2. Azure NLP Services in Depth

2.1 Agenda

Estimated reading time: ~12 minutes

Learning Outcomes

Explain the full capability set of Azure AI Language beyond basic (cơ bản) text analytics
Describe how Conversational Language Understanding (CLU) processes (xử lý) intents and entities
Map Azure AI Speech capabilities to the correct (đúng) business scenario
Apply the service selection framework (khung chọn dịch vụ) to NLP-specific exam scenarios

2.2 Glossary

Term	Quick Explanation
Utterance (Lượt nói)	Một câu hoặc đoạn nói của người dùng gửi vào hệ thống conversational AI. Ví dụ: "Book me a flight to Da Nang next Monday."
CLU	Conversational Language Understanding — dịch vụ nhận dạng (recognize) intent và entity từ conversational utterance. Kế thừa (successor) của LUIS.
Knowledge Base (Kho tri thức)	Tập hợp cặp câu hỏi-đáp (Q&A pairs) dùng để xây dựng chatbot hỏi-đáp (FAQ bot).
Custom Training (Huấn luyện tùy chỉnh)	Quá trình cung cấp dữ liệu của riêng mình để tinh chỉnh (fine-tune) dịch vụ cho domain cụ thể.
Confidence Score (Điểm tin cậy)	Xác suất (probability) mà model dự đoán cho một kết quả — từ 0.0 đến 1.0.
Phoneme (Âm vị)	Đơn vị âm thanh (sound unit) nhỏ nhất có thể phân biệt (distinguish) nghĩa. Tiếng Anh có ~44 phoneme. Ví dụ: "cat" = /k/ + /æ/ + /t/ — 3 phoneme.
Neural Voice	Giọng nói tổng hợp (synthetic voice) cực kỳ tự nhiên, tạo bằng deep learning — gần như không thể phân biệt với giọng người thật.
Diarization (Phân biệt người nói)	Kỹ thuật xác định và gán nhãn (label) từng người nói trong một cuộc hội thoại nhiều người.
Custom Speech	Phiên bản Azure AI Speech được tinh chỉnh (fine-tuned) trên dữ liệu âm thanh của domain cụ thể — tăng độ chính xác (accuracy) cho thuật ngữ chuyên ngành.
Dialog Turn (Lượt hội thoại)	Một cặp hỏi-đáp trong cuộc hội thoại nhiều bước (multi-turn). Hệ thống phải nhớ (remember) ngữ cảnh (context) từ các lượt trước.

3. Azure AI Language — Full Capability Map

Workshop 2 introduced Azure AI Language at an overview (tổng quan) level. Workshop 5 covers the internal structure and the capabilities critical for building real NLP applications.

3.1 Capability Groupings

3.2 Custom vs. Prebuilt Capabilities

Capability	Prebuilt (Dùng ngay)	Custom (Cần training)
Sentiment Analysis	Yes	No
NER	Yes (general entities)	Yes (domain-specific entities)
Text Classification	No	Yes (your own categories)
Question Answering	No	Yes (your own Q&A pairs)
Conversational Language Understanding	No	Yes (your intents and entities)
Summarization	Yes	No
Language Detection	Yes	No
PII Detection	Yes	Yes (custom entity types)

4. Deep Dive: Conversational Language Understanding (CLU)

4.1 What CLU Does

CLU extracts two structured pieces of information from a user's natural language utterance (lượt nói) — the intent (ý định) and the entities (thực thể) — enabling conversational applications to take the correct (đúng) action.

4.2 Intent + Entity Extraction

Example utterance: "Book a window seat on a flight to Da Nang on Friday for two people."

Intent:    book_flight          (confidence: 0.97)
Entities:
  seat_type:    "window seat"
  destination:  "Da Nang"
  date:         "Friday"
  passenger_count: "two"

The application uses the intent to decide what action to take and the entities to fill in the parameters (tham số) of that action.

4.3 CLU vs. LUIS (Deprecated (Đã ngừng hỗ trợ))

Dimension	LUIS (Deprecated)	CLU (Current)
Status	Retired (đã ngừng) — migrated (di chuyển) to CLU	Current service — actively supported
Training approach	Separate service with its own portal	Unified in Azure AI Language
Multilingual support	Limited	Native (tự nhiên) multilingual support
Orchestration	Manual integration (tích hợp thủ công)	Built-in workflow orchestration (điều phối)

Exam note: AI-900 may still reference LUIS in historical context (ngữ cảnh lịch sử). Know that CLU is its successor and all new projects should use CLU.

4.4 CLU Orchestration Workflow — How It Ties Together

A real conversational application rarely (hiếm khi) uses only CLU or only Question Answering — it needs both, coordinated (phối hợp) by an Orchestration Workflow:

Dialog Management (Quản lý hội thoại) is the layer that makes multi-turn (nhiều lượt) conversations possible:

Problem	Dialog Management Solution
User says "him" — who is "him"?	Track (theo dõi) previous entities across turns (các lượt trước) — resolve pronouns from context
User changes topic mid-conversation	Detect (phát hiện) intent shift; decide whether to switch (chuyển) or confirm (xác nhận)
Slot filling (điền vào các slot)	Chatbot collects multiple entities over several turns: "Where?" → "When?" → "How many?"
Fallback (Dự phòng)	When no intent is recognized, route to a human agent (đại lý) or ask for clarification (làm rõ)

Exam note: Azure AI Language — CLU + Orchestration handles Dialog Management for multi-turn scenarios. It is the Azure service behind intelligent chatbots that remember context (nhớ ngữ cảnh) across turns.

4.5 Question Answering (Custom)

Build a knowledge base (kho tri thức) from:

Manually entered Q&A pairs
Existing documents (tài liệu hiện có) (FAQ pages, PDFs, Word docs, URLs)
Chitchat (trò chuyện) templates (friendly greetings)

The service uses NLP to find the best-matching (khớp nhất) answer for user questions, returning:

The answer text
A confidence score
The source (nguồn gốc) Q&A pair

5. Deep Dive: Azure AI Speech

5.1 The Speech Processing Pipeline

For Speech-to-Text, the process involves two stages: acoustic modeling (mô hình âm học) (audio → phonemes (âm vị)) and language modeling (mô hình ngôn ngữ) (phonemes → words). Custom Speech improves both layers for domain-specific (chuyên biệt theo lĩnh vực) vocabulary.

5.2 Speech Capabilities Deep Dive

Capability	Key Detail	When to Use Custom
Real-time STT	Streams (truyền) audio and returns transcript as spoken	Use Custom Speech when domain has specialized vocabulary
Batch STT	Sends a stored audio file to Azure and retrieves the transcript later (lấy kết quả sau) — the application does not wait (không cần chờ) for the result synchronously (đồng bộ)	For large volumes of recorded audio — call center archives
TTS — Neural Voices	400+ prebuilt neural voices across 140+ languages	When brand consistency (nhất quán thương hiệu) is important — use Custom Neural Voice
Speaker Diarization	Labels each speaker in a multi-person conversation	Call center transcription to separate agent (đại lý) from customer
Pronunciation Assessment	Scores pronunciation at phoneme, word, and sentence level	Language learning apps, accent training
Keyword Spotting (Phát hiện từ khóa)	Detects a specific word/phrase to trigger (kích hoạt) an action	Wake-word detection ("Hey Cortana"), always-on monitoring
Speech Translation	Translates spoken audio to text/audio in another language	Real-time multilingual meetings, conference interpretation (phiên dịch)

5.2b TTS: Concatenative vs. Neural Voice

Not all Text-to-Speech is equal. Understanding the difference helps explain why Neural Voices sound human:

Dimension	Concatenative TTS (Tổng hợp nối)	Neural Voice
How it works	Stitches (ghép) together pre-recorded audio fragments (đoạn)	Generates speech waveform (dạng sóng) end-to-end from text using deep learning
Sound quality	Robotic (cứng nhắc), audible seams (rõ chỗ nối) between fragments	Natural prosody (ngữ điệu tự nhiên), emotion, and rhythm
Flexibility	Fixed voice recordings — can't adapt tone (giọng điệu)	Adjustable (có thể điều chỉnh) speed, pitch (cao độ), and speaking style
Azure example	Legacy TTS (deprecated)	Azure AI Speech — Neural Voices (400+)
Custom Neural Voice	N/A	Train on your brand's voice samples — creates a unique branded voice

5.3 Custom Speech — When and Why

Custom Speech improves STT accuracy when:

Scenario	Problem with Base STT	Custom Speech Solution
Medical transcription	Misrecognizes drug names, dosages (liều lượng), procedures (thủ thuật)	Train on medical vocabulary dataset
Legal proceedings (thủ tục pháp lý)	Struggles with legal Latin terms, case citations (trích dẫn)	Train on legal corpus
Manufacturing	Factory noise, technical part names	Acoustic + language model adaptation
Accented speech	Regional accents reduce accuracy	Train on representative (đại diện) accent samples

6. NLP Service Selection — Exam Scenarios

These are the most testable (dễ ra đề thi nhất) distinctions in AI-900:

Scenario	Correct Service + Capability	Why Not the Alternative
Classify customer support tickets into 15 custom categories	AI Language — Custom Text Classification	Prebuilt NER extracts entities, not custom categories
Build a chatbot that answers questions from a 200-page HR policy PDF	AI Language — Question Answering (custom KB)	Not CLU — CLU is for intent/entity, not document Q&A
Detect when a user says "cancel" in an ongoing phone call	AI Speech — Keyword Spotting	STT transcribes everything; keyword spotting listens specifically (cụ thể)
Extract all person names and company names from contracts	AI Language — NER (prebuilt)	Sentiment analysis extracts tone, not structured entities
Identify which of three speakers is speaking in a recorded meeting	AI Speech — Speaker Diarization	Speaker Recognition verifies identity; Diarization separates speakers
Detect if a customer review contains the customer's home address	AI Language — PII Detection	NER identifies location entities; PII Detection specifically flags sensitive data
Understand what a user wants to do in a booking chatbot	AI Language — CLU	Question Answering retrieves facts; CLU recognizes intent for action

7. Discussion Questions

Q1 — CLU vs Question Answering: A developer is building a bank's internal chatbot. Use case A: "Customers ask: what is the penalty (phí phạt) for early loan repayment (trả nợ sớm)? — answer found in the bank's FAQ document." Use case B: "Customer says: 'I want to open a savings account' — the chatbot must route them to the account opening flow (luồng mở tài khoản)." Which Azure AI Language capability handles each, and why?

Q2 — Custom Speech ROI: A Vietnamese media company wants to transcribe 10,000 hours of archived broadcast footage (cảnh quay lưu trữ). The base Azure AI Speech STT returns 85% accuracy on Vietnamese broadcasting language. Custom Speech would improve this to 97% but requires 40 hours of labeled audio data (dữ liệu âm thanh được gán nhãn) and 2 weeks of ML engineering time. How would you calculate (tính toán) whether the investment (đầu tư) in Custom Speech is justified (được biện minh)?

Q3 — The Privacy Pipeline: A telehealth (y tế từ xa) company uses Azure AI Speech for real-time transcription of doctor-patient calls, then pipes (đưa) the transcript into Azure AI Language for sentiment and keyword analysis. The transcript contains patient names, diagnoses (chẩn đoán), and medications (thuốc). Design the responsible data handling pipeline (quy trình xử lý dữ liệu có trách nhiệm) — which Azure AI capabilities should be applied at each step, and in what order?

Made by Anh Tu - Share to be share

2.1 Agenda​

Learning Outcomes​

2.2 Glossary​

3. Azure AI Language — Full Capability Map​

3.1 Capability Groupings​

3.2 Custom vs. Prebuilt Capabilities​

4. Deep Dive: Conversational Language Understanding (CLU)​

4.1 What CLU Does​

4.2 Intent + Entity Extraction​

4.3 CLU vs. LUIS (Deprecated (Đã ngừng hỗ trợ))​

4.4 CLU Orchestration Workflow — How It Ties Together​

4.5 Question Answering (Custom)​

5. Deep Dive: Azure AI Speech​

5.1 The Speech Processing Pipeline​

5.2 Speech Capabilities Deep Dive​

5.2b TTS: Concatenative vs. Neural Voice​

5.3 Custom Speech — When and Why​

6. NLP Service Selection — Exam Scenarios​

7. Discussion Questions​