2. Azure AI Services

2.1 Agenda

Estimated reading time: ~15 minutes

Learning Outcomes

Describe the capabilities and use cases of each Azure AI prebuilt service
Distinguish (phân biệt) between services that handle similar data types (e.g., Vision vs. Document Intelligence)
Identify which service is deprecated (đã ngừng hỗ trợ) and its replacement
Map a real-world capability to its specific Azure service

2.2 Glossary

Term	Quick Explanation
SDK	Software Development Kit — bộ thư viện và công cụ lập trình để tích hợp dịch vụ Azure vào ứng dụng.
REST API	Giao thức giao tiếp qua HTTP — cách phổ biến nhất để gọi Azure AI services từ bất kỳ ngôn ngữ nào.
Deprecated	Dịch vụ chính thức ngừng hỗ trợ — không nhận update mới, sẽ bị tắt sau một thời gian.
Neural Voice	Giọng nói nhân tạo (synthetic voice) được tạo bằng deep learning — nghe tự nhiên gần giống người thật.
Bounding Box	Hộp hình chữ nhật đánh dấu vị trí của vật thể trong ảnh (tọa độ x, y, width, height).
Grounding	Kỹ thuật neo đậu LLM vào dữ liệu thực tế, có thể kiểm chứng — đối lập với hallucination (bịa thông tin).
Jailbreak	Tấn công vào AI bằng cách craft (thiết kế) prompt để vượt qua giới hạn an toàn của model.
Schema	Cấu trúc (structure) định nghĩa các trường dữ liệu cần trích xuất từ tài liệu (ví dụ: invoice_number, total_amount, date).

3. Service 1: Azure AI Language

Azure AI Language is a unified NLP service that provides text analytics, language understanding, summarization, and question answering through a single API.

3.1 Capabilities

Capability	What It Does
Sentiment Analysis (Phân tích cảm xúc)	Classifies text as Positive / Negative / Neutral with a confidence score
Named Entity Recognition (NER)	Extracts structured entities (thực thể có cấu trúc): people, organizations, locations, dates, quantities
Key Phrase Extraction	Identifies the most informative (có giá trị thông tin) concepts in a document
Language Detection	Identifies the language of input text with confidence score
PII Detection (Personally Identifiable Information)	Detects and redacts (che đi) sensitive personal data (names, phone numbers, ID numbers)
Text Classification	Assigns predefined (định sẵn) labels to text — supports custom (tùy chỉnh) training
Summarization	Extractive (trích xuất câu quan trọng) and Abstractive (tổng hợp nội dung mới) summarization
Question Answering	Answers natural language questions from a knowledge base (kho tri thức)
Conversational Language Understanding (CLU)	Extracts intent (ý định) and entities from conversational input — *successor (kế thừa)* to deprecated LUIS**

3.2 Use Cases

Customer feedback analysis (sentiment at scale (quy mô lớn))
Automated document tagging and routing (phân loại và điều hướng)
Compliance monitoring (giám sát tuân thủ) — detect PII in documents
Build FAQ bots using the Question Answering capability

4. Service 2: Azure AI Translator

Azure AI Translator provides real-time and batch (hàng loạt) language translation for text and documents across 100+ languages.

4.1 Capabilities

Capability	What It Does
Text Translation	Translates plain text between languages in real-time
Document Translation	Translates entire documents (Word, PDF, HTML) while preserving (giữ nguyên) formatting
Transliteration (Phiên âm)	Converts text to a different script — e.g., Arabic script → Latin characters
Language Detection	Auto-detects source language before translation
Custom Translator	Fine-tunes (tinh chỉnh) the model with domain-specific (chuyên biệt) terminology

4.2 Use Cases

Multilingual (đa ngôn ngữ) customer support portals
Global content publishing and localization (bản địa hóa)
Real-time cross-language communication in collaboration tools

5. Service 3: Azure AI Speech

Azure AI Speech enables bidirectional (hai chiều) conversion between spoken audio and text, plus speaker recognition and voice synthesis (tổng hợp giọng nói).

5.1 Capabilities

Capability	What It Does
Speech-to-Text (STT)	Transcribes (phiên âm) audio in real-time or from batch files
Text-to-Speech (TTS)	Synthesizes natural-sounding speech using Neural Voices
Speech Translation	Translates spoken audio to text or audio in another language, in real-time
Speaker Recognition	Verifies (xác minh) or identifies a speaker's identity from audio
Pronunciation Assessment	Evaluates (đánh giá) pronunciation accuracy — useful for language learning
Custom Speech	Fine-tunes STT for domain-specific vocabulary (từ vựng chuyên ngành) (medical, legal, technical)
Custom Neural Voice	Creates a brand-specific (đặc trưng thương hiệu) synthetic voice from recorded samples

5.2 Use Cases

Call center transcription and analytics (phân tích cuộc gọi)
Accessibility features (tính năng hỗ trợ tiếp cận) — voice control, screen readers
Real-time interpretation (phiên dịch) in multilingual meetings
Language learning applications with pronunciation feedback

6. Service 4: Azure AI Vision

Azure AI Vision analyzes images and video to generate tags, descriptions, text, and spatial insights (thông tin không gian).

6.1 Capabilities

Capability	What It Does
Image Analysis	Returns tags, captions (chú thích), objects, colors, and metadata from images
Object Detection	Detects and localizes (định vị) multiple objects with bounding boxes
OCR (Read API)	Extracts printed and handwritten text from images
Spatial Analysis	Detects people, tracks movement, and measures occupancy (mức độ chiếm dụng) in video streams
Smart Crop	Automatically identifies the most important region (vùng quan trọng nhất) of an image for thumbnails

6.2 Azure AI Vision vs. Azure AI Document Intelligence

This distinction (sự phân biệt) is frequently tested in AI-900:

Dimension	Azure AI Vision	Azure AI Document Intelligence
Primary input	General images (ảnh tổng quát)	Document images (ảnh tài liệu — hóa đơn, form, hợp đồng)
Output	Tags, objects, captions, general text	Structured fields (trường có cấu trúc): key-value pairs, tables, document structure
Intelligence level	"What is in this image?"	"What does this document say, field by field?"
Example	Photo of a park → tags: [grass, trees, dog]	Photo of an invoice → `{invoice_no: "INV-001", total: "$500"}`

6.3 Use Cases

Auto-tagging product photos in e-commerce (thương mại điện tử)
Retail shelf monitoring (giám sát kệ hàng)
Security and surveillance (giám sát an ninh) analytics

7. Service 5: Azure AI Document Intelligence

Azure AI Document Intelligence (formerly (trước đây là) Form Recognizer) automates extraction (trích xuất tự động) of structured data from documents using AI-enhanced OCR and ML models.

7.1 Capabilities

Capability	What It Does
Prebuilt Models (Mô hình dựng sẵn)	Ready-to-use extraction for invoices (hóa đơn), receipts (biên lai), ID cards, tax forms, business cards
Layout Model	Extracts text, tables, and structure from any document — without domain-specific training
Custom Model	Train on your own labeled (được gán nhãn) documents to extract domain-specific fields
Composed Model	Combines multiple custom models into one endpoint (điểm cuối) for mixed document types
RAG Ingestion	Pre-processes (tiền xử lý) documents for use in Azure AI Search and RAG pipelines

7.2 Use Cases

Automated invoice processing in accounts payable (kế toán phải trả)
KYC (Know Your Customer — Xác minh danh tính khách hàng) document verification in banking
Medical record digitization (số hóa hồ sơ y tế)
Logistics: automated bill of lading (vận đơn) extraction

8. Service 6: Azure AI Content Safety

Azure AI Content Safety detects and filters harmful, inappropriate, or policy-violating (vi phạm chính sách) content in text, images, and multimodal inputs — for both user-generated and AI-generated content.

8.1 Capabilities

Capability	What It Does
Text Moderation (Kiểm duyệt)	Detects hate speech (ngôn ngữ thù địch), violence, sexual content, self-harm in text
Image Moderation	Detects harmful visual content in images
Prompt Shields	Protects against jailbreak attacks and prompt injection (tiêm lệnh độc hại)
Groundedness Detection	Verifies whether an LLM's response is supported (có căn cứ) by the provided source material
Protected Material Detection	Identifies copyrighted (có bản quyền) text or code in model outputs
Custom Categories	Define your own harmful content categories (danh mục) specific to your application

8.2 Use Cases

Social media platforms — moderation of user-generated content
Customer-facing chatbots — prevent harmful AI responses
Developer tools — detect copyrighted code snippets in AI code suggestions

9. Service 7: Azure AI Content Understanding

Azure AI Content Understanding extracts structured insights (thông tin có cấu trúc) from unstructured, multimodal content — documents, images, audio, and video — using generative AI.

9.1 Capabilities

Capability	What It Does
Prebuilt Analyzer	Extract key fields from common content types using ready-made schema (cấu trúc dữ liệu định sẵn)
Custom Analyzer	Define your own schema for domain-specific field extraction
Multimodal Processing	Analyze text, images, audio, and video in a single pipeline
RAG Ingestion	Structure content for use in AI Search and agent workflows (luồng làm việc của agent)

9.2 Content Understanding vs. Document Intelligence

Dimension	Document Intelligence	Content Understanding
Data types	Documents only (PDF, images of documents)	Multimodal (text, image, audio, video)
Extraction approach	OCR + ML field detection	Generative AI + schema mapping
Best for	Structured business documents	Unstructured mixed-media content

9.3 Use Cases

Call center recording analysis (phân tích ghi âm): extract topics, outcomes, sentiment from audio
Knowledge management (quản lý tri thức): structure unstructured enterprise content for AI search
Media intelligence: index and query video content by topic

10. Service Summary Reference

Service	Primary Data	Core Capability	Common Use Case
Azure AI Language	Text	Sentiment, NER, Q&A, CLU	Customer feedback, FAQ bot
Azure AI Translator	Text	Cross-language translation	Multilingual support portal
Azure AI Speech	Audio	STT, TTS, Speaker ID	Call transcription, voice apps
Azure AI Vision	Images/Video	Analysis, OCR, Detection	Product tagging, surveillance
Azure AI Document Intelligence	Document images	Structured field extraction	Invoice processing, KYC
Azure AI Content Safety	Text/Images	Harmful content detection	Content moderation, chatbot safety
Azure AI Content Understanding	Multimodal	Schema-based insight extraction	Call center analytics, knowledge base

11. Discussion Questions

Q1 — Picking the Right Tool: A logistics (hậu cần) company wants to automatically extract the sender (người gửi), recipient (người nhận), weight, and tracking number (số theo dõi) from scanned bills of lading. They already tried Azure AI Vision's Read API but got raw text with no structure. What service should they use instead, and why?

Q2 — Content Safety in Generative AI: A company deploys an Azure OpenAI-powered customer service chatbot. A user finds a way to make the chatbot respond with competitor product recommendations and offensive language. What Azure AI Content Safety capabilities would address each attack vector (hướng tấn công), and at what point in the pipeline should they be applied?

Q3 — Custom Speech Trade-off: A medical transcription (phiên âm y tế) service uses Azure AI Speech but notices 15% error rate on drug names and surgical terminology (thuật ngữ phẫu thuật). They consider Custom Speech fine-tuning. What are the costs and risks of this approach vs. simply post-processing (hậu xử lý) the output with a medical terminology dictionary?

Made by Anh Tu - Share to be share

2.1 Agenda​

Learning Outcomes​

2.2 Glossary​

3. Service 1: Azure AI Language​

3.1 Capabilities​

3.2 Use Cases​

4. Service 2: Azure AI Translator​

4.1 Capabilities​

4.2 Use Cases​

5. Service 3: Azure AI Speech​

5.1 Capabilities​

5.2 Use Cases​

6. Service 4: Azure AI Vision​

6.1 Capabilities​

6.2 Azure AI Vision vs. Azure AI Document Intelligence​

6.3 Use Cases​

7. Service 5: Azure AI Document Intelligence​

7.1 Capabilities​

7.2 Use Cases​

8. Service 6: Azure AI Content Safety​

8.1 Capabilities​

8.2 Use Cases​

9. Service 7: Azure AI Content Understanding​

9.1 Capabilities​

9.2 Content Understanding vs. Document Intelligence​

9.3 Use Cases​

10. Service Summary Reference​

11. Discussion Questions​