2. Azure Vision Services in Depth
2.1 Agenda
Estimated reading time: ~12 minutes
Learning Outcomes
- Map each CV task to the correct Azure Vision service
- Explain when to use Custom Vision vs. Azure AI Vision (prebuilt)
- Identify the most testable (dễ ra đề nhất) service selection scenarios for Computer Vision in AI-900
- Describe the responsible AI considerations specific to vision systems deployed in production
2.2 Glossary
| Term | Quick Explanation |
|---|
| Azure AI Vision | Dịch vụ prebuilt (đã có sẵn) của Microsoft để phân tích ảnh và video — Image Analysis, OCR/Read API, và Spatial Analysis. |
| Custom Vision | Dịch vụ cho phép bạn tự huấn luyện (train) model phân loại ảnh (image classification) hoặc phát hiện đối tượng (object detection) tùy chỉnh bằng dữ liệu của mình. |
| Azure AI Face | Dịch vụ chuyên biệt cho Face Detection, Face Verification, và Face Identification. |
| Azure AI Document Intelligence | Dịch vụ OCR thông minh (intelligent OCR) chuyên biệt cho tài liệu — trích xuất (extract) trường dữ liệu có cấu trúc từ hóa đơn, form, ID... |
| Spatial Analysis (Phân tích không gian) | Tính năng trong Azure AI Vision — phân tích video để đếm người, đo dwell time (thời gian ở lại), và theo dõi (track) chuyển động. |
| Custom Neural Voice | Mô hình giọng nói (voice model) tùy chỉnh huấn luyện từ các mẫu âm thanh (audio samples) để tạo giọng mang bản sắc thương hiệu (brand voice). |
| Prebuilt Document Model | Model Document Intelligence đã được Microsoft huấn luyện sẵn cho các loại tài liệu phổ biến: hóa đơn (invoice), biên lai (receipt), tờ khai thuế (tax form), CCCD... |
| Limited Access (Quyền truy cập hạn chế) | Chính sách của Microsoft yêu cầu phê duyệt (approval) trước khi dùng các tính năng Face nhạy cảm (sensitive) như identification và liveness detection. |
3. Azure Vision Services — Overview
| Service | Primary Capability | Prebuilt or Custom |
|---|
| Azure AI Vision | Image Analysis (tag, caption, detect objects), OCR (Read API), Spatial Analysis | Prebuilt |
| Azure AI Face | Face detection, verification, identification, liveness | Prebuilt + Limited Access |
| Azure AI Custom Vision | Train your own image classifier or object detector | Custom (you provide data + labels) |
| Azure AI Document Intelligence | Structured field extraction from document images | Prebuilt models + Custom models |
4. Azure AI Vision (Image Analysis)
4.1 Capabilities
| Feature | What It Returns | Example |
|---|
| Image Tagging | Set of keyword tags (thẻ từ khóa) with confidence scores | ["outdoor", "mountain", "snow", "skiing"] confidence: [0.99, 0.97, 0.95, 0.88] |
| Image Captioning | Natural language description of the image | "A person skiing down a snow-covered mountain slope" |
| Object Detection | Bounding boxes + labels for multiple objects | [Person at (120,80), Skis at (130,200)] |
| Background Removal | Separates foreground (tiền cảnh) from background (hậu cảnh) | Product photo with transparent background |
| Smart Crop | Automatically identifies (xác định) the most important region | Crop thumbnail (ảnh thu nhỏ) to focal point |
| Read API (OCR) | Extracts all text from image with layout (bố cục) information | Scanned page → text + position of each word |
| Spatial Analysis | Detects and tracks people in video frames | Video → "Zone A: 8 people, avg dwell: 3.2 min" |
| Content Moderation | Flags adult (nội dung người lớn), racy (khiêu khích), or violent (bạo lực) content | Image → {"isAdultContent": false, "adultScore": 0.02} |
4.2 Spatial Analysis Use Cases (Video vs. Image)
Object Detection (Image Analysis) looks at a single static image. Spatial Analysis is designed for video streams — it tracks objects across time. This allows it to calculate metrics that are impossible with single images:
| Industry | Use Case | Metric (Chỉ số) Tracked |
|---|
| Retail | Measure time customers spend at product displays | Dwell time (thời gian dừng) per display zone |
| Airports | Count passengers in security queues (hàng chờ) | Queue length (độ dài hàng), wait time (thời gian chờ) |
| Hospitals | Monitor (giám sát) social distancing compliance (tuân thủ) | Distance between individuals (khoảng cách) |
| Manufacturing | Enforce (thực thi) PPE (thiết bị bảo hộ) usage in restricted (hạn chế) zones | Presence of hard hat, safety vest |
5. Azure AI Custom Vision
5.1 When Prebuilt Is Not Enough
The prebuilt Azure AI Vision models are trained on general (chung) image datasets. They recognize broad categories (danh mục rộng) — dogs, cats, cars — but fail at domain-specific (chuyên biệt) tasks:
| General Vision Fails When... | Custom Vision Solution |
|---|
| Classify proprietary (độc quyền) product SKUs by photo | Train classifier on your product photos with SKU labels |
| Detect defects specific to your manufacturing process | Train detector on images of your specific defect patterns |
| Identify rare species (loài hiếm) or custom object types | Provide labeled examples → Custom Vision learns the pattern |
5.2 Custom Vision Workflow
Step 1: Labeling Best Practices
A model is only as good as its labels. "Garbage in, garbage out."
- Draw bounding boxes tightly (chặt chẽ) around objects.
- Ensure balanced classes (don't have 500 images of defect A and only 10 of defect B).
- Include background variations and different lighting.
Step 2: Evaluation Metrics
Custom Vision provides two key metrics. You must understand the trade-off:
- Precision (Độ chuẩn xác): Of all the objects the model claimed were defects, what percentage actually were? (Low precision = many False Positives / "Crying wolf").
- Recall (Độ bao phủ): Of all the actual defects in reality, what percentage did the model find? (Low recall = many False Negatives / "Missed defects").
Exam Tip: In medical diagnosis or safety inspection, Recall is more important (you'd rather have a false alarm than miss a tumor). In automated sorting, Precision might be more important.
5.3 Custom Vision vs. Azure ML (Custom Vision)
| Dimension | Azure AI Custom Vision | Azure Machine Learning |
|---|
| Expertise needed | Minimal (tối thiểu) — no ML coding | Data science skills required |
| Supported tasks | Image classification + Object detection only | Any ML task |
| Control | Limited — fixed architecture (kiến trúc cố định) | Full control over model architecture |
| Time to model | Hours | Days to weeks |
| Best for | Domain-specific vision without ML team | Advanced custom CV with specific architecture needs |
6. Azure AI Document Intelligence
6.1 Service Architecture
6.2 Prebuilt Models
| Model | Document Type | Key Fields Extracted |
|---|
| Invoice | Business invoices (hóa đơn doanh nghiệp) | Vendor name, invoice date, line items, subtotal, tax, total amount |
| Receipt | Retail receipts (biên lai bán lẻ) | Merchant name, transaction date, items purchased, total |
| ID Document | Passports (hộ chiếu), driver's licenses, ID cards | Name, date of birth, document number, expiry date |
| Tax Form (W-2, 1099) | US tax documents | Employer info, wages, federal tax withheld |
| Business Card | Business cards (danh thiếp) | Name, company, email, phone, address |
6.3 The Layout Model (Mô hình bố cục)