Getting Started With Azure Document AI Document Intelligence API In Python (Source Code In Desc)

Jie Jenn
5 Mar 202450:24

TLDRThis Azure tutorial introduces developers to the Document Intelligence API in Python, focusing on building document processing solutions with AI. The video is divided into four parts, covering pricing, use cases, setting up resources, and exploring pre-built models. It demonstrates script development for the API client, extracting data from documents like invoices and W2 forms, and concludes with table extraction from documents using the layout model, providing a comprehensive guide for beginners in document AI.

Takeaways

  • 📘 Azure Document Intelligence is an AI service for building document processing solutions that automatically analyze and extract information from documents.
  • 👶 This tutorial is aimed at beginners but is considered one of the most useful yet challenging APIs to learn, with prior Python and Azure experience recommended.
  • 📈 The tutorial is divided into four parts: pricing and use cases, setting up Azure Document Intelligence, script development, and various example implementations.
  • 💰 Azure offers a free tier with 500 free pages per month, and paid tiers range from $3 to $50 per 1000 pages depending on the document type.
  • 🚫 Free tier limitations include processing only the first two pages of a document, while paid tiers allow up to 2000 pages per document.
  • 🖼️ Supported file formats include PDF, JPEG, PNG, BMP, TIFF, HEIF, and Microsoft Office files, with the latter only supporting read and layout models.
  • 🔍 The API includes pre-built models such as 'read' for text extraction and 'layout' for structured document text extraction, tables, and document structure.
  • 🏢 Common use cases for Azure Document Intelligence include invoice processing, receipt capture, legal document review, text form processing, and bank statement analysis.
  • 🛠️ The script development section demonstrates constructing a Document Intelligence API client, handling document sources, and processing documents using the API.
  • 📑 Example implementations cover extracting data from W2 forms, invoices, and tables within documents, showcasing the API's capabilities for different document types.
  • 📚 The tutorial concludes with instructions on navigating the API's response structure to retrieve specific data points from processed documents.

Q & A

  • What is Azure Document Intelligence?

    -Azure Document Intelligence is an AI service used to build document processing solutions that can automatically analyze and extract information from documents.

  • Who is the target audience for this Azure tutorial?

    -The tutorial is aimed at beginners, but it's more useful for those who have some experience in Python and Azure.

  • What are the four main parts of the tutorial?

    -The tutorial is divided into four parts: discussing pricing and common use cases, installing Python dependencies, setting up the Azure Document Intelligence resource, and exploring pre-built models and script development.

  • What are some common use cases for Azure Document Intelligence API?

    -Common use cases include invoice processing, receipt capture, legal document review, text form processing, and bank statement analysis.

  • What is the free tier limitation for Azure Document Intelligence API?

    -With the free tier, you can only process the first two pages of a document. For paid tiers, you can process up to 2000 pages per document.

  • What are the supported file formats for Azure Document Intelligence API?

    -The API supports PDF and image files like JPEG, PNG, BMP, TIFF, and HEIF, as well as Microsoft Office files such as Excel, Word, PowerPoint, and HTML.

  • What are the core models available in Azure Document Intelligence API?

    -The core models are the Read model, used for text extraction, and the Layout model, which can extract text and document structure in an organized format.

  • How can one obtain an API key and endpoint URL for Azure Document Intelligence API?

    -You can obtain an API key and endpoint URL by creating an instance of the Azure Document Intelligence service in the Azure console and then accessing the 'Keys and Endpoint' section under resource management.

  • What is the process for extracting data from a document using Azure Document Intelligence API?

    -The process involves creating a document intelligence client instance, analyzing the document with a specified model ID, and then using the result to navigate and extract data from the fields of the document.

  • How can tables be extracted from a document using Azure Document Intelligence API?

    -Tables can be extracted using the pre-built layout model. The API treats the document as a free form and identifies tables within the document, which can then be compiled into a data frame object.

Outlines

00:00

📚 Introduction to Azure Document Intelligence API

This paragraph introduces a tutorial on utilizing Azure's Document Intelligence API with Python. It's aimed at beginners but assumes some knowledge of Python and Azure. The agenda includes an overview of pricing, use cases, free tier limitations, setting up the Azure resource, exploring pre-built models, and developing scripts for the API client. The tutorial will cover various examples, starting with basic API calls, examining responses, loading documents, and extracting data from forms like W-2, invoices, and tables from documents.

05:06

💰 Azure Document Intelligence Pricing and Limitations

The paragraph discusses the pricing model of Azure's Document Intelligence API, highlighting a free tier offering 500 pages per month. Beyond the free tier, pricing ranges from $3 to $50 per 1000 pages, depending on document type, with a special note on custom-trained models. It also outlines limitations of the free tier, such as processing only the first two pages of a document, compared to up to 2000 pages for paid subscriptions. File size limitations and supported formats are also detailed, including PDF, image files, and Microsoft Office files, with the latter only supporting read and layout models.

10:10

🛠️ Setting Up Azure Document Intelligence Resource

The speaker guides viewers through setting up an Azure Document Intelligence resource, starting from creating an Azure account and subscription to navigating the Azure console for resource group creation. The process includes naming the resource, selecting a region, and creating the document intelligence service instance with a globally unique name. After deployment, the tutorial covers accessing the instance dashboard, monitoring usage, and obtaining API keys and endpoint URLs for API access.

15:15

🔧 Configuring Azure Document Intelligence Client

This section explains how to configure the Azure Document Intelligence client in Python. It involves creating a configuration file for API keys and endpoints, and writing a script to instantiate the client with these credentials. The script includes helper functions to check document sources and handle file paths, culminating in a test run to ensure the client object is created correctly.

20:20

🔎 Analyzing Documents with Pre-built Models

The paragraph delves into the process of analyzing documents using pre-built models available in the Document Intelligence API. It explains the use of the 'analyze document request' to prepare documents for analysis and the initiation of the analysis process using the client instance. The focus is on extracting information from text forms like W2, with a mention of challenges in navigating the documentation and locating model IDs.

25:21

📝 Navigating the Document Analysis Results

The speaker describes how to navigate the results of document analysis, explaining the structure of the output and the significance of keys such as 'content', 'pages', and 'styles'. The summary covers how to reference specific pages, extract text, and understand the document hierarchy. It also touches on using the results with generative APIs or iterating through documents to extract specific field values.

30:22

📑 Extracting Data from Text Forms and Invoices

This section details the process of extracting data from text forms like W2 and invoices using the Document Intelligence API. It discusses identifying field IDs and navigating through the nested dictionary structure to retrieve values. The tutorial also covers handling special cases like box 12 on a W2 form, which requires accessing an array of values, and the process of extracting and iterating through line items on an invoice.

35:23

📈 Extracting and Handling Tables from Documents

The paragraph explains how to extract tables from documents using the pre-built layout model of the Document Intelligence API. It covers the process of analyzing a document to identify tables and compiling the data into a data frame object. The summary includes steps for iterating through rows and columns, handling empty cells, and preparing the data for further use, with an emphasis on cleaning up the data to remove empty rows.

40:26

🏁 Conclusion and Next Steps

The final paragraph wraps up the tutorial by summarizing the process of extracting data from documents using Azure's Document Intelligence API. It invites viewers to ask questions or provide feedback in the comments and encourages them to like and subscribe for more content. The speaker also hints at continuing the topic in a follow-up video.

Mindmap

Keywords

💡Azure Document Intelligence API

Azure Document Intelligence API is a service provided by Microsoft Azure that enables developers to build solutions for processing documents. It automatically analyzes and extracts information from various document formats. In the video, it is the central technology around which the tutorial is structured, with the aim of teaching viewers how to integrate and use this API in Python to process documents.

💡Document Processing

Document processing refers to the automated manipulation and analysis of documents to extract meaningful information. In the context of the video, document processing is the core functionality provided by the Azure Document Intelligence API, which can analyze documents like invoices, receipts, and legal documents, extracting key data points.

💡Python

Python is a widely used high-level programming language known for its readability and versatility. In the video, Python is the chosen language for demonstrating how to interact with the Azure Document Intelligence API, indicating that the tutorial assumes some level of Python experience among its audience.

💡Pre-built Models

Pre-built models in the context of the Azure Document Intelligence API are pre-trained AI models that can recognize and extract information from specific types of documents without the need for custom training. The video mentions several pre-built models like those for invoices, receipts, and W2 forms, which simplify the process of document analysis for common document types.

💡API Endpoint

An API endpoint is a specific location in a web service that can be called upon to execute a certain function or retrieve data. In the video, the presenter discusses how to access the Azure Document Intelligence API endpoint, which is necessary for making API calls to process documents.

💡Pricing and Free Tier

The pricing and free tier refer to the cost structure and the availability of a limited version of the service at no cost. The video explains the Azure Document Intelligence API's pricing model, including the free tier that allows users to process a certain number of pages for free each month, and the different pricing tiers for paid services.

💡Usage Limits

Usage limits are the constraints imposed on the usage of a service, often based on the subscription tier. The video script outlines the limitations of the free tier of the Azure Document Intelligence API, such as the maximum number of pages that can be processed and the file size limits.

💡Supported File Formats

Supported file formats are the types of documents that a service can handle. The video mentions that the Azure Document Intelligence API supports PDF and image files, including JPEG, PNG, BMP, TIFF, and HEIF formats, as well as Microsoft Office files for read and layout models.

💡Read Model

The read model is one of the core models available in the Azure Document Intelligence API, primarily used for text extraction from documents. It is mentioned in the video as a model that does not require the document to be organized in a specific format, focusing solely on extracting text.

💡Layout Model

The layout model is another core model in the Azure Document Intelligence API, which not only extracts text but also maintains the document's structure and format. The video describes how this model can be used to extract text, tables, and the layout of the document, which is useful for documents that have a structured format.

💡Script Development

Script development involves writing and creating scripts that automate tasks or processes. In the video, script development is part of the tutorial where the presenter guides viewers on constructing the client for the Azure Document Intelligence API and demonstrates how to write scripts to interact with the API.

Highlights

Introduction to Azure's document intelligence API for building document processing solutions.

The tutorial is aimed at beginners with an emphasis on Python and Azure experience.

The agenda is divided into four parts covering pricing, use cases, setup, and examples.

Azure Document Intelligence offers a free tier with limitations on document processing.

Pricing details vary based on document type and usage tiers.

Free tier limitations include processing only the first two pages of a document.

Supported file formats include PDF, image files, and Microsoft Office files.

Pre-built models for text extraction and document layout are available.

Installing Python dependencies for Azure AI and pandas library.

Setting up Azure document intelligence resource and obtaining API key and endpoint.

Script development for constructing the document intelligence API client.

Examples include making API calls to document intelligence endpoints and examining responses.

Loading documents in various formats like JPEG, PDF, and other image files.

Extracting data points from W2 forms using pre-built models.

Demonstration of extracting data from invoices and other document files.

Table extraction from documents using the pre-built layout model.

Common use cases for Azure Document Intelligence include invoice processing and legal document review.

Navigating the complexities of document structure and data extraction.

Practical applications of the API in real-world scenarios for document processing.

The tutorial concludes with a summary of the covered topics and an invitation for feedback.