Azure Form Recognizer — Extract data from bulk scanned documents

Amir Mustafa
13 min readFeb 21, 2023

--

→ Form Recognizer is Azure’s AI service to extract data from scanned forms or documents.

→ Suppose there is a company that deals with lots of documents say a hospital or bank.

→ So manually copying from a large amount of document files can be a long or erroneous process.

→ Using this Azure service, we can extract data from forms and images.

Form Recognizer Capabilities:

→ We will go through each one by one. Basically, when we enter Form Recognizer we see below API services:

a. Layouts: Extracting data from Forms or tables.

b. Pre-built model:

→ These are intended to be used for specific scenarios — like receipts, invoices, business cards, ID documents, and many more,

→ We should simply think of some defined structure of documents already created (say Driving Licence, etc.)

c. General Documents:

This contained pre-trained models — to extract key-value pairs as well as any entity from any models.

d. Custom Models:

→ Think of this as some regularly used document structures in your company. Suppose this structure is not available in the Prebuilt model.

→ We can use this feature. Train a model in minutes to extract specific key-value pairs

Creating Form Recognizer Account in Azure Portal:

→ Go to your Azure Portal Account (i.e. Login to Azure Cloud)

→ Search Form Recognizer in the Search bar

→ Click Create icon to fresh create. We can create multiple instances of Form Recognizer Account.

→ If you are on a free trial chose Free Trial subscription otherwise we can choose our Pay-as-you-go subscription

→ Resource group is simply some unique group for this specific feature eg dev or prod environment, etc just to identify

→ Name your form recognizer (this will be the Form recognizer account name)

→ Once we have created the account. It will be visible on the home page of the form recognizer. Click it.

→ We will be on the landing page:

Form Recognizer Studio:

→ Here we see two options:

  1. Form Recognizer Studio — Experiment with all Features of Form Recognizer here
  2. SDK for different technology — to implement in projects

→ Whatever we will see in Form Recognizer Studio can be implemented in projects using SDK.

→ We are actively working on these features in our current project. From a learning perspective, we will see this in the Form Recognizer studio.

→ We land on the below page by clicking the Try it link.

Let us see different features of Form Recognizer:

1. Layout API:

→ From Layout API, we can upload documents using Add button. There is also a sample document uploaded.

→ Click on Result to see extracted data

→ This is the best part for developers. Just like postman, they provide snippets to be used in code technology specific.

→ It is important to note Layout API can extract texts, tables data, etc.

→ We see data is extracted on right in JSON format.

→ So imagine we uploaded a document. Form Recognizer can extract data from documents.

2. Prebuilt models:

→ We will use an Invoice API as an example.

→ This also applies to Receipts, Business cards, etc. shown in the above image (basically all existing types of document formats)

→ There is a sample invoice in a portal.

→ There are three specific types of information we can extract (i.e. text, tables, and selection marks data)

→ Let us click Analyse button.

→ We observe all tables values are extracted, text values are extracted, and so on.

→ If we scroll down, we see each line item on right contains table data extracted.

→ Let us click the Result tab and see the JSON response

→ We also have Code snippets here for technology specific.

3. General Purpose API:

Eg 1: Clicking Analyse button will extract information in right.

Eg 2: Second Document:

→ We observe all data value is extracted on right

→ Clicking on the table also extracts data

→ Let us now observe what all gets extracted from documents and how it benefits a project.

Fields: Key-value pairs extracted present in the document.

Content: Readable format data

Result: Here we get JSON object for use in the application

Code: They also provide code snippets technology specific here as well.

4. Custom Model:

→ Click on Custom Model on the Form Recognizer page.

→ Click on Create a Project

→ Enter your Custom model name say W9 Parser and click Create button.

→ W9 is basically a document for Tax paying in the US

→ There are basically two types of formats — template and neural

→ Enter your subscription, resource group, and details are shown below. Click Next.

→ Now Azure requests to create three entities:

a. Storage account — where we store files

b. Blob container — A specific container inside the storage account

c. Provide the path of the blob here

d. Permissions — Form Recognizer permission to access blob container

→ As it is paid, will not implement it. Instead, for learning purposes we will use a sample provided in the Microsoft portal

→ Let us understand high-level view steps of the Custom model:

→ Prepare: Form Recognizer says we need to upload a minimum of 5 documents of a similar template.

Label:

→ Label simply means tags eg. nature, photography, w9-format, address, etc

→ Below is the view of the custom model. Uploaded 5 documents in Form Recognizer Studio.

→ To create a Label, we have to click the + icon on the right. Any name we can write as per requirement.

→ So there are basically 4 types of Labels: Field, Selection mark, Signature, and Table

A. Field Type Label

→ Enter Field name, press enter

→ Create as many labels as required by the document — eg Address, Name, Signature, Pincode, etc.

→ The next step is to select the word in the document — add to that label

Eg 1:

→ A pop comes asking for the tag in which Field. As it is a name. Choosing it.

→ Select the relevant Label for highlighted word (eg: putting in Name label)

Eg 2: Selecting another word in a category — It can be part of the business field

Eg 3: Tagging a checkmark from the document

Eg 4: Tagging Address

Eg 5: Tagging Zip code

Eg 6: We can also select a region. Click Region and select the area in the document (A square pops up) — Say this is in the EIN category

Eg 6: We can also select a region for Signature Field:

→ Now once tagging is done, we can train a model by clicking the Train button from Form Recognizer Studio

NOTE:

Training a model means making Form Recognizer (Machine) know which tags/fields data to extract whenever a new document of similar type is uploaded.

→ Giving a random model ID and clicking the Train button inside the popup.

→ Training takes a maximum of 2 minutes to process.

→ We observe our Custom Form model is done — most of the Fields with 95% confidence which is good.

→ So training a model is done. Before uploading a new document to extract data. Let us also see the table type of Label.

B. Table Type Label:

Eg: Vehicle Registration Document (CarMain named model):

If the uploaded document has tables in it.

→ Let us now go to the 5th document has fewer tags.

→ Let us say now we click MaintenanceLog Field — A dynamic table popups.

→ So what basically we will do is select text from the table. Tap on the table field in that column.

→ Click the + icon for entering a new row.

→ So the idea of uploading 5 documents and if we tag with variations. The machine will have a good understanding of the documents.

NOTE: All 5 documents are of the same format, data can be different.

This is similar to like fingerprint setting in phone. More variations and angles in which pushed. More easy recognizing the document.

Eg: the 4th Document: 3 rows tagged

Eg: the 3rd Document: have 8 rows tagged — So we have tagged the document in variation.

Training

→ Once tagging is done we have to train the document so that the machine will remember it. Click the Train button as shown in the above screenshot.

Uploading New Document:

→ Whatever we have done till now in the custom model is for this step.

→ In Layout API, or pre-trained models (eg. Invoice), Microsoft has pre-defined the document structure.

→ Go to Test in Azure portal in Form Recognizer Studio. Click CarMain custom model.

→ This one was the name second custom model we did (one with a Vehicle Registration Document having a Table Label)

→ Let us now add upload a sample document from the local computer. Should be of the model we trained.

→ Click Analyse button

→ So what it will do is invoke the model we tagged and trained and extract the data of all the tagged fields.

→ We observe all the data in the machine extracted.

Implementing in Project:

→ We have seen Form Recognizer and its capabilities from Azure Studio. The same can be implemented in our app using SDK provided by Azure.

→ We have also seen code snippets provided in each Feature of Form Recognizer.

→ Microsoft has SDKs available for multiple technologies — eg. Node.js, Python, etc for implementing this in our application.

Check out more about SDK here.

Video:

Closing Thoughts:

In this article, we have learned about Azure’s Form Recognizer service. This Azure AI service is basically used to extract data from documents.

We have learned about Layout API, Pre-trained models (eg. Invoice), General Purpose API, and Custom Models.

Basically, this becomes powerful when documents come in bulk to a company, and an application with AI-powered can extract the data and process it downstream.

Thank you for reading till the end 🙌 . If you enjoyed this article or learned something new, support me by clicking the share button below to reach more people and/or give me a follow on Twitter and subscribe Happy Learnings !! to see some other tips, articles, and things I learn about and share there.

--

--

Amir Mustafa
Amir Mustafa

Written by Amir Mustafa

JavaScript Specialist | Consultant | YouTuber 🎬. | AWS ☁️ | Docker 🐳 | Digital Nomad | Human. Connect with me on https://www.linkedin.com/in/amirmustafa1/

Responses (1)