Azure’s Custom Vision — Building ML Models Trainings

Amir Mustafa

9 min readMay 1, 2023

→ Today, we will be learning one of the unique services of Azure in Machine Learning that extracts data from images

→ Remember old times when we have to search for a file physically or even now from softcopy?

→ Copying one to ten files is okay. Imagine we have a service where we regularly deal with documents

→ The question is can an ML extract data from images? The answer is yes.

→ In this article we will make our hands dirty and do it in real-time. It will be fun understanding it.

Prerequisites:

→ Just keen to learn is enough. You can explore this article or follow along with the video.

How it works:

→ It is very important to know how Custom Vision works.

→ Suppose you enter a bank in an office, says HDFC Bank. Now if we want to deposit money, we take a deposit form

→ If we want to withdraw money we take the withdrawal form. Likewise, there are multiple forms

→ Now suppose 100 people have filled out the deposit form and 200 people withdrawal form.

→ Now what Custom Vision can do from if we know the structure, say deposit form if there are 100 data or 10K data, custom vision can extract data easily without effort 😀

→ The same can be done for withdrawal forms and other types of forms.

Architecture:

→ From an IT perspective, we have to do two things:

a. Training:

→ Here we will make Azure understand multiple sets of similar documents and tag them.

→ Every form type is called a model (say Deposit model, a Withdrawl model)

b. Prediction/Extraction:

→ Now once the document/image is trained we will upload the document using the trained model and extract data.

Form Types:

→ We are about to start in Azure Custom Vision. At the time of the creation of Custom Vision, it asks to choose Form types. These are:

a. Classification: This simply means getting the tags. There are two types of Classification:

i. Multilabel — Multiple tags per image

ii. Multiclass — Single tag per image

b. Object Detection: Returns Coordinates of objects in the image i.e. location

Realtime Demo:

→ It is time to start the fun stuff. Let us explore together.

→ If you have an Azure portal, try to do it parallelly. Reading or watching videos should be enough.

STEP 1: Go to Azure Cloud.

STEP 2: Search for Custom Vision Service:

→ Click Create Custom Vision. The landing page of the custom vision service looks like below.

→ Now remember like Deposit form understanding, we have to make Azure understand about Deposit form or image by creating a model and training it.

NOTE: Every specific category of images or form will have a separate model

eg. Deposit form

→ Now to Create a model, we will follow below steps:

STEP 3: Click Create custom vision button

→ The first step in the landing page is to create a new resource group. We will keep this explicit in the custom vision

→ We can think of resource group as a category that will tag all software in one place eg. say dev-custom-vision or uat-custom-vision or custom-vision-rg

→ Now from the above image, we see we can do both prediction and training or individually

NOTE:
Training: Making Machine Learn about document or image eg. Deposit Form
Prediction/Extraction: Now when uploading similar type of document with different data Machine will extract data without extra effort

→ We will choose both. Click on Create a new resource group. Give any name eg. custom-vision-rg

→ Now for both Training and Prediction, we have to choose a pricing tier. We will choose the Free tier (i.e Free F0) for both as shown below:

→ Region we can choose any nearest region (eg, East US).

→ Click Review + create button

→ Now we have to review the configurations chosen and click Create button:

→ Click the Go to resource group button

→ Here we will see that with the new Azure interface, we get two resources:

a. cognitiveserviceshappylearnings786 — for training

b. cognitiveserviceshappylearnings786-Prediction — for prediction

→ Click on cognitiveserviceshappylearnings786 link to start training.

Training:

→ Click on Custom Vision Portal

→ A new tab will open for Custom Vision Studio. Click on Sign in button

→ In the next step it asks to Read Terms and Conditions and Click Agree after checking.

→ It is important to know whatever we will see here, the steps we follow can be integrated into our app using SDK eg, Node.js, Python, etc.

→ Click New Project.

→ Now there are two main options to choose:

Different Project Types:

a. Classification — for tags

b. Object Detection — for location (i.e. cor-ordinates)

→ The first step of training is to upload 15 images of documents/images/other files.

What we will upload?

→ The idea is to upload all different images having Pomegranate fruits in it.™£¡™

→ At the time of prediction, the machine will extract information

→ Below have downloaded a list of different open images in the Desktop directory

→ Clicking the Upload button will open a popup to upload the image

NOTE:
a. As per Azure’s rule — we cannot upload greater than 6 MB file
b. JPG, PNG or BPM format files are accepted at present.

→ Now click Done button:

→ As we have freshly uploaded these images, they are untagged. We have to tag it.

Tagging:

→ Currently tagged images are empty and all images are there in tagged ones

→ Now we have to click images one by one and tag them. Let us click the first image eg. Pomegranate and Pomegranate juice

→ Just hovering on the object in the image will create a square box.

→ Click it and write the tag name. The second time we tag, the hover of tags will come

→ Now once tagging of the first image is done, click on the right arrow for the next image.

→ Similarly we have to tag for the other images.

→ Now once tagging of all the images is done. Click on the x icon.

→ We now observe all images coming in the Tagged tab. Randomly clicking one. image.

→ Now if we see the untagged tab, we see no images — as all images are tagged

→ Once the training is done, click on the Train button. The train button will make the Machine learn about all the tags of images.

→ There we have two options — Quick Training and Advanced Training

Quick training will train a little fast
Advanced Training will train little better

→ Chose a training type and click the Train button

Error 😐

→ This error simply says — need 15 items per tag. Our cases 11 and 1.

→ Let us make 15 count

Error Solved 🙂

→ Finger crossed, clicking train button again

→ We observe Training in Progress. Error is passed.

→ Now, it takes some time to train a model. It will show the training status in the dashboard.

→ Once training is done, we below screen:

→ Hurray, training is completed. Let us see the prediction in the coming article.

→ This will otherwise make the video long.

Video:

Closing Thoughts:

In this article, we have learned to make machines learn about documents image using Azure’s Custom Vision service.

We have created a custom model and trained it successfully. In the next article, we will extract data using the trained model

Thank you for reading till the end 🙌 . If you enjoyed this article or learned something new, support me by clicking the share button below to reach more people and/or give me a follow on Twitter and subscribe Happy Learnings !! to see some other tips, articles, and things I learn about and share there.

https://twitter.com/amir__mustafa