Azure ML Designer -Creating ML Pipeline graphically using ML Service
→ Today we will set up ML Pipeline using the Designer feature of Azure ML Service.
→ In the previous article, we learned about setting up Automated Model, creating a dataset, computing, ran the first job.
Why use this service?
→ Basically we can create ML Pipeline visually i.e. graphically drag-drop pipeline.
Why is this service in demand?
→ As we can visually test, build and deploy machine learning models.
→ Azure Machine Learning comes with two parts — Automated ML and Designer. Anyone using Automated ML service easily creates a pipeline with this.
→ This article is a continuation of a previous article or video.
Designer Demo:
→ Log in to Azure Cloud, search for Azure Machine Learning Service
→ Click on the workspace.
Note: we have created workspace in Automated ML article here (max 2 to 3 mins here) or basically clicking + Create icon in below screenshot.
→ Click on Launch ML Studio:
→ Finally we are inside ML studio.
→ Designer is a section where we add multiple items for the pipeline.
→ It is a graphical canvas, where we can drag and drop things from left to right for making a pipeline.
→ By default Azure, provides us with four pre-built templates.
→ We will create a new pipeline using — classic prebuilt components. Hit the plus sign.
→ This is used most of the time (as the dataset is organization specific).
→ Some quick tour comes the first time, let us click the done button to start.
In the designer, we have two parts:
a. Left part: is data/component source — items to be used
b. Right part: It is called Canvas. Here items from the left can be dragged and used.
→ Basically we will create ML pipeline graphically.
→ Left part is divided into two tabs:
i. Data — This contains the dataset we already created — we see the diabetes dataset.
→ Drag data from left to right canvas.
→ Data Added (Piece 1) ✅
→ If we double-click it, we see info.
ii. Components:
→ Here we can do lots pieces of operations/algorithms can be done. Think like some reusable function that matches our requirement
eg. remove duplicate rows, partition, sample, split data, and other mathematical operations.
→ Eg. using split component
Q. How is it beneficial?
A. Suppose we have a large data — we need to split:
- One for training data i.e. model training
- One for extraction/evaluation/prediction
NOTE: Whenever we use data for ML (say 50K data), we use some for training and some for extraction, hence splitting component becomes important here.
Component 1: Split Data (1st Piece)
→ Let us now Drag the Split component to the right (i.e. towards Canvas).
→ Alternatively, instead of dragging, we can also click it, followed by clicking the Use Component button
→ Now the designer area is a space just like we do Architect design in PPT.
→ We can drag left or right and adjust when the diagram has lots of components.
→ Split Data Added (Piece 2) ✅
Component 2: ML Algorithm:
→ Now another component we would need is to use ML algorithms.
There are three types of models:
a. Regression — model that has numeric output ,
b. Clustering — model that are grouped based on something similar
c. Classification- model than has Boolean ouput (true/false, yes/no, etc.)
→ Now there are several algorithms inside Regression, Clustering, and Classification.
→ We have to choose as per project needs.
→ As our data is numeric. The data used is a Sample Diabetic Patient report as shown below:
→ Let us use the simplest regression algorithm i.e. Linear Regression.
→ Select it and click Use component.
→ ML Algorithm component added (Piece 3) ✅
Component 3: Models:
→ Now we have to use the next piece of block i.e. models
Depending in the type of data we use models:
a. Train Clustering Models: Cluseter data
b. Train Model: Regression or Classification data
c. Train PyTorch: For using PyTorch
d. Tune model Hyperparameters
→ As our data type is regression. Let us use the same.
NOTE: We see one warning error in the model block. This is because we have to add the name. We will do it after adding all components.
→ We can also zoom in/out right canvas as per requirement using Ctrl + and Ctrl — keys resp.
→ Model component added (Piece 4) ✅
Components 4 and 5: Score Model, Evaluation Model:
→ The next piece we need is to Score and Evaluate a model.
Why do we need it?
→ When we connect all these pieces, we will see that some pieces from the graph will go to the training model, and the left part of the split will go to the evaluation block.
→ This is for evaluating data based on the training model.
→ Let us click this Model scoring & Evaluation component in our canvas.
→ Score Model component added (Piece 4) ✅
→ Adding Evaluation Model:
→ Evaluate the Model component added (Piece 5) ✅
→ In addition, we can also use this in various ways — like Python scripts, R scripts, expose as Web service, etc. i.e. Developer friendly 😀
→ For us it is not required — because we can directly do it from ML Studio.
→ At a high level, we are developing an ML pipeline.
→ For that we are dragging multiple components or modules into Canvas.
Connections:
→ We have successfully added all our components required for creating ML pipeline.
→ Let us now connect the component as per our flow.
a. Connecting ml-data to split data:
Q. Why connect these two?
A. We want to split the whole data into two parts — some for training and the rest for evaluation. ml-data may have more than 50K data.
Taking the same fingerprint eg of the iPhone:
a. training — initially we add fingerprints for setting impressions into phone | training machine | training data
b. evaluation — fingerprints for unlocking phone | evaluation data | if unknown fingerprint detected, will not unlock the device . 😀
→ Let us now drag from o/p of the dataset (i.e. ml-data) to split data
→ Double-click on the Split Data text — for configuring split.
→ Let us say 0.8 (80% training) by 0.2 (20% extraction)
→ Click on the right arrow on the popup to close it.
→ Let us see if we have configured it rightly. Click Parameters next to Split data component.
→ It is rightly configured (80% training, 20% extraction based)
→ Now we have to train a model — for this, we need two things:
a. Model (i.e. Train Model) and algorithm (Linear Regression) connected:
→ We will drag the connection from Linear to regression to 1st point of the Train model.
→ Now Split component has two outputs.
→ The first split of the dataset (i.e. training data) goes to the input of the second point of the Train model
Q. Is this connection justified?
Yes, because the Train model will take input as — ML Algorithm (i.e. Linear Regression) and 80% split of 1st data
→ Now model data is ready. Time to connect remaining split data (ie. 20% one) to Model score
→ One point is the Model as input and another point is the remaining dataset.
→ So basically think like Score Model is like an iPhone that knows the fingerprints of the user (i.e. training data is known)
→ First input is training data and second input is remaining split data.
→ Finally we connect with Score Model to the first endpoint of Evaluate Model.
→ This is the high-level view of the ML Pipeline.
→ We have successfully connected all our components (Piece 6) ✅
Note: The art lies in two things:
a. Choosing right ML algorithm
b. Choosing right dataset, clean data (no missing rows or columns, etc.) and identify right features (i.e. input of data eg. AGE, SEX, BMI, etc.)
For this data all columns other that Y = features (input)
Y columns — labels (output)
Submitting Pipeline:
→ Now once our connection is ready, first we rename our pipeline and then submit it.
→ Finally after renaming. Clicking Submit button.
→ Now on clicking Submit, we observe one error comes
Error Message: Select compute target in the setting portal.
Solution:
This is because we have to select the compute for the same. Double-click on the error.
→ From the dropdown we select compute
→ Click on the dropdown. Compute we create will be visible here. Select it.
NOTE: We created ml-compute in the previous video of Automated ML. We can see compute creation here — timestamp is selected (max 2 to 3 mins for creating ml-compute).
→ Click the Save button followed by Submit button.
Running Pipeline (i.e. Experiment)
→ A popup comes when clicking on the Submit button.
→ Click Create a new experiment.
→ We also observe the back of the popup Azure compute error is gone and a green tick from the Azure side.
Experiment: Every run in the Azure pipeline is called Experiment.
→ Let us give the experiment name and click on Submit button.
→ We observe our pipeline running — initiated.
→ Finally after some time, we will receive experiment completion notifications.
STEP 6: Model Evaluation — Regression Models:
→ In ML all run = experiment
→ Click on the Jobs or Experiment for seeing the status of the ML pipeline run.
→ Click on the pipeline we just created i.e. custom-ml-experiment
Note: In realtime, when working with applications in productions — will show the Status here — Failed, Completed,
→ Logs become very helpful here for fixing the error. Clicking the pipeline itself will show the status and logs.
→ Here we see our pipeline is successful. This is a graph view.
→ Click on the Job overview to see the status of the Job.
→ Here couple of tabs here with some data.
→ First one is Overview — Basic details and JSON
→ Let us see the actual JSON we get. Scroll a little down. Click Raw JSON.
Note: Run ID is the unique id triggered for this instance of the pipeline run.
→ Second tab is Pipeline parameters
→ Third tab is Metrics. As this is a regression model, some numeric output is generated.
→ Finally Forth Tab is Child Jobs — here we see all the steps and info
→ In the Regression module what we are really concerned about is the error and how to reduce error.
Publishing Pipeline:
→ Now once everything is ready and tested in the training pipeline. We will publish the pipeline.
→ We can also say this pipeline is ready for production.
→ Click on Publish button.
→ Select create new
→ Click on the published link provided to us.
→ We now observe we have the REST endpoint ready for us. Developer friendly 😀
→ So if we click the documentation link, we see the request to create the pipeline.
→ These are sample data we used from Microsoft Open Diabetic data.
STEP 7: Model Evaluation — Classification Models
→ In the Regression model, we have multiple outputs — MEA, Root MEA, etc.
→ These are basically metrics used in Regression models.
→ In the Classification model, we have the simplest output.
→ Let us take the simplest Classification model i.e. Two-way classification.
→ Here the output is boolean eg. true/false, 0/1, positive/negative
→ Little bit of theory gist guys. Important here:
Metrics Terminologies:
Predicted label: whatever is predicted
True label: whatever result we got
True negative: Predicted negative + Actual negative
False negative: Predicted negative + Actual positive
False positive: Predicted positive + Actual negative
True positive: Predicted positive + Actual positive
Accuracy: Proportion of accurate results to total cases
Precision: (True positive)/(True positive + False positive)
Recall: (True positive)/(True positive + False negative)
F1 Score: (Precision Recall)/(Precision + Recall)
→ So F1 Score is the balance between — Precision and Recall
→ When working on a production project, they ask after training, the lesser the F1 Score better the model trained.
→ We successfully set up our first ML pipeline ✅
Closing Thoughts:
In this current article, we have created an ML pipeline graphically provided by Azure Machine Learning Service.
We pulled data to the canvas (right screen) and added multiple components for making the ML pipeline — Split component, ML Algorithm (Linear Regression), Model, Model Score, and Evaluation.
We also connected each component as per flow.
Finally, we successfully ran the ML pipeline using Experiment. We basically, see Failed errors in different environments and get to know the error from logs for fixing it.
Thank you for reading till the end 🙌 . If you enjoyed this article or learned something new, support me by clicking the share button below to reach more people and/or give me a follow on Twitter and subscribe Happy Learnings !! to see some other tips, articles, and things I learn about and share there.