X
30Dec

Microsoft Azure : Machine Learning - Price Prediction

We'll create linear regression model that predicts price of automobile based on different variables such as make and technical specifications.

 


Pre-requisite


We'll use Azure Machine Learning Studio to develop and iterate on simple predictive analytics experiment.

Enter Machine Learning Studio http://studio.azureml.net and click Get started button. We can choose either Guest Access or log in with our Microsoft account.

Machine Learning Studio experiment consists of dragging components to canvas and connecting them in order to create a modeltrain the model and score and test the model. Experiment uses predictive modelling techniques in form of Machine Learning Studio modules that ingest data, train a model against it and apply model to new data. We can also add modules to pre-process data and select features, split data into training and test sets and evaluate or cross-validate the quality of our model.

 


Five steps to create experiment


We'll follow five basic steps to build experiment in Machine Learning Studio in order to create, train and score our model:

  •  Create model

o    Step 1 : Get data

o    Step 2 : Pre-process data

o    Step 3 : Define features

 

  • Train model

o    Step 4 : Choose and apply learning algorithm

 

  • Score and test model

o    Step 5 : Predict new automobile prices

 

 Step 1:  Get data

There are number of sample datasets included with Machine Learning Studio that we can choose from and we can import data from many sources. For this example, we will use the included sample dataset, Automobile price data (Raw). This dataset includes entries for number of individual automobiles, including information such as make, model, technical specifications and price.

1)    Select Blank Experiment. Select default experiment name at the top of canvas and rename it to something meaningful.

 

5

 

2)    To left of experiment canvas is palette of datasets and modules. Type automobile in Search box at top of this palette to find dataset labelled Automobile price data (Raw).

3)    Drag dataset to experiment canvas.

To see what this data looks like, click output port at bottom of automobile dataset and then select Visualize. The variables in the dataset appear as columns and each instance of automobile appears as row. Far-right column is target variable we're going to try to predict.

 

6

 

Close the visualization window by clicking the x in upper-right corner.

 

7

 

Step 2: Pre-process data

A dataset usually requires some pre-processing before it can be analysed. We might have noticed missing values present in columns of various rows. These missing values need to be cleaned so model can analyse data correctly. In our case, we'll remove any rows that have missing values. Also, normalized-losses column has a large proportion of missing values, so we'll exclude that column from model altogether.

 

TIP: Cleaning the missing values from input data is pre-requisite for using most of the modules.

 

First we'll remove normalized-losses column and then we'll remove any row that has missing data.

1)    Type project columns in Search box at top of module palette to find Project Columns module, then drag it to experiment canvas and connect it to output port of Automobile price data (Raw) dataset. This module allows us to select which columns of data we want to include or exclude in model.

2)    Select Project Columns module and click Launch column selector in Properties pane.

 

7

 

  •  Make sure All columns is selected in filter drop-down list, Begin With. This directs Project Columns to pass through all the columns.
  • In next row, select Exclude and column names and then click inside text box. A list of columns is displayed. Select normalized-losses and it will be added to text box.

 

7

 

  • Click check mark button to close column selector.

 

8

 

The properties pane for Project Columns indicates that it will pass through all columns from the dataset except normalized-losses.

 

TIP: We can add a comment to module by double-clicking module and entering text. This can help us to see at glance what module is doing in our experiment. In this case, double-click the Project Columns module and type the comment.

 

8

 

3)    Drag Clean Missing Data module to experiment canvas and connect it to Project Columns module. In Properties pane, select Remove entire row under Cleaning mode to clean data by removing rows that have missing values. Double-click module and type comment.

 

9

 

10

 

4)    Run experiment by clicking RUN under experiment canvas.

 

11

 

When experiment is finished, all modules have green check mark to indicate that they finished successfully. Notice also the Finished running status in upper-right corner.

All we have done in experiment to this point is clean data. If we want to view cleaned dataset, click left output port of the Clean Missing Data module and select Visualize. Notice that normalized-losses column is no longer included and there are no missing values.

 

12

 

12

 

Now that data is clean, we're ready to specify what features we're going to use in predictive model.

 

Step 3: Define features

In machine learning, features are individual measurable properties of something we’re interested in. In our dataset, each row represents one automobile and each column is feature of that automobile. Finding good set of features for creating a predictive model requires experimentation and knowledge about problem we want to solve. Some features are better for predicting target than others. Also, some features have strong correlation with other features, so they will not add much new information to model and they can be removed.

Let's build model that uses subset of features in our dataset. We can come back and select different features, run experiment again and see if we get better results. As first guess, we'll select following features with the Project Columns module. Note that for training model, we need to include price value that we're going to predict.

1)    Drag another Project Columns module to experiment canvas and connect it to left output port of the Clean Missing Data module. Double-click module and type comment.

2)    Click Launch column selector in Properties pane.

 

12

 

3)    In column selector, select No columns for Begin With and then select Include and column names in filter row. Enter our list of column names. This directs module to pass through only columns that we specify.

 

TIP: Because we’ve run experiment, column definitions for our data have passed from the original dataset through the Clean Missing Data module. When we connect Project Columns to Clean Missing Data, Project Columns module becomes aware of column definitions in our data. When we click the column names box, list of columns is displayed and we can select the columns that we want to add to list.

 

4)    Click check mark button.

 

13

 

This produces dataset that will be used in learning algorithm in next steps. Later, we can return and try again with different selection of features.

 

Step 4: Choose and apply learning algorithm

Now that data is ready, constructing a predictive model consists of training and testing. We'll use our data to train model and then test model to see how close it's able to predict prices.

Classification and regression are two types of supervised machine learning techniques. Classification is used to make prediction from a defined set of values, such as a colour. Regression is used to make a prediction from a continuous set of values, such as a person's age.

We want to predict price of automobile, which can be any value, so we'll use a regression model. For this example, we'll train a simple linear regression model and in next step, we'll test it.

1)    We can use our data for both training and testing by splitting it into separate training and testing sets. Select and drag Split Data module to experiment canvas and connect it to output of last Project Columns module. Set Fraction of rows in the first output dataset to 0.75. This way, we'll use 75 percent of the data to train the model, and hold back 25 percent for testing.

 

TIP: By changing Random seed parameter, we can produce different random samples for training and testing. This parameter controls the seeding of pseudo-random number generator.

 

14

 

2)    Run the experiment. This allows Project Columns and Split Data modules to pass column definitions to the modules we'll be adding next.

 

14

 

15

 

3)    To select learning algorithm, expand Machine Learning category in module palette to left of canvas and then expand Initialize Model. This displays several categories of modules that can be used to initialize machine learning algorithms.

For this experiment, select the Linear Regression module under Regression category and drag it to experiment canvas.

 

4)    Find and drag Train Model module to experiment canvas. Connect left input port to output of Linear Regression module. Connect right input port to training data output of Split Data module.

5)    Select the Train Model module, click Launch column selector in Properties pane and then select price column. This is value that our model is going to predict.

 

16

 

17

 

6)    Run experiment.

 

18

 

19

 

20

 

The result is trained regression model that can be used to score new samples to make predictions.

 

Step 5: Predict new automobile prices

Now that we've trained model using 75 percent of our data, we can use it to score the other 25 percent of data to see how well our model functions.

1)    Find and drag Score Model module to experiment canvas and connect the left input port to output of the Train Model module. Connect right input port to test data output of Split Data module.

2)    To run the experiment and view output from Score Model module, click output port and then select Visualize. Output shows predicted values for price and known values from test data.

 

22

 

23

 

24

 

3)    Finally, to test quality of results, select and drag the Evaluate Model module to experiment canvas and connect left input port to the output of Score Model module.

4)    Run experiment.

 

25

 

26

 

To view output from Evaluate Model module, click output port and then select Visualize. Following statistics are shown for our model:

 

 

25

 

27

 

  • Mean Absolute Error: The average of absolute errors.
  • Root Mean Squared Error: The square root of average of squared errors of predictions made on test dataset.
  • Relative Absolute Error: The average of absolute errors relative to absolute difference between actual values and average of all actual values.
  • Relative Squared Error: The average of squared errors relative to squared difference between actual values and average of all actual values.
  • Coefficient of Determination: Also known as R squared value, this is a statistical metric indicating how well model fits data.

For each of error statistics, smaller is better. A smaller value indicates that predictions more closely match actual values. For Coefficient of Determination, closer its value is to one, better predictions.

 

Related

Azure Blob Storage: The PowerShell Way!

Hi folks!Great to see you again.This blog post is purely based on Azure Blob Storage: The PowerShell...

Read More >

Create a Windows Server 2012 R2 VM using ARM in Azure PowerShell

Hi Folks,In this Blog Post we will learn how to create an Azure ARM Virtual Machine using Azure Powe...

Read More >

Continuous Integration/ Continuous Deployment VSTS

Following the below steps you can build and deploy your ASP.NET  app to Azure from either Visua...

Read More >

How to Sync On-premise AD with Windows Azure AD using Azure AD Connect tool

 Azure AD is a service that provides identity and access management capabilities in the cloud. ...

Read More >

Creating a Point-to-Site Connectivity using Azure Resource Manager

Configure a Point-to-Site connectivity to a VNet using PowerShell (ARM Mode)Task 1: Create a Self-Si...

Read More >

How to Create an Azure Virtual Network by using a Deployment Template

Hello Folks!In this Blog post, we will try to learn how to create an Azure V-Net using an ARM templa...

Read More >

Locking VMs and Resources Groups with Azure Resource Manager using Azure PowerShell

Hello Folks!In this blog post we will be talking about locking down your Azure Resources with Azure ...

Read More >

Microsoft Azure: Implementing Internet Facing Load Balancers using Azure Resource Manager

Howdy Folks!I was exploring Network Load Balancer in Azure Resource Manager and found out that you c...

Read More >

Microsoft Azure Stack : Power of Azure in our datacentre

Why Azure Stack?Microsoft Azure Stack is a new hybrid cloud platform product that enables our organi...

Read More >

Microsoft Azure : Mobile Services - Xamarian.Android with .Net

NOTE: Microsoft Azure recommends Azure App Service Mobile Apps for all new mobile backend deployment...

Read More >

Share

Comments

This is most informative and also this posts most user-friendlyhttps://socialprachar.com/ and super navigation to all posts... Thank you so much for giving this information to me. Microsoft Windows Azure Training | Online Course | Certification in Chennai | Microsoft Windows Azure Training | Online Course https://socialprachar.com/Artificial Intelligence Training and Course in Hyderabad
5/7/2021 8:49:30 PM | Reply
I at last discovered extraordinary post here.I will get back here. I just added your blog to my bookmark locales. thanks.Quality presents is the vital on welcome the guests to visit the page, that is the thing that this website page is giving.data science courses in delhi ncr
3/2/2021 3:30:00 PM | Reply
Wow, amazing post! Really engaging, thank you.data scientist course
2/26/2021 3:08:31 PM | Reply
Anu
https://www.sevenmentor.com/machine-learning-course-in-pune.php
7/25/2020 12:11:06 PM | Reply
Anu
Machine learning is one of the top trending technology of the era and machine learning has stretched its roots deep down in the corporate industry. Due to this vast inference, new age learners and working professionals are keen to learn this technology. https://www.sevenmentor.com/machine-learning-course-in-pune.php
7/25/2020 12:10:03 PM | Reply
This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me. Microsoft Windows Azure Training | Online Course | Certification in chennai | Microsoft Windows Azure Training | Online Course | Certification in bangalore | Microsoft Windows Azure Training | Online Course | Certification in hyderabad | Microsoft Windows Azure Training | Online Course | Certification in pune
6/3/2020 8:40:30 PM | Reply
Awesome article, Thanks for sharing.
8/1/2019 3:19:37 PM | Reply

Post a Comment

Try DevOpSmartBoard Ultimate complete Azure DevOps End-to end reporting tool

Sign Up

  • Recent
  • Popular
  • Tag
Monthly Archive
Subscribe
Name

Text/HTML
Text/HTML
Contact Us
  • *
  • *