Integrating Azure Machine Learning With Azure Stream Analytics to Predict Customer Churn

I covered building IoT Analytics Architecture prototype authored by David Crook from Microsoft in one of my previous posts. David graciously provided some great feedback on the architecture model and asked me to explore yet another aspect of his architecture, that is to take intelligent decisions on the streams of incoming data based on Azure Machine Learning based predictive models. In this post, let’s see how we can integrate MAML with Stream Analytics and extend David’s IoT analytics architecture.

Businesses need to have an effective strategy for managing customer churn because it costs more to attract new customers than to retain existing ones. Customer churn can take different forms, such as switching to a competitor’s service, reducing the time spent using the service, reducing the number of services used, or switching to a lower-cost service. Companies in the retail, media, telecommunication, and banking industries use churn modeling to create better products, services, and experiences that lead to a higher customer retention rate.

Delving deep into Machine Learning is out of scope for this post. I recommend that you read Predictive Analytics with Microsoft Azure Machine Learning by Apress to get an overview of Azure ML. In this post, we will build a small MAML experiment, publish the experiment as a web service, integrate the web service with a Stream Analytics job, and test the application.

Building The Azure ML Experiment

There are custom templates available in the Cortana Gallery to build real life customer churn experiments such as this one that uses data from the KDD tournament. However, for the purpose of demonstration, I will build a simple and contrived experiment that just works. I would not be walking you through the MAML dashboard. You can easily learn about the various steps involved in building a predictive experiment in the walk-through documented here.

The Dataset

I built a dataset named CustomerChurnDataset, which is a CSV file with the following values that I repeated about 100 times.

I deliberately created a pattern in the dataset for our experiment. According to the pattern, customers in their 20’s churn (value 0) from the provider services, whereas those in their 30’s do not (value 1). I uploaded this dataset to my ML workspace.

The Experiment

Once the data is in place, we can build our ML experiment. Create a new experiment in your ML workspace that consists of the following modules and connections.

The experiment works as follows. The dataset first passes through the Split Data module which divides the data into training and test data. The first output of the Split Data module that contains 90% (0.9 fraction of rows) of the data connects to the Train Model module and is used for training the model. The other 10% of the data is passed to the Score Model module that is used for scoring the predictions of the model.

There are several algorithms available in MAML for binary classification. I picked one of the algorithm modules, Two-Class Boosted Decision Tree module, that does that. The Train Model module trains the model to predict the values of Churn column of the dataset. Following is the property window snapshot of the Train Model module.

Once you run the experiment in your workspace, you can visualize the predictions made by the experiment by clicking on the circle below the Score Model module and selecting Visualize from the list.

Deploy The Experiment As Web Service

Once your experiment is ready, click on Set Up Web Service in the options menu and select Predictive Web Service (Recommended) from the menu options. Once you do that, your experiment will be copied to a new experiment. It will then be modified and a web service input and a web service output endpoint will be added to the experiment.

Run the experiment at this point of time so that you can publish it. Once the experiment run has successfully completed, select Deploy Web Service option from the options menu. Upon successful deployment, you will be presented with your web service dashboard that lists your OAuth key and other helpful information that you can use to access your web service. Let’s use the Test button to test our web service now.

Let’s input a test value with high churn probability to the service, i.e. a consumer of age 21 some other values.

Once the service call is complete, you will find the following output at the bottom of the dashboard.

The result shows that for the input, the service calculated the churn as 1 or true (Scored Label) with a probability of 0.95.

Connecting Azure ML and Stream Analytics

Create a new Stream Analytics job (how) named CustomerChurnJob. Click on the job and select Functions from the menu.

Select Add a Machine Learning Function and populate the values in the dialog window that follows. Set the function alias name as predictchurn.

If the ML service that you previously provisioned is outside your subscription, you would need to specify the service URL and the Authorization Key specified in the API Help Page of your ML Service.

Setting Up The Input and The Output of The Stream Analytics Job

Although I wanted to serve a continuous data stream to the stream analytics job by connecting it to Event Hub or IoT Hub, to keep the sample brief and to the point, let’s serve it static test data from blob storage. Setup Blob Storage as Input of the Stream Analytics Service (how) so that files from container named data serve as input to the function. Set the input alias name as input.

Since the test file that we are going to use will be a CSV with headers, we will set the required properties in the next step of adding Blob Storage Input.

Similarly, setup Blob Storage as Output of the Stream Analytics job (how) so that the output of the job is saved in a container named output. Set the output alias name as output.

Now let’s write a query that fetches data from the storage container, executes the function that we just configured and directs the output of the query to the storage container. Select Query from the top menu and write the following query in the query editor.

WITH subquery AS (
    SELECT predictchurn(Age, CallsPerMonth, InternetUsageInMbPerMonth, Churn) AS result FROM input)
SELECT result.[Age], result.[CallsPerMonth], result.[InternetUsageInMbPerMonth], result.[Scored Labels], result.[Scored Probabilities]
INTO output
FROM subquery

Click on the Start button in the bottom menu to start the job and wait for the job to get in running state.

Executing The Sample

Use the Microsoft Azure Storage Explorer tool to upload a test file to your storage account that you configured as input for the Stream Analytics job. You can create a test file yourself with tools such as Microsoft Excel. Just make sure that the names of the headers are consistent with the parameters that we passed to the function that we provisioned in the Stream Analytics job. Following is the snapshot of the test file that I used for testing the sample.

I uploaded this file to the input container.

Soon after the upload, the Stream Analytics job kicked in and started processing the data.

The output of the job was saved in the target container.

The file contained the scored labels (expected churn value) and the probability of accuracy of the score.

This is a powerful feature of Stream Analytics through which intelligent decisions can be taken on a continuous stream of data. For example, for this very scenario, if the telecommunication companies run analytics on a stream of telephone call data, then they can predict customer satisfaction and take corrective actions even before customers plan on opting out of operator services. I hope that this blog is fun to read and proves helpful. As always, do provide your feedback and comments in the comments section below! Thank you!

Did you enjoy reading this article? I can notify you the next time I publish on this blog... ✍