Intro to Azure Analysis Services

Azure Analysis Services is essentially Analysis Services from SQL Server served as a platform-as-a-service in the Microsoft Azure cloud. The following are the steps I have taken to build, deploy and use Analysis Services for business intelligence. To get a complete overview and see more comprehensive steps, read What is Azure Analysis Services?

In the Azure Portal, create Analysis Service
Azure Analysis Services-1

Notice you can pause to save on costs, backup, set admins and browse models.

Note the Pricing Options
Azure Analysis Services-2

Open Visual Studio with latest SQL Server Data Tools.
Create new Project Analysis Services Tabular Project
Azure Analysis Services-3

Ensure Integrated Workspace is true

Azure Analysis Services-4

You may have to double click on the model.bim to generate the metadata. As a result, the Tabular Model Explorer displays as

Azure Analysis Services-5

Right click on Data Sources > Import From Data Sources > Select Microsoft SQL Azure
I have data prepared in my SQL Azure.

Azure Analysis Services-6

Select Service Account. Click next again.
Select tables to import
Azure Analysis Services-7

As a result of importing tables
Azure Analysis Services-8

After some modelling activities, such as creating relationships, date hierarchy, location hierarchy, show/hide tables or columns to client tools, etc.
Azure Analysis Services-9

To set up for deployment to Azure Analysis Services – right click on the project > Properties
Set the Server to the Analysis Services Server name asazure://canadacentral.asazure.windows.net/rkas
Azure Analysis Services-11

To deploy project and Deploy. You may get prompted for credentials. Ensure this is credentials for an account in your Azure AD directory (not a windows live account like hotmail) and is part of the Analysis Services Admins. Also, ensure you have installed the latest SSDT. Otherwise, you get a vague error message stating ‘An error occurred while connecting to the server.’
Azure Analysis Services-12

Deploying…
Azure Analysis Services-13

To test the deployment, I used SQL Server Management Studio to connect to AS and query the database
Azure Analysis Services-14

This confirms deployment was successful.
Also, using Power BI to connect to AS
Azure Analysis Services-15

Enter server Url to AS
Azure Analysis Services-16

Select the model
Azure Analysis Services-17

Create some charts with some measures, dimensions and hierarchies.
Azure Analysis Services-18

Azure Analysis Services is easy to work and the developer experience is quite the same with SQL Server. The ability to connect to other cloud data sources like SQL Azure and have Power BI service connect to AS provides an entire cloud solution.


A Basic Demo of an Azure Machine Learning Experiment

In the recent months, the hype around Machine Learning has caught my attention. Therefore, I spent reading articles and watching videos and university lectures on the topic. Wearing the developer hat, I wondered how a developer can build a machine learning solution in Azure. This blog post hopes to share how I built a machine learning experiment using Azure Machine Learning Studio. By no means am I a data scientist. And forgot everything from my statistics courses in university. I come to understand that a machine learning solution is much less about the tools, but more so about statistical modelling and cleaning the data. I hope to show the general steps to build an experiment that will assist other beginners in understanding Azure Machine Learning.

In Azure Portal, go to Machine Learning workspaces. I had one already created.
A Basic Demo of an Azure Machine Learning Experiment-1

Click on Launch Machine Learning Studio
A Basic Demo of an Azure Machine Learning Experiment-2

This will take you essentially to https://studio.azureml.net
Upload a data set. I will upload a .tsv file I produced with Azure Data Lake Analytics.
A Basic Demo of an Azure Machine Learning Experiment-3
In the left navigation DATASETS, click NEW, select data file from local computer to upload. Note my file has a column header row for ease of data manipulation.
A Basic Demo of an Azure Machine Learning Experiment-4

Create new Experiment>Blank Experiment
A Basic Demo of an Azure Machine Learning Experiment-5

We see the design surface. Drag the data set.
A Basic Demo of an Azure Machine Learning Experiment-6

We can do further do data cleanup and manipulation with the operations on the left pane. For example, I will remove any duplicates based on a selected key field.
A Basic Demo of an Azure Machine Learning Experiment-7

Split the rows of a dataset into two distinct sets. One for training the model and one for scoring the model.
A Basic Demo of an Azure Machine Learning Experiment-8

To predict salary, I use the Linear Regression module. Regression is a machine learning used to predict a numeric outcome. Linear regression attempts to establish a linear relationship between independent variables and an outcome variable, or dependent variable, that is also numeric. Read more here

Use the Train Model module. Training a classification or regression model is a kind of supervised machine learning. That means you must provide a dataset that contains historical data from which to learn patterns. The data should contain both the outcome you are trying to predict, and related factors (variables). The machine learning model uses the data to extract statistical patterns and build a model. Read more here

One of the split data sets is used to train model.
A Basic Demo of an Azure Machine Learning Experiment-9

Once the model has been trained, we can use the 2nd split data set to make predictions by using the Score Model module. Read more about it here

Use the Evaluate Model module to see how accurate the predictions were from the 2nd split data set.
A Basic Demo of an Azure Machine Learning Experiment-10

To resolve, I will select a subset of columns (city, salary, and jobdescription) and use the Clean Missing Databy removing rows where there is missing salary. In addition, to improve on the accuracy, I add Extract N-Gram Features from Text where jobdescription selected text column. This module is to featurize the long text string and infer valuable information that may help provide a correlation between the job posting and its salary. I must say that this may not be the right approach, but want to demonstrate the use of such modules in an experiment. For further details on this module read Extract N-Gram Features from TextA Basic Demo of an Azure Machine Learning Experiment-11

To resolve, I will select a subset of columns (city, salary and jobdescription) and use the Clean Missing Data by removing rows where there is missing salary. In addition, to improve on the accuracy, I add Extract N-Gram Features from Text where jobdescription selected text column. This module is to featurize the long text string and infer valuable information that may help provide a correlation between the job posting and its salary. I must say that this may not be the right approach, but want to demonstrate the use of such modules in an experiment. For further details on this module read Extract N-Gram Features from Text
A Basic Demo of an Azure Machine Learning Experiment-12

Click Run.
A Basic Demo of an Azure Machine Learning Experiment-13

To see a sampling of scored dataset, go to Visualize
A Basic Demo of an Azure Machine Learning Experiment-14

We can see the predicted salary value in the scored label column. In this one row, as an example, the actual salary is 70,661. Based on the city, job description through the N-gram extraction, the predicted salary is 47,904. Quite a bit off. Let’s evaluate the model as a whole.
A Basic Demo of an Azure Machine Learning Experiment-15

Right click into Evaluate Model Evaluation results Visualize
A Basic Demo of an Azure Machine Learning Experiment-16

The metrics provide details on the amount of error. For example, mean absolute error shows the level of accuracy of predictions against the actual outcomes. Likely using the 2nd split data set. For more details on the metrics read Evaluate Model
A Basic Demo of an Azure Machine Learning Experiment-17

This is where statistical knowledge and expertise comes into place in tweaking the parameters of your machine learning module and model and then re-run the experiment. Continue to iterate until you have a desired trained model with low amount of error.
You then create a web service that takes the parameters to make the salary prediction, based on some parameters.
A Basic Demo of an Azure Machine Learning Experiment-18

As a result, the experiment becomes
A Basic Demo of an Azure Machine Learning Experiment-19

For further details and steps on the predictive web service, read How a Machine Learning model progresses from an experiment to an operationalized Web service

This concludes my very basic demo of an Azure ML experiment.

My take-aways from my introductory learning experience of Azure Machine Learning and background reading is that

  • machine learning is simply just modelling with statistics and mathematics.
  • prediction can be an outcome with or without an associated probability.
  • a developer needs a robust understanding of statistics much more so than the software engineering. In other words, a .NET developer can’t just pick it up and learn the technical tooling and provide a solution. Partnering up with a statistician/data scientist would be more productive.
  • machine learning algorithms is like an application being written for you based on a model, input and predicted outcome. This contrasts with formulating a model and figuring out the business rules and logic, parameters and writing the application.

Hope to learn more about machine learning and build on my understanding as technology platforms continue to add some ML service offerings.


Power BI Embedded Walk Through Part 3 of 3

Power BI Embedded Walk Through Part 1 of 3
Power BI Embedded Walk Through Part 2 of 3

In the previous 2 blog articles, I have shown how to create a Power BI Embedded       Workspace collection, import a Power BI desktop file and get generated embed details.

To recap from the last post, I obtained the embed Url Embed Url: https://embedded.powerbi.com/appTokenReportEmbed?reportId=47e3e117-65b9-4bb4-8283-98525e5b7c59
Embed Token:
eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ2ZXIiOiIwLjIuMCIsInR5cGUiOi
JlbWJlZCIsIndjbiI6InJrcGJpIiwid2lkIjoiMjY0MTJlNGQtNmQ0YS00ZTE1LTk3MzEtOTMxNzdkNT
U1ODNmIiwicmlkIjoiNDdlM2UxMTctNjViOS00YmI0LTgyODMtOTg1MjVlNWI3YzU5Iiwic2NwIjoiUm
Vwb3J0LlJlYWRXcml0ZSIsImlzcyI6IlBvd2VyQklTREsiLCJhdWQiOiJodHRwczovL2FuYWx5c2lzLn
dpbmRvd3MubmV0L3Bvd2VyYmkvYXBpIiwiZXhwIjoxNDk0NDY5ODY5LCJuYmYiOjE0OTQ0NjYyNjl9.n
ZW5lcop9VNLf1mWzc3-QUN2L1WaMe_b5bI5SQ6CojU

Create or use an existing web application. For myself, I created a new ASP .NET web application with Single Page Application template
Power BI Embedded Walk Through Part 3 of 3-1

I add the Power BI JavaScript nugget package into my project.
Power BI Embedded Walk Through Part 3 of 3-2
Thus, I see the powerbi.js file added
Power BI Embedded Walk Through Part 3 of 3-3

The following places to put the code.
In App_Start\BundleConfig.cs, add the Power BI script reference

bundles.Add(newScriptBundle("~/bundles/app").Include(
"~/Scripts/sammy-{version}.js",
"~/Scripts/app/common.js",
"~/Scripts/powerbi.js",
"~/Scripts/app/app.datamodel.js",
"~/Scripts/app/app.viewmodel.js",
"~/Scripts/app/home.viewmodel.js",
"~/Scripts/app/_run.js"));

In Scripts\app\home.viewmodel.js, add the PowerBI embedded JavaScript code snippet. Note the configuration needed for the accessToken, embedUrl, and id. You can get this in my post of part 2 of 3. As for the accessToken, it has a short expiry. In the ProvisionSample App, there is a call to createEmbedToken, and you can pass an expiry date to control the lifetime of the token. As for some refresh token mechanism, I have yet figure out how that would happen.

As or KnockoutJS conventions, I could have not used JQuery on the reportContainer div and did proper binding.

functionHomeViewModel(app, dataModel) {
var self = this;
    self.myHometown = ko.observable("");

    Sammy(function () {
// Removed function code for brevity
    });

var models = window['powerbi-client'].models;

var PBIconfig = {
        type: 'report',
accessToken: 'eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ2ZXIiOiIwLjIuMCIsInR5cGUiOiJlbWJlZCIsIndjbiI6InJrcGJpIiwid2lkIjoiMjY0MTJlNGQtNmQ0YS00ZTE1LTk3MzEtOTMxNzdkNTU1ODNmIiwicmlkIjoiMDYxODlkYmQtYWRhZS00YWRiLTljMDQtZTNkMGNkZmIxZmQ3Iiwic2NwIjoiUmVwb3J0LlJlYWRXcml0ZSIsImlzcyI6IlBvd2VyQklTREsiLCJhdWQiOiJodHRwczovL2FuYWx5c2lzLndpbmRvd3MubmV0L3Bvd2VyYmkvYXBpIiwiZXhwIjoxNDk0NjQwMzM3LCJuYmYiOjE0OTQ1NTM5Mzd9.VroIsYUxVDzbgHl0HyJ2sJTPRWWtQ6wjzZkrRXP8SmQ',
embedUrl: 'https://embedded.powerbi.com/appTokenReportEmbed?reportId=06189dbd-adae-4adb-9c04-e3d0cdfb1fd7',
id: '06189dbd-adae-4adb-9c04-e3d0cdfb1fd7',
        permissions: models.Permissions.All /*gives maximum permissions*/,
        viewMode: models.ViewMode.Edit,
        settings: {
            filterPaneEnabled: true,
            navContentPaneEnabled: true
        }
    };

var reportContainer = $('#reportContainer')[0];
var report = powerbi.embed(reportContainer, PBIconfig);
// Report.off removes a given event handler if it exists.
    report.off("loaded");
// Report.on will add an event handler which prints to Log window.
    report.on("loaded", function () {
        Log.logText("Loaded");
    });
    report.off("error");
    report.on("error", function (event) {
        Log.log(event.detail);
    });
    report.off("saved");
    report.on("saved", function (event) {
        Log.log(event.detail);
if (event.detail.saveAs) {
            Log.logText('In order to interact with the new report, create a new token and load the new report');
        }
    });

return self;
}

Go to Views\Home\_Home.cshtml and the div container for the iFrame to display the Power BI report
<divid="reportContainer"style="height:600px"></div>

Run the application in preferably Chrome browser
Power BI Embedded Walk Through Part 3 of 3-4

Since this is in Edit mode, the user can interactively edit the chart, add to it as if you were in Power BI Service or Power BI desktop. Also, the ability save or save as. I find this quite neat and serves DIY analytical scenarios.

Going back to Azure Portal, since a user had started a session with this report, you can see the one embed session.
Power BI Embedded Walk Through Part 3 of 3-5

Final Remarks
My experience to learn about Power BI Embedded was quite challenging and time-consuming as each online documentation told part of the entire developer story. I hope I have shown a good end to end walkthrough to understand what is involved.


Power BI Embedded Walk Through Part 2 of 3

In my previous blog post, Power BI Embedded Walk Through Part 1 of 3 I have shown how to create a Power BI Embedded workspace collection in the Azure Portal as one of the first major steps. In this post, I will show how to get your Power BI report into the Azure Power BI Embedded workspace and get the embed details needed to set up the client side code in your web application.

To manage a workspace and upload a PBIX file, unfortunately, there isn’t a UI yet and must rely one the following options at the very least

I opted for the ProvisionSample app.
So, download the code from the github repository
Power BI Embedded Walk Through Part 2 of 3-1
Extract the zip and open the .sln file
Power BI Embedded Walk Through Part 2 of 3-2
Restore the nugget packages by right clicking on the solution. Otherwise, you will get build errors.
Power BI Embedded Walk Through Part 2 of 3-3Open the App.config file and fill out the app settings

<appSettings file="Cloud.config">
<!-- Your Azure subscription ID -->
<add key="subscriptionId" value="" />
<!-- The Azure resource group name -->
<add key="resourceGroup" value="" />
<!-- The Power BI Workspace Collection Name -->
<add key="workspaceCollectionName" value="" />
<!-- The Power BI Workspace Collection Access Key -->
<add key="accessKey" value="" />
<!-- The Power BI Workspace-->
<add key="workspaceId" value="" />
<!-- Credentials to connect to datasource within Power BI -->
<add key="username" value="" />
<add key="password" value="" />
</appSettings>

At initial start, you can get the subscription id, resource group, workspace collection name and access key from the Azure Portal.
Power BI Embedded Walk Through Part 2 of 3-4

Get one of the Access Keys
Power BI Embedded Walk Through Part 2 of 3-5

Update the app.config file with these values
Run the ProvisionSample project in Visual Studio
Console application loads.
Power BI Embedded Walk Through Part 2 of 3-6

We first need to create a workspace
Enter 1 to go to Collection Management
Enter 6 to Provision a new Workspace
Enter a name for the workspace
Copy the workspace Id and enter it into the app.config file. This is so that you don’t have to enter it again in the console each time you want to work in that workspace.
Power BI Embedded Walk Through Part 2 of 3-7
To import the PBIX file, enter 7 to Exit group
Enter 2 to go to Report management

Power BI Embedded Walk Through Part 2 of 3-8
Enter 3 to Import PBIX Desktop file
Press Enter to work with current workspace
Enter dataset name to skip for now
Enter the full path and file name

Power BI Embedded Walk Through Part 2 of 3-9

To get the embed details to embed the report into your app, enter 8 to generate embed details
Hit enter for default workspace collection and workspace id
For embed mode, I select 2 for Edit & Save Mode
Press enter for first report
Press enter of no row level security
Press enter to skip RLS for roles
Press Y to copy embed token to clipboard
Power BI Embedded Walk Through Part 2 of 3-10
Also, copy the Embed Url:
Embed Url: https://embedded.powerbi.com/appTokenReportEmbed?reportId=47e3e117-65b9-4bb4-8283-98525e5b7c59
In the Azure Portal, we can see workspace just created displayed
Power BI Embedded Walk Through Part 2 of 3-11

Next, we will look to how to embed this report into a web application Power BI Embedded Walk Through Part 3 of 3

 


Power BI Embedded Walk Through Part 1 of 3

Power BI Embedded is an interesting approach to allow web applications to embed Power BI reports such that users of the application do not require any Office 365 account and user license. In trying to figure out how to do a simple demo for myself, I found the documentation to be fragmented. I will provide a simple end-to-end walkthrough of embedding a Power BI report into a web application.

The online documentation to help me piece together this walkthrough is as follows:

      High-level steps

  1. In Azure Portal, create workspace collection
  2. Create or use a Power BI report using Power BI Desktop
  3. Use the ProvisionSample code to import PBIX file and get embed details
  4. Create or use a web application
  5. Develop client side code to embed the Power BI report into the web application
  6. Test the application and embedded report

1.  Go to Azure Portal, create Power BI Embedded                                                                          Power BI Embedded Walk Through Part 1 of 3-1

2. Fill out details
Power BI Embedded Walk Through Part 1 of 3-2

3. Workspace collection
Power BI Embedded Walk Through Part 1 of 3-3
4. Where do I create a workspace? Where do I import a Power BI report file? Where do I get Power BI embed details for client side code?
You can’t at the moment. A disappointment if you ask me. You need to use the SDK and APIs from https://www.nuget.org/profiles/powerbi

Therefore, obtain the Access Keys to use the with SDK and APIs
Power BI Embedded Walk Through Part 1 of 3-4
For the next steps of creating a workspace, importing a Power BI desktop file and getting embed details, see the next Power BI Embedded Walk Through Part 2 of 3


Building a Spark Application for HDInsight using IntelliJ Part 2 of 2

In continuation from my blog article Building a Spark Application for HDInsight using IntelliJ Part 1 of 2 which outlines my experience in installing IntelliJ, other dependent SDKs and creating an HDInsight project.

To add some code, right click src, create Scala Class
Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-1
Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-2

Project folders and MainApp
Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-3

Scala code:

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SQLContext

object MainApp{
def main (arg: Array[String]): Unit = {
val conf = new SparkConf().setAppName("MainApp")
val sc = new SparkContext(conf)

val rdd = sc.textFile("adl://rkbigdata.azuredatalakestore.net/MyDatasets/Crimes_-_2001_to_present_pg2.csv")
//find the rows where primary type == THEFT
val rdd1 =  rdd.filter(s =&amp;amp;amp;amp;gt; s.split(",")(5) == "THEFT")

val spark = SparkSession.builder().appName("Spark SQL basic").enableHiveSupport().getOrCreate()

    spark.sql("USE usdata")
val crimesDF = spark.sql("SELECT * FROM CRIMES WHERE primarytype == 'NARCOTICS'")

    // save data frame of results into an existing or non-existing hive table.
crimesDF.write.mode("overwrite").saveAsTable("crimebytype_NARCOTICS")
  }

}

Logic

  1. Read from csv file in Azure Data Lake Store into RDD
  2. Filter RDD for rows where primary type field is “THEFT”
  3. Set Hive Database to usdata (from default database)
  4. Query Hive table CRIMES for rows primary type field is “THEFT”
  5. Save data frame into new or existing crimebytype_NARCOTICS hive table.

Begin to setup IntelliJ to submit application to HDInsight

Before using Azure Explorer, I encountered an issue where signing in resulted in an error and it kept prompting me to enter the credentials. Sorry I didn’t capture the error message. And so, I was led to disable Android Support by checking it off.
Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-4

Click Ok.
Sign into Azure via Azure Explorer
Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-5
Select Interactive
Enter credentials

Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-6

See the Azure resources display

Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-7
Right click the Project and click on Submit Spark Application to HDInsight

Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-8
Set Main class name to MainApp
Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-9
HDInsight Spark Submission window
Confirm success

Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-10
Go to Ambari Hive View to query the hive table created from the spark application.
Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-11
From Jupyter notebook, I query the same hive table.

Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-12

I have shown a walk through of setting up the development tooling and building a simple spark application and run against HDInsight Spark 2.0 Cluster.


Building a Spark Application for HDInsight using IntelliJ Part 1 of 2

For developers with a Microsoft .NET background who want to get familiar with building Spark applications with Scala programming language, this blog post series is a walk through from installing the development tools and building a simple Spark application, then submit against an HDInsight Spark cluster.

My HDInsight configuration is Spark 2.0 (HDI 3.5) with Azure Data Lake Store as the primary storage.

The articles I used to understand the installation and setup:
https://docs.microsoft.com/en-us/azure/azure-toolkit-for-intellij-installation

A point of confusion for me was that HDInsight tools for IntelliJ is deprecated. It has been rolled into Azure Toolkit for IntelliJ. So, any online articles referring to HDInsight tools are valid, but see it with Azure Toolkit in mind.
Download

I’m working on a Windows Server 2012 R2 VM.
JDK 8
Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-1

IntelliJ Installation
Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-2
Start IntelliJ
Create New Project
Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-3

Before creating a project, we need to install Azure Toolkit so that we can use Spark on HDInsight project template.
Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-4Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-5
Select Spark On HDInsight (Scala) project template. As a side note, an alternate approach is with Maven, but I found this approach generally easier.

Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-6

Enter Project Name
For Project SDK, Select Java 1.8 by finding it at C:\Program Files\Java\jdk1.8.0_131
For Scala SDK, Download and select latest version. In this screen shot I had previously downloaded it.
For Spark SDK, Select and find where you downloaded spark-assembly-2.0.0-hadoop2.7.0-SNAPSHOT.jar

Selected SDKsBuilding a Spark Application for HDInsight using IntelliJ Part 1 of 2-8
Note: For Spark 2.0 cluster, you need Java 1.8 and Scala 2.11 or above.
Click Finish
Go to File >Project Structure, setProject Language level to 7Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-9

To write code and submit to HDInsight, see my next blog post Building a Spark Application for HDInsight using IntelliJ Part 2 of 2


Build Power BI Reports with HDInsight Spark Connector

With Power BI Desktop you can use the Azure HDInsight Spark BI Connector to get data from the Spark cluster to build reports. I have an HDInsight Spark 2.0 cluster with Azure Data Lake Store as the primary storage.

Open Power BI Desktop

Click Get Data
Build Power BI Report with HDInsight-1

Enter url of your HDInsight Spark cluster.
Build Power BI Report with HDInsight-2

Enter the cluster admin credentials. This is the same credentials for Ambari.
Build Power BI Report with HDInsight-3

We are able to see a list of tables and views along with a preview of data. In my case, usdata is the database. Crimes and crimesgroupbytype are internal hive tables, crimes_ext is an external table and crimesgroupbytype_view is a view.

I selected crimes, crimesgroupbytype and crimesgroupbytype_view.
Click Load to generate queries for each
Build Power BI Report with HDInsight-4

The Query Editor
Build Power BI Report with HDInsight-5

When I click Apply & Close, the crimes query results in an error. My suspicion is that crimes has over a million rows where the other queries are only dealing with several hundred rows. Since this connector is in beta, perhaps I have to wait for final release.
Build Power BI Report with HDInsight-6

To continue, I delete usdata crimes query.
Build Power BI Report with HDInsight-7

Click Close & Apply

In Report Page you can see your tables in Fields pane from the two queries
Build Power BI Report with HDInsight-8

I build my report of number of crimes by each crime. Also filters to the right on year and crime type.
Build Power BI Report with HDInsight-9

HDInsight and Power BI is a powerful combination to work with big data and the ability to transform, analyze and visualize data with Power BI desktop.


 

Re-Create HDInsight Cluster with Pre-Existing Azure Data Lake Store and Hive Metastore

Upon creating an HDInsight cluster for big data analytics development, the costs can be quite high at around $4 an hour. This adds up, especially when running idle most of the time. To save on the costs, I delete the cluster and re-create when needed.

For simple development purposes, my HDInsight depends on

  • Azure Data Lake Store as its primary data storage (i.e. HDFS compliant)
  • Hive Metastore Azure SQL DB, to preserve my hive tables

Re-creating manually in Azure Portal is time-consuming and tedious. Therefore, I script in Power Shell the creation of the HDInsight cluster and re-point to ADLS and Hive Metastore. The costs of these services are relatively very small.

I used the following online references, but they don’t exactly serve my configuration.

Create HDInsight clusters with Data Lake Store as default storage by using PowerShell

New- Azure Rm HD Insight Cluster

With a lot of trial and error, the following script is a working start.

The high-level logic is

  1. Login to Azure with an admin account and set the subscription
  2. Setup the ADLS service principal and certificate so that HDInsight cluster has access to ADLS
  3. Grant permissions to ADLS folders
  4. Setup config for Hive metastore DB and HDInsight cluster identity
  5. Create the HDInsight cluster

Power Shell Script

# Existing ADLS
$dataLakeStoreName = "rkdatalake"
$subscriptionId = "<Sub Id>"
$resourceGroupName = "rkbigdata"
$myrootdir = "/"
$certificateFilePath = "C:\Users\Roy\Downloads\adls-cert-rkdatalake.pfx"
$certificatePassword = "Winterwinter09"
# Existing Hive MetaStore
$hivemetastoreSqlAzureServerName = "rkbigdata.database.windows.net"
$hivemetastoreDBName = "HiveMetaStore"
$hivemetastoreDBUsername = "rkim";
$hivemetastoreDBPassword = ConvertTo-SecureString "<Password>" -AsPlainText -Force
$hivemetastoreCredentials = New-Object System.Management.Automation.PSCredential ($hivemetastoreDBUsername, $hivemetastoreDBPassword)
# HDInsight Cluster (To be created)
$clusterRootPath = $myrootdir+"clusters/rkhdinsight"
$clusterType = "Spark"
$clusterName = "rkhdinsight"
$clusterVersion = "3.5"
$location = "Central US" # Region
$storageRootPath = $clusterRootPath # E.g. /clusters/hdiadlcluster
$clusterNodes = 3            # The number of nodes in the HDInsight cluster
$adminName = "admin"
$adminPassword = ConvertTo-SecureString "<Password>" -AsPlainText -Force
$httpCredentials = New-Object System.Management.Automation.PSCredential ($adminName, $adminPassword)
$sshuserName = "sshuser"
$sshuserPassword = ConvertTo-SecureString "<Password>" -AsPlainText -Force
$sshuserCredentials = New-Object System.Management.Automation.PSCredential ($sshuserName, $sshuserPassword)
$hivemetastoreAdmin = "kimr"
$metastoreadminPassword = ConvertTo-SecureString "<Password>" -AsPlainText -Force
$metastoreadminCredentials = New-Object System.Management.Automation.PSCredential ($hivemetastoreAdmin, $metastoreadminPassword)

$errorVar

# Sign in to your Azure account
Login-AzureRmAccount
# List all the subscriptions associated to your account
Get-AzureRmSubscription
# Select a subscription
Set-AzureRmContext -SubscriptionId $subscriptionId

$resourceGroup = Get-AzureRmResourceGroup -Name $resourceGroupName -Location "Central US"
Test-AzureRmDataLakeStoreAccount -Name $dataLakeStoreName
New-AzureRmDataLakeStoreItem -Folder -AccountName $dataLakeStoreName -Path $clusterRootPath -Confirm $true -Force

# ADLS Service Principal and Certificate
$certificatePFX = New-Object System.Security.Cryptography.X509Certificates.X509Certificate2($certificateFilePath, $certificatePassword)

# Service Principal which is set to have access to ADLS
$servicePrincipal = Get-AzureRmADServicePrincipal -SearchString rkdatalakestoretemp
$servicePrincipalobjectId = $servicePrincipal.Id
write-host "service principal object Id: " $servicePrincipalobjectId 

# Grant the Service Prinicipal permissions to the following 3 folders in ADLS
Set-AzureRmDataLakeStoreItemAclEntry -AccountName $dataLakeStoreName -Path / -AceType User -Id $servicePrincipalobjectId -Permissions All
Set-AzureRmDataLakeStoreItemAclEntry -AccountName $dataLakeStoreName -Path /clusters -AceType User -Id $servicePrincipalobjectId -Permissions All
Set-AzureRmDataLakeStoreItemAclEntry -AccountName $dataLakeStoreName -Path $clusterRootPath -AceType User -Id $servicePrincipalobjectId -Permissions All

$tenantID = (Get-AzureRmContext).Tenant.TenantId

# Setup configuration to existing Hive Metastore Azure SQL DB and HDInsight cluster Identity
$azureHDInsightConfig = New-AzureRMHDInsightClusterConfig -debug -ErrorVariable $errorVar `
     -ClusterType Spark `
    | Add-AzureRMHDInsightMetastore `
        -SqlAzureServerName $hivemetastoreSqlAzureServerName `
        -DatabaseName $hivemetastoreDBName `
        -Credential $hivemetastoreCredentials `
        -MetastoreType HiveMetastore `
    | Add-AzureRmHDInsightClusterIdentity `
        -AadTenantId $tenantId `
        -ObjectId $servicePrincipalobjectId `
        -CertificateFilePath $certificateFilePath `
        -CertificatePassword $certificatePassword `

# output any error
Write-Output $errorVar

# Create new HDInsight cluster with config object
New-AzureRmHDInsightCluster `
    -ClusterType $clusterType `
    -OSType Linux `
    -ClusterSizeInNodes 2 `
    -ResourceGroupName $resourceGroupName `
    -ClusterName $clusterName `
    -HttpCredential $httpCredentials `
    -Location $location `
    -DefaultStorageAccountType AzureDataLakeStore `
    -DefaultStorageAccountName "$dataLakeStoreName.azuredatalakestore.net" `
    -DefaultStorageRootPath $clusterRootPath `
    -Version 3.5 `
    -SshCredential $sshuserCredentials `
    -AadTenantId $tenantId `
    -ObjectId $servicePrincipalobjectId `
    -Config $azureHDInsightConfig `
    -debug -ErrorVariable $errorVar `

# output any error
Write-Output $errorVar

Upon executing the script, go to the Azure Portal and see if the HDInsight cluster is being created. If you see Applying changes then it is in progress. This takes about 15-20 minutes.
Re-Create HDInsight Cluster with Pre-Existing Azure Data Lake Store and Hive Metastore-1

Pain Points

  • General typos such as file paths with too many slashes and misspellings. Just be extra careful.
  • The following error was a false positive. The .pfx file indeed has a private key.
-	"errors": [
-	      {
-	        "code": "InvalidDocumentErrorCode",
-	        "message": "DeploymentDocument 'AmbariConfiguration_1_7' failed the validation. Error: 'Error while getting access to the datalake storage account rkdata
-	lake: The private key is not present in the X.509 certificate..'"
-	      }

To resolve, I had the certificate configured as

Add-AzureRmHDInsightClusterIdentity `
        -AadTenantId $tenantId `
        -ObjectId $servicePrincipalobjectId `
        -CertificateFilePath $certificateFilePath `
        -CertificatePassword $certificatePassword

rather than -CertificateFIleContents

New-AzureRmHDInsightCluster : Response status code indicates server error: 500 (InternalServerError).
At line:3 char:1
+ New-AzureRmHDInsightCluster `
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : CloseError: (:) [New-AzureRmHDInsightCluster], CloudException
    + FullyQualifiedErrorId : Microsoft.Azure.Commands.HDInsight.NewAzureHDInsightClusterCommand

Debugging Tips

-Debug switch

The –Debug switch allows seeing various execution diagnostics and any specific error message. For example and in some cases, able to see some details of the HTTP request and HTTP response along with JSON data.

Error Variable

https://blogs.msdn.microsoft.com/powershell/2006/11/02/erroraction-and-errorvariable/

Other Tips

Don’t run provisioning script right immediately after deleting cluster and confirmation. Wait a couple of minutes so that all the Azure resources behind the scenes are cleared up even if the Azure Portal UI confirms it has been deleted.


			

Text Analytics of Movie Reviews using Azure Data Lake, Cognitive Services and Power BI (part 2 of 2)

Applicable Business Scenario

Marketing or data analysts who need to review sentiments and key phrases of a very large data set of consumer-based movie reviews.

Applied Technologies

  1. Azure Data Lake Store
  2. Azure Data Lake Analytics
    1. Cognitive Services
  3. Visual Studio with Azure Data Lake Analytics Tools
  4. Power BI Desktop & Power BI Service
  5. SharePoint Online site and preview of Power BI Web Part

Power BI Desktop

Use Power BI Desktop as the report authoring tool.

Data Source

Get Data from Azure Data Lake Store. Retrieve the output of the U-SQL script executed in Azure Data Lake Analytics in part 1 of this blog series.

TextAnalytics-6.png

Data Source to Azure Data Lake Store

textanalytics-7Point to the folder containing the .tsv (tab delimited) files which was the output of the U-SQL script execution.

Provide credentials to an account that has permissions to the Azure Data Lake Store. In this case, it was an Azure AD account.

Queries

Create a query for each .TSV file

textanalytics-8

textanalytics-9

Relationships

Define 1 to many relationship based on the ID of each movie review.

textanalytics-10

Reports/Visualization

Sentiment confidence value for each of the 2000 movie reviews

textanalytics-11

 

textanalytics-12

Publish

Click on ‘Publish’ to upload your report to Power BI Service in the MS cloud. You can view in http://app.powerbi.com with your Office 365 or Microsoft account.

textanalytics-13

SharePoint Online

If you want to publish and share this report to a wide audience via a SharePoint online site, you can leverage the new Power BI Web Part (currently preview as of Feb 2017). I have displayed this report in the latest SPO modern page experience as a publishing page. For each user that views the report must have a Power BI Pro license which is not free.

textanalytics-14

To configure, you need to create a modern publishing page displaying the power BI report via Power BI Web Part (preview).

textanalytics-15

Web Part in Edit Modetextanalytics-16

Enter the report link which you get from Power BI Service at http://app.powerbi.com

textanalytics-17

Options to further extend this solution

  • For the movie reviews .csv file, one can add date/time, movie being reviewed, genre, location and any other descriptive metadata. Thus, supporting more reporting and insights.
  • Overlay this data set against other data sent for correlation such as related news events, weather, popular movies trending, other movie reviews sources, etc. This is to find any cause and effect relationships for diagnostic insights – “Why is this happening?”.
  • To get data from internal network to Azure Data Lake or any Azure storage account, an option is to use the Data Management Gateway. This is installed within the internal network to allow to transfer files and data from other internal data sources with no to little corporate firewall changes.Move data between on-premises sources and the cloud with Data Management Gateway

Closing Remarks

Azure Cognitive Services built into Azure Data Lake Analytics is a suitable option for very high volume, unstructured and complex processing of data. This is such that the scalable computing power is needed. In addition, this priced in a pay-per use model in making it cost-effective in many scenarios. The agility of Azure services allows to experiment, iterate quickly and fail-fast in finding the right technical solution and applying the right techniques and approach. This article highlights how data can be ingested, analyzed/processed, modeled, visualized and then published to a business audience.