Build Power BI Reports with HDInsight Spark Connector

With Power BI Desktop you can use the Azure HDInsight Spark BI Connector to get data from the Spark cluster to build reports. I have an HDInsight Spark 2.0 cluster with Azure Data Lake Store as the primary storage.

Open Power BI Desktop

Click Get Data
Build Power BI Report with HDInsight-1

Enter url of your HDInsight Spark cluster.
Build Power BI Report with HDInsight-2

Enter the cluster admin credentials. This is the same credentials for Ambari.
Build Power BI Report with HDInsight-3

We are able to see a list of tables and views along with a preview of data. In my case, usdata is the database. Crimes and crimesgroupbytype are internal hive tables, crimes_ext is an external table and crimesgroupbytype_view is a view.

I selected crimes, crimesgroupbytype and crimesgroupbytype_view.
Click Load to generate queries for each
Build Power BI Report with HDInsight-4

The Query Editor
Build Power BI Report with HDInsight-5

When I click Apply & Close, the crimes query results in an error. My suspicion is that crimes has over a million rows where the other queries are only dealing with several hundred rows. Since this connector is in beta, perhaps I have to wait for final release.
Build Power BI Report with HDInsight-6

To continue, I delete usdata crimes query.
Build Power BI Report with HDInsight-7

Click Close & Apply

In Report Page you can see your tables in Fields pane from the two queries
Build Power BI Report with HDInsight-8

I build my report of number of crimes by each crime. Also filters to the right on year and crime type.
Build Power BI Report with HDInsight-9

HDInsight and Power BI is a powerful combination to work with big data and the ability to transform, analyze and visualize data with Power BI desktop.


 

Leave a Reply