Applicable Business Scenario
Marketing or data analysts who need to review sentiments and key phrases of a very large data set of consumer-based movie reviews.
- Azure Data Lake Store
- Azure Data Lake Analytics
- Visual Studio with Azure Data Lake Analytics Tools
- Power BI Desktop & Power BI Service
- SharePoint Online site and preview of Power BI Web Part
Azure Data Lake Store
Upload .csv file of 2000 movie reviews to a folder in Azure Data Lake Store
Azure Data Lake Analytics
Execute the following U-SQL script in either the Azure Portal > Azure Data Lake Analytics > Jobs > New Jobs or Visual Studio with Azure Data Lake Analytics Tools.
This script makes reference to the Cognitive Services assemblies. They come out of the box in the Azure Data Lake master database.
The following script reads the moviereviews.csv file in Azure Data Lake Store and then analyzes for sentiment and key phrase extraction. Two .tsv files are produced, one with the sentiment and key phrases for each movie review and another for a list of each individual key phrase with a foreign key ID to the parent movie review.
REFERENCE ASSEMBLY [TextCommon]; REFERENCE ASSEMBLY [TextSentiment]; REFERENCE ASSEMBLY [TextKeyPhrase];@comments = EXTRACT Text string FROM @"/TextAnalysis/moviereviews.csv" USING Extractors.Csv(); @sentiment = PROCESS @comments PRODUCE Text, Sentiment string, Conf double READONLY Text USING new Cognition.Text.SentimentAnalyzer(true); @keyPhrases = PROCESS @sentiment PRODUCE Text, Sentiment, Conf, KeyPhrase string READONLY Text, Sentiment, Conf USING new Cognition.Text.KeyPhraseExtractor(); @keyPhrases = SELECT *, ROW_NUMBER() OVER () AS RowNumber FROM @keyPhrases; OUTPUT @keyPhrases TO "/TextAnalysis/out/MovieReviews-keyPhrases.tsv" USING Outputters.Tsv(); // Split the key phrases. @kpsplits = SELECT RowNumber, Sentiment, Conf, T.KeyPhrase FROM @keyPhrases CROSS APPLY new Cognition.Text.Splitter("KeyPhrase") AS T(KeyPhrase); OUTPUT @kpsplits TO "/TextAnalysis/out/MovieReviews-kpsplits.tsv" USING Outputters.Tsv();
Azure Portal > Azure Data Lake Analytics U-SQL execution
Create a new job to execute a U-SQL script.
Visual Studio Option
Create a U-SQL project and paste the script. Submit the U-SQL script to the Azure Data Lake Analytics for execution. The following shows the successful job summary.
Power BI Desktop
Use Power BI Desktop as the report authoring tool.
Get Data from Azure Data Lake Store.
Data Source to Azure Data Lake Store
Point to the folder containing the .tsv (tab delimited) files which was the output of the U-SQL script execution.
Provide credentials to an account that has permissions to the Azure Data Lake Store. In this case, it was an Azure AD account.
Create a query for each .TSV file
Define 1 to many relationship based on the ID of each movie review.
Sentiment confidence value for each of the 2000 movie reviews
Click on ‘Publish’ to upload your report to Power BI Service in the MS cloud. You can view in http://app.powerbi.com with your Office 365 or Microsoft account.
If you want to publish and share this report to a wide audience via a SharePoint online site, you can leverage the new Power BI Web Part (currently preview as of Feb 2017). I have displayed this report in the latest SPO modern page experience as a publishing page. For each user that views the report must have a Power BI Pro license which is not free.
To configure, you need to create a modern publishing page displaying the power BI report via Power BI Web Part (preview).
Web Part in Edit Mode
Enter the report link which you get from Power BI Service at http://app.powerbi.com
Options to further extend this solution
- For the movie reviews .csv file, one can add date/time, movie being reviewed, genre, location and any other descriptive metadata. Thus, supporting more reporting and insights.
- Overlay this data set against other data sent for correlation such as related news events, weather, popular movies trending, other movie reviews sources, etc. This is to find any cause and effect relationships for diagnostic insights – “Why is this happening?”.
- To get data from internal network to Azure Data Lake or any Azure storage account, an option is to use the Data Management Gateway. This is installed within the internal network to allow to transfer files and data from other internal data sources with no to little corporate firewall changes.
Move data between on-premises sources and the cloud with Data Management Gateway
Azure Cognitive Services built into Azure Data Lake Analytics is a suitable option for very high volume, unstructured and complex processing of data. This is such that the scalable computing power is needed. In addition, this priced in a pay-per use model in making it cost-effective in many scenarios. The agility of Azure services allows to experiment, iterate quickly and fail-fast in finding the right technical solution and applying the right techniques and approach. This article highlights how data can be ingested, analyzed/processed, modeled, visualized and then published to a business audience.