Text Analytics of Movie Reviews using Azure Data Lake, Cognitive Services and Power BI (part 1 of 2)

Applicable Business Scenario

Marketing or data analysts who need to review sentiments and key phrases of a very large data set of consumer-based movie reviews.

Applied Technologies

  1. Azure Data Lake Store
  2. Azure Data Lake Analytics
    1. Cognitive Services
  3. Visual Studio with Azure Data Lake Analytics Tools
  4. Power BI Desktop & Power BI Service
  5. SharePoint Online site and preview of Power BI Web Part

Azure Data Lake Store

Upload .csv file of 2000 movie reviews to a folder in Azure Data Lake Store

textanalytics-1

Azure Data Lake Analytics

Execute the following U-SQL script in either the Azure Portal > Azure Data Lake Analytics > Jobs > New Jobs or Visual Studio with Azure Data Lake Analytics Tools.

This script makes reference to the Cognitive Services assemblies. They come out of the box in the Azure Data Lake master database.

TextAnalytics-2.png

U-SQL Script

 The following script reads the moviereviews.csv file in Azure Data Lake Store and then analyzes for sentiment and key phrase extraction. Two .tsv files are produced, one with the sentiment and key phrases for each movie review and another for a list of each individual key phrase with a foreign key ID to the parent movie review.

 REFERENCE ASSEMBLY [TextCommon];
 REFERENCE ASSEMBLY [TextSentiment];
 REFERENCE ASSEMBLY [TextKeyPhrase];

@comments =
 EXTRACT
 Text string
 FROM @"/TextAnalysis/moviereviews.csv"
 USING Extractors.Csv();

@sentiment =
 PROCESS @comments
 PRODUCE
 Text,
 Sentiment string,
 Conf double
 READONLY
 Text
 USING new Cognition.Text.SentimentAnalyzer(true);

@keyPhrases =
 PROCESS @sentiment
 PRODUCE
 Text,
 Sentiment,
 Conf,
 KeyPhrase string
 READONLY
 Text,
 Sentiment,
 Conf
 USING new Cognition.Text.KeyPhraseExtractor();

@keyPhrases = SELECT *, ROW_NUMBER() OVER () AS RowNumber
 FROM @keyPhrases;
 OUTPUT @keyPhrases
 TO "/TextAnalysis/out/MovieReviews-keyPhrases.tsv"
 USING Outputters.Tsv();

// Split the key phrases.
 @kpsplits =
 SELECT RowNumber,
 Sentiment,
 Conf,
 T.KeyPhrase
 FROM @keyPhrases
 CROSS APPLY
 new Cognition.Text.Splitter("KeyPhrase") AS T(KeyPhrase);

OUTPUT @kpsplits
 TO "/TextAnalysis/out/MovieReviews-kpsplits.tsv"
 USING Outputters.Tsv();

Azure Portal > Azure Data Lake Analytics  U-SQL execution

Create a new job to execute a U-SQL script.
TextAnalytics-3.png

TextAnalytics-4.png

Visual Studio Option

You need the Azure Data Lake Tools for Visual Studio. Create a U-SQL project and paste the script. Submit the U-SQL script to the Azure Data Lake Analytics for execution. The following shows the successful job summary after the U-SQL script has been submitted.

TextAnalytics-5.png

Click here to Part 2 of 2 of this blog series

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s