Azure Data Lake Analytics: Populating & Querying Tables

Upon creating a table with a defined set of columns, let’s go through a basic example of populating it from a CSV file stored in Azure Data Lake.

The .csv file as the sample data to be imported into the table. I am using Data Lake Explorer in visual studio. The relative file path is \MyData\FL_insurance_sample.csv. This sample data file I got from a public website.
Azure Data Lake Analytics- Populating - Querying Tables-1

U-SQL application:

// Use insurance database;
USE Insurance;

// Extract from file
@csvData =
    EXTRACT policyID int,
        statecode string,
        county string,
        eq_site_limit float,
        hu_site_limit float,
        fl_site_limit float,
        fr_site_limit float,
        tiv_2011 float,
        tiv_2012 float,
        eq_site_deductible float,
        hu_site_deductible float,
        fl_site_deductible float,
        fr_site_deductible float,
        point_latitude float,
        point_longitude float,
        line string,
        construction string,
        point_granularity int?
    FROM "/MyData/FL_insurance_sample.csv"
    USING Extractors.Csv(skipFirstNRows:1);

//Insert it into a previously created table
INSERT INTO Policies
SELECT *
FROM @csvData;

Since the data file was in CSV format, I would use the Extractors.csv. Other extractors are TSV and Text file. Since the data file had a header row of column names as its first row, I had to indicate to skip the first row by stating skipFristNrows: 1. Otherwise, the script submission would result in an error because the header columns wouldn’t work with the field data types defined. There are other arguments to specify to offer more configurability and flexibility.

Submit Job
Azure Data Lake Analytics- Populating - Querying Tables-2
Azure Data Lake Analytics- Populating - Querying Tables-3

To query and see the results, they must be stored into an output file in the Azure Data Lake Store. Unfortunately, I don’t see a way to see them on a console like you do in SQL Management Studio.

// Use insurance database;
USE Insurance;

@queryPolicy = SELECT * FROM Policies
WHERE county == "NASSAU COUNTY";

// output query into a file
OUTPUT @queryPolicy
TO "/MyData/queryresults/policies-by-nassaucounty.csv"
USING Outputters.Csv();

Azure Data Lake Analytics- Populating - Querying Tables-4

You can preview the file in Visual Studio
Azure Data Lake Analytics- Populating - Querying Tables-5

@queryView = SELECT * FROM PoliciesByCounty;

// output query into a file
OUTPUT @queryView
TO "/MyData/queryresults/view-by-county.csv"
USING Outputters.Csv();     

Note the View definition is:
CREATE VIEW IF NOT EXISTS Insurance.dbo.PoliciesByCounty AS
    SELECT COUNT(policyID) AS NumOfPolicies, county
    FROM Insurance.dbo.Policies
    GROUP BY county;

The preview of the output file:
Azure Data Lake Analytics- Populating - Querying Tables-6

I just went through some basic scenarios for populating and querying a table and view in ADLA.

References:
Tutorial: Get started with Azure Data Lake Analytics U-SQL language
U-SQL Views


Advertisements

2 thoughts on “Azure Data Lake Analytics: Populating & Querying Tables

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s