Azure Data Lake Analytics: Database and Tables

Upon creating your Azure Data Lake Analytics service, you are ready to create database and tables. This is to define a schema and structure to your data. Your data may be in an semi-structured format. For the development tooling, you need to install Azure Data Lake Tools for Visual Studio as a plugin-in to Visual Studio.

This provides the ability to create U-SQL projects and functionality for development.
Azure Data Lake Analytics- Database and Tables-1
Azure Data Lake Analytics- Database and Tables-2

The catalog of Azure Data Lake Analytics:

Visual Studio – Sever Explorer Pane
Azure Data Lake Analytics- Database and Tables-3

Azure Portal – Azure Data Lake AnalyticsAzure Data Lake Analytics- Database and Tables-4

When creating tables, you have the option to create database into master database. This has assemblies that you can be leveraged in your scripts such as Cognitive Services to do key phrase extraction in text.

To create a new database and table, create a new .usql file into your U-SQL project.


//Create Database
CREATE DATABASE IF NOT EXISTS Insurance;

//Create Table
CREATE TABLE IF NOT EXISTS Insurance.dbo.Policies
(
        //Define schema of table
        policyID int,
        statecode string,
        county string,
        eq_site_limit float,
        hu_site_limit float,
        fl_site_limit float,
        fr_site_limit float,
        tiv_2011 float,
        tiv_2012 float,
        eq_site_deductible float,
        hu_site_deductible float,
        fl_site_deductible float,
        fr_site_deductible float,
        point_latitude float,
        point_longitude float,
        line string,
        construction string,
        point_granularity int?,
    INDEX idx1 //Name of index
    CLUSTERED (county ASC) //Column to cluster by
    DISTRIBUTED BY HASH (county) //Column distribute by
);

// CREATE VIEW based on the Policies table
CREATE VIEW IF NOT EXISTS Insurance.dbo.PoliciesByCounty AS
    SELECT COUNT(policyID) AS NumOfPolicies
    FROM Insurance.dbo.Policies
    GROUP BY county;

  • Create a database that is a container for tables, views, assemblies, schemas, table valued functions, etc..
  • Create a table with a set fields with data types. The data types are C#-like in such that you can define nullable types just as I did with point_granularity as int?.
  • Create an index for this table based for how rows are physically stored to one another and distribute by hash. The purpose is for efficient querying and processing.

Click Submit to submit the script as a job
Azure Data Lake Analytics- Database and Tables-5

Azure Data Lake Analytics- Database and Tables-6

In the Server Explorer pane, you will see the resulting outcome.
Azure Data Lake Analytics- Database and Tables-7

In the following article Azure Data Lake Analytics: Populating & Querying Tables, we will explore how to populate and query the table from a flat file from Azure Data Lake Store.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s