Creating Azure Data Lake Store

The Azure Data Lake Store is a storage solution to manage files for big data analytical workloads. The definition of a data lake according to Wikipedia, “is a method of storing data within a system or repository, in its natural format, that facilitates the collocation of data in various schemata and structural forms, usually object blobs or files. The idea of data lake is to have a single store of all data in the enterprise ranging from raw data (which implies exact copy of source system data) to transformed data which is used for various tasks including reporting, visualization, analytics and machine learning.

To read more from the MS documentation visit Overview of Azure Data Lake Store

ingest-data

In summarizing the documentation’s overview, here are some of the key capabilities for starting out.

  • Hadoop compatible
  • Virtually unlimited storage
  • Performance for analytical processing
  • User and Role based security
  • Data encryption
  • Store any data format

In its simplest form, it is a hierarchical file system of folders and files. You run your analytical processing scripts pointing to a set of folders or files.

The following is how I created the Azure Data Lake Store. To see the MS documentation visit Get started with Azure Data Lake Store using the Azure Portal

  1. New Data Lake Store
    Creating Azure Data Lake 1
    A. Encryption Settings. I decided the more sophisticated option of creating a master encryption key in an existing Azure Key Vault for my own ownership. To see read details read Data protectionCreating Azure Data Lake 2
  2. Click Create button
  3. Confirm provisioning of Azure Data Lake
  4. Overview section has to provide details of the data lake service. However, it is prompting for further action. Grant the data lake store account RN_rkdatalake to have access to the key vault.Creating Azure Data Lake 3A. Click on the orange bar to setup.
    B. Click on Grant Permissions button to grant the RN-rkdatalake account permissions.
    Creating Azure Data Lake 4C. Notification
    Creating Azure Data Lake 5
  5. Let’s go back to the rkdatalake blade and take a tour of the some of the unique settings.
    Creating Azure Data Lake 6A. Encryption settings
    Creating Azure Data Lake 7The master encryption key is located and managed in my Key Vault named rkEntKeyVault. The data lake store account RN_rkdatalake only has access to the key vault to encrypt data stored.B. Firewall For security best practices, it is recommended to enable the firewall. The firewall is based on client IP Address or IP address range.Creating Azure Data Lake 8C. Pricing. For a developer scenario, pay-as-you-go should be quite fine. For myself, this option has not been expensive at all and usually work with a several GBs of data of data anyways. Currently, it is 0.039 USD per GB which is still pennies.
    creating-azure-data-lake-9.jpg
    For other monthly plans,
    Creating Azure Data Lake 10D. Data Explorer This is more of a tool to explore the file system in Data Lake Store. You can create folders, upload files and manage permissions. Creating Azure Data Lake 11File Preview of MvcWeb.log
    Creating Azure Data Lake 12Access
    You can assign permissions to a folder or file. Here I am managing permissions on the MyData folder I had created.Creating Azure Data Lake 13Click Add so I can add a user or group, that is in Azure Active Directory, to have access to this folder.
    Creating Azure Data Lake 14Creating the data lake store sets the foundation for analytical processing. You may begin to upload large amounts of data in their respective folders. Examples can be IoT sensor data, tweets, .csv export from relational databases, log files, images, videos or documents. It is up to the processing application such as Azure Data Analytics U-SQL or Hadoop applications to process the data which would use a set of libraries and apply your custom logic. To be specific about what open source applications can work with Azure Data Lake Store, read Open Source Big Data applications that work with Azure Data Lake Store. Essentially, only Azure’s HDInsight works with it and not any other cloud or on-premises Hadoop platform to my current understanding.Next, we will look at PowerShell and Options to Upload Data to Azure Data Lake


Advertisements

7 thoughts on “Creating Azure Data Lake Store

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s