PowerShell and Options to Upload Data to Azure Data Lake

Once getting started with Azure Data Lake Store and manually uploading files, you will get to a point of wanting to automate file uploading and management. Especially if you are dealing with a massive amount of files and extensive folder hierarchies.

  1. PowerShell
  2. .NET SDK
  3. Java SDK
  4. REST API
  5. Azure Command Line Interface
  6. Node.js
  7. Python

Some of the key factors in deciding between the options are administrative scripts vs custom applications and the supporting platform (Windows, Linux, MAC, cross-platform).

My personal preferences are to use PowerShell in administrative scenarios and .NET SDK to build a custom application. Here is an example of using PowerShell.

PowerShell

# Variable Declaration
$rgName = "rkbigdata"
$subscriptionID = "<your subscription ID>"
$dataLakeStoreName = "rkdatalake"
$myDataRootFolder = "/datasets"
$sourceFilesPath = "C:\Users\Roy\Downloads\datasets\"

# Log in to your Azure account
 Login-AzureRmAccount
# List all the subscriptions associated to your account
 Get-AzureRmSubscription
# Select a subscription
Set-AzureRmContext -SubscriptionId $subscriptionID

# See if folder exists.
# If a folder or item does not exiss, then you will see
#  Get-AzureRmDataLakeStoreChildItem : Operation returned an invalid status code 'NotFound'
Get-AzureRmDataLakeStoreChildItem -AccountName $dataLakeStoreName -Path $myDataRootFolder

# Create new folder
New-AzureRmDataLakeStoreItem -Folder -AccountName $dataLakeStoreName -Path $myDataRootFolder/population

Options to Upload Data to Azure Data Lake Store 1

# Upload a single file
Import-AzureRmDataLakeStoreItem -AccountName $dataLakeStoreName `
    -Path $sourceFilesPath\ComputerSystemsImportSample.csv `
    -Destination $myDataRootFolder\ComputerSystemsImportSample.csv

Options to Upload Data to Azure Data Lake Store 2

# Upload folder and its contents recursively and force ovewrite existing
Import-AzureRmDataLakeStoreItem -AccountName $dataLakeStoreName `
    -Path $sourceFilesPath\ `
    -Destination $myDataRootFolder `
    -Recurse `
    -Force

Options to Upload Data to Azure Data Lake Store 3

In situations where large quantity and volumes of data need to transferred, read Performance guidance while using PowerShell. The See further details on the Import-AzureRmDataLakeStoreItem commandlet

https://docs.microsoft.com/en-us/powershell/resourcemanager/azurerm.datalakestore/v2.1.0/import-azurermdatalakestoreitem

The PowerShell option is suitable for general admin roles of the data lake on behalf of many projects, departments and other organizational domains. PowerShell allows the flexibility to manage files, folder, permissions to the entire data lake or whatever the admin has been granted permissions.


Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s