Once getting started with Azure Data Lake Store and manually uploading files, you will get to a point of wanting to automate file uploading and management. Especially if you are dealing with a massive amount of files and extensive folder hierarchies.
Some of the key factors in deciding between the options are administrative scripts vs custom applications and the supporting platform (Windows, Linux, MAC, cross-platform).
My personal preferences are to use PowerShell in administrative scenarios and .NET SDK to build a custom application. Here is an example of using PowerShell.
# Variable Declaration $rgName = "rkbigdata" $subscriptionID = "<your subscription ID>" $dataLakeStoreName = "rkdatalake" $myDataRootFolder = "/datasets" $sourceFilesPath = "C:\Users\Roy\Downloads\datasets\" # Log in to your Azure account Login-AzureRmAccount # List all the subscriptions associated to your account Get-AzureRmSubscription # Select a subscription Set-AzureRmContext -SubscriptionId $subscriptionID # See if folder exists. # If a folder or item does not exiss, then you will see # Get-AzureRmDataLakeStoreChildItem : Operation returned an invalid status code 'NotFound' Get-AzureRmDataLakeStoreChildItem -AccountName $dataLakeStoreName -Path $myDataRootFolder # Create new folder New-AzureRmDataLakeStoreItem -Folder -AccountName $dataLakeStoreName -Path $myDataRootFolder/population
# Upload a single file Import-AzureRmDataLakeStoreItem -AccountName $dataLakeStoreName ` -Path $sourceFilesPath\ComputerSystemsImportSample.csv ` -Destination $myDataRootFolder\ComputerSystemsImportSample.csv
# Upload folder and its contents recursively and force ovewrite existing Import-AzureRmDataLakeStoreItem -AccountName $dataLakeStoreName ` -Path $sourceFilesPath\ ` -Destination $myDataRootFolder ` -Recurse ` -Force
In situations where large quantity and volumes of data need to transferred, read Performance guidance while using PowerShell. The See further details on the Import-AzureRmDataLakeStoreItem commandlet
The PowerShell option is suitable for general admin roles of the data lake on behalf of many projects, departments and other organizational domains. PowerShell allows the flexibility to manage files, folder, permissions to the entire data lake or whatever the admin has been granted permissions.