Building a Spark Application for HDInsight using IntelliJ Part 2 of 2

In continuation from my blog article Building a Spark Application for HDInsight using IntelliJ Part 1 of 2 which outlines my experience in installing IntelliJ, other dependent SDKs and creating an HDInsight project.

To add some code, right click src, create Scala Class
Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-1
Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-2

Project folders and MainApp
Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-3

Scala code:

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SQLContext

object MainApp{
def main (arg: Array[String]): Unit = {
val conf = new SparkConf().setAppName("MainApp")
val sc = new SparkContext(conf)

val rdd = sc.textFile("adl://rkbigdata.azuredatalakestore.net/MyDatasets/Crimes_-_2001_to_present_pg2.csv")
//find the rows where primary type == THEFT
val rdd1 =  rdd.filter(s => s.split(",")(5) == "THEFT")

val spark = SparkSession.builder().appName("Spark SQL basic").enableHiveSupport().getOrCreate()

    spark.sql("USE usdata")
val crimesDF = spark.sql("SELECT * FROM CRIMES WHERE primarytype == 'NARCOTICS'")

    // save data frame of results into an existing or non-existing hive table.
crimesDF.write.mode("overwrite").saveAsTable("crimebytype_NARCOTICS")
  }

}

Logic

  1. Read from csv file in Azure Data Lake Store into RDD
  2. Filter RDD for rows where primary type field is “THEFT”
  3. Set Hive Database to usdata (from default database)
  4. Query Hive table CRIMES for rows primary type field is “THEFT”
  5. Save data frame into new or existing crimebytype_NARCOTICS hive table.

Begin to setup IntelliJ to submit application to HDInsight

Before using Azure Explorer, I encountered an issue where signing in resulted in an error and it kept prompting me to enter the credentials. Sorry I didn’t capture the error message. And so, I was led to disable Android Support by checking it off.
Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-4

Click Ok.
Sign into Azure via Azure Explorer
Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-5
Select Interactive
Enter credentials

Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-6

See the Azure resources display

Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-7
Right click the Project and click on Submit Spark Application to HDInsight

Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-8
Set Main class name to MainApp
Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-9
HDInsight Spark Submission window
Confirm success

Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-10
Go to Ambari Hive View to query the hive table created from the spark application.
Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-11
From Jupyter notebook, I query the same hive table.

Building a Spark Application for HDInsight using IntelliJ Part 1 of 2-12

I have shown a walk through of setting up the development tooling and building a simple spark application and run against HDInsight Spark 2.0 Cluster.


One thought on “Building a Spark Application for HDInsight using IntelliJ Part 2 of 2

  1. Pingback: Building a Spark Application for HDInsight using IntelliJ Part 1 of 2 – Roy Kim on SharePoint, Azure, BI, Office 365

Leave a Reply