The Spark RDD API or Spark SQL Dataframes/Datasets - IT Assignment Help

Download Solution Order New Solution
Assignment Task:

Task:


You will answer the questions either using the Spark RDD API or Spark SQL dataframes/datasets. Please complete the question using spark RDDs for the questions marked as [Spark RDD], please complete the questions using dataframes/datasets for questions marked as [Spark SQL]. When you program Spark SQL you must use the dataset/dataframe operations instead of SQL syntax.
Example using dataset/dataframe operations syntax:
df.filter($"price" > 1000000).show()
df.select($"Number_bedrooms", $" Number_bedrooms" + 1).show()
Do not use SQL syntax like this:
val sqlDF = spark.sql("SELECT * FROM houses")
sqlDF.show()
For Spark RDD questions assume you have been given the code below.
val lines = sc.textFile("houses.txt")
// assume all the columns are separated by ", "
val split_lines = lines.map(_.split(", "))
For Spark SQL questions assume you have been given the code below.
case class House(House_ID: Int, Suburb: String, Postcode: Int, Multistorey: Boolean, Price: Float, Owner_ID: Int, Number_bedrooms: Int, Land_size: Float) val df = spark.read.schema(Encoders.product[House].schema).option("delimiter", " ").csv("houses.txt").as[House]
Q1) [Spark RDD] How many houses have 2 or more bedrooms and are multistoried?
Q2) [Spark RDD] Output the House_ID of the house with the smallest Land_size in the Bundoora suburb. Break ties arbitrarily if there are more than one house with the smallest Land_size?
Q3) [Spark RDD] Output all suburbs whose largest land size is greater than 4000?
Q4) [Spark SQL] Output the House_ID, postcode, and price for all houses which are single storied and have price below 800000.00?
Q5) [Spark SQL] For each suburb output the House_ID of the least expensive house. Break ties arbitrarily?

The above  IT Assignment has been solved by our  IT Assignment  Experts at My Uni Papers. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our experts are well trained to follow all marking rubrics & referencing style.

Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.

 

Get It Done! Today

Country
Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
+

Every Assignment. Every Solution. Instantly. Deadline Ahead? Grab Your Sample Now.