Showing posts with label JSON. Show all posts
Showing posts with label JSON. Show all posts

Monday, December 17, 2018

Spark Interview Questions

Spark Interview Questions:

Spark is a framework which is heavily used in Hadoop(Hadoop 2.0/Yarn) in order to execute Analytical, Streaming , Machine Learning process in a very efficient way. I would like to take you through some of the questions those are frequently asked in any interview.

Spark:

1. What is Spark.
2. Explain higher level architecture of Spark.
3. What is Driver &Executor and explain the difference between them
4. What is DAG
5. How do you trace your failed job through DAG.
6. What is Persistence. Name difference level of Persistence.
7. Why do we use Repartittion & Coalesce
8. What is RDD,Dataframe & Dataset and explain difference between them
9. How to see partition after loading a input file in Spark.
10. How do we load/store any Hive table in Spark
11. How to read JSON,CSV file in Spark.
12. What is Spark Streaming.
13. Name some properties which you have set in your project
14. How could a Hive UDF be used in Spark session
15. Explain some troubleshooting in Spark
16. What is Stage,Job,Tasks in Spark
17. Difference between GroupByKey and ReduceByKey
18. What is executor memory and explain how did you set them in your project
19.Name some Spark functions which have been used in your project
20. What is Spark UDF and write the signature of the same. 

Saturday, December 8, 2018

JSON File Parsing Using Spark Scala

Parsing JSON file in Spark:


First create object of Spark Context.


import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext


val conf = new SparkConf().setAppName(appName).setMaster(master)
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)

sc.sqlContext.read.option("multiline",<is_multiline>)
                              .option("mode","PERMISSIVE")
                              .option("quote","\"")
                              .option("escape","\u0000")
                              .json(<path>)


** is_multiline:Boolean = True/False
** path = JSON file Path