Showing posts with label SQL. Show all posts
Showing posts with label SQL. Show all posts

Saturday, December 8, 2018

PySpark Interview Questions

PySpark Interview Questions

Big Data is evolving day by day. Now big organizations are using Python on Spark in order to derive Analytics based solutions. I would like to share some interview questions.

1. What is Spark.
2. Tell me some use case where we prefer Python over Scala in Spark framework.
3. Difference between Python and Scala.
4. How will you execute Python script in Spark framework?
5. What is Pandas in Python?
6. What is Cython?
7. Explain OOPS Feature in Python?
8. What is dictionary in Python?
9. Difference between Tuple and List in Python?
10. Tell some package which you have used in your program.
11. What is anonymous function in Python and explain the use case.
12. What is module and how will you add one module into another?
13. How to handle python I/O operation.
14. How to handle exception in Python?
15. How will you execute Hive queries in Python?

JSON File Parsing Using Spark Scala

Parsing JSON file in Spark:


First create object of Spark Context.


import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext


val conf = new SparkConf().setAppName(appName).setMaster(master)
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)

sc.sqlContext.read.option("multiline",<is_multiline>)
                              .option("mode","PERMISSIVE")
                              .option("quote","\"")
                              .option("escape","\u0000")
                              .json(<path>)


** is_multiline:Boolean = True/False
** path = JSON file Path