Python Dictionary

It is an ordered collection of items. If you look at the other data structures of Python(i.e. list,tuple) they consist only value items, but Dictionary consists of Key,Value pair. As Python dynamically typed language there are no need to define variable type. We can directly define variable name.

We can create dictionary by below commands.

#Empty dictionary 
my_collections = {}

#Dictionary with same type of key
my_collections  = {1:'Rahul',2:'Shayam'}

#Dictionary with different type of keys
my_collections  = {'name':'Rahul',2:[2,3,4]}

We can access the dictionary by below commands.

my_collections1  = {1:'Rahul',2:'Shayam'}
my_collections2  = {'name':'Rahul',2:[2,3,4]}

my_collections1 [1]
my_collections2  ['name']


my_collections2 .get('name')

We can add/update the dictionary by below commands.

my_collections2  ['name'] = 'John'

my_collections2 ['age']= 23

We can delete/remove the dictionary by below commands.

cubes= {1:1, 2:8 3:27, 4:64, 5:125}

#Deleting particular item

#Deleting arbitrary item

#Clearing Dictionary 

Mutable and Immutable Collections

In Scala language collections framework id typically of 2 types.

  • Mutable
  • Immutable

In Mutable Collections we can change,add or remove the elements. But, in case of immutable collections , they never change. Though you can perform addition, deletion, updation etc operations on the immutable data set but each time it will return you a new variable.

Mutable: scala.collection.mutable
Immutable: scala.collection.immutable

If you don't explicitly  specify any package name then by default it will point to immutable collection. Please find below some comparisons between two type of collections.

Manipulation of Variable
Not Possible
Memory Allocation
Once defined allocated into memory
In time of execution it uses memory
Data Security
Use Case
In time of operations on data we should use this technique.
While exposing the data to end user/third party we should prefer this technique.

Create DataFrame from RDD

Creating DataFrame from RDD in Spark:

RDD and DataFrame both are highly used APIs in Spark framework. Converting RDD to DataFrame is a very common technique every programmer has to do in their programming. I would like to take you through the most suitable way to achieve this.

There are 2 most commonly used techniques.
- Inferring the Schema Using Reflection
- Programmatically specifying the schema

Inferring the Schema Using Reflection:

//Creating RDD
val rdd = sc.parallelize(Seq(1,2,3,4))
import spark.implicits._
//Creating Dataframe from RDD
val dataFrame = rdd.toDF()

//Creating Schema
case class Person(name: String, age: Int)
//Creating DataFrame using Refelction
val people = sc.textFile("SaleData.txt").map(_.split(",")).map(p => Person(p(0), p(1).toInt)).toDF()

Programmatically specifying the schema:

//Creating Schema function
def dfSchema(columnNames: List[String]): StructType =
      StructField(name = "name", dataType = StringType, nullable = false),
      StructField(gender= "gender", dataType = StringType, nullable = false),
      StructField(age= "age", dataType = IntegerType, nullable = false)

//Calling Schema function
val schema = dfSchema(Seq("name", "gender","age"))

//Creating RDD
val rdd: RDD[String] = ...

//Creating function to map Row data
def row(line: List[String]): Row = Row(line(0), line(1),line(2).toInt)

//Mapping Row Data
val data =",").to[List]).map(row)

//Creating DataFrame
val dataFrame = spark.createDataFrame(data, schema)

PySpark Interview Questions

Big Data is evolving day by day. Now big organizations are using Python on Spark in order to derive Analytics based solutions. I would like to share some interview questions.

1. What is Spark.
2. Tell me some use case where we prefer Python over Scala in Spark framework.
3. Difference between Python and Scala.
4. How will you execute Python script in Spark framework?
5. What is Pandas in Python?
6. What is Cython?
7. Explain OOPS Feature in Python?
8. What is dictionary in Python?
9. Difference between Tuple and List in Python?
10. Tell some package which you have used in your program.
11. What is anonymous function in Python and explain the use case.
12. What is module and how will you add one module into another?
13. How to handle python I/O operation.
14. How to handle exception in Python?
15. How will you execute Hive queries in Python?

Reading JDBC(Oracle/SQL Server) Data using Spark

Reading JDBC table data using Spark Scala

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext

val conf = new SparkConf().setAppName(appName).setMaster(master)
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)

Reading JDBC:

sc.sqlContext .read.format("jdbc")
                               .option("url",<jdbc URL>)
                               .option("dbtable",<table name>)
                               .option("user",<user name>)
                               .option("driver",<Driver Name>)

Driver Details:
SQL Server: ""
Oracle: "oracle.jdbc.driver.OracleDriver"

URL Details:
SQL Server: "jdbc:sqlserver://<serverName><instanceName><:><portNumber>;<property1=value>;<property2=value>"
Oracle: "jdbc:oracle:thin:@localhost:1521:orcl"

** Please change the local host of Oracle JDBC URL depending on your server IP.
** You can use User Name and Password in place of Property1 and Property2 in time of  SQL Server JDBC URL creation. 

JSON File Parsing Using Spark Scala

Parsing JSON file in Spark:

First create object of Spark Context.

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext

val conf = new SparkConf().setAppName(appName).setMaster(master)
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)"multiline",<is_multiline>)

** is_multiline:Boolean = True/False
** path = JSON file Path

Python Interview Questions

Python is a general-purpose interpreted, interactive, object-oriented, and high-level programming language. It was created by Guido van Rossum during 1985- 1990. Like Perl, Python source code is also available under the GNU General Public License (GPL). Here some interview those are frequently asked in any interview.
1. What is Python?
2. How will you define function in Python?
3. What kind of typed of language Python is? Explain why?
4. How will you define variable in Python?
5. What are the different loops used in Python?
6. What kind of various String declaration are possible in Python?
7. Tell some built in function which is massively used for Scripting?
8. How to compile and run Python script?
9. Difference between Python and Shell.
10. What is Pandas in Python?
11. What is Cython?
12. Explain OOPS Feature in Python?
13. What is dictionary in Python?
14. Difference between Tuple and List in Python?
15. Tell some package which you have used in your program.
16. What is anonymous function in Python and explain the use case.

