Tech Learn

Friday, December 14, 2018

Python Dictionary

Dictionary in Python

It is an ordered collection of items. If you look at the other data structures of Python(i.e. list,tuple) they consist only value items, but Dictionary consists of Key,Value pair. As Python dynamically typed language there are no need to define variable type. We can directly define variable name.

We can create dictionary by below commands.

#Empty dictionary
my_collections = {}

#Dictionary with same type of key
my_collections = {1:'Rahul',2:'Shayam'}

#Dictionary with different type of keys
my_collections = {'name':'Rahul',2:[2,3,4]}

We can access the dictionary by below commands.

my_collections1 = {1:'Rahul',2:'Shayam'}
my_collections2 = {'name':'Rahul',2:[2,3,4]}

my_collections1 [1]
my_collections2 ['name']

or

my_collections2 .get('name')

We can add/update the dictionary by below commands.

#update
my_collections2 ['name'] = 'John'

#add
my_collections2 ['age']= 23

We can delete/remove the dictionary by below commands.

cubes= {1:1, 2:8 3:27, 4:64, 5:125}

#Deleting particular item
cubes.pop(4)

#Deleting arbitrary item
cubes.popitem()

#Clearing Dictionary
cubes.clear()

Thursday, December 13, 2018

Mutable and Immutable Collections

Mutable and Immutable Collections in Scala

In Scala language collections framework id typically of 2 types.

Mutable
Immutable

In Mutable Collections we can change,add or remove the elements. But, in case of immutable collections , they never change. Though you can perform addition, deletion, updation etc operations on the immutable data set but each time it will return you a new variable.

Package:
Mutable: scala.collection.mutable
Immutable: scala.collection.immutable

If you don't explicitly specify any package name then by default it will point to immutable collection. Please find below some comparisons between two type of collections.

Properties	Mutable	Immutable
Manipulation of Variable	Possible	Not Possible
Speed	Faster	Slower
Memory Allocation	Once defined allocated into memory	In time of execution it uses memory
Data Security	Less	High
Use Case	In time of operations on data we should use this technique.	While exposing the data to end user/third party we should prefer this technique.

Tuesday, December 11, 2018

Create DataFrame from RDD

Creating DataFrame from RDD in Spark:

RDD and DataFrame both are highly used APIs in Spark framework. Converting RDD to DataFrame is a very common technique every programmer has to do in their programming. I would like to take you through the most suitable way to achieve this.

There are 2 most commonly used techniques.
- Inferring the Schema Using Reflection
- Programmatically specifying the schema

Inferring the Schema Using Reflection:

//Creating RDD
val rdd = sc.parallelize(Seq(1,2,3,4))
import spark.implicits._
//Creating Dataframe from RDD
val dataFrame = rdd.toDF()

//Creating Schema
case class Person(name: String, age: Int)

//Creating DataFrame using Refelction

val people = sc.textFile("SaleData.txt").map(_.split(",")).map(p => Person(p(0), p(1).toInt)).toDF()

Programmatically specifying the schema:

//Creating Schema function

def dfSchema(columnNames: List[String]): StructType =

StructType(

Seq(

StructField(name = "name", dataType = StringType, nullable = false),

StructField(gender= "gender", dataType = StringType, nullable = false),

StructField(age= "age", dataType = IntegerType, nullable = false)

)

//Calling Schema function

val schema = dfSchema(Seq("name", "gender","age"))

//Creating RDD

val rdd: RDD[String] = ...

//Creating function to map Row data

def row(line: List[String]): Row = Row(line(0), line(1),line(2).toInt)

//Mapping Row Data

val data = rdd.map(_.split(",").to[List]).map(row)

//Creating DataFrame

val dataFrame = spark.createDataFrame(data, schema)

Saturday, December 8, 2018

PySpark Interview Questions

Big Data is evolving day by day. Now big organizations are using Python on Spark in order to derive Analytics based solutions. I would like to share some interview questions.

1. What is Spark.
2. Tell me some use case where we prefer Python over Scala in Spark framework.
3. Difference between Python and Scala.
4. How will you execute Python script in Spark framework?
5. What is Pandas in Python?
6. What is Cython?
7. Explain OOPS Feature in Python?
8. What is dictionary in Python?
9. Difference between Tuple and List in Python?
10. Tell some package which you have used in your program.
11. What is anonymous function in Python and explain the use case.
12. What is module and how will you add one module into another?
13. How to handle python I/O operation.
14. How to handle exception in Python?
15. How will you execute Hive queries in Python?

Reading JDBC(Oracle/SQL Server) Data using Spark

Reading JDBC table data using Spark Scala

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext

val conf = new SparkConf().setAppName(appName).setMaster(master)
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)

Reading JDBC:

sc.sqlContext .read.format("jdbc")
.option("url",<jdbc URL>)
.option("dbtable",<table name>)
.option("user",<user name>)
.option("password",<password>)
.option("driver",<Driver Name>)
.load

Driver Details:
SQL Server: "com.microsoft.sqlserver.jdbc.SQLServerDriver"
Oracle: "oracle.jdbc.driver.OracleDriver"

URL Details:
SQL Server: "jdbc:sqlserver://<serverName><instanceName><:><portNumber>;<property1=value>;<property2=value>"
Oracle: "jdbc:oracle:thin:@localhost:1521:orcl"

** Please change the local host of Oracle JDBC URL depending on your server IP.
** You can use User Name and Password in place of Property1 and Property2 in time of SQL Server JDBC URL creation.

JSON File Parsing Using Spark Scala

Parsing JSON file in Spark:

First create object of Spark Context.

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext

val conf = new SparkConf().setAppName(appName).setMaster(master)
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)

sc.sqlContext.read.option("multiline",<is_multiline>)
.option("mode","PERMISSIVE")
.option("quote","\"")
.option("escape","\u0000")
.json(<path>)

** is_multiline:Boolean = True/False
** path = JSON file Path

Wednesday, December 5, 2018

Python Interview Questions

Python is a general-purpose interpreted, interactive, object-oriented, and high-level programming language. It was created by Guido van Rossum during 1985- 1990. Like Perl, Python source code is also available under the GNU General Public License (GPL). Here some interview those are frequently asked in any interview.

1. What is Python?
2. How will you define function in Python?
3. What kind of typed of language Python is? Explain why?
4. How will you define variable in Python?
5. What are the different loops used in Python?
6. What kind of various String declaration are possible in Python?
7. Tell some built in function which is massively used for Scripting?
8. How to compile and run Python script?
9. Difference between Python and Shell.
10. What is Pandas in Python?
11. What is Cython?
12. Explain OOPS Feature in Python?
13. What is dictionary in Python?
14. Difference between Tuple and List in Python?
15. Tell some package which you have used in your program.
16. What is anonymous function in Python and explain the use case.

To check the answers click here.