Tech Learn: Big Data

Showing posts with label Big Data. Show all posts

Friday, January 18, 2019

Python Interview Questions & Answers

Python Interview Questions and Answers

Python is a massively growing programming language in current times. There are some basic questions which is asked while anybody is going for an interview. I would like to share some of the questions which are widely asked for an interview.

1. What is Python?
Ans: Python is an object oriented , interpreted , high level programming language created by Guido van Rossum and first released in 1991. It is enriched with high level built in data structures and combined with dynamic typing and dynamic binding. More importantly it has very easy to learn syntax and therefore it enhances the readability and cost of maintenance. Python supports modularity and code re-use in a very efficient manner and therefore it is extensible and flexible as well.

2. How will you define function in Python?
Ans: A function is a block of organized, reusable code that is used to perform a single, related action. Functions provide better modularity for your application and a high degree of code reusing.

a. Function block begins with def keyword.
b. In function declaration tag there will be colon at the end. (i.e def calculate(): )
c. There won't be any parenthesis in the start/end of the function.
d. Return type is not required while declaring the function.

3. What kind of typed of language Python is? Explain why?
Ans. Python is dynamically typed language. It means you need not to declare the variable type while developing the code. In time of execution it will be decided depending on the value you have assigned to your variable.

Example:
a=7
b="Python"

In time of execution variable a will be considered as Integer and b as String

4. How will you define variable in Python?
Ans: We need to type the variable name and assign the value into it. As python is dynamically typed language, it will interpret the variable type at run time.

5. What are the different loops used in Python?
Ans: Like other programming languages Python also has certain conditional blocks. Please find below the loops which is used in Python.
if,else, elif, for , while etc.
Example:

if a=1:
println("Python")
elif a=2:
println("Java")
else:
println("C++")

6. What kind of various String formatting are possible in Python?
Ans: String formatting is required to generate the output with more accuracy.
%s - String (or any object with a string representation, like numbers)

%d - Integers

%f - Floating point numbers

%.<number of digits>f - Floating point numbers with a fixed amount of digits to the right of the dot.

%x/%X - Integers in hex representation (lowercase/uppercase)

7. Tell some built in function which is massively used for Scripting?
Ans: There are lot of functions which is used widely. Here are few of them.
Python abs() returns absolute value of a number
Python dict() Creates a Dictionary
Python dir() Tries to Return Attributes of Object
Python divmod() Returns a Tuple of Quotient and Remainder
Python enumerate() Returns an Enumerate Object
Python eval() Runs Python Code Within Program
Python exec() Executes Dynamically Created Program
Python filter() constructs iterator from elements which are true
Python float() returns floating point number from number, string
Python format() returns formatted representation of a value

8. How to compile and run Python script?
Ans: Please find the below steps.
Windows User:
1.First you have to install python
2. Then set path variable
(My Computer > Properties > Advanced System Settings > Environment Variables >
Create a new variable PYTHONPATH and add values C:\Python\Lib;C:\Python\DLLs;C:\Python\Lib\lib-tk)
3.After that write your python program and save
4.Create a sample python program that name "hello.py"
5.Open cmd.exe
6.Then goto the path that you saved your "hello.py" file,
and then type python hello.py and press enter key.

9. Difference between Python and Shell.
Ans: Python and Shell both are scripting languages. But Python has some additional features which makes it useful for building diversified application. Here are some differences between this two:

i. There are external libraries in Python , by importing those we can easily make any functionality with lesser code and complexity than Shell.
ii. In Python there are concept of list,dictionary,Tuple so building a data structure is very easy in Python but in case of Shell there are no such concept is there which makes it less useful.
iii. Python can be useful for web development by using it's stock of libraries but in case of Shell it is not possible.

10. What is Pandas in Python?
Ans: Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language.

11. What is Cython?
Ans: Cython is a optimistic static compiler for both Python and extended Cython programming language. The Cython is superset of Pyton language and additionally support C language and declaring C types on variable and class attributes.This allows compiler to generate very efficient C code from Cython language.
For details please go to the following link:
http://docs.cython.org/en/latest/

12. Explain OOPS Feature in Python?
Ans: Python is an object-oriented programming language besides it's scripting nature. It allows us to develop applications using an Object Oriented approach. In Python, we can easily create and use classes and objects.

Major principles of object-oriented programming system are given below.

Object
Class
Method
Inheritance
Polymorphism
Data Abstraction
Encapsulation

13. What is dictionary in Python?
Ans: Dictionary is special data type in Python. Those who are familiar with Map/HashMap in Java it could be easier to understand for them. Dictionary is actually combination of key-value pair. It is an un-ordered collection of data. So, insertion order can not be maintained. But accessing/updating the data operation is too fast.

14. Difference between Tuple and List in Python?
Ans: Tuple and List both are two important and frequently use data type in Python. But there are few basic differences between these two.
1. Tuple is immutable but List is mutable.
2. In Tuple insertion order can not be maintained but in case List insertion order is maintained properly.
3. Tuple is heterogeneous data structure but List is a homogeneous data structure.

15. Tell some package which you have used in your program.

scapy,matplotlib,kivy,nltk,keras,SQLAlchemy etc.

16. What is anonymous function in Python and explain the use case.
Ans: In Python, anonymous function is a function that is defined without a name.
While normal functions are defined using the def keyword, in Python anonymous functions are defined using the lambda keyword.
Lambda functions can have any number of arguments but only one expression. The expression is evaluated and returned. Lambda functions can be used wherever function objects are required.

double = lambda x: x * 2

# Output: 10
print(double(5))

Monday, December 17, 2018

Spark Interview Questions

Spark Interview Questions:

Spark is a framework which is heavily used in Hadoop(Hadoop 2.0/Yarn) in order to execute Analytical, Streaming , Machine Learning process in a very efficient way. I would like to take you through some of the questions those are frequently asked in any interview.

Spark:

1. What is Spark.
2. Explain higher level architecture of Spark.
3. What is Driver &Executor and explain the difference between them
4. What is DAG
5. How do you trace your failed job through DAG.
6. What is Persistence. Name difference level of Persistence.
7. Why do we use Repartittion & Coalesce
8. What is RDD,Dataframe & Dataset and explain difference between them
9. How to see partition after loading a input file in Spark.
10. How do we load/store any Hive table in Spark
11. How to read JSON,CSV file in Spark.
12. What is Spark Streaming.
13. Name some properties which you have set in your project
14. How could a Hive UDF be used in Spark session
15. Explain some troubleshooting in Spark
16. What is Stage,Job,Tasks in Spark
17. Difference between GroupByKey and ReduceByKey
18. What is executor memory and explain how did you set them in your project
19.Name some Spark functions which have been used in your project
20. What is Spark UDF and write the signature of the same.

Thursday, December 13, 2018

Mutable and Immutable Collections

Mutable and Immutable Collections in Scala

In Scala language collections framework id typically of 2 types.

Mutable
Immutable

In Mutable Collections we can change,add or remove the elements. But, in case of immutable collections , they never change. Though you can perform addition, deletion, updation etc operations on the immutable data set but each time it will return you a new variable.

Package:
Mutable: scala.collection.mutable
Immutable: scala.collection.immutable

If you don't explicitly specify any package name then by default it will point to immutable collection. Please find below some comparisons between two type of collections.

Properties	Mutable	Immutable
Manipulation of Variable	Possible	Not Possible
Speed	Faster	Slower
Memory Allocation	Once defined allocated into memory	In time of execution it uses memory
Data Security	Less	High
Use Case	In time of operations on data we should use this technique.	While exposing the data to end user/third party we should prefer this technique.

Tuesday, December 11, 2018

Create DataFrame from RDD

Creating DataFrame from RDD in Spark:

RDD and DataFrame both are highly used APIs in Spark framework. Converting RDD to DataFrame is a very common technique every programmer has to do in their programming. I would like to take you through the most suitable way to achieve this.

There are 2 most commonly used techniques.
- Inferring the Schema Using Reflection
- Programmatically specifying the schema

Inferring the Schema Using Reflection:

//Creating RDD
val rdd = sc.parallelize(Seq(1,2,3,4))
import spark.implicits._
//Creating Dataframe from RDD
val dataFrame = rdd.toDF()

//Creating Schema
case class Person(name: String, age: Int)

//Creating DataFrame using Refelction

val people = sc.textFile("SaleData.txt").map(_.split(",")).map(p => Person(p(0), p(1).toInt)).toDF()

Programmatically specifying the schema:

//Creating Schema function

def dfSchema(columnNames: List[String]): StructType =

StructType(

Seq(

StructField(name = "name", dataType = StringType, nullable = false),

StructField(gender= "gender", dataType = StringType, nullable = false),

StructField(age= "age", dataType = IntegerType, nullable = false)

)

//Calling Schema function

val schema = dfSchema(Seq("name", "gender","age"))

//Creating RDD

val rdd: RDD[String] = ...

//Creating function to map Row data

def row(line: List[String]): Row = Row(line(0), line(1),line(2).toInt)

//Mapping Row Data

val data = rdd.map(_.split(",").to[List]).map(row)

//Creating DataFrame

val dataFrame = spark.createDataFrame(data, schema)

Saturday, September 29, 2018

Big Data Interview Questions

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. I would like to share some interview questions.

Hadoop:

1. What is Big Data & Hadoop
2. What is Map Reduce.
3. What distribution are you using in your project.
4. Write one Word count problem in Map reduce.
5. What is Yarn.
6. What is difference between Hadoop 1.x & 2.x
7. Explain the high level architecture of Yarn.
8. What is the difference between Application Master and Application Manager.
9. How does Node manager work in Yarn framework
10. What is Data locality
11. What is Speculative Execution
12. What is Rack Awareness
13. What is replication factor. How do you set them.
14. Draw the diagram of I/O read & write anatomy in Yarn framework
15. How many primary nodes are maintained in Yarn and why
16. What is Zookeeper.
17. What is the different type of Clusters are there
18. What is the difference between Local and Cluster mode.
19. What is Mesos.
20. Difference between Flume & Kafka

Hive:

1.What is Hive
2. Difference between Hive and RDBMS
3. Underlying storage of Hive
4. ACID property supported in Hive or not.
5. How will you Insert/Append/Overwrite Hive table.
6. What all those file formats have been used in your project.
7. What is columnar format. What columnar format you have used in your project.
8. What is the ORC & Parquet file format.
9. How do you parse CSV file tin Hive
10. What is Vectorization technique in Hive.
11. How will you import/export data from/to between RDBMS and Hive.
12. What all those properties did you set in your project
13. What is Partitioning & Bucketing and explain the use case of both.
14. What are the limitations of Partitioning.
15. How to optimize I/O reading in Hive.
16. How many types of tables are there in Hive
17. Why do we use external table in Hive
18. Name some Serde properties which has been used in your project
19. What metastore has been used in your project underlying of Hive
20. Name some Date & String manipulation function.