Saturday, September 29, 2018

Big Data Interview Questions

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. I would like to share some interview questions.

Hadoop:

1. What is Big Data & Hadoop
2. What is Map Reduce.
3. What distribution are you using in your project.
4. Write one Word count problem in Map reduce.
5. What is Yarn.
6. What is difference between Hadoop 1.x & 2.x
7. Explain the high level architecture of Yarn.
8. What is the difference between Application Master and Application Manager.
9. How does Node manager work in Yarn framework
10. What is Data locality
11. What is Speculative Execution
12. What is Rack Awareness
13. What is replication factor. How do you set them.
14. Draw the diagram of I/O read & write anatomy in Yarn framework
15. How many primary nodes are maintained in Yarn and why
16. What is Zookeeper.
17. What is the different type of Clusters are there
18. What is the difference between Local and Cluster mode.
19. What is Mesos.
20. Difference between Flume & Kafka


Hive:

1.What is Hive
2. Difference between Hive and RDBMS
3. Underlying storage of Hive
4. ACID property supported in Hive or not.
5. How will you Insert/Append/Overwrite Hive table.
6. What all those file formats have been used in your project.
7. What is columnar format. What columnar format you have used in your project.
8. What is the ORC & Parquet file format.
9. How do you parse CSV file tin Hive
10. What is Vectorization technique in Hive.
11. How will you import/export data from/to  between RDBMS and Hive.
12. What all those properties did you set in your project
13. What is Partitioning & Bucketing and explain the use case of both.
14. What are the limitations of Partitioning.
15. How to optimize I/O reading in Hive.
16. How many types of tables are there in Hive
17. Why do we use external table in Hive
18. Name some Serde properties which has been used in your project
19. What metastore has been used in your project underlying of Hive
20. Name some Date & String manipulation function.




No comments:

Post a Comment