Advanced Big Data and Hadoop Course & Certification

Course Schedule and Upcoming Batches

Please take a note of the Upcoming batch schedules. Mode of course delivery is either Instructor led Online classes or Classroom sessions.

Self Paced Course
Class Mode
Classes Start
Price (INR)
Price (USD)
Self Paced
6 Weeks
5% Discount for limited period only
Class Mode
Classes Start
Time (IST)
Price (INR)
Price (USD)
To be confirmed
6 Weeks

We are running a special Combo Offer for Self Paced Courses on Big Data and Data and Dimensional Modeling. Click here to know more. Or you can also call our sales representative.

*Service Tax rate is revised to 14.5% from 15 November 2015. This includes S.B. cess of 0.5%.

*Classroom location is Bangalore.

In case you want to Register now and take up this course later, you can register by clicking on " Register" section.

If you have any query, you can call/mail or send your query by clicking on " Send a Query" section.

Sample Class Video

Check out the Introduction Class video here:

About the Course

Learn the Big Data Concepts. Learn why Hadoop has become so important for processing humungous amount of data and how is it changing the rules of the game. Learn how Big Data fits into the whole scheme of Data Science and Business Intelligence.

This course is designed to give a learner the general knowledge of Big Data systems and practical knowledge of Hadoop. Learn Hadoop in detail. Learn about MapReduce, Pig, Hive, HBase, Zookeeper, OOZIE, MRV2, FLUME, SQOOP and other Hadoop related tools. Get hands-on exposure on Hadoop by working on a project.

After completing this course, you will be able to appreciate the importance of Big Data technologies and will become comfortable working on Hadoop. This is the right course if you want to start your career in Big Data.

Course Objective

After completion of this course, you will be able to:

  1. Learn about Big Data and why is it becoming so important now days.
  2. Understand the differences between data science technologies like Business Intelligence, Data Warehousing, Analytics, Big Data and how they work together in a typical data science environment.
  3. Learn the basic concepts behind Hadoop's hugely successful MapReduce Architecture.
  4. Learn about Hadoop Architecture, Hadoop Ecosystem and Hadoop Family.
  5. Learn Data Loading Techniques FLUME and SQOOP.
  6. Learn Advanced MapReduce topics like MapReduce execution framework, MapReduce User Interfaces, Configuration, Job Environment, Hadoop Data Types etc.
  7. Learn Pig and Pig Latin.
  8. Learn Hive and HiveQL.
  9. Learn NoSQL, HBase and Zookeeper.
  10. Learn OOZIE and HCatalog.
  11. Learn Hadoop 2.0, MRV2 (or YARN).
  12. Learn how to Integrate Hadoop with Business Intelligence and Analytics tools.
  13. Learn how to create Hadoop Cluster
  14. Learn Hadoop Development Best practices.
  15. Work on 17 comprehensive Lab Exercises on Hadoop and start writing your own programs using Hadoop.
  16. Work on an end-to-end projects on Hadoop.

Why is it the right course for you?

If you aspire to become a Data Scientist with focus on Big Data then this is the right course for you. This course will help you equip with an overall understanding of Hadoop. This course is beneficial not only for Software Professionals, Business Intelligence/Analytics/ETL developers, Managers, Quality Assurance Professionals but also for those professionals who are not in this list but want to gain a good understanding of Big Data.


It is better if you are familiar with Java, however it is not a must. We will guide you to get a required level of understanding in Java.

Course Curriculum

1) Introduction to Big Data

Learning Objectives

In this module you will understand what is Big Data, why so much hype around it? What types of problems big data can solve? What are the current mega projects in Big Data space? How is this becoming a big business opportunity? What are the best practices of Big Data?


What is Big Data? , Why big data?, Evolution of big data, Big Data in Science and research ,Big Data in Government, Big Data in Private sector, Big Data Business Opportunity, Some use cases of Big Data, ,Big Data Critique, Big Data Best practices.

2) What is the difference between Business Intelligence, Analytics and Big Data?

Learning Objectives

In this module you will get a clear idea about the distinctions between BI, Analytics and Big Data and learn how these Data Science technologies will co-exists together rather than kill each other as speculated by media.

3) Framework of Big Data and Data Science Study

Learning Objectives

In this module you will learn about various disciplines of data sciences and how are they interlinked together. You will also learn about various learning and job related opportunities and how can you transition yourself for next generation roles.

4) Big Data and Hadoop Architecture

Learning Objectives

In this module you will learn the high level architecture behind Big Data and how Hadoop became the de-facto standard of Big Data. You will understand how the traditional database technologies were not sufficient to handle the big amount of data and how Hadoop could solve this problem.


Brewer's Theorem, Map Reduce Architecture, SMAQ stack for big data, Big Data on Cloud.

5) Hadoop and Hadoop Eco-system

Learning Objectives

In this module you will learn the common Hadoop ecosystem components, Hadoop Architecture, HDFS and Map Reduce Framework, Hadoop Cluster Architecture and Setup, Important Configuration files in a Hadoop Cluster, How to install hadoop on your local machine and how to work on it.


Hadoop ,Hadoop MapReduce, Hadoop Server Roles: NameNode, Secondary NameNode, and DataNode, Hadoop Cluster Architecture, Hadoop Installation and Configuration, Hadoop Installation and configurations on Learners machines.

6) Data Loading Techniques

Learning Objectives

In this module you will learn the data loading techniques Flume and Sqoop.



7) Advanced Map-Reduce

Learning Objectives

In this module, you will learn Hadoop MapReduce framework .You will learn what are the Input and Output formats and their usage. You will also learn Advanced MapReduce concepts such as Counters, Schedulers, Custom Writables, Compression, Serialization, Tuning, Error Handling etc.


MapReduce execution framework, MapReduce User Interfaces, Mapper, Reducer, Shuffle, Sort, Reduce, Partitioner, Reporter, Configuring a MapReduce Job, MapReduce Job Environment, Job Submission and Monitoring, Job Authorization, Job Control, Job Credentials, Hadoop Data Types, , Job Input and Output Formats (Input Splits and Records, Text Input, Binary Input, Multiple Inputs, TextOutput, BinaryOutPut, Multiple Output), Partitioners and Combiners, Advanced MapReduce, Counters, Custom Writables, Unit Testing: JUnit and MRUnit testing framework, Error Handling, Hadoop Project: MapReduce Programming.

8) Pig and Pig Latin

Learning Objectives

You will learn why Pig is an important component of Hadoop framework and how PIG makes it easier to create MapReduce programs. You will learn Pig Latin commands and will be able to write Pig latin scripts.


Introduction to Pig, Pig Keywords, Pig Installation and execution, Pig basic commands, Exercise on Data Processing in Pig, Uploading data files, Creating scripts, running scripts.

9) Hive and HiveQL

Learning Objectives

In this module you will learn about Hive and HiveQL, how to install Apache, Loading and Querying Data in Hive and so on.


Hive Architecture, Hive Installation Steps, Hive Data Model, Hive Data Type, How to Process Data with Apache Hive (Exercise on Hive) Introduction to HiveQL, Data Units, Built In Operators and Functions, Operators on Complex Types, Built In Functions, Hive Data Definition Language (DDL) and Data Manipulation Language (DML), Browsing Tables and Partitions, Altering Tables, Truncate Table, Dropping Tables and Partitions, Loading Data, Loading files into tables, Inserting data into Hive Tables from queries, Partitions and Buckets, Partitions, Buckets, Create/Drop/Alter Database, Querying and Inserting Data

10) NoSQL, HBase and Zookeeper

Learning Objectives

You will learn the importance of NoSQL databases, about HBase in details, loading and querying the data in HBase. You will learn about Zookeeper and why HBase uses Zookeeper and also an example of Zookeeper application.


NoSQL databases, Introduction of HBase, HBase Architecture details

Introduction to Zookeeper, Data model and the hierarchical namespace, Nodes and ephemeral nodes, Zookeeper example


Learning Objectives

In this module you will learn about Oozie, importance of workflow scheduler, definition, packaging and deployment of Oozie workflow.


Introduction of Oozie, Defining an Oozie workflow, Packaging and deploying an Oozie workflow application.

12) HCatalog

Learning Objectives

You will learn how HCatalog make it easier to read and write data on the grid. Why it is important to provide the user with data format independence while working with Pig, MapReduce or Hive.


Introduction to HCatalog, HCatalog Architecture , Data Flow example.

13) Hadoop 2.0, MRV2 (or YARN)

Learning Objectives

In this module, you will learn new features in Hadoop 2.0, namely YARN (also called MRv2), NameNode High Availability, HDFS Federation etc.


Fair Capacity, Capacity scheduler, NameNode High Availability, Introduction to Yarn, Programming in YARN framework.

14) Integration of Hadoop with Business Intelligence and Analytics tools

Learning Objectives

In this module, you will learn how Hadoop is becoming the inseparable part of Data Science ecosystem and how the integration of Hadoop with BI and Analytics tools are done.


How Hadoop fits in the ecosystem with BI and Analytics tools, Hadoop integration with BIRT, Hadop Integration with R

15) Hadoop Cluster Setup

Learning Objectives

In this module, you will learn how to setup Hadoop Cluster using Ambari.


Hadoop Cluster Setup using Ambari.

16) Introduction of other Apache tools for Big Data and Machine Learning tasks

Learning Objectives

In this module, you will learn about some other big data and machine learning tools under Apache umbrella.


  1. Apache Mahout
  2. Apache Spark
  3. Apache Storm

17) Labs

Learning Objectives

There are lot of Lab exercises included in this course. These Lab exercises will enable you to do hands-on on Hadoop and help you gain practical knowledge on Hadoop from beginners to advanced level.

Lab 1: Hadoop

  • Installation of Hadoop
  • Installation of Required software like VMWare, Putty, FileZilla etc
  • Starting the Virtual Machine and Hadoop
  • Test to check if Hadoop is running properly

Lab 2 : HDFS

  • To demonstrate how data is stored in HDFS.
  • Upload File to HDFS
  • General HDFS commands like ls, du, cd, tail, mkdir , put etc

Lab 3 : Loading Data to HDFS

  • Loading Data from Local system (using put etc)
  • Loading Data from a Database to HDFS using Sqoop
  • Export Data from HDFS to a Database using Sqoop

Lab 4 : Run a MapReduce Job

  • Setup and run a job using MapReduce

Lab 5 : Getting Started with Pig & Pig Latin

  • Execute some basic Pig commands, load data into a relation, and store a relation into a folder in HDFS using different formats.
  • Run the Pig scripts
  • Define Schema, store, analyze and query the datasets.
  • Running operations like Group By, ForEach, Limit, Filter ,DUMP, Store
  • Advanced Pig Programming - Parameter, Parallel Flatten, Order by, Case, Parameter, Inner join, outer join, replicated join, Cogroup
  • Splitting Datasets
  • Joining Datasets
  • Preparing data for hive

Lab 6 : Analyzing Website Log Data using Pig

  • Analyze the data generated from the visitors of website
  • File has data about users, total time spent on the website and website url
  • We use Pig to determine statistical information about the web sessions, like the length of each session and the average and median lengths of all sessions

Lab 7 : Analyzing Stock Market Data using Pig

  • Analyze Stock Market data and find out the range of stock prices of some of the stocks.

Lab 8 : Hive

  • Create Hive Table and store data from a file
  • Create partition
  • Query the data stored as a Table and find out some useful insights
  • Write multiple Hive Queries and become comfortable with Hive
  • Compute nGram in Hive
  • Join Datasets in Hive

Lab 9 : HCatalog

  • Write a program for HCatalog by using Pig

Lab 10 : Advanced Hive Programming

  • Run Hive queries and use some of the more advanced features of Hive like views and the window functions.

Lab 11 : YARN application

  • Execute the distributed shell YARN application

Lab 12 : Oozie workflow

  • Define and run Oozie workflow

Lab 13 : Hadoop Integration with R

  • Integrate R with Hadoop

Lab 14 : Run Word Count Program on R & Hadoop

  • Run Word Count Program on R & Hadoop

Lab 15 : Website visitor prediction using R and Hadoop

  • Predict the number of website visitors by using predictive power of R and from the data stored on Hadoop.

Lab 16 : Integration of Hadoop with BIRT

  • Integrate BIRT (Open source BI tool) with Hadoop and create reports by directly connecting to Hadoop.

Lab 17 : Hadoop Cluster Setup

  • Setup a three node cluster on Hadoop using Ambari.

18) Projects

Learning Objectives

In this module you will work on some Hadoop projects and get a first hand experience of how to solve real world problems using Hadoop.

Project 1: Extract, Refine and Visualize Sentiment Data

This project demonstrates how to extract live social media data, how to refine and attach a sentiment score to each set of data and how to analyze and visualize this refined sentiment data. This is a live project where you can use your preferred keywords to extract and download live social media data.

Project 2: Visualize Website Clickstream Data

This project demonstrates how to refine website clickstream data and how to analyze and visualize this refined data.

Project Assessment

Project assessment will be done by the Trainer and a grade will awarded based on the performance on project. Grading will be done on the scale of 1 to 5, 5 being the outstanding performance and 1 being the Unsatisfactory performance. Scale is as below:

  1. Outstanding
  2. Very Good
  3. Good
  4. Average
  5. Unsatisfactory

Course Duration

Online Instructor Led Classes OR Classroom Sessions: 32 Hours

There will be eight instructor led interactive online classes or Classroom sessions depending on the learning mode. Each class will be of approximately four hours. If you miss a class then you can reschedule it in a different batch or you can also access class video recordings anytime.

Lab Work : 40 Hours

There will be lab exercises for 40 hours. These labs exercises will be guided by our expert faculty.

Project Work : 30 Hours

Project works need to be carried out during the course. We keep adding new projects to each batch and give our learners real life experience on creating and running Big Data applications.

Course Work : 40 Hours

Study material (E books) of 40 hours or more will be given as a course work to be completed.

Exam : 1 Hour

A proctored exam of 1 hour will be conducted for final assessment.

Big Data Course Features

  • 8 Days classes (Classroom or Online)
  • 40 Hrs of Lab Exercises with proprietary VM
  • 30 Hrs of Real Time Industry based Projects
  • 40 Hrs of High Quality e-Learning content
  • Anytime doubt clarification policy
  • Advanced modules like YARN, Flume, Oozie, Mahout & Ambari
  • Java Essentials for Hadoop guide included
  • Certified Big Data and Hadoop Developer
  • Hadoop Installation Procedure Included
  • 45 PDUs Offered for PMI's certification purpose
  • Quizzes in each class
  • Life time access to video lectures
  • Differences between BI, Analytics and Big Data
  • End to end knowledge on Big Data and Hadoop
  • Life time access to BIWHIZ Support for any Big Data query

Other Features

  • Live Classes

    Live Instructor Led classes from Industry Experts. Option to choose from Online or Classroom Lectures. Case studies from real life projects.

  • Ongoing Research

    Research team works hard to bring out the latest innovations and best practices of course subjects. Courses are evolving continuously; they never get stale.

  • Be More Productive

    Get work related tips and perform your work more efficiently. Once you know the tricks of trade, you become more productive.

  • Industry Experts

    Classes are conducted by Industry Experts. Learners gain from world class curriculum and extensive experiences of Trainers as well.

  • Award of PDUs

    We award you 45 PDUs which can be used towards PMI certifications' requirements. It is completely free but you need to raise a separate request for it.

  • Learning Material

    Learners get unlimited access to online and offline materials. Don't worry if you miss any class, we will be providing you a repeat class online or offline.

  • Learning Support

    Learners are encouraged to ask online and offline questions. Our team of Trainers makes it a priority to answer these questions.

  • Money-back Guarantee

    If you are not satisfied with the quality of "Classroom Course" then take full refund within the seven days of first class. However there is NO REFUND for "Self Paced Course" because we deliver all the training material at once in soft copy format and we cannot UNDO it. Please buy Self-Paced course only when you are 100% convinced about the quality.

  • Easy Reschedule

    Have you missed a class? Don't worry !!, You can watch the class videos or you can also request for a reschedule. We will invite you for next class for Free.

  • Career Support

    Learner's resume is reviewed and a one-to-one discussion is arranged with an expert to advice on career roadmap and job opportunities. For details, visit Career Centre

System Requirements

Operating System: 32-bit and 64-bit OS (Windows XP, Windows 7, Windows 8 and Mac OSX)

RAM: Minimum 4GB RAM; 8Gb required to run Ambari and Hbase

Processor : i3 equivalent or more

Browser: Chrome 25+, IE 9+, Safari 6+

Virtualization enabled on BIOS

System/Laptop with complete setup will be provided in the classroom; you can also bring your own laptop.

If you are attending Instructor Led Online training then please make sure that your system meets these requirements.


Certificate of Participation

A certificate of participation will be awarded after participating in the training program. The name of certificate is "BIWHIZ CERTIFICATE OF PARTICIPATION ON BIG DATA".

Certification after Completion

A certification will be issued after assessment of assignments, course work requirements, projects and a written test as per the course curriculum. After successfully completing all the requirements and passing the written test, a certification will be issued. The name of the certification is "BIWHIZ CERTIFICATION ON BIG DATA".

Certificate Issuing Authority

BIWHIZ is part of the company "Business Intelligence Consultant and Services LLP". Certificates are issued by "Business Intelligence Consultant and Services LLP". This is a registered company with Ministry of Corporate Affairs, Government of India.

Sample Exam Questions

This sample is only for illustrative purpose and only basic level questions are displayed here. Actual exam questions may be completely different with different format as well. Please contact your coordinator to know more about prevailing exam format.

ZooKeeper is?

Choose one.

  1. MapReduce framework
  2. Workflow scheduler
  3. Coordination service
  4. Data loading tool

Oozie is?

Choose one.

  1. Data Acess Language
  2. Columner Database
  3. Machine Learning Library
  4. Workflow scheduler

HCatalog is?

Choose one.

  1. Database Lock
  2. Storage management
  3. Database catalog
  4. Web Interface

Big Data is?

Choose one.

  1. Huge Volume of Data
  2. Data stored in Hadoop
  3. Unstructured Data
  4. Social Media Data

Only Hadoop can handle Big Data?

Choose one.

  1. True
  2. False
  3. Yes, till so far
  4. Can't Say

NoSQL Databases support?

Choose one.

  1. No Query
  2. SQL like query
  3. No DDL
  4. Key-Value Store

Frequently Asked Questions

  1. Is it necessary to have an experience in Java before starting this course?

    Absolutely No !!, Only a minimum level of Java knowledge is required but that too can be learnt during the course. However it is better if you have basic computer knowledge and have worked in any of the programming language. We will cover from basics to advanced Hadoop topics.

  2. I have good knowledge in BI/Datawarehousing, Is this course for me? What value would it add to my career?

    Business Intelligence, Data warehousing , Analytics and Big Data together form the science of Data or the field of Information Management. Even today some of your colleague might be working on Big Data techniques. Its good to have this skill if you belong to BI community. This will enable you to work in diverse areas and grow as a complete Data Science professional.

  3. I do coding in ASP.NET. My domain is e-commerce applications. Can I do this course? And what would be the professional advantages of doing so?

    Yes, you can definitely enroll for this course. Lot of companies need professionals who have coding experience plus Big Data/Hadoop knowledge. Data problems comes from anywhere, whether it is e-commerce, retail, banking, finance or any other domain. You can switch to Big Data domain in your company once you get good knowledge on it.

  4. Where would I be developing and running Hadoop Projects?

    We will help you setup Hadoop environment on your machine. If you are attending classroom sessions then we will provide you the required systems having all necessary setups.

  5. I guess I need to have very fast Internet connection to be able to attend online courses?

    Generally 1 MB connection is sufficient however lower bandwidth connections are also working fine.

  6. What are the system requirements to install Hadoop?

    4 GB RAM and i3 processor or more.

  7. I heard that Hadoop runs on Linux, but I have Windows OS, Should I be able to install it?

    Yes, you can do it through VMPlayer , we will guide you for installation on Windows.

  8. Can I request for another class if I miss it?

    You can watch the recorded session video or you can also attend the same class in next batch.

  9. I liked your course but I am not comfortable with online classes?

    You can attend our classroom based courses.

  10. Why Certification is included in the course?

    This is to make sure that a learner has understood the course content and BIWHIZ has verified the knowledge level of the Learner. Certification is the proof that you have achieved a certain level of expertise on course and any authorized organization can verify that with us.

  11. What if I am not able to clear the Certification exam in first attempt?

    You can take extra attempts for free.

  12. What is the learning support and would it be available after completing the course as well?

    You can raise your queries and doubts during and post training period. All queries will be resolved by Big Data/Hadoop experts.

  13. Do you help in career related issues as well, like reviewing resume, mentoring for my career growth?

    Yes, we have a panel of Big Data experts who will guide and mentor you; please check Career Centre for more details.

  14. Will you sell my data to Recruitment/marketing companies or will you use it for Recruitment or any other activities not related with Training?

    NO, Never. Your details are highly confidential and safe with us. But If you are willing to accept any such calls then you can inform us in advance with a specific need and we will contact only for those specific requirements. For a detailed privacy policy please check Terms, Conditions & Privacy Policy.

Register Here

Please register for this course here. Even if you are not ready to Enroll now, you can register now and get an intimation about our next batch whenever it is starting.

Please Fill Your Details

Send a Query

You got a query for us? Please use the Form below to send your query. We will get back to you soon. Have a great day !!

Please Fill Your Details & Query