UV Learn - Learning in Cloud made simple easy and effective

Big Data And Hadoop

Big Data And Hadoop

Big data has gone mainstream. Everywhere you look, people are talking about how to unlock value from the massive, ever-growing volumes of information residing within and beyond traditional data repositories.

Objectives of Course

  1. To enrich participants in the BIG Data field.
  2. To fulfill the huge gap between demand & supply of manpower trained in BIG Data.
  3. Create awareness of the research opportunities in this field.

Learn the basics of the Hadoop Distributed File System (HDFS) and MapReduce framework and how to write programs against its API, as well as discuss design techniques for larger workflows.

Pre-requisites

64 Bit processor laptop/PC with minimum 2GB RAM (for programming practice along with sessions

Windows user

64 bit OS, Min 4 GB RAM
VMWare Player 5.0.0
Linux VM– Ubuntu 12.04 LTS
Eclipse 3.6+
Putty – For opening Telnet sessions to the Linux VM
WinSCP – For transferring files between Windows and Linux VM

Linux/Mac Users (preferably a 64 bit machine):

Min 4 GB RAM
Eclipse 3.6+
JDK 1.6 or higher installed on your machine
SSH installed

1. Introduction

  1. What is BigData?
  2. Why Big Data?
  3. Limitationsof Big Data
  4. Hadoop Background
  5. The Hadoop Way

2. Getting Started with Hadoop

  1. Setting up VM Hadoop Environment
  2. Installing VMware Player
  3. Setting up the Virtual Environment (Virtual MachineUser Accounts; Running a Hadoop Job; Accessing the VM via ssh; Shutting Down the VM)

3. Hadoop Architecture

  1. Hadoop Cluster in commodity hardware
  2. Hadoop core services and components
  3. Regular file system vs. Hadoop
  4. HDFS layer
  5. HDFS operation principle
  6. HDFS 1.0 & HDFS2.0

4. Hadoop Deployment

  1. Hadoop installation
  2. Single node and multi node configuration
  3. HadoopConfiguration in cluster environment

5. MapReduce

  1. MapReduce concepts
  2. Hadoop MapReduce example
  3. Hadoop MapReduce requirements
  4. Steps of Hadoop MapReduce
  5. MapReduce user supply
  6. MapReduce framework
  7. Basics of MapReduce programming

6. Advance MapReduce

  1. Custom Data Types
  2. Inputformats
  3. Output formats
  4. Combiners and Partitioners
  5. Error handling and Unit Testing

7. PIG

  1. Introduction to PIG
  2. Why PIG
  3. Comparison between PIG and SQL
  4. Installing and configuring PIG
  5. Running PIG
  6. PIG Latin

8. HIVE

  1. Why another data warehousing system
  2. What is HIVE
  3. Type System
  4. Data Model- Tables, Partitions, Buckets, External Tables
  5. Serialization/De-serialization
  6. Hive file formats
  7. System Architecture and components
  8. HiveQuery Language
  9. HIVE: Installing, running andprogramming
  10. Difference between Hive and PIG

9. HBase

  1. HBase introduction
  2. HBase history
  3. Who uses HBase
  4. When to use HBase
  5. HBase Data Model
  6. HBase Families
  7. HBase Components
  8. Row Distribution between region servers
  9. Data Storage
  10. HBase Master
  11. HBase and Zookeeper
  12. HBase Deployment
  13. Installation of HBase
  14. Configuration of HBase

10. Cloudera

  1. What is Cloudera
  2. Cloudera enterprisepictorial view
  3. Downloading ClouderaQuickstart VM
  4. Starting the Cloudera VM
  5. Eploring the Welcome Page
  6. Understanding Hue
  7. Understanding Cloudera Manager

11. Zookeeper and Sqoop

  1. Introduction to Zookeeper
  2. WhatisZookeeper
  3. Challenges faced in distributed applications
  4. Coordination
  5. ZooKeeper: Goals and Uses
  6. Zookeeper: Entities, Data Model, Services
  7. Client APIs
  8. Introduction to Sqoop (Why,what, processing, under the hood)
  9. Importing data into Hive
  10. Importing data into HBase
  11. Exporting data from Hadoop using Sqoop
  12. Sqoop Connectors
  13. Connecting MongoDB (NoSQL database)

12. Ecosystem and its Components

  1. Why Flume and Chukwa
  2. What is Apache Flume
  3. Flume Model
  4. Scalability in Flume
  5. Chukwa
  6. Chukwa Architecture
  7. Chukwa Capability
  8. Chukwa Agent
  9. Introduction to Apache Oozie
  10. Apache Oozie Workflow
  11. Introduction to Mahout
  12. Introduction toYARN
  13. YARN Architecture
  14. Apache Cassandra
  15. WhyApache Cassandra

13. Hadoop Administration and Troubleshooting

  1. Different configurations of Hadoop cluster
  2. Performance monitoring
  3. Performance tuning
  4. Troubleshooting and Log observation
© 2019 UV Learn. All rights reserved.