Hadoop admin

Course curriculum

Watch the recorded demo session

Module 1 – Introduction to Hadoop
• The amount of data processing in today’s life
• What Hadoop is why it is important?
• Hadoop comparison with traditional systems
• Hadoop history
• Hadoop main components and architecture

Module 2 – Hadoop Distributed File System (HDFS)
• HDFS overview and design
• HDFS architecture
• HDFS file storage
• Component failures and recoveries
• Block placement
• Balancing the Hadoop cluster

Module 3 – Planning your Hadoop cluster
• Planning a Hadoop cluster and its capacity
• Hadoop software and hardware configuration
• HDFS Block replication and rack awareness
• Network topology for Hadoop cluster

Module 4 – Hadoop Deployment
• Different Hadoop deployment types
• Hadoop distribution options
• Hadoop competitors
• Hadoop installation procedure
Distributed cluster architecture
Lab: Hadoop Installation

Module 5 – Working with HDFS
• Ways of accessing data in HDFS
• Common HDFS operations and commands
• Different HDFS commands
• Internals of a file read in HDFS
• Data copying with ‘distcp’
Lab: Working with HDFS

Module 6 – Mapreduce Abstraction
• What MapReduce is and why it is popular
• The Big Picture of the MapReduce
• MapReduce process and terminology
• MapReduce components failures and recoveries
• Working with MapReduce

Module 7 – Hadoop Cluster Configuration
• Hadoop configuration overview and important configuration file
• Configuration parameters and values
• HDFS parameters MapReduce parameters
• Hadoop environment setup
• ‘Include’ and ‘Exclude’ configuration files
Lab: MapReduce Performance Tuning

Module 8 – Hadoop Administration and Maintenance
• Namenode/Datanode directory structures and files
• File system image and Edit log
• The Checkpoint Procedure
• Namenode failure and recovery procedure
• Safe Mode
• Metadata and Data backup
• Potential problems and solutions / what to look for
• Adding and removing nodes
Lab: MapReduce File system Recovery

Module 9 – Hadoop Monitoring and Troubleshooting

• Best practices of monitoring a Hadoop cluster
• Using logs and stack traces for monitoring and troubleshooting
• Using open-source tools to monitor Hadoop cluster
Module 10 – Job Scheduling
• How to schedule Hadoop Jobs on the same cluster
• Default Hadoop FIFO Schedule
• Fair Scheduler and its configuration
Module 11 – Project – Hadoop Multi Node Cluster Setup and Running Map Reduce Jobs on Amazon Ec2
• Hadoop Multi Node Cluster Setup using Amazon ec2 – Creating 4 node cluster setup
• Running Map Reduce Jobs on Cluster
Module 12 – High Availability Federation, Yarn and Security
1. Project – Working with Map Reduce, Hive, Sqoop
Problem Statement – It describes that how to import mysql data using sqoop and querying it using hive and also describes that how to run the word count mapreduce job.
2. Project – Multinode Cluster Setup
Problem Statement – It includes following actions:
• Hadoop Multi Node Cluster Setup using Amazon ec2 – Creating 4 node cluster setup
• Running Map Reduce Jobs on Cluster

About the course

About the trainer

Trainer details here

Sample resumes

Sample resumes will follow