| |
| |
| |
|
|
|
Cloud Computing with Hadoop
|
|
Duration:
3 Days
|
|
|
This course is designed for professionals who have experience with
programming in Java.
-
Architects
-
Designers
-
Consultants
-
Developers
-
Technical Managersn
| |
Fluency in Java
|
|
|
Description
|
Hadoop is an open-source Cloud computing environment that implements the Googletm MapReduce
framework in Java. Hadoop is created and maintained by the Apache project.
MapReduce makes it very easy to process and generate large data sets on the cloud. Using MapReduce, you
can divide the work to be performed in to smaller chunks, where multiple chunks can be processed concurrently. You
can then combine the results to obtain the final result. MapReduce enables one to exploit the massive parallelism
provided by the cloud and provides a simple interface to a very complex and distributed computing infrastructure.
If you can model your problem as a MapReduce problem, then you can take advantage of the Cloud computing
environment provided by Hadoop.
Hadoop enables the development of reliable, scalable, efficient, economical and distributed computing using
very simple Java interfaces - massive parallel code without the pain!. Hadoop includes a distributed file
system, HDFS and a system for provisioning virtual Hadoop clusters over a large physical cluster called
Hadoop On Demand (HOD).
We will systematically go through the installation of the Hadoop development tools in Eclipse and the API's used when
building Hadoop applications. We will also set up a cluster of nodes to execute Hadoop applications. This course uses
the latest stable Hadoop implementation.
The course is filled with many useful, real-world examples of applications that are best implemented using the Hadoop
framework and is taught by instructors who have more than 20 years of experience in distributed and parallel software
development. This course also covers best practices for architecting and designing Hadoop applications.
|
|
Objectives
|
This course teach the students how to design, test and deploy
Java applications on the cloud using the Hadoop Framework. Upon completion
of this course, the students should be able to:
-
Understand the benefits and architecture of Cloud Computing
-
Understand the MapReduce programming paradigm
-
Understand the Hadoop API
-
Install Hadoop Eclipse Plugins for development
-
Implement solutions to deploy on the Hadoop platform
-
Use the Hadoop Distributed File System (HDFS)
-
Configure a Hadoop cluster on Linux
-
Administer the Hadoop environment
-
Master design principals and patterns for cloud computing on Hadoop
|
|
Course Outline
|
Cloud Computing
- What is Cloud Computing?
- Cloud Computing and Web 2.0
- Why Cloud Computing?
- Architectural overview
- Comparison with traditional distributed computing platforms
- Survey of platforms
MapReduce
- What is MapReduce?
- Relevance of MapReduce to Cloud computing
- Map operation
- Reduce operation
- Survey of real-world "MapReduce" problems
- Execution strategies for MapReduce
Hadoop
- What is Hadoop?
- The Hadoop architecture
- Hadoop tools installation
- Configuring a Hadoop cluster
- First example in Hadoop
Hadoop API
- Mapper
- Reducer
- Combiner
- JobConf
- JobClient
- Exercise - A real-life Hadoop application
Hadoop Distributed File System (HDFS)
- HDFS Architecture
- HDFS API
- Web interface
- Command shell
- Managing upgrades and rollback
- Permissions and Security
- Scalability
- Data replication
Hadoop On Demand (HOD)
- HOD principles
- HOD session
- Provisioning and managing clusters
- Configuring Hadoop
- Web user interfaces
- Managing logs
- Deallocating idle clusters
- Configuring HOD
- Command line shell
| |
Chukwa
- What is Chukwa?
- Chukwa Architecture
- Data collection with Chukwa
- Displaying data
- Monitoring
- Analyzing results
HBase
- What is HBase?
- HBase Architecture
- HBase API
- Managing large data sets with HBase
- Using HBase in Hadoop applications
Hive
- What is Hive?
- Hive architecture
- Data warehouse using Hive
- Hive QL
- Plugging custom mappers and reducers
Pig
- What is Pig?
- Pig architecture
- Analyzing data using Pig
- Using Pig Latin to build data analysis programs
- Using the Pig compiler
- Optimization
Zookeeper
- What is Zookeeper?
- Zookeeper architecture
- Implementing coordination services
- Data model
- Zookeeper API
- Examples
Summary and Conclusions
- What have we learned?
- Sample Applications
- Useful references
|
|
|
|