Big Data Solution – Hadoop Development

 
(更多资料和具体参加方法)
 

Introduction Big Data

All about Data!
Data Storage and Analysis
Comparison with Other Systems
Rational Database Management System
Grid Computing
Volunteer Computing
A Brief History of Hadoop
Compatibility
 

Installation single node Hadoop

Prerequisites Installation Configuration Standalone Mode
Pseudo distributed Mode Configuration SSH Formatting HDFS filesystem
Starting and stopping MapReduce
Fully Distributed Mode
 

Creating Eclipse Plugin for Hadoop-2.x.0

Contents
Download and install Eclipse
Install git
Download source code for Hadoop Plugin for Eclipse from git
Compile and create jar
Install the plugin to eclipse
 

Developing a MapReduce Application

The Configuration Combining Resources Variable Expansion
Setting Up the Development Environment Managing Configuration GenericOptionsParser, Tool, and ToolRunne
Writing a Unit Test with MRUnit
Mapper
Reducer
Running Locally on Test Data Running a Job in a Local Job Runner Testing the Driver
Running on a Cluster Packaging a Job Launching a Job
The MapReduce Web UI Retrieving the Results Debugging a Job
Hadoop Logs Remote Debugging Tuning a Job Profiling Tasks
 

MapReduce Workflows

Decomposing a Problem into MapReduce Jobs
JobControl
Apache Oozie
MapReduce Features
Counters
Built-in Counters
User-Defined Java Counters
User-Defined Streaming Counters
Sorting Preparation Partial Sort Total Sort Secondary Sort Joins
Map-Side Joins
Reduce-Side Joins
Side Data Distribution
Using the Job Configuration Distributed Cache MapReduce Library Classes
 

Setting Up a Hadoop Cluster

Cluster Specification
Network Topology
Cluster Setup and Installation
Installing Java
Creating a Hadoop User Installing Hadoop Testing the Installation SSH Configuration
Hadoop Configuration
Configuration Management
Environment Settings
Important Hadoop Daemon Properties Hadoop Daemon Addresses and Ports Other Hadoop Properties
User Account Creation
YARN Configuration
Important YARN Daemon Properties YARN Daemon Addresses and Ports Security
Kerberos and Hadoop
Delegation Tokens
Other Security Enhancements Benchmarking a Hadoop Cluster Hadoop Benchmarks
User Jobs
Hadoop in the Cloud
Apache Whirr
 

Administering Hadoop

HDFS
Persistent Data Structures
Safe Mode Audit Logging Tools Monitoring Logging Metrics
Java Management Extensions
Maintenance
Routine Administration Procedures Commissioning and Decommissioning Nodes Upgrades
 

Pig

Installing and Running Pig
Execution Types Running Pig Programs Grunt
Pig Latin Editors An Example Generating Examples
Comparison with Databases
Pig Latin Structure Statements Expressions Types Schemas Functions Macros
User-Defined Functions
A Filter UD An Eval UDF A Load UDF
Data Processing Operators Loading and Storing Data Filtering Data
Grouping and Joining Data
Sorting Data
Combining and Splitting Data
Pig in Practice
Parallelism
Parameter Substitution
 

Hive

Installing Hive The Hive Shell An Example Running Hive
Configuring Hive
Hive Services
The Metastore
Comparison with Traditional Databases Schema on Read Versus Schema on Write Updates, Transactions, and Indexes HiveQL
Data Types
Operators and Functions
Tables
Managed Tables and External Tables
Partitions and Buckets
Storage Formats
Importing Data Altering Tables Dropping Tables Querying Data
Sorting and Aggregating
MapReduce Scripts
Joins Subqueries Views
User-Defined Functions
Writing a UDF Writing a UDAF
 

HBase

HBasics Backdrop Concepts
Whirlwind Tour of the Data Model
Implementation
Installation Test Drive Clients
Java
Avro, REST, and Thrift
Example Schemas Loading Data Web Queries
HBase Versus RDBMS
Successful Service
HBase
Use Case: HBase at Streamy.com
Praxis Versions HDFS
UI Metrics
Schema Design
Counters
Bulk Load
 

R and Hadoop

Introduction R language
Introduction RHadoop Big Data solution
RHadoop
RHadoop data analysis
RHadoop machine learning
 

Python and Hadoop

Python Programming
Python and Hadoop
Hadoop - mrjob development
 

Spark

Introduction Spark
PySpark
Machine Learning
 

Advanced Administration and monitoring

Multiple nodes

Add nodes
Decommission nodes
Recovering from Namenode failure
Monitoring cluster health using Ganglia - Pure Monitoring
Install Ambari - Manage and monitoring
Install Hue - Emphasis on use of hadoop environment and management
 

Clouderea Hadoop Certification

CCHA - Hadoop Administrator
CCHD – Hadoop Developer
 

Case Studies

Hadoop Usage at Last.fm
Last.fm: The Social Music Revolution
Hadoop at Last.fm
Generating Charts with Hadoop The Track Statistics Program Summary
 
 
 
 
 
(更多资料和具体参加方法)
本页最后更新: | -- | 网站设计和虚拟主机服务 WECAN.ca CMS