AM2: Introduction to the Hadoop Ecosystem
Share this Session:
  Vladimir Bacvanski   Vladimir Bacvanski
Founder
SciSpike
 


 

Tuesday, August 20, 2013
08:30 AM - 11:45 AM

Level:  Technical - Introductory


This course is designed to provide a rapid immersion into NoSQL and Big Data systems with the Hadoop ecosystem. We start with an introduction to the Hadoop cluster and teach the ways to interact with the Hadoop file system and the cluster. We introduce Hive and Pig popular higher level interfaces to managing data in the Hadoop system.

Upon completion, attendees will understand: - NoSQL and Big Data concepts and technologies - Map Reduce concepts - The Hadoop file system - Hive and Pig for productive data management and development

Big Data and Hadoop: A quick dive:

  • Big Data
  • Problems with conventional systems
  • Map Reduce algorithm
  • Traditional Database Applications
  • Hadoop
  • MapReduce:

  • What is MapReduce?
  • Relevance of MapReduce to Big Data
  • Map operation
  • Reduce operation
  • Hadoop:

  • What is Hadoop?
  • The Hadoop architecture
  • Hadoop Distributed File System
  • Hadoop Distributed File System (HDFS):

  • HDFS Architecture
  • HDFS API
  • Scalability
  • Data replication
  • Hadoop Applications:

  • Typical Hadoop algorithms
  • Best practices for Hadoop
  • Working with Hive:

  • What is Hive?
  • Hive architecture
  • Data warehouse using Hive
  • Working with Pig:

  • What is Pig?
  • Analyzing data using Pig
  • Using Pig Latin to build data analysis programs

  • Dr. Vladimir Bacvanski has over two decades of engineering experience with mission critical and distributed enterprise systems and data technologies. Vladimir has helped a number of companies including the US Treasury, the Federal Reserve Bank, the US Navy, IBM, Dell, Hewlett Packard, JP Morgan Chase, General Electric, BAE Systems, AMD, and others to select, transition to, and apply new software and data technologies.

    Vladimir is published worldwide and is a keynote speaker, session chair, and workshop organizer at leading industry events. As a founder of SciSpike, Vladimir is focusing on Big Data technologies and highly scalable reactive software architectures with node.js and Scala. Vladimir is the author of the O'Reilly course on Big Data and NoSQL.


       
    Close Window