Analyzing Big Data with Hive

While Apache Hive is designed to allow users to leverage their SQL skills for Big Data analysis, it’s still a relatively new data warehouse infrastructure based on Hadoop and MapReduce operations.

In this workshop, Shrikanth Shankar, Head of Engineering at Qubole, will explain what Hive is and how it works behind the scenes. The workshop will begin with a brief overview of Big Data and Apache Hive, its pros and cons, focusing on the key differences between Hive and traditional data warehouses built on top of relational databases.

During the workshop, Shrikanth will cover data modeling in Hive, Hive Query Language constructs, features and syntax, the Hive Execution Model using MapReduce, and Hive performance optimization.

The workshop concludes with recommendations and examples based on best practices for Hive query optimization and for running a data warehouse on Hive.

Workshop Agenda:

Why Data Professionals Should Use Apache Hive

The Difference Between Hive and Traditional Data Warehousing Bulit on Relational Databases

Data Modeling in Hive (including input formats, SerDe's, partitioning etc.)

Hive Language Constructs (including transform operator, UDFs and UDTFs)

Features and Syntax of the HQL Language

The Hive Execution Model (MapReduce)

Hive Performance Optimization (layouts, advanced execution options, etc.)

Best Practices for Hive Query Optimization

Best Practices for Running a Data Warehouse on Hive

Before coming to Qubole, Shrikanth Shankar worked at Oracle for over a decade, rising to become Director of Development in the BI team. Shrikanth was one of the leaders of the Oracle Exalytics effort and helped drive the product from conception to release. Before that, Shrikanth was on the Database team in the SQL/DSS group, where he made significant contributions to many different portions of the Oracle stack, ranging from Partitioning, SQL Optimization, and SQL/Parallel Execution all the way to the Indexing and Data layers. Now Shrikanth is the Head of Engineering for Qubole, a pioneering startup in Big Data.