ACM Data Mining: 1) Hadoop - Distributed Data Processing 2) Facebook’s Petabyte Scale Data Warehouse Using Hive and Hadoop
January 25, 2010 at 6:30 PM - 8:30 PM
LinkedIn, Mountain View
NEW MEETING DATE & LOCATION! TITLE 1: ”Hadoop: Distributed Data Processing” Hadoop is an open-source distributed platform designed to economically store and process data using clustered commodity hardware. Hadoop is Apache’s implementation of the MapReduce/GFS frameworks popularized by Google. In this talk we will demystify this powerful platform, and describe how it enables you to consolidate many different data storage and processing needs in an economically scalable cloud resource. SPEAKER BIOGRAPHY Dr. Amr Awadallah is Chief Technical Officer and Founder for Cloudera, Inc. Before Cloudera, he was vice president of product intelligence engineering at Yahoo! Inc., where he worked since June 2000 after Yahoo acquired his first startup (VivaSmart). Dr. Awadallah received his PhD from Stanford University in 2007 and his BS/MS degrees from Cairo University in 1992 and 1995, respectively. TITLE 2: ”Facebook’s Petabyte Scale Data Warehouse Using Hive and Hadoop” Hive is an open source, peta-byte scale date warehousing framework built on top of Hadoop that enables scalable analytics on large data sets using SQL and some language extensions. Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook – both engineering and non-engineering. This talk will highlight how Hive and Hadoop allow us at Facebook to offer a cheap, scalable and flexible infrastructure to do different kinds of analysis. We will talk about the architecture, applications and capabilities of this infrastructure which handles close to 8000 jobs a day and stores nearly 2.5PB of compressed data. SPEAKER BIOGRAPHY Ashish Thusoo has been with Facebook for the last couple of years and is managing the Facebook data infrastructure team in his most recent role. He started the Hive project at Facebook along with Joydeep and serves at the project lead for Hive at Apache.
Event Owner: Greg Makowski (Director of Risk Analytics and Policy at CashEdge)

Updates

Judith Barnett (CEO at Vyndyco Corporation) attended.
John Gillson (Senior Software Engineer at ON24) will be attending.
Bohan Chen (Director, Hadoop Development and Operations at Apollo Group) will be attending.
Atul Mohidekar (CTO and SVP of Operations at rfXcel) will be attending.
Francois Andry (Software Architect at Optum) will be attending.
Rahul Singh (Content Processing , Enrichment and Web Scale data analysis) will be attending.
Shreepadma Venugopalan (Software Engineer) will be attending.
Stefan Groschupf (CEO and Co-Founder at Datameer) will be attending.
Nalini Kartha (Member of Technical Staff at Salesforce.com) will be attending.
Ramarao Kadiyala (President Broadrange CRM Solutions, Inc) will be attending.
Show More...
Become a LinkedIn member. It's free!
Network with others attending this event
Attendees by company
Yahoo!
2
ON24
1
Apollo Group
1
rfXcel
1
Optum (NYSE:UNH)
1