BEGIN:VCALENDAR
PRODID:-//Microsoft Corporation//Outlook 11.0 MIMEDIR//EN
VERSION:1.0
BEGIN:VEVENT
DTSTART:20090528T013000Z
DTEND:20090528T040000Z
LOCATION;ENCODING=QUOTED-PRINTABLE:Cubberley Community Center
DESCRIPTION;ENCODING=QUOTED-PRINTABLE: Abstract: A rapid introduction to Hadoop architecture, MapReduce  patterns, and best practices with Cascading. Hadoop is an open  source implementation of the Google MapReduce processing model and has been  widely embraced by startups and established companies like Yahoo! and Amazon.  Cascading, also an open source project, is an alternative API to MapReduce that  allows developers to rapidly create sophisticated applications on the Hadoop  platform.Unfortunately the  MapReduce model can be very complex to manipulate when attempting to perform  tasks developers take for granted when using relational style databases, like  joins and secondary sorting of grouped values.Further,  integrating Hadoop with external systems requires a deep knowledge of its  internals. But this is where Hadoop clusters offer the most value, of  off-loading data cleansing and data migration tasks from traditional tools and  expensive load sensitive systems.Cascading is an  API that replaces the &ldquo;Map&rdquo; and &ldquo;Reduce&rdquo; primitives and their associated  Key/Value algebra with functions, filters, and aggregators, and links them all  together with a familiar columns and records model. And provides key processing  primitives familiar to developers.In this  presentation, we will present the Hadoop architecture, how MapReduce influences  that architecture and is used for common tasks, and how Cascading helps  developers rapidly build sophisticated data processing and orchestration  applications that can be very simply tested and  executed.Bio: Chris K Wensel has been a Software and Systems Architect for  over 15 years. He is the founder of Concurrent Inc., and the author of the  Cascading data processing open-source project. He&rsquo;s also a Principal at Scale  Unlimited, a professional services company offering commercial training and  consulting for Hadoop and related large architectures.  Over the last 7 years he has deployed large and sophisticated  data processing applications for use by companies providing geo-spatial, web  content, and financial data services in both the traditional enterprise  data-center and on Amazon EC2. LocationCubberley Community Center4000 Middlefield Road, Room H-1Palo Alto, CADirectionsAgenda6:30 - 7:00 p.m. Registration/Networking/Refreshments/Pizza7:00 - 9:00 p.m. PresentationsPrice$15 at the door for non-SDForum membersNo charge for SDForum membersNo registration required
SUMMARY;ENCODING=QUOTED-PRINTABLE:SAM SIG: Hadoop architecture, MapReduce patterns, and best practices w/Cascading
END:VEVENT
END:VCALENDAR
