Hadoop was created in 2003. In the early years, Hadoop provided large scale data processing with MapReduce, and distributed fault-tolerant storage with the Hadoop Distributed File System. Over the last decade, Hadoop has evolved rapidly, with the support of a big open-source community.
Today’s guest is Mike Cafarella, co-creator of Hadoop. Mike takes us on a journey from past to present. Hadoop was based on the Google File System and MapReduce papers, and so Mike and I talk about what it was like to work on a distributed file system in 2004, and the challenges of implementing real software systems based on white papers. We also discuss YARN, and the wave of innovation that YARN enabled within the Hadoop ecosystem. Mike will also be presenting at Strata + Hadoop World in San Jose. We’re partnering with O’Reilly to support this conference – if you want to go to Strata, you can save 20% off a ticket with our code PCSED.