Database Research Group PhD Seminar

2010 Feb 10 at 14:30

DC 1331

Integrating MapReduce ideas into Distributed DBMS

Iman Elghandour, PhD candidate, David R. Cheriton School of Comp. Sci., Univ. Waterloo

MapReduce has emerged as a framework to process large scale data. It has also become popular for its scalability and fault tolerance. MapReduce and parallel databases share some similarities. But, MapReduce is designed for unstructured data and it lacks the efficiency of DBMS. Therefore, recent research has focused on combining Mapreduce with independent units of DBMS running on cluster nodes.

In this talk, I will discuss two different approaches: HadoopDB and OspreyDB. In HadoopDB, the Hadoop MapReduce implementation is used as a communication layer on top of single node DBMS instances. In contrast, Osprey exports the MapReduce fault tolerance and adds it to their distributed shared nothing database.

[1] A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. HadoopDB: An architectural hybrid of mapreduce and DBMS technologies for analytical workloads. Proc. VLDB Endow., 2(1):922933, 2009.

[2] C. Yang, C. Yen, C. Tan, and S. Madden. Osprey: Implementing mapreduce-style fault tolerance in a shared-nothing distributed database. In ICDE '10, 2010.