MPI and MapReduc eCCGSC 2010 Flat Rock NC Seembe 8 2010 Geoffrey Fox gcf@indiana.edu h.p://www.infomall.o rgh.p://www.futuregrid.o rgDirector, Digital Science Center, Pervasive Technology Institute Associate Dean for Research and Graduate Studies, School of Informatics and Computing Indiana University Bloomington r ptMapReduce Map(Key, Value) A hash function maps the results of the map Reduce(Key, List) tasks to reduce tasks • Implementa;ons (Hadoop – Java; Dryad – Windows) support: – SpliH'g of data with customized file systems – Passing the output of map func;ons to reduce func;ons – Sor;ng the inputs to the reduce func;on based on the intermediate keys – Quality of service • 20 petabytes per day (on an average of 400 machines) processed by Google using MapReduce September 2007 stupO RecudePnsoititra atDaMapReduce “File/Data Repository” Parallelism Map = (data parallel) computa;on reading and wri;ng data Redce = Collec;ve/Consolida;on phase e.g. forming mul;ple Instruments global ...
Voir