Facebook’s Petabyte Scale Data Warehouse using Hive and HadoopWednesday, January 27, 2010Why Another Data Warehousing System?Data, data and more data200GB per day in March 2008 12+TB(compressed) raw data per day todayWednesday, January 27, 2010Trends Leading to More Data Wednesday, January 27, 2010Trends Leading to More Data Free or low cost of user servicesWednesday, January 27, 2010Trends Leading to More Data Free or low cost of user servicesRealization that more insights are derived fromsimple algorithms on more dataWednesday, January 27, 2010Deficiencies of Existing TechnologiesWednesday, January 27, 2010Deficiencies of Existing TechnologiesCost of Analysis and Storage on proprietary systems does not support trends towards more dataWednesday, January 27, 2010Deficiencies of Existing TechnologiesCost of Analysis and Storage on proprietary systems does not support trends towards more dataLimited Scalability does not support trends towards more dataWednesday, January 27, 2010Deficiencies of Existing TechnologiesCost of Analysis and Storage on proprietary systems does not support trends towards more dataLimited Scalability does not support trends towards more dataClosed and Proprietary SystemsWednesday, January 27, 2010Lets try Hadoop… Pros– Superior in availability/scalability/manageability– Efficiency not that great, but throw more hardware– Partial Availability/resilience/scale more important than ACID Cons: Programmability ...
Voir