Facebook's Petabyte Scale Data Warehouse using Hive and Hadoop

icon

40

pages

icon

English

icon

Documents

2011

Écrit par

Publié par

Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres

icon

40

pages

icon

English

icon

Ebook

2011

Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres

Facebook’s Petabyte Scale Data Warehouse using Hive and HadoopWednesday, January 27, 2010Why Another Data Warehousing System?Data, data and more data200GB per day in March 2008 12+TB(compressed) raw data per day todayWednesday, January 27, 2010Trends Leading to More Data Wednesday, January 27, 2010Trends Leading to More Data Free or low cost of user servicesWednesday, January 27, 2010Trends Leading to More Data Free or low cost of user servicesRealization that more insights are derived fromsimple algorithms on more dataWednesday, January 27, 2010Deficiencies of Existing TechnologiesWednesday, January 27, 2010Deficiencies of Existing TechnologiesCost of Analysis and Storage on proprietary systems does not support trends towards more dataWednesday, January 27, 2010Deficiencies of Existing TechnologiesCost of Analysis and Storage on proprietary systems does not support trends towards more dataLimited Scalability does not support trends towards more dataWednesday, January 27, 2010Deficiencies of Existing TechnologiesCost of Analysis and Storage on proprietary systems does not support trends towards more dataLimited Scalability does not support trends towards more dataClosed and Proprietary SystemsWednesday, January 27, 2010Lets try Hadoop… Pros– Superior in availability/scalability/manageability– Efficiency not that great, but throw more hardware– Partial Availability/resilience/scale more important than ACID Cons: Programmability ...
Voir Alternate Text

Publié par

Publié le

04 août 2011

Nombre de lectures

112

Langue

English

Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Wednesday, January 27, 2010
W
Why Another Data Warehousing System?
ednesday,January27,2010
Data, data and more data 200GB per day in March 2008 12+TB(compressed) raw data per day today
W
Trends Leading to More Data
ed
n
e
s
d
a
y
,
Ja
n
u
ar
y 2
7, 2
0
1
0
Trends Leading to More Data
Wednesday, January 27, 2010
Free or low cost of user services
W
Trends Leading to More Data
ednesday,Janu
Free or low cost of user services
Realization that more insights are derived from simple algorithms on more data
ary27,2010
Deficiencies of Existing Technologies
Wednesday, January 27, 2010
W
Deficiencies of Existing Technologies
ednesda
Cost of Analysis and Storage on proprietary systems does not support trends towards more data
y,January27,2010
W
Deficiencies of Existing Technologies
ednesda
Cost of Analysis and Storage on proprietary systems does not support trends towards more data
y,January
Limited Scalability does not support trends towards more data
27,2010
W
Deficiencies of Existing Technologies
ednesda
Cost of Analysis and Storage on proprietary systems does not support trends towards more data
y,January
Limited Scalability does not support trends towards more data
27,2010
Closed and Proprietary Systems
W
Lets try Hadoop
edne
Pros – Superior in availability/scalability/manageability – Efficiency not that great, but throw more hardware – Partial Availability/resilience/scale more important than ACID
Cons: Programmability and Metadata – Map-reduce hard to program (users know sql/bash/python) – Need to publish data in well known schemas
Solution: HIVE
sday,January27,2010
W
What is HIVE?
edne
A system for managing and querying structured data built on top of Hadoop – Map-Reduce for execution – HDFS for storage – Metadata in an RDBMS
Key Building Principles: SQL as a familiar data warehousing tool – Extensibility – Types, Functions, Formats, Scripts – Scalability and Performance – Interoperability
sday,January27,2010
Voir Alternate Text
  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents
Alternate Text