MapReduce

E185673

MapReduce is a programming model and processing framework for distributed computation of large data sets across clusters of computers.

All labels observed (4)

How this entity was disambiguated

Statements (50)

Predicate Object
instanceOf distributed computing framework
parallel computing model
programming model
abstractsAway details of data distribution
details of fault tolerance
details of parallelization
basedOn map function
reduce function
category big data technology
distributed data processing framework
commonlyUsedWith Google File System
HDFS
surface form: Hadoop Distributed File System
dataLocalityStrategy move computation to data
dataModel key-value pairs
describedBy Jeffrey Dean
Sanjay Ghemawat
describedIn MapReduce self-linksurface differs
surface form: MapReduce: Simplified Data Processing on Large Clusters
designedFor fault-tolerant distributed processing
developer Google
executionModel batch processing
faultToleranceMechanism re-execution of failed tasks
handles automatic data distribution
automatic fault recovery
task scheduling
hasComponent Map phase
Reduce phase
Shuffle phase
Sort phase
implementedIn Google internal infrastructure
influenced Hadoop
surface form: Apache Hadoop MapReduce

Apache Spark
Dryad
FlumeJava
inspiredBy functional programming
jobInput input splits
jobOutput output files in distributed file system
publicationYear 2004
purpose batch data processing
distributed computation
processing large data sets
runsOn cluster of commodity hardware
scalesTo petabytes of data
thousands of machines
supports data parallelism
task parallelism
usedFor ETL workloads
data mining
index building
log processing
machine learning preprocessing

How these facts were elicited

Referenced by (17)

Full triples — surface form annotated when it differs from this entity's canonical label.

Hadoop hasComponent MapReduce
Hadoop processingLayer MapReduce
Jeff Dean notableWork MapReduce
Jeff Dean notablePublication MapReduce
this entity surface form: MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean knownFor MapReduce
Jeffrey Dean workedOn MapReduce
this entity surface form: MapReduce programming model
Jeffrey Dean coAuthorOf MapReduce
this entity surface form: MapReduce: Simplified Data Processing on Large Clusters
YARN supportsFramework MapReduce
MapReduce describedIn MapReduce self-linksurface differs
this entity surface form: MapReduce: Simplified Data Processing on Large Clusters
Apache HBase integratesWith MapReduce
this entity surface form: Apache MapReduce
Google MapReduce paperTitle MapReduce
this entity surface form: MapReduce: Simplified Data Processing on Large Clusters
HDFS usedBy MapReduce
Apache Pig programmingModel MapReduce
Apache Pig executionEngine MapReduce
Sanjay Ghemawat coDesignerOf MapReduce
Sanjay Ghemawat notableWork MapReduce
this entity surface form: MapReduce: Simplified Data Processing on Large Clusters