MapReduce

E185673

distributed computing framework parallel computing model programming model

MapReduce is a programming model and processing framework for distributed computation of large data sets across clusters of computers.

Try in SPARQL Jump to: Surface forms Disambiguation Statements Elicitation Referenced by

All labels observed (4)

Label	Occurrences
MapReduce canonical	10
MapReduce: Simplified Data Processing on Large Clusters	5
Apache MapReduce	1
MapReduce programming model	1

How this entity was disambiguated

This entity first appeared as the object of triple T1647831 — resolving that mention is where its identity was fixed. The disambiguator weighed these candidate entities and picked the highlighted one (or “None”, minting a new entity). This is how homonymy is resolved: the same surface form can point to different entities.

NED1 Entity disambiguation (via context triple) gpt-5-mini-2025-08-07

Target entity: MapReduce
Context triple: [Hadoop, hasComponent, MapReduce]

A. Hadoop
Hadoop is an open-source framework that enables distributed storage and parallel processing of large data sets across clusters of commodity hardware.
B. Google Cloud Dataflow
Google Cloud Dataflow is a fully managed service for developing and executing batch and streaming data processing pipelines, based on Apache Beam, within the Google Cloud ecosystem.
C. Apache Mesos
Apache Mesos is an open-source cluster manager that abstracts CPU, memory, storage, and other resources away from machines to enable efficient deployment and scaling of distributed applications and frameworks.
D. Google Cloud Dataproc
Google Cloud Dataproc is a managed cloud service for running Apache Hadoop, Spark, and other big data workloads on scalable, automated clusters in Google Cloud.
E. Paxos consensus algorithm
The Paxos consensus algorithm is a fault-tolerant protocol for achieving agreement among distributed systems, widely used as a foundation for reliable, replicated state machines and modern distributed databases.
F. None of above. chosen
G. Unsure - the case is ambiguous/there is not enough information to decide.

NED2 Entity disambiguation (via description) gpt-5-mini-2025-08-07

Target entity: MapReduce
Target entity description: MapReduce is a programming model and processing framework for distributed computation of large data sets across clusters of computers.

A. Hadoop
Hadoop is an open-source framework that enables distributed storage and parallel processing of large data sets across clusters of commodity hardware.
B. Google Cloud Dataflow
Google Cloud Dataflow is a fully managed service for developing and executing batch and streaming data processing pipelines, based on Apache Beam, within the Google Cloud ecosystem.
C. Apache Mesos
Apache Mesos is an open-source cluster manager that abstracts CPU, memory, storage, and other resources away from machines to enable efficient deployment and scaling of distributed applications and frameworks.
D. Google Cloud Dataproc
Google Cloud Dataproc is a managed cloud service for running Apache Hadoop, Spark, and other big data workloads on scalable, automated clusters in Google Cloud.
E. Paxos consensus algorithm
The Paxos consensus algorithm is a fault-tolerant protocol for achieving agreement among distributed systems, widely used as a foundation for reliable, replicated state machines and modern distributed databases.
F. None of above. chosen

Statements (50)

Predicate	Object
instanceOf	distributed computing framework ⓘ parallel computing model ⓘ programming model ⓘ
abstractsAway	details of data distribution ⓘ details of fault tolerance ⓘ details of parallelization ⓘ
basedOn	map function ⓘ reduce function ⓘ
category	big data technology ⓘ distributed data processing framework ⓘ
commonlyUsedWith	Google File System ⓘ HDFS ⓘ surface form: Hadoop Distributed File System
dataLocalityStrategy	move computation to data ⓘ
dataModel	key-value pairs ⓘ
describedBy	Jeffrey Dean ⓘ Sanjay Ghemawat ⓘ
describedIn	MapReduce self-linksurface differs ⓘ surface form: MapReduce: Simplified Data Processing on Large Clusters
designedFor	fault-tolerant distributed processing ⓘ
developer	Google ⓘ
executionModel	batch processing ⓘ
faultToleranceMechanism	re-execution of failed tasks ⓘ
handles	automatic data distribution ⓘ automatic fault recovery ⓘ task scheduling ⓘ
hasComponent	Map phase ⓘ Reduce phase ⓘ Shuffle phase ⓘ Sort phase ⓘ
implementedIn	Google internal infrastructure ⓘ
influenced	Hadoop ⓘ surface form: Apache Hadoop MapReduce Apache Spark ⓘ Dryad ⓘ FlumeJava ⓘ
inspiredBy	functional programming ⓘ
jobInput	input splits ⓘ
jobOutput	output files in distributed file system ⓘ
publicationYear	2004 ⓘ
purpose	batch data processing ⓘ distributed computation ⓘ processing large data sets ⓘ
runsOn	cluster of commodity hardware ⓘ
scalesTo	petabytes of data ⓘ thousands of machines ⓘ
supports	data parallelism ⓘ task parallelism ⓘ
usedFor	ETL workloads ⓘ data mining ⓘ index building ⓘ log processing ⓘ machine learning preprocessing ⓘ

How these facts were elicited

Referenced by (17)

Full triples — surface form annotated when it differs from this entity's canonical label.

Hadoop → hasComponent → MapReduce ⓘ

Hadoop → processingLayer → MapReduce ⓘ

Jeff Dean → notableWork → MapReduce ⓘ

Jeff Dean → notablePublication → MapReduce ⓘ