Google MapReduce

E185683

data processing framework distributed computing framework programming model

Google MapReduce is a programming model and processing framework developed by Google for large-scale distributed data processing across clusters of commodity hardware.

Try in SPARQL Jump to: Surface forms Disambiguation Statements Elicitation Referenced by

All labels observed (2)

Label	Occurrences
Google MapReduce canonical	1
MapReduce	1

How this entity was disambiguated

This entity first appeared as the object of triple T1647862 — resolving that mention is where its identity was fixed. The disambiguator weighed these candidate entities and picked the highlighted one (or “None”, minting a new entity). This is how homonymy is resolved: the same surface form can point to different entities.

NED1 Entity disambiguation (via context triple) gpt-5-mini-2025-08-07

Target entity: Google MapReduce
Context triple: [Hadoop, initiallyInspiredBy, Google MapReduce]

A. Hadoop
Hadoop is an open-source framework that enables distributed storage and parallel processing of large data sets across clusters of commodity hardware.
B. Google Earth
Google Earth is a virtual globe and mapping application that lets users explore detailed satellite imagery, 3D terrain, and geographic information for locations around the world.
C. Google Maps
Google Maps is a web-based mapping and navigation service by Google that provides detailed maps, real-time GPS navigation, traffic conditions, and location search worldwide.
D. The Anatomy of a Large-Scale Hypertextual Web Search Engine
"The Anatomy of a Large-Scale Hypertextual Web Search Engine" is a seminal research paper by Sergey Brin and Larry Page that introduced the design and PageRank algorithm behind the early Google search engine.
E. Google Cloud Dataflow
Google Cloud Dataflow is a fully managed service for developing and executing batch and streaming data processing pipelines, based on Apache Beam, within the Google Cloud ecosystem.
F. None of above. chosen
G. Unsure - the case is ambiguous/there is not enough information to decide.

NED2 Entity disambiguation (via description) gpt-5-mini-2025-08-07

Target entity: Google MapReduce
Target entity description: Google MapReduce is a programming model and processing framework developed by Google for large-scale distributed data processing across clusters of commodity hardware.

A. Hadoop
Hadoop is an open-source framework that enables distributed storage and parallel processing of large data sets across clusters of commodity hardware.
B. Google Earth
Google Earth is a virtual globe and mapping application that lets users explore detailed satellite imagery, 3D terrain, and geographic information for locations around the world.
C. Google Maps
Google Maps is a web-based mapping and navigation service by Google that provides detailed maps, real-time GPS navigation, traffic conditions, and location search worldwide.
D. The Anatomy of a Large-Scale Hypertextual Web Search Engine
"The Anatomy of a Large-Scale Hypertextual Web Search Engine" is a seminal research paper by Sergey Brin and Larry Page that introduced the design and PageRank algorithm behind the early Google search engine.
E. Google Cloud Dataflow
Google Cloud Dataflow is a fully managed service for developing and executing batch and streaming data processing pipelines, based on Apache Beam, within the Google Cloud ecosystem.
F. None of above. chosen

Statements (50)

Predicate	Object
instanceOf	data processing framework ⓘ distributed computing framework ⓘ programming model ⓘ
abstractsFromProgrammer	details of data distribution ⓘ details of fault tolerance ⓘ details of parallelization ⓘ
basedOn	map function ⓘ reduce function ⓘ
category	big data technology ⓘ distributed data processing ⓘ
dataLocalityStrategy	moving computation to data ⓘ
designedFor	batch processing ⓘ distributed data processing ⓘ large-scale data processing ⓘ
developer	Google ⓘ
executionModel	map phase followed by reduce phase ⓘ
faultToleranceMechanism	re-execution of failed tasks ⓘ
firstPublicationYear	2004 ⓘ
handles	automatic task scheduling ⓘ data partitioning ⓘ intermediate data shuffling ⓘ load balancing ⓘ re-execution of failed tasks ⓘ
influenced	Hadoop ⓘ surface form: Apache Hadoop MapReduce Apache Spark ⓘ surface form: Apache Spark programming model Dryad ⓘ FlumeJava ⓘ
inputDataModel	key-value pairs ⓘ
inspiredBy	functional programming ⓘ
notableFeature	automatic handling of machine failures ⓘ simple programming interface for large clusters ⓘ
operatesOn	clusters of commodity hardware ⓘ
outputDataModel	key-value pairs ⓘ
paperAuthors	Jeffrey Dean ⓘ Sanjay Ghemawat ⓘ
paperTitle	MapReduce ⓘ surface form: MapReduce: Simplified Data Processing on Large Clusters
publishedBy	Google ⓘ
schedulingUnit	map task ⓘ reduce task ⓘ
supports	automatic parallelization ⓘ data locality optimization ⓘ fault tolerance ⓘ scalability across thousands of machines ⓘ
usedFor	data mining ⓘ indexing web pages ⓘ log analysis ⓘ machine learning preprocessing ⓘ
usedWithin	Google indexing systems ⓘ Google log processing pipelines ⓘ Google search infrastructure ⓘ

How these facts were elicited

Referenced by (2)

Full triples — surface form annotated when it differs from this entity's canonical label.

Hadoop → initiallyInspiredBy → Google MapReduce ⓘ

Sanjay Ghemawat → knownFor → Google MapReduce ⓘ

this entity surface form: MapReduce

All labels observed (2)

How this entity was disambiguated Show

Statements (50)

How these facts were elicited Show

Referenced by (2)

How this entity was disambiguated

How these facts were elicited