Apache Flink

E188935

big data framework distributed data processing framework open-source software stream processing framework

Apache Flink is an open-source distributed stream-processing framework designed for high-throughput, low-latency data processing and real-time analytics on large-scale data.

Try in SPARQL Jump to: Surface forms Disambiguation Statements Elicitation Referenced by

All labels observed (1)

Label	Occurrences
Apache Flink canonical	7

How this entity was disambiguated

This entity first appeared as the object of triple T1647851 — resolving that mention is where its identity was fixed. The disambiguator weighed these candidate entities and picked the highlighted one (or “None”, minting a new entity). This is how homonymy is resolved: the same surface form can point to different entities.

NED1 Entity disambiguation (via context triple) gpt-5-mini-2025-08-07

Target entity: Apache Flink
Context triple: [Hadoop, influenced, Apache Flink]

A. Apache Spark
Apache Spark is an open-source, distributed data processing engine designed for large-scale data analytics, machine learning, and stream processing.
B. Apache Storm
Apache Storm is a distributed real-time computation system designed for processing large streams of data with low latency and high fault tolerance.
C. Apache Flume
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log and event data into Hadoop and other data stores.
D. Hadoop
Hadoop is an open-source framework that enables distributed storage and parallel processing of large data sets across clusters of commodity hardware.
E. Google Cloud Dataflow
Google Cloud Dataflow is a fully managed service for developing and executing batch and streaming data processing pipelines, based on Apache Beam, within the Google Cloud ecosystem.
F. None of above. chosen
G. Unsure - the case is ambiguous/there is not enough information to decide.

NED2 Entity disambiguation (via description) gpt-5-mini-2025-08-07

Target entity: Apache Flink
Target entity description: Apache Flink is an open-source distributed stream-processing framework designed for high-throughput, low-latency data processing and real-time analytics on large-scale data.

A. Apache Spark
Apache Spark is an open-source, distributed data processing engine designed for large-scale data analytics, machine learning, and stream processing.
B. Apache Storm
Apache Storm is a distributed real-time computation system designed for processing large streams of data with low latency and high fault tolerance.
C. Apache Flume
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log and event data into Hadoop and other data stores.
D. Hadoop
Hadoop is an open-source framework that enables distributed storage and parallel processing of large data sets across clusters of commodity hardware.
E. Google Cloud Dataflow
Google Cloud Dataflow is a fully managed service for developing and executing batch and streaming data processing pipelines, based on Apache Beam, within the Google Cloud ecosystem.
F. None of above. chosen

Statements (69)

Predicate	Object
instanceOf	big data framework ⓘ distributed data processing framework ⓘ open-source software ⓘ stream processing framework ⓘ
designedFor	high-throughput workloads ⓘ large-scale data processing ⓘ low-latency workloads ⓘ
developer	Apache Software Foundation ⓘ
hasAPI	DataSet API ⓘ DataStream API ⓘ SQL API ⓘ Table API ⓘ
hasComponent	Dispatcher ⓘ JobManager ⓘ ResourceManager ⓘ TaskManager ⓘ
integratesWith	Amazon Kinesis ⓘ Apache HBase ⓘ Hadoop ⓘ surface form: Apache Hadoop Apache Hive ⓘ Apache Kafka ⓘ JDBC data sources ⓘ
license	Apache License 2.0 ⓘ
partOf	Apache ecosystem ⓘ
programmingLanguage	Java ⓘ Scala ⓘ
repository	https://github.com/apache/flink ⓘ
supportsDeployment	Kubernetes ⓘ Mesos ⓘ YARN ⓘ cloud environments ⓘ native Kubernetes deployment ⓘ standalone cluster ⓘ
supportsFeature	SQL-based analytics ⓘ batch table API ⓘ checkpointing ⓘ complex event processing ⓘ connectors to external systems ⓘ data stream API ⓘ event-time windows ⓘ exactly-once state consistency ⓘ fault tolerance ⓘ high-throughput processing ⓘ iterative processing ⓘ low-latency processing ⓘ savepoints ⓘ state backends ⓘ stateful functions ⓘ streaming table API ⓘ table API ⓘ watermarks ⓘ
supportsLanguage	Java ⓘ Python ⓘ SQL ⓘ Scala ⓘ
supportsModel	batch processing ⓘ event-time processing ⓘ stateful stream processing ⓘ stream processing ⓘ
supportsSemantic	at-least-once ⓘ exactly-once ⓘ
useCase	ETL pipelines ⓘ event-driven applications ⓘ fraud detection ⓘ log processing ⓘ machine learning pipelines ⓘ monitoring and alerting ⓘ real-time analytics ⓘ
website	https://flink.apache.org/ ⓘ

How these facts were elicited

Referenced by (7)

Full triples — surface form annotated when it differs from this entity's canonical label.

Hadoop → influenced → Apache Flink ⓘ

ORC → usedIn → Apache Flink ⓘ

Google Cloud Dataproc → supportsFramework → Apache Flink ⓘ

Avro → usedWith → Apache Flink ⓘ

YARN → supportsFramework → Apache Flink ⓘ

Apache Storm → competesWith → Apache Flink ⓘ

Apache Mahout → integratesWith → Apache Flink ⓘ

All labels observed (1)

How this entity was disambiguated Show

Statements (69)

How these facts were elicited Show

Referenced by (7)

How this entity was disambiguated

How these facts were elicited