Apache Spark

E185661

Apache Spark is an open-source, distributed data processing engine designed for large-scale data analytics, machine learning, and stream processing.

All labels observed (12)

How this entity was disambiguated

Statements (91)

Predicate Object
instanceOf big data framework
cluster computing framework
distributed data processing engine
open-source software
abbreviation Apache Spark self-linksurface differs
surface form: RDD
architecture master-slave architecture
canRunOn Apache Mesos
YARN
surface form: Hadoop YARN

Kubernetes
standalone cluster manager
canUseStorage Amazon S3
Azure Data Lake Storage
Google Cloud Storage
HDFS
surface form: Hadoop Distributed File System

local file system
category big data analytics
data engineering
machine learning platform
stream processing framework
component GraphX
Apache Spark self-linksurface differs
surface form: MLlib

PySpark
ESP8266 microcontrollers
surface form: Spark Core

Apache Spark self-linksurface differs
surface form: Spark SQL

Apache Spark self-linksurface differs
surface form: Spark Streaming

Apache Spark self-linksurface differs
surface form: SparkR

Structured Streaming
coreAbstraction Apache Spark self-linksurface differs
surface form: Resilient Distributed Dataset
designedFor batch processing
interactive data analytics
large-scale data processing
machine learning workloads
stream processing
developer Apache Software Foundation
donatedTo Apache Software Foundation
donationYear 2013
executionModel in-memory computing
hasComponent cluster manager
driver program
executors
initialReleaseDate 2010
integratesWith Apache Cassandra
Apache HBase
Hadoop
surface form: Apache Hadoop

Apache Hive
Apache Kafka
JDBC data sources
license Apache License 2.0
optimizedFor in-memory data processing
originatedAt UC Berkeley AMPLab
programmingLanguage Java
Python
R
SQL
Scala
provides Catalyst query optimizer
Tungsten execution engine
high-level APIs
low-level RDD API
schedulingUnit job
stage
task
supports SQL queries
batch processing
data parallelism
distributed computing
fault tolerance
graph processing
lazy evaluation
machine learning algorithms
stream processing
task parallelism
supportsAbstraction DataFrame
Dataset
supportsDeployment cloud environments
on-premises clusters
supportsLanguageAPI Java API
PySpark
Scala
surface form: Scala API

Apache Spark self-linksurface differs
surface form: Spark SQL

Apache Spark self-linksurface differs
surface form: SparkR
topLevelProjectSince 2014
useCase ETL pipelines
data warehousing
graph analytics
log processing
real-time analytics
recommendation systems
website https://spark.apache.org
writtenIn Java
Scala

How these facts were elicited

Referenced by (36)

Full triples — surface form annotated when it differs from this entity's canonical label.

Azure Synapse Analytics supports Apache Spark
Hadoop influenced Apache Spark
Scala ecosystem Apache Spark
KMeans implementedIn Apache Spark
this entity surface form: Apache Spark MLlib
AWS Glue programmingModel Apache Spark
ORC usedIn Apache Spark
Avro usedWith Apache Spark
Apache Mesos supportsFramework Apache Spark
Apache Spark supportsLanguageAPI Apache Spark self-linksurface differs
this entity surface form: SparkR
Apache Spark supportsLanguageAPI Apache Spark self-linksurface differs
this entity surface form: Spark SQL
Apache Spark coreAbstraction Apache Spark self-linksurface differs
this entity surface form: Resilient Distributed Dataset
Apache Spark abbreviation Apache Spark self-linksurface differs
this entity surface form: RDD
Apache Spark component Apache Spark self-linksurface differs
this entity surface form: Spark SQL
Apache Spark component Apache Spark self-linksurface differs
this entity surface form: Spark Streaming
Apache Spark component Apache Spark self-linksurface differs
this entity surface form: MLlib
Apache Spark component Apache Spark self-linksurface differs
this entity surface form: SparkR
Synapse Studio supports Apache Spark
YARN supportsFramework Apache Spark
MapReduce influenced Apache Spark
Apache Storm competesWith Apache Spark
this entity surface form: Apache Spark Streaming
Apache Hive runsOn Apache Spark
Apache HBase integratesWith Apache Spark
Apache Mahout integratesWith Apache Spark
Google MapReduce influenced Apache Spark
this entity surface form: Apache Spark programming model
HDFS usedBy Apache Spark
Apache Pig executionEngine Apache Spark
this entity surface form: Spark
Apache Pig comparedWith Apache Spark
this entity surface form: Apache Spark SQL
NVIDIA RAPIDS integratesWith Apache Spark
Databricks coreTechnology Apache Spark
ASF governs Apache Spark
subject surface form: Apache Software Foundation
ASF hasKeyProject Apache Spark
subject surface form: Apache Software Foundation
ApacheCon isRelatedTo Apache Spark
Cloudera usesTechnology Apache Spark