PySpark
E702182
PySpark is the Python API for Apache Spark, enabling large-scale data processing, analysis, and machine learning using Python.
All labels observed (1)
| Label | Occurrences |
|---|---|
| PySpark canonical | 2 |
Statements (65)
| Predicate | Object |
|---|---|
| instanceOf |
Apache Spark component
ⓘ
Python API ⓘ big data framework component ⓘ software library ⓘ |
| canBeUsedWith |
AWS EMR
NERFINISHED
ⓘ
Azure Synapse Analytics NERFINISHED ⓘ Databricks NERFINISHED ⓘ Google Dataproc NERFINISHED ⓘ |
| compatibleWith | Hadoop ecosystem NERFINISHED ⓘ |
| developer | Apache Software Foundation NERFINISHED ⓘ |
| documentation | https://spark.apache.org/docs/latest/api/python/ ⓘ |
| hasComponent |
pyspark.RDD
ⓘ
pyspark.accumulators NERFINISHED ⓘ pyspark.broadcast ⓘ pyspark.conf ⓘ pyspark.context ⓘ pyspark.files NERFINISHED ⓘ pyspark.ml NERFINISHED ⓘ pyspark.mllib NERFINISHED ⓘ pyspark.profiler NERFINISHED ⓘ pyspark.resource ⓘ pyspark.serializers ⓘ pyspark.sql ⓘ pyspark.sql.DataFrame ⓘ pyspark.sql.SparkSession ⓘ pyspark.sql.Window ⓘ pyspark.sql.functions NERFINISHED ⓘ pyspark.sql.types ⓘ pyspark.storagelevel ⓘ pyspark.streaming ⓘ pyspark.taskcontext ⓘ |
| interoperatesWith |
Jupyter Notebook
NERFINISHED
ⓘ
NumPy NERFINISHED ⓘ pandas NERFINISHED ⓘ scikit-learn NERFINISHED ⓘ |
| license | Apache License 2.0 ⓘ |
| partOf | Apache Spark NERFINISHED ⓘ |
| programmingLanguage | Python ⓘ |
| repository | https://github.com/apache/spark ⓘ |
| runsOn |
Apache Spark cluster
NERFINISHED
ⓘ
Kubernetes NERFINISHED ⓘ YARN NERFINISHED ⓘ standalone Spark cluster ⓘ |
| supports |
ETL workloads
ⓘ
SQL queries ⓘ batch processing ⓘ distributed computing ⓘ graph processing ⓘ large-scale data processing ⓘ machine learning ⓘ stream processing ⓘ |
| supportsLanguageFeature |
DataFrame API
ⓘ
RDD API ⓘ SQL API NERFINISHED ⓘ pandas user-defined functions ⓘ structured streaming ⓘ user-defined functions ⓘ |
| typicalUseCase |
data analytics
ⓘ
data engineering ⓘ data warehousing ⓘ machine learning pipelines ⓘ |
| uses |
Spark Core
NERFINISHED
ⓘ
Spark MLlib NERFINISHED ⓘ Spark SQL engine NERFINISHED ⓘ Spark Streaming NERFINISHED ⓘ |
Referenced by (2)
Full triples — surface form annotated when it differs from this entity's canonical label.