Google Cloud Dataproc

E100303

Google Cloud Dataproc is a managed cloud service for running Apache Hadoop, Spark, and other big data workloads on scalable, automated clusters in Google Cloud.

All labels observed (3)

Label Occurrences
Amazon EMR 1
Dataproc 1
Google Cloud Dataproc canonical 1

How this entity was disambiguated

Statements (67)

Predicate Object
instanceOf Google Cloud Platform service
big data processing service
managed cloud service
billingModel pay-as-you-go
per-second billing
clusterType high availability
single node
standard
deploymentModel managed cluster
serverless
developer Google
feature autoscaling
autoscaling policies
component gateway
custom images
ephemeral clusters
high availability clusters
initialization actions
integrated logging
integrated monitoring
job-level scheduling
preemptible worker nodes
workflow templates
integratesWith Google BigQuery
surface form: BigQuery

Bigtable
surface form: Cloud Bigtable

Cloud Composer
Cloud IAM
Cloud Key Management Service
surface form: Cloud KMS

Cloud Logging
Cloud Monitoring
Google Cloud Pub/Sub
surface form: Cloud Pub/Sub

Google Cloud Storage
VPC networks
Vertex AI
managementInterface Google Cloud Console
REST API
client libraries
gcloud CLI
partOf Google Cloud
surface form: Google Cloud Platform
regionAvailability multiple Google Cloud regions
securityFeature IAM-based access control
VPC Service Controls
encryption at rest
encryption in transit
supports long-running clusters
on-demand jobs
scheduled jobs
short-lived clusters
supportsFramework Apache Flink
Hadoop
surface form: Apache Hadoop

Apache Hive
Apache Pig
Apache Spark
Project Jupyter
surface form: Jupyter

Presto
supportsLanguage Java
Python
SQL
Scala
supportsStorage Google BigQuery
surface form: BigQuery connector

Google Cloud Storage
surface form: Cloud Storage connector

HDFS
useCase ETL workloads
batch data processing
data warehousing
log processing
machine learning pipelines

How these facts were elicited

Referenced by (3)

Full triples — surface form annotated when it differs from this entity's canonical label.

Google BigQuery integratesWith Google Cloud Dataproc
Google Cloud hasComponent Google Cloud Dataproc
this entity surface form: Dataproc
Amazon Web Services offersService Google Cloud Dataproc
this entity surface form: Amazon EMR