Apache Pig

E187922

Apache Pig is a high-level platform for creating MapReduce programs used to analyze large data sets in the Hadoop ecosystem.

All labels observed (1)

Label Occurrences
Apache Pig canonical 4

How this entity was disambiguated

Statements (49)

Predicate Object
instanceOf Apache Software Foundation project
big data tool
data processing platform
high-level programming language
abstractionLevel high-level
comparedWith Apache Hive
Apache Spark
surface form: Apache Spark SQL
designedFor batch processing
parallel data processing
developedBy Apache Software Foundation
ecosystem Hadoop
surface form: Hadoop ecosystem
executionEngine MapReduce
Apache Spark
surface form: Spark

Tez
hasComponent Pig Latin
hasFeature automatic optimization of execution plans
extensibility via UDFs
logical and physical execution plans
inputFormat semi-structured data
structured data
unstructured data
integratesWith Apache HBase
surface form: HBase

HDFS
Apache Hive
surface form: Hive

YARN
language Pig Latin
license Apache License 2.0
openSource true
paradigm data flow programming
PigLatin data flow language
programmingModel MapReduce
purpose analyzing large data sets
simplifying MapReduce programming
repository https://pig.apache.org/
runsOn Hadoop
supports MapReduce mode execution
data aggregation
data filtering
data joining
data transformation
local mode execution
schema-on-read
user-defined functions
targetUser data analysts
data engineers
useCase ETL pipelines
data preparation for analytics
log processing
writtenIn Java

How these facts were elicited

Referenced by (4)

Full triples — surface form annotated when it differs from this entity's canonical label.

Hadoop ecosystemIncludes Apache Pig
Apache HBase integratesWith Apache Pig
Apache Oozie integratesWith Apache Pig