ORC

E97127

ORC (Optimized Row Columnar) is a highly efficient, columnar storage file format commonly used in big data systems to enable fast analytics and compression.

Jump to: Statements Referenced by

Statements (50)

Predicate Object
instanceOf columnar storage file format
abbreviationFor Optimized Row Columnar
category big data file format
compatibleWith HDFS
cloud object stores
compressionCodec LZ4
LZO
SNAPPY
ZLIB
ZSTD
dataModel columnar
designedFor efficient columnar reads
fast analytics
high compression ratio
fileExtension .orc
fullName Optimized Row Columnar
hasFeature bloom filters
column-level statistics
file-level statistics
lazy decompression
lightweight indexes
row groups
stripes
support for complex types
type-specific encodings
license Apache License 2.0 NERFINISHED
openSource true
optimizedFor large analytical queries
read-heavy workloads
origin developed for Apache Hive
partOf Apache ORC project
stores metadata in file footer
statistics in file footer
supports ACID tables in Hive
columnar storage
compression
predicate pushdown
schema evolution
splittable files
statistics per column
statistics per stripe
usedFor big data analytics
data compression
efficient data storage
usedIn Apache Flink
Apache Hive
Apache Spark
Hadoop
surface form: Hadoop ecosystem

Presto
Trino

Referenced by (2)

Full triples — surface form annotated when it differs from this entity's canonical label.