ORC
E97127
ORC (Optimized Row Columnar) is a highly efficient, columnar storage file format commonly used in big data systems to enable fast analytics and compression.
Statements (50)
| Predicate | Object |
|---|---|
| instanceOf | columnar storage file format ⓘ |
| abbreviationFor | Optimized Row Columnar ⓘ |
| category | big data file format ⓘ |
| compatibleWith |
HDFS
ⓘ
cloud object stores ⓘ |
| compressionCodec |
LZ4
ⓘ
LZO ⓘ SNAPPY ⓘ ZLIB ⓘ ZSTD ⓘ |
| dataModel | columnar ⓘ |
| designedFor |
efficient columnar reads
ⓘ
fast analytics ⓘ high compression ratio ⓘ |
| fileExtension | .orc ⓘ |
| fullName | Optimized Row Columnar ⓘ |
| hasFeature |
bloom filters
ⓘ
column-level statistics ⓘ file-level statistics ⓘ lazy decompression ⓘ lightweight indexes ⓘ row groups ⓘ stripes ⓘ support for complex types ⓘ type-specific encodings ⓘ |
| license | Apache License 2.0 NERFINISHED ⓘ |
| openSource | true ⓘ |
| optimizedFor |
large analytical queries
ⓘ
read-heavy workloads ⓘ |
| origin | developed for Apache Hive ⓘ |
| partOf | Apache ORC project ⓘ |
| stores |
metadata in file footer
ⓘ
statistics in file footer ⓘ |
| supports |
ACID tables in Hive
ⓘ
columnar storage ⓘ compression ⓘ predicate pushdown ⓘ schema evolution ⓘ splittable files ⓘ statistics per column ⓘ statistics per stripe ⓘ |
| usedFor |
big data analytics
ⓘ
data compression ⓘ efficient data storage ⓘ |
| usedIn |
Apache Flink
ⓘ
Apache Hive ⓘ Apache Spark ⓘ Hadoop ⓘ
surface form:
Hadoop ecosystem
Presto ⓘ Trino ⓘ |
Referenced by (2)
Full triples — surface form annotated when it differs from this entity's canonical label.