ORC

E97127

columnar storage file format

ORC (Optimized Row Columnar) is a highly efficient, columnar storage file format commonly used in big data systems to enable fast analytics and compression.

Try in SPARQL Jump to: Surface forms Disambiguation Statements Elicitation Referenced by

All labels observed (1)

Label	Occurrences
ORC canonical	3

How this entity was disambiguated

This entity first appeared as the object of triple T817110 — resolving that mention is where its identity was fixed. The disambiguator weighed these candidate entities and picked the highlighted one (or “None”, minting a new entity). This is how homonymy is resolved: the same surface form can point to different entities.

NED1 Entity disambiguation (via context triple) gpt-5-mini-2025-08-07

Target entity: ORC
Context triple: [Amazon Redshift, supportsDataFormat, ORC]

A. OAR
OAR is the commonly used abbreviation for the Oregon Administrative Rules, which comprise the codified regulations issued by Oregon’s state agencies.
B. OAR
OAR is the commonly used acronym for the U.S. Environmental Protection Agency’s Office of Air and Radiation, which oversees national efforts to protect and improve air quality and control radiation exposure.
C. ORX
ORX is the ticker symbol used to represent the Orix Buffaloes, a professional baseball team in Japan's Nippon Professional Baseball league.
D. Orb
The Orb is a river in southern France that flows through the Occitanie region before emptying into the Mediterranean Sea.
E. ORTA
ORTA is the commonly used abbreviation for the U.S. Stevenson-Wydler Technology Innovation Act of 1980, a federal law that promotes the transfer of technology from government laboratories to the private sector.
F. None of above. chosen
G. Unsure - the case is ambiguous/there is not enough information to decide.

NED2 Entity disambiguation (via description) gpt-5-mini-2025-08-07

Target entity: ORC
Target entity description: ORC (Optimized Row Columnar) is a highly efficient, columnar storage file format commonly used in big data systems to enable fast analytics and compression.

A. OAR
OAR is the commonly used abbreviation for the Oregon Administrative Rules, which comprise the codified regulations issued by Oregon’s state agencies.
B. OAR
OAR is the commonly used acronym for the U.S. Environmental Protection Agency’s Office of Air and Radiation, which oversees national efforts to protect and improve air quality and control radiation exposure.
C. ORX
ORX is the ticker symbol used to represent the Orix Buffaloes, a professional baseball team in Japan's Nippon Professional Baseball league.
D. Orb
The Orb is a river in southern France that flows through the Occitanie region before emptying into the Mediterranean Sea.
E. ORTA
ORTA is the commonly used abbreviation for the U.S. Stevenson-Wydler Technology Innovation Act of 1980, a federal law that promotes the transfer of technology from government laboratories to the private sector.
F. None of above. chosen

Statements (50)

Predicate	Object
instanceOf	columnar storage file format ⓘ
abbreviationFor	Optimized Row Columnar ⓘ
category	big data file format ⓘ
compatibleWith	HDFS ⓘ cloud object stores ⓘ
compressionCodec	LZ4 ⓘ LZO ⓘ SNAPPY ⓘ ZLIB ⓘ ZSTD ⓘ
dataModel	columnar ⓘ
designedFor	efficient columnar reads ⓘ fast analytics ⓘ high compression ratio ⓘ
fileExtension	.orc ⓘ
fullName	Optimized Row Columnar ⓘ
hasFeature	bloom filters ⓘ column-level statistics ⓘ file-level statistics ⓘ lazy decompression ⓘ lightweight indexes ⓘ row groups ⓘ stripes ⓘ support for complex types ⓘ type-specific encodings ⓘ
license	Apache License 2.0 ⓘ
openSource	true ⓘ
optimizedFor	large analytical queries ⓘ read-heavy workloads ⓘ
origin	developed for Apache Hive ⓘ
partOf	Apache ORC project ⓘ
stores	metadata in file footer ⓘ statistics in file footer ⓘ
supports	ACID tables in Hive ⓘ columnar storage ⓘ compression ⓘ predicate pushdown ⓘ schema evolution ⓘ splittable files ⓘ statistics per column ⓘ statistics per stripe ⓘ
usedFor	big data analytics ⓘ data compression ⓘ efficient data storage ⓘ
usedIn	Apache Flink ⓘ Apache Hive ⓘ Apache Spark ⓘ Hadoop ⓘ surface form: Hadoop ecosystem Presto ⓘ Trino ⓘ

How these facts were elicited

Referenced by (3)

Full triples — surface form annotated when it differs from this entity's canonical label.

Amazon Redshift → supportsDataFormat → ORC ⓘ

Snowflake → supportsDataFormat → ORC ⓘ

Apache Hive → supportsFileFormat → ORC ⓘ

All labels observed (1)

How this entity was disambiguated Show

Statements (50)

How these facts were elicited Show

Referenced by (3)

How this entity was disambiguated

How these facts were elicited