ORC
E97127
ORC (Optimized Row Columnar) is a highly efficient, columnar storage file format commonly used in big data systems to enable fast analytics and compression.
All labels observed (1)
| Label | Occurrences |
|---|---|
| ORC canonical | 3 |
How this entity was disambiguated
This entity first appeared as the object of triple T817110 — resolving that mention is where its identity was fixed. The disambiguator weighed these candidate entities and picked the highlighted one (or “None”, minting a new entity). This is how homonymy is resolved: the same surface form can point to different entities.
Target entity: ORC Context triple: [Amazon Redshift, supportsDataFormat, ORC]
-
A.
OAR
OAR is the commonly used abbreviation for the Oregon Administrative Rules, which comprise the codified regulations issued by Oregon’s state agencies.
-
B.
OAR
OAR is the commonly used acronym for the U.S. Environmental Protection Agency’s Office of Air and Radiation, which oversees national efforts to protect and improve air quality and control radiation exposure.
-
C.
ORX
ORX is the ticker symbol used to represent the Orix Buffaloes, a professional baseball team in Japan's Nippon Professional Baseball league.
-
D.
Orb
The Orb is a river in southern France that flows through the Occitanie region before emptying into the Mediterranean Sea.
-
E.
ORTA
ORTA is the commonly used abbreviation for the U.S. Stevenson-Wydler Technology Innovation Act of 1980, a federal law that promotes the transfer of technology from government laboratories to the private sector.
- F. None of above. chosen
- G. Unsure - the case is ambiguous/there is not enough information to decide.
Target entity: ORC Target entity description: ORC (Optimized Row Columnar) is a highly efficient, columnar storage file format commonly used in big data systems to enable fast analytics and compression.
-
A.
OAR
OAR is the commonly used abbreviation for the Oregon Administrative Rules, which comprise the codified regulations issued by Oregon’s state agencies.
-
B.
OAR
OAR is the commonly used acronym for the U.S. Environmental Protection Agency’s Office of Air and Radiation, which oversees national efforts to protect and improve air quality and control radiation exposure.
-
C.
ORX
ORX is the ticker symbol used to represent the Orix Buffaloes, a professional baseball team in Japan's Nippon Professional Baseball league.
-
D.
Orb
The Orb is a river in southern France that flows through the Occitanie region before emptying into the Mediterranean Sea.
-
E.
ORTA
ORTA is the commonly used abbreviation for the U.S. Stevenson-Wydler Technology Innovation Act of 1980, a federal law that promotes the transfer of technology from government laboratories to the private sector.
- F. None of above. chosen
Statements (50)
| Predicate | Object |
|---|---|
| instanceOf | columnar storage file format ⓘ |
| abbreviationFor | Optimized Row Columnar ⓘ |
| category | big data file format ⓘ |
| compatibleWith |
HDFS
ⓘ
cloud object stores ⓘ |
| compressionCodec |
LZ4
ⓘ
LZO ⓘ SNAPPY ⓘ ZLIB ⓘ ZSTD ⓘ |
| dataModel | columnar ⓘ |
| designedFor |
efficient columnar reads
ⓘ
fast analytics ⓘ high compression ratio ⓘ |
| fileExtension | .orc ⓘ |
| fullName | Optimized Row Columnar ⓘ |
| hasFeature |
bloom filters
ⓘ
column-level statistics ⓘ file-level statistics ⓘ lazy decompression ⓘ lightweight indexes ⓘ row groups ⓘ stripes ⓘ support for complex types ⓘ type-specific encodings ⓘ |
| license | Apache License 2.0 ⓘ |
| openSource | true ⓘ |
| optimizedFor |
large analytical queries
ⓘ
read-heavy workloads ⓘ |
| origin | developed for Apache Hive ⓘ |
| partOf | Apache ORC project ⓘ |
| stores |
metadata in file footer
ⓘ
statistics in file footer ⓘ |
| supports |
ACID tables in Hive
ⓘ
columnar storage ⓘ compression ⓘ predicate pushdown ⓘ schema evolution ⓘ splittable files ⓘ statistics per column ⓘ statistics per stripe ⓘ |
| usedFor |
big data analytics
ⓘ
data compression ⓘ efficient data storage ⓘ |
| usedIn |
Apache Flink
ⓘ
Apache Hive ⓘ Apache Spark ⓘ Hadoop ⓘ
surface form:
Hadoop ecosystem
Presto ⓘ Trino ⓘ |
How these facts were elicited
The pipeline generated the facts above by prompting gpt-5.1 with this entity's name + description and the instruction below.
You are a knowledge base construction expert. Given a subject entity and a description of it, return factual statements that you know for the subject as a JSON list of dictionaries(triples), where keys must be "subject", "predicate" and "object". The number of facts may be very high, between 25 to 50 or more, for very popular subjects. For less popular subjects, the number of facts can be very low, like 5 or 10. # Requirements - If you don't know the subject at all, return an empty list. - If the subject is not a named entity, return an empty list. - Include at least one triple where predicate is "instanceOf". - Do not get too wordy. - Separate several objects into multiple triples with one object.
Subject: ORC Description of subject: ORC (Optimized Row Columnar) is a highly efficient, columnar storage file format commonly used in big data systems to enable fast analytics and compression.
Referenced by (3)
Full triples — surface form annotated when it differs from this entity's canonical label.