RCFile
E702193
RCFile is a columnar storage file format designed for efficient data processing and querying in Hadoop-based systems.
Statements (42)
| Predicate | Object |
|---|---|
| instanceOf |
Hadoop file format
ⓘ
columnar storage file format ⓘ |
| abbreviationOf | Record Columnar File NERFINISHED ⓘ |
| accessPatternOptimizedFor | scan-heavy analytical workloads ⓘ |
| belongsTo | Hadoop storage formats family ⓘ |
| category | big data storage format ⓘ |
| compatibleWith | HDFS block structure ⓘ |
| compressionGranularity | per column within a row group ⓘ |
| dataLayout |
columnar
ⓘ
row-group based ⓘ |
| dataModel | table with rows and columns ⓘ |
| dataOrganization | row groups followed by column chunks ⓘ |
| designedFor |
Hadoop-based systems
ⓘ
efficient data processing ⓘ efficient data querying ⓘ |
| ecosystem |
Apache Hadoop
NERFINISHED
ⓘ
Apache Hive NERFINISHED ⓘ |
| enables | reading only selected columns ⓘ |
| fileExtension | .rcfile ⓘ |
| fullName | Record Columnar File NERFINISHED ⓘ |
| hasComponent |
key buffer
ⓘ
row group header ⓘ value buffer ⓘ |
| improvesOver | row-oriented storage formats ⓘ |
| optimizes |
I/O efficiency
ⓘ
data compression ⓘ query performance ⓘ |
| primaryGoal | balance between row and columnar storage advantages ⓘ |
| runsOn | Hadoop Distributed File System NERFINISHED ⓘ |
| storageType | on-disk file format ⓘ |
| stores | table data in columnar format ⓘ |
| supports |
MapReduce
NERFINISHED
ⓘ
block-level compression ⓘ column pruning ⓘ projection pushdown ⓘ schema-on-read ⓘ splittable input for MapReduce ⓘ |
| tradeOff | higher write cost for faster reads ⓘ |
| typicalUseCase |
analytical queries
ⓘ
large-scale data warehousing ⓘ |
| usedIn |
Apache Hive
NERFINISHED
ⓘ
Hadoop ecosystem NERFINISHED ⓘ |
Referenced by (1)
Full triples — surface form annotated when it differs from this entity's canonical label.