WARC
E457874
WARC is a standardized file format used to store and archive web crawls and their associated metadata at scale.
Statements (48)
| Predicate | Object |
|---|---|
| instanceOf |
file format
ⓘ
web archiving format ⓘ |
| allows |
linking between records via identifiers
ⓘ
storing multiple representations of same resource ⓘ |
| associatedSoftware |
Heritrix web crawler
NERFINISHED
ⓘ
OpenWayback NERFINISHED ⓘ Webrecorder tools ⓘ pywb NERFINISHED ⓘ |
| compatibleWith | HTTP NERFINISHED ⓘ |
| compressionSupport | GZIP NERFINISHED ⓘ |
| designedBy | International Internet Preservation Consortium NERFINISHED ⓘ |
| designedFor |
batch-oriented processing
ⓘ
large-scale web crawls ⓘ |
| domain |
digital preservation
ⓘ
web archiving ⓘ |
| fileExtension |
.warc
ⓘ
.warc.gz ⓘ |
| fullName | Web ARChive format NERFINISHED ⓘ |
| governedBy |
ISO 28500:2009
ⓘ
ISO 28500:2017 NERFINISHED ⓘ |
| headerFormat | text-based key-value headers ⓘ |
| initialPublicationYear | 2009 ⓘ |
| latestRevisionYear | 2017 ⓘ |
| mediaType | application/warc ⓘ |
| payloadFormat | binary or text payloads ⓘ |
| predecessor | ARC file format ⓘ |
| primaryUse |
preserving web content at scale
ⓘ
storing web crawls ⓘ web archiving ⓘ |
| recordIdentification | URI-based identifiers ⓘ |
| recordStructure | sequence of self-contained records ⓘ |
| standardizedBy | International Organization for Standardization NERFINISHED ⓘ |
| standardNumber | ISO 28500 NERFINISHED ⓘ |
| status | international standard ⓘ |
| stores |
HTTP request records
ⓘ
HTTP response records ⓘ continuation records ⓘ conversion records ⓘ metadata records ⓘ resource records ⓘ revisit records ⓘ |
| supports |
deduplication via revisit records
ⓘ
embedded metadata ⓘ long-term preservation of web content ⓘ |
| usedBy |
Internet Archive
NERFINISHED
ⓘ
national libraries ⓘ research institutions ⓘ web archiving projects ⓘ |
Referenced by (1)
Full triples — surface form annotated when it differs from this entity's canonical label.