WARC

E457874

WARC is a standardized file format used to store and archive web crawls and their associated metadata at scale.

Try in SPARQL Jump to: Statements Referenced by

Statements (48)

Predicate Object
instanceOf file format
web archiving format
allows linking between records via identifiers
storing multiple representations of same resource
associatedSoftware Heritrix web crawler NERFINISHED
OpenWayback NERFINISHED
Webrecorder tools
pywb NERFINISHED
compatibleWith HTTP NERFINISHED
compressionSupport GZIP NERFINISHED
designedBy International Internet Preservation Consortium NERFINISHED
designedFor batch-oriented processing
large-scale web crawls
domain digital preservation
web archiving
fileExtension .warc
.warc.gz
fullName Web ARChive format NERFINISHED
governedBy ISO 28500:2009
ISO 28500:2017 NERFINISHED
headerFormat text-based key-value headers
initialPublicationYear 2009
latestRevisionYear 2017
mediaType application/warc
payloadFormat binary or text payloads
predecessor ARC file format
primaryUse preserving web content at scale
storing web crawls
web archiving
recordIdentification URI-based identifiers
recordStructure sequence of self-contained records
standardizedBy International Organization for Standardization NERFINISHED
standardNumber ISO 28500 NERFINISHED
status international standard
stores HTTP request records
HTTP response records
continuation records
conversion records
metadata records
resource records
revisit records
supports deduplication via revisit records
embedded metadata
long-term preservation of web content
usedBy Internet Archive NERFINISHED
national libraries
research institutions
web archiving projects

Referenced by (1)

Full triples — surface form annotated when it differs from this entity's canonical label.

Common Crawl dataFormat WARC