UTF-8

E162096

UTF-8 is a widely used variable-length character encoding standard for Unicode that efficiently represents text in most of the world's writing systems while maintaining backward compatibility with ASCII.

Try in SPARQL Jump to: Surface forms Statements Referenced by

All labels observed (1)

Label Occurrences
UTF-8 canonical 6

Statements (51)

Predicate Object
instanceOf Unicode transformation format
character encoding
variable-length encoding
advantages no byte-order mark required
robust to resynchronization after errors
space-efficient for ASCII-heavy text
ASCIICompatibility code points U+0000 to U+007F encoded as single bytes 0x00–0x7F
backwardCompatibleWith ASCII
byteOrder byte-order independent
continuationBytePattern 10xxxxxx
defaultEncodingFor HTML5
XML (when not otherwise specified)
designedBy Ken Thompson
Rob Pike
designedFor Unicode
disallows non-shortest form encodings
surrogate code points U+D800–U+DFFF
dominantOn World Wide Web
encodesRange U+0000 to U+10FFFF
encodingType variable-length
endianness not applicable
errorDetectionProperty invalid byte sequences can be detected
excludes code points above U+10FFFF
fullName 8-bit Unicode Transformation Format
leadingBytePattern 0xxxxxxx for 1-byte sequences
110xxxxx for 2-byte sequences
1110xxxx for 3-byte sequences
11110xxx for 4-byte sequences
maxBytesPerCodePoint 4
minBytesPerCodePoint 1
originallyDescribedIn RFC 2279
preferredBy modern web standards
primaryUse general text encoding
web content encoding
relatedEncoding UTF-16
UTF-32
replaces UTF-1
securityProperty rejects overlong encodings in modern specifications
selfSynchronizing true
specifiedBy RFCs
surface form: IETF RFCs

Unicode Standard
standardizedIn RFC 3629
supports all Unicode code points
usesCodeUnitSize 1 byte
8 bits
usesPrefixEncoding true
widelyUsedIn Unix-like operating systems
databases
network protocols
programming languages and libraries
yearIntroduced 1992

How these facts were elicited

The pipeline generated the facts above by prompting gpt-5.1 with this entity's name + description and the instruction below.

Instruction
You are a knowledge base construction expert. Given a subject entity and a description of it, return factual statements that you know for the subject as a JSON list of dictionaries(triples), where keys must be "subject", "predicate" and "object". The number of facts may be very high, between 25 to 50 or more, for very popular subjects. For less popular subjects, the number of facts can be very low, like 5 or 10.

# Requirements
- If you don't know the subject at all, return an empty list.
- If the subject is not a named entity, return an empty list.
- Include at least one triple where predicate is "instanceOf".
- Do not get too wordy.
- Separate several objects into multiple triples with one object.
Input
Subject: UTF-8
Description of subject: UTF-8 is a widely used variable-length character encoding standard for Unicode that efficiently represents text in most of the world's writing systems while maintaining backward compatibility with ASCII.

Referenced by (6)

Full triples — surface form annotated when it differs from this entity's canonical label.

Ken Thompson coCreatorOf UTF-8
Ken Thompson notableWork UTF-8
Rob Pike coInventorOf UTF-8
Rob Pike notableWork UTF-8
RFC 3629 defines UTF-8
RFC 3629 standardizes UTF-8