UTF-8
E162096
UTF-8 is a widely used variable-length character encoding standard for Unicode that efficiently represents text in most of the world's writing systems while maintaining backward compatibility with ASCII.
All labels observed (1)
| Label | Occurrences |
|---|---|
| UTF-8 canonical | 6 |
Statements (51)
| Predicate | Object |
|---|---|
| instanceOf |
Unicode transformation format
ⓘ
character encoding ⓘ variable-length encoding ⓘ |
| advantages |
no byte-order mark required
ⓘ
robust to resynchronization after errors ⓘ space-efficient for ASCII-heavy text ⓘ |
| ASCIICompatibility | code points U+0000 to U+007F encoded as single bytes 0x00–0x7F ⓘ |
| backwardCompatibleWith | ASCII ⓘ |
| byteOrder | byte-order independent ⓘ |
| continuationBytePattern | 10xxxxxx ⓘ |
| defaultEncodingFor |
HTML5
ⓘ
XML (when not otherwise specified) ⓘ |
| designedBy |
Ken Thompson
ⓘ
Rob Pike ⓘ |
| designedFor | Unicode ⓘ |
| disallows |
non-shortest form encodings
ⓘ
surrogate code points U+D800–U+DFFF ⓘ |
| dominantOn | World Wide Web ⓘ |
| encodesRange | U+0000 to U+10FFFF ⓘ |
| encodingType | variable-length ⓘ |
| endianness | not applicable ⓘ |
| errorDetectionProperty | invalid byte sequences can be detected ⓘ |
| excludes | code points above U+10FFFF ⓘ |
| fullName | 8-bit Unicode Transformation Format ⓘ |
| leadingBytePattern |
0xxxxxxx for 1-byte sequences
ⓘ
110xxxxx for 2-byte sequences ⓘ 1110xxxx for 3-byte sequences ⓘ 11110xxx for 4-byte sequences ⓘ |
| maxBytesPerCodePoint | 4 ⓘ |
| minBytesPerCodePoint | 1 ⓘ |
| originallyDescribedIn | RFC 2279 ⓘ |
| preferredBy | modern web standards ⓘ |
| primaryUse |
general text encoding
ⓘ
web content encoding ⓘ |
| relatedEncoding |
UTF-16
ⓘ
UTF-32 ⓘ |
| replaces | UTF-1 ⓘ |
| securityProperty | rejects overlong encodings in modern specifications ⓘ |
| selfSynchronizing | true ⓘ |
| specifiedBy |
RFCs
ⓘ
surface form:
IETF RFCs
Unicode Standard ⓘ |
| standardizedIn | RFC 3629 ⓘ |
| supports | all Unicode code points ⓘ |
| usesCodeUnitSize |
1 byte
ⓘ
8 bits ⓘ |
| usesPrefixEncoding | true ⓘ |
| widelyUsedIn |
Unix-like operating systems
ⓘ
databases ⓘ network protocols ⓘ programming languages and libraries ⓘ |
| yearIntroduced | 1992 ⓘ |
How these facts were elicited
The pipeline generated the facts above by prompting gpt-5.1 with this entity's name + description and the instruction below.
Instruction
You are a knowledge base construction expert. Given a subject entity and a description of it, return factual statements that you know for the subject as a JSON list of dictionaries(triples), where keys must be "subject", "predicate" and "object". The number of facts may be very high, between 25 to 50 or more, for very popular subjects. For less popular subjects, the number of facts can be very low, like 5 or 10. # Requirements - If you don't know the subject at all, return an empty list. - If the subject is not a named entity, return an empty list. - Include at least one triple where predicate is "instanceOf". - Do not get too wordy. - Separate several objects into multiple triples with one object.
Input
Subject: UTF-8 Description of subject: UTF-8 is a widely used variable-length character encoding standard for Unicode that efficiently represents text in most of the world's writing systems while maintaining backward compatibility with ASCII.
Referenced by (6)
Full triples — surface form annotated when it differs from this entity's canonical label.