UTF-8

E162096

Unicode transformation format character encoding variable-length encoding

UTF-8 is a widely used variable-length character encoding standard for Unicode that efficiently represents text in most of the world's writing systems while maintaining backward compatibility with ASCII.

Try in SPARQL Jump to: Surface forms Statements Referenced by

All labels observed (1)

Label	Occurrences
UTF-8 canonical	6

Statements (51)

Predicate	Object
instanceOf	Unicode transformation format ⓘ character encoding ⓘ variable-length encoding ⓘ
advantages	no byte-order mark required ⓘ robust to resynchronization after errors ⓘ space-efficient for ASCII-heavy text ⓘ
ASCIICompatibility	code points U+0000 to U+007F encoded as single bytes 0x00–0x7F ⓘ
backwardCompatibleWith	ASCII ⓘ
byteOrder	byte-order independent ⓘ
continuationBytePattern	10xxxxxx ⓘ
defaultEncodingFor	HTML5 ⓘ XML (when not otherwise specified) ⓘ
designedBy	Ken Thompson ⓘ Rob Pike ⓘ
designedFor	Unicode ⓘ
disallows	non-shortest form encodings ⓘ surrogate code points U+D800–U+DFFF ⓘ
dominantOn	World Wide Web ⓘ
encodesRange	U+0000 to U+10FFFF ⓘ
encodingType	variable-length ⓘ
endianness	not applicable ⓘ
errorDetectionProperty	invalid byte sequences can be detected ⓘ
excludes	code points above U+10FFFF ⓘ
fullName	8-bit Unicode Transformation Format ⓘ
leadingBytePattern	0xxxxxxx for 1-byte sequences ⓘ 110xxxxx for 2-byte sequences ⓘ 1110xxxx for 3-byte sequences ⓘ 11110xxx for 4-byte sequences ⓘ
maxBytesPerCodePoint	4 ⓘ
minBytesPerCodePoint	1 ⓘ
originallyDescribedIn	RFC 2279 ⓘ
preferredBy	modern web standards ⓘ
primaryUse	general text encoding ⓘ web content encoding ⓘ
relatedEncoding	UTF-16 ⓘ UTF-32 ⓘ
replaces	UTF-1 ⓘ
securityProperty	rejects overlong encodings in modern specifications ⓘ
selfSynchronizing	true ⓘ
specifiedBy	RFCs ⓘ surface form: IETF RFCs Unicode Standard ⓘ
standardizedIn	RFC 3629 ⓘ
supports	all Unicode code points ⓘ
usesCodeUnitSize	1 byte ⓘ 8 bits ⓘ
usesPrefixEncoding	true ⓘ
widelyUsedIn	Unix-like operating systems ⓘ databases ⓘ network protocols ⓘ programming languages and libraries ⓘ
yearIntroduced	1992 ⓘ

How these facts were elicited

The pipeline generated the facts above by prompting gpt-5.1 with this entity's name + description and the instruction below.

Instruction

You are a knowledge base construction expert. Given a subject entity and a description of it, return factual statements that you know for the subject as a JSON list of dictionaries(triples), where keys must be "subject", "predicate" and "object". The number of facts may be very high, between 25 to 50 or more, for very popular subjects. For less popular subjects, the number of facts can be very low, like 5 or 10.

# Requirements
- If you don't know the subject at all, return an empty list.
- If the subject is not a named entity, return an empty list.
- Include at least one triple where predicate is "instanceOf".
- Do not get too wordy.
- Separate several objects into multiple triples with one object.

Input

Subject: UTF-8
Description of subject: UTF-8 is a widely used variable-length character encoding standard for Unicode that efficiently represents text in most of the world's writing systems while maintaining backward compatibility with ASCII.

Referenced by (6)

Full triples — surface form annotated when it differs from this entity's canonical label.

Ken Thompson → coCreatorOf → UTF-8 ⓘ

Ken Thompson → notableWork → UTF-8 ⓘ

Rob Pike → coInventorOf → UTF-8 ⓘ

Rob Pike → notableWork → UTF-8 ⓘ

RFC 3629 → defines → UTF-8 ⓘ

RFC 3629 → standardizes → UTF-8 ⓘ