Unicode Technical Report #29

E26869

Unicode Technical Report #29 is the specification that defines how to determine and segment user-perceived text elements (grapheme clusters), words, and sentences in Unicode text.


Statements (48)
Predicate Object
instanceOf Unicode technical report
text segmentation specification
allows locale-specific tailoring of segmentation rules
appliesTo Unicode text
all Unicode scripts
compatibleWith Unicode Character Database
defines Unicode text segmentation rules
grapheme cluster boundaries
line break interaction with segmentation
sentence boundaries
word boundaries
definesProperty Grapheme_Cluster_Break
Line_Break interaction with segmentation
Sentence_Break
Word_Break
hasAbbreviation UAX #29
UTR #29
hasAlternateTitle Unicode Standard Annex #29
hasScope locale-independent default segmentation
user-perceived text elements
hasStatus Unicode Standard Annex
hasTitle Unicode Text Segmentation
intendedFor implementers of regular expression engines
implementers of search and indexing systems
implementers of text editors
implementers of text rendering systems
implementers of word processors
partOf Unicode Standard
provides algorithmic rules for grapheme cluster segmentation
algorithmic rules for sentence segmentation
algorithmic rules for word segmentation
publishedBy Unicode Consortium
references Unicode Standard Annex #14
Unicode Standard Annex #15
specifies default grapheme cluster boundaries
default sentence boundary rules
default word boundary rules
tailorable segmentation rules
updatedWithEachVersionOf Unicode Standard
usedBy operating systems
programming language libraries
search engines
text layout engines
web browsers
usesConcept default sentence boundaries
default word boundaries
extended grapheme cluster
legacy grapheme cluster

Referenced by (5)
Subject (surface form when different) Predicate
Unicode Technical Report #29 ("Unicode text segmentation rules")
defines
Unicode Technical Report #29 ("UAX #29")
hasAbbreviation
Unicode
hasTechnicalReport
Unicode Technical Report #29 ("Unicode Text Segmentation")
hasTitle
Unicode Consortium ("Unicode Technical Reports")
maintainsStandard

Please wait…