2.5. uniseg.linebreak
— Line Break
Unicode line breaking algorithm.
UAX #14: Unicode Line Breaking Algorithm (Unicode 15.0.0) https://www.unicode.org/reports/tr14/tr14-49.html
- uniseg.linebreak.line_break(c: str, index: int = 0, /) str
Return the Line_Break property of c
c must be a single Unicode code point string.
>>> print(line_break('\x0d')) CR >>> print(line_break(' ')) SP >>> print(line_break('1')) NU
If index is specified, this function consider c as a unicode string and return Line_Break property of the code point at c[index].
>>> print(line_break(u'a\x0d', 1)) CR
- uniseg.linebreak.line_break_boundaries(s: str, legacy: bool = False, tailor: Callable[[str, Iterator[Literal[0, 1]]], Iterator[Literal[0, 1]]] | None = None) Iterator[int]
Iterate indices of the line breaking boundaries of s
This function yields from 0 to the end of the string (== len(s)).
- uniseg.linebreak.line_break_breakables(s: str, legacy: bool = False, /) Iterator[Literal[0, 1]]
Iterate line breaking opportunities for every position of s
1 means “break” and 0 means “do not break” BEFORE the postion. The length of iteration will be the same as
len(s)
.>>> list(line_break_breakables('ABC')) [0, 0, 0] >>> list(line_break_breakables('Hello, world.')) [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0] >>> list(line_break_breakables(u'')) []
- uniseg.linebreak.line_break_units(s: str, legacy: bool = False, tailor: Callable[[str, Iterator[Literal[0, 1]]], Iterator[Literal[0, 1]]] | None = None, /) Iterator[str]
Iterate every line breaking token of s
>>> s = 'The quick (\u201cbrown\u201d) fox can\u2019t jump 32.3 feet, right?' >>> '|'.join(line_break_units(s)) == 'The |quick |(\u201cbrown\u201d) |fox |can\u2019t |jump |32.3 |feet, |right?' True >>> list(line_break_units(u'')) []
>>> list(line_break_units('\u03b1\u03b1')) == [u'\u03b1\u03b1'] True >>> list(line_break_units(u'\u03b1\u03b1', True)) == [u'\u03b1', u'\u03b1'] True