3.5. uniseg.linebreak — Line Break
Unicode line breaking algorithm.
UAX #14: Unicode Line Breaking Algorithm (Unicode 16.0.0)
- uniseg.linebreak.LB
alias of
Line_Break
- class uniseg.linebreak.Line_Break(value)
Line_Break property values.
- AI = 'AI'
Line_Break property value AI, Ambiguous (Alphabetic or Ideographic)
- AK = 'AK'
Line_Break property value AK, Aksara
- AL = 'AL'
Line_Break property value AL, Alphabetic
- AP = 'AP'
Line_Break property value AP, Aksara Pre-Base
- AS = 'AS'
Line_Break property value AS, Aksara Start
- B2 = 'B2'
Line_Break property value B2, Break Opportunity Before and After
- BA = 'BA'
Line_Break property value BA, Break After
- BB = 'BB'
Line_Break property value BB, Break Before
- BK = 'BK'
Line_Break property value BK, Mandatory Break
- CB = 'CB'
Line_Break property value CB, Contingent Break Opportunity
- CJ = 'CJ'
Line_Break property value CJ, Conditional Japanese Starter
- CL = 'CL'
Line_Break property value CL, Close Punctuation
- CM = 'CM'
Line_Break property value CM, Combining Mark
- CP = 'CP'
Line_Break property value CP, Close Parenthesis
- CR = 'CR'
Line_Break property value CR, Carriage Return
- EB = 'EB'
Line_Break property value EB, Emoji Base
- EM = 'EM'
Line_Break property value EM, Emoji Modifier
- EX = 'EX'
Line_Break property value EX, Exclamation/Interrogation
- GL = 'GL'
Line_Break property value GL, Non-breaking (“Glue”)
- H2 = 'H2'
Line_Break property value H2, Hangul LV Syllable
- H3 = 'H3'
Line_Break property value H3, Hangul LVT Syllable
- HL = 'HL'
Line_Break property value HL, Hebrew Letter
- HY = 'HY'
Line_Break property value HY, Hyphen
- ID = 'ID'
Line_Break property value ID, Ideographic
- IN = 'IN'
Line_Break property value IN, Inseparable
- IS = 'IS'
Line_Break property value IS, Infix Numeric Separator
- JL = 'JL'
Line_Break property value JL, Hangul L Jamo
- JT = 'JT'
Line_Break property value JT, Hangul T Jamo
- JV = 'JV'
Line_Break property value JV, Hangul V Jamo
- LF = 'LF'
Line_Break property value LF, Line Feed
- NL = 'NL'
Line_Break property value NL, Next Line
- NS = 'NS'
Line_Break property value NS, Nonstarter
- NU = 'NU'
Line_Break property value NU, Numeric
- OP = 'OP'
Line_Break property value OP, Open Punctuation
- PO = 'PO'
Line_Break property value PO, Postfix Numeric
- PR = 'PR'
Line_Break property value PR, Prefix Numeric
- QU = 'QU'
Line_Break property value QU, Quotation
- RI = 'RI'
Line_Break property value RI, Regional Indicator
- SA = 'SA'
Line_Break property value SA, Complex Context Dependent (South East Asian)
- SG = 'SG'
Line_Break property value SG, Surrogate
- SP = 'SP'
Line_Break property value SP, Space
- SY = 'SY'
Line_Break property value SY, Symbols Allowing Break After
- VF = 'VF'
Line_Break property value VF, Virama Final
- VI = 'VI'
Line_Break property value VI, Virama
- WJ = 'WJ'
Line_Break property value WJ, Word Joiner
- XX = 'XX'
Line_Break property value XX, Unknown
- ZW = 'ZW'
Line_Break property value ZW, Zero Width Space
- ZWJ = 'ZWJ'
ZLine_Break property value ZWJ, Zero Width Joiner
- uniseg.linebreak.line_break(c: str, /) Line_Break
Return the Line_Break property for c.
c must be a single Unicode code point string.
>>> line_break('\x0d') Line_Break.CR >>> line_break(' ') Line_Break.SP >>> line_break('1') Line_Break.NU >>> line_break('\u1b44') Line_Break.VI
- uniseg.linebreak.line_break_boundaries(s: str, legacy: bool = False, tailor: Callable[[str, Iterable[Literal[0, 1]]], Iterable[Literal[0, 1]]] | None = None) Iterator[int]
Iterate indices of the line breaking boundaries of s
This function yields from 0 to the end of the string (== len(s)).
- uniseg.linebreak.line_break_breakables(s: str, legacy: bool = False, /) Iterable[Literal[0, 1]]
Iterate line breaking opportunities for every position of s
1 means “break” and 0 means “do not break” BEFORE the postion. The length of iteration will be the same as
len(s).>>> list(line_break_breakables('ABC')) [0, 0, 0] >>> list(line_break_breakables('Hello, world.')) [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0] >>> list(line_break_breakables('')) []
- uniseg.linebreak.line_break_units(s: str, legacy: bool = False, tailor: Callable[[str, Iterable[Literal[0, 1]]], Iterable[Literal[0, 1]]] | None = None, /) Iterator[str]
Iterate every line breaking token of s
>>> s = 'The quick (\u201cbrown\u201d) fox can\u2019t jump 32.3 feet, right?' >>> '|'.join(line_break_units(s)) == 'The |quick |(\u201cbrown\u201d) |fox |can\u2019t |jump |32.3 |feet, |right?' True >>> list(line_break_units('')) []
>>> list(line_break_units('\u03b1\u03b1')) == ['\u03b1\u03b1'] True >>> list(line_break_units('\u03b1\u03b1', True)) == ['\u03b1', '\u03b1'] True