3.5. uniseg.linebreak — Line Break
Unicode line breaking algorithm.
UAX #14: Unicode Line Breaking Algorithm (Unicode 16.0.0)
- uniseg.linebreak.LB
alias of
Line_Break
- class uniseg.linebreak.Line_Break(value)
Line_Break property values.
- AI = 'AI'
Line_Break property value AI, Ambiguous (Alphabetic or Ideographic)
- AK = 'AK'
Line_Break property value AK, Aksara
- AL = 'AL'
Line_Break property value AL, Alphabetic
- AP = 'AP'
Line_Break property value AP, Aksara Pre-Base
- AS = 'AS'
Line_Break property value AS, Aksara Start
- B2 = 'B2'
Line_Break property value B2, Break Opportunity Before and After
- BA = 'BA'
Line_Break property value BA, Break After
- BB = 'BB'
Line_Break property value BB, Break Before
- BK = 'BK'
Line_Break property value BK, Mandatory Break
- CB = 'CB'
Line_Break property value CB, Contingent Break Opportunity
- CJ = 'CJ'
Line_Break property value CJ, Conditional Japanese Starter
- CL = 'CL'
Line_Break property value CL, Close Punctuation
- CM = 'CM'
Line_Break property value CM, Combining Mark
- CP = 'CP'
Line_Break property value CP, Close Parenthesis
- CR = 'CR'
Line_Break property value CR, Carriage Return
- EB = 'EB'
Line_Break property value EB, Emoji Base
- EM = 'EM'
Line_Break property value EM, Emoji Modifier
- EX = 'EX'
Line_Break property value EX, Exclamation/Interrogation
- GL = 'GL'
Line_Break property value GL, Non-breaking (“Glue”)
- H2 = 'H2'
Line_Break property value H2, Hangul LV Syllable
- H3 = 'H3'
Line_Break property value H3, Hangul LVT Syllable
- HL = 'HL'
Line_Break property value HL, Hebrew Letter
- HY = 'HY'
Line_Break property value HY, Hyphen
- ID = 'ID'
Line_Break property value ID, Ideographic
- IN = 'IN'
Line_Break property value IN, Inseparable
- IS = 'IS'
Line_Break property value IS, Infix Numeric Separator
- JL = 'JL'
Line_Break property value JL, Hangul L Jamo
- JT = 'JT'
Line_Break property value JT, Hangul T Jamo
- JV = 'JV'
Line_Break property value JV, Hangul V Jamo
- LF = 'LF'
Line_Break property value LF, Line Feed
- NL = 'NL'
Line_Break property value NL, Next Line
- NS = 'NS'
Line_Break property value NS, Nonstarter
- NU = 'NU'
Line_Break property value NU, Numeric
- OP = 'OP'
Line_Break property value OP, Open Punctuation
- PO = 'PO'
Line_Break property value PO, Postfix Numeric
- PR = 'PR'
Line_Break property value PR, Prefix Numeric
- QU = 'QU'
Line_Break property value QU, Quotation
- RI = 'RI'
Line_Break property value RI, Regional Indicator
- SA = 'SA'
Line_Break property value SA, Complex Context Dependent (South East Asian)
- SG = 'SG'
Line_Break property value SG, Surrogate
- SP = 'SP'
Line_Break property value SP, Space
- SY = 'SY'
Line_Break property value SY, Symbols Allowing Break After
- VF = 'VF'
Line_Break property value VF, Virama Final
- VI = 'VI'
Line_Break property value VI, Virama
- WJ = 'WJ'
Line_Break property value WJ, Word Joiner
- XX = 'XX'
Line_Break property value XX, Unknown
- ZW = 'ZW'
Line_Break property value ZW, Zero Width Space
- ZWJ = 'ZWJ'
ZLine_Break property value ZWJ, Zero Width Joiner
- uniseg.linebreak.line_break(c: str, /) Line_Break
Return the Line_Break property for c.
c must be a single Unicode code point string.
>>> line_break('\r') Line_Break.CR >>> line_break(' ') Line_Break.SP >>> line_break('1') Line_Break.NU >>> line_break('᭄') # (== '\u1b44') Line_Break.VI
- uniseg.linebreak.line_break_boundaries(s: str, /, legacy: bool = False, tailor: Callable[[str, Iterable[Literal[0, 1]]], Iterable[Literal[0, 1]]] | None = None) Iterator[int]
Iterate indices of the line breaking boundaries for s.
This function iterates values from 0, which is the start of the string, to the end boundary of the string which its value is
len(s).>>> list(line_break_boundaries('a')) [1] >>> list(line_break_boundaries('a b')) [2, 3] >>> list(line_break_boundaries('a b\n')) [2, 4] >>> list(line_break_boundaries('あい、うえ、お。')) [1, 3, 4, 6, 8]
The length of the returned list means the count of the line break units for the string.
- uniseg.linebreak.line_break_breakables(s: str, /, legacy: bool = False) Iterable[Literal[0, 1]]
Iterate line breaking opportunities for every position of s
1 means “break” and 0 means “do not break” BEFORE the postion. The length of iteration will be the same as
len(s).>>> list(line_break_breakables('ABC')) [0, 0, 0] >>> list(line_break_breakables('Hello, world.')) [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0] >>> list(line_break_breakables('')) []
- uniseg.linebreak.line_break_units(s: str, /, legacy: bool = False, tailor: Callable[[str, Iterable[Literal[0, 1]]], Iterable[Literal[0, 1]]] | None = None) Iterator[str]
Iterate every line breaking token of s
>>> s = 'The quick (“brown”) fox can’t jump 32.3 feet, right?' >>> '|'.join(line_break_units(s)) 'The |quick |(“brown”) |fox |can’t |jump |32.3 |feet, |right?' >>> list(line_break_units('')) []
>>> list(line_break_units('αα')) ['αα'] >>> list(line_break_units('αα', True)) ['α', 'α']