3.5. uniseg.linebreak — Line Break

Unicode line breaking algorithm.

UAX #14: Unicode Line Breaking Algorithm (Unicode 16.0.0)

uniseg.linebreak.LB

alias of Line_Break

class uniseg.linebreak.Line_Break(value)

Line_Break property values.

AI = 'AI'

Line_Break property value AI, Ambiguous (Alphabetic or Ideographic)

AK = 'AK'

Line_Break property value AK, Aksara

AL = 'AL'

Line_Break property value AL, Alphabetic

AP = 'AP'

Line_Break property value AP, Aksara Pre-Base

AS = 'AS'

Line_Break property value AS, Aksara Start

B2 = 'B2'

Line_Break property value B2, Break Opportunity Before and After

BA = 'BA'

Line_Break property value BA, Break After

BB = 'BB'

Line_Break property value BB, Break Before

BK = 'BK'

Line_Break property value BK, Mandatory Break

CB = 'CB'

Line_Break property value CB, Contingent Break Opportunity

CJ = 'CJ'

Line_Break property value CJ, Conditional Japanese Starter

CL = 'CL'

Line_Break property value CL, Close Punctuation

CM = 'CM'

Line_Break property value CM, Combining Mark

CP = 'CP'

Line_Break property value CP, Close Parenthesis

CR = 'CR'

Line_Break property value CR, Carriage Return

EB = 'EB'

Line_Break property value EB, Emoji Base

EM = 'EM'

Line_Break property value EM, Emoji Modifier

EX = 'EX'

Line_Break property value EX, Exclamation/Interrogation

GL = 'GL'

Line_Break property value GL, Non-breaking (“Glue”)

H2 = 'H2'

Line_Break property value H2, Hangul LV Syllable

H3 = 'H3'

Line_Break property value H3, Hangul LVT Syllable

HL = 'HL'

Line_Break property value HL, Hebrew Letter

HY = 'HY'

Line_Break property value HY, Hyphen

ID = 'ID'

Line_Break property value ID, Ideographic

IN = 'IN'

Line_Break property value IN, Inseparable

IS = 'IS'

Line_Break property value IS, Infix Numeric Separator

JL = 'JL'

Line_Break property value JL, Hangul L Jamo

JT = 'JT'

Line_Break property value JT, Hangul T Jamo

JV = 'JV'

Line_Break property value JV, Hangul V Jamo

LF = 'LF'

Line_Break property value LF, Line Feed

NL = 'NL'

Line_Break property value NL, Next Line

NS = 'NS'

Line_Break property value NS, Nonstarter

NU = 'NU'

Line_Break property value NU, Numeric

OP = 'OP'

Line_Break property value OP, Open Punctuation

PO = 'PO'

Line_Break property value PO, Postfix Numeric

PR = 'PR'

Line_Break property value PR, Prefix Numeric

QU = 'QU'

Line_Break property value QU, Quotation

RI = 'RI'

Line_Break property value RI, Regional Indicator

SA = 'SA'

Line_Break property value SA, Complex Context Dependent (South East Asian)

SG = 'SG'

Line_Break property value SG, Surrogate

SP = 'SP'

Line_Break property value SP, Space

SY = 'SY'

Line_Break property value SY, Symbols Allowing Break After

VF = 'VF'

Line_Break property value VF, Virama Final

VI = 'VI'

Line_Break property value VI, Virama

WJ = 'WJ'

Line_Break property value WJ, Word Joiner

XX = 'XX'

Line_Break property value XX, Unknown

ZW = 'ZW'

Line_Break property value ZW, Zero Width Space

ZWJ = 'ZWJ'

ZLine_Break property value ZWJ, Zero Width Joiner

uniseg.linebreak.line_break(c: str, /) Line_Break

Return the Line_Break property for c.

c must be a single Unicode code point string.

>>> line_break('\x0d')
Line_Break.CR
>>> line_break(' ')
Line_Break.SP
>>> line_break('1')
Line_Break.NU
>>> line_break('\u1b44')
Line_Break.VI
uniseg.linebreak.line_break_boundaries(s: str, legacy: bool = False, tailor: Callable[[str, Iterable[Literal[0, 1]]], Iterable[Literal[0, 1]]] | None = None) Iterator[int]

Iterate indices of the line breaking boundaries of s

This function yields from 0 to the end of the string (== len(s)).

uniseg.linebreak.line_break_breakables(s: str, legacy: bool = False, /) Iterable[Literal[0, 1]]

Iterate line breaking opportunities for every position of s

1 means “break” and 0 means “do not break” BEFORE the postion. The length of iteration will be the same as len(s).

>>> list(line_break_breakables('ABC'))
[0, 0, 0]
>>> list(line_break_breakables('Hello, world.'))
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]
>>> list(line_break_breakables(''))
[]
uniseg.linebreak.line_break_units(s: str, legacy: bool = False, tailor: Callable[[str, Iterable[Literal[0, 1]]], Iterable[Literal[0, 1]]] | None = None, /) Iterator[str]

Iterate every line breaking token of s

>>> s = 'The quick (\u201cbrown\u201d) fox can\u2019t jump 32.3 feet, right?'
>>> '|'.join(line_break_units(s)) == 'The |quick |(\u201cbrown\u201d) |fox |can\u2019t |jump |32.3 |feet, |right?'
True
>>> list(line_break_units(''))
[]
>>> list(line_break_units('\u03b1\u03b1')) == ['\u03b1\u03b1']
True
>>> list(line_break_units('\u03b1\u03b1', True)) == ['\u03b1', '\u03b1']
True