3.4. `uniseg.sentencebreak` — Sentence Break

Unicode sentence boundaries.

UAX #29: Unicode Text Segmentation (Unicode 16.0.0)

uniseg.sentencebreak.SB: alias of Sentence_Break

class uniseg.sentencebreak.Sentence_Break(value)

Sentence_Break property values.

ATerm = 'ATerm': Sentence_Break property value ATerm

CR = 'CR': Sentence_Break property value CR

Close = 'Close': Sentence_Break property value Close

Extend = 'Extend': Sentence_Break property value Extend

Format = 'Format': Sentence_Break property value Format

LF = 'LF': Sentence_Break property value LF

Lower = 'Lower': Sentence_Break property value Lower

Numeric = 'Numeric': Sentence_Break property value Numeric

OLetter = 'OLetter': Sentence_Break property value OLetter

Other = 'Other': Sentence_Break property value Other

SContinue = 'SContinue': Sentence_Break property value SContinue

STerm = 'STerm': Sentence_Break property value STerm

Sep = 'Sep': Sentence_Break property value Sep

Sp = 'Sp': Sentence_Break property value Sp

Upper = 'Upper': Sentence_Break property value Upper

uniseg.sentencebreak.sentence_boundaries(s: str, tailor: Callable[[str, Iterable[Literal[0, 1]]], Iterable[Literal[0, 1]]] | None = None, /) → Iterator[int]

Iterate indices of the sentence boundaries of s.

This function yields from 0 to the end of the string (== len(s)).

>>> list(sentence_boundaries('ABC'))
[0, 3]
>>> s = 'He said, \u201cAre you going?\u201d John shook his head.'
>>> list(sentence_boundaries(s))
[0, 26, 46]
>>> list(sentence_boundaries(''))
[]

uniseg.sentencebreak.sentence_break(c: str, /) → Sentence_Break

Return Sentence_Break property value of c.

c must be a single Unicode code point string.

>>> sentence_break('\x0d')
Sentence_Break.CR
>>> sentence_break(' ')
Sentence_Break.Sp
>>> sentence_break('a')
Sentence_Break.Lower

>>> sentence_break('/')
Sentence_Break.Other

uniseg.sentencebreak.sentence_breakables(s: str, /) → Iterable[Literal[0, 1]]

Iterate sentence breaking opportunities for every position of s.

1 for “break” and 0 for “do not break”. The length of iteration will be the same as len(s).

>>> from pprint import pprint
>>> s = 'He said, \u201cAre you going?\u201d John shook his head.'
>>> pprint(list(sentence_breakables(s)), width=76, compact=True)
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

uniseg.sentencebreak.sentences(s: str, tailor: Callable[[str, Iterable[Literal[0, 1]]], Iterable[Literal[0, 1]]] | None = None, /) → Iterator[str]

Iterate every sentence of s.

>>> s = 'He said, \u201cAre you going?\u201d John shook his head.'
>>> list(sentences(s)) == ['He said, \u201cAre you going?\u201d ', 'John shook his head.']
True

3.4. uniseg.sentencebreak — Sentence Break

3.4. `uniseg.sentencebreak` — Sentence Break