2.4. uniseg.sentencebreak — Sentence Break

Unicode sentence boundaries.

UAX #29: Unicode Text Segmentation (Unicode 15.0.0) https://www.unicode.org/reports/tr29/tr29-41.html

uniseg.sentencebreak.SB

alias of SentenceBreak

class uniseg.sentencebreak.SentenceBreak(value)

Sentence_Break property values.

uniseg.sentencebreak.sentence_boundaries(s: str, tailor: Callable[[str, Iterator[Literal[0, 1]]], Iterator[Literal[0, 1]]] | None = None, /) Iterator[int]

Iterate indices of the sentence boundaries of s

This function yields from 0 to the end of the string (== len(s)).

>>> list(sentence_boundaries('ABC'))
[0, 3]
>>> s = 'He said, \u201cAre you going?\u201d John shook his head.'
>>> list(sentence_boundaries(s))
[0, 26, 46]
>>> list(sentence_boundaries(''))
[]
uniseg.sentencebreak.sentence_break(c: str, index: int = 0, /) SentenceBreak

Return Sentence_Break property value of c

c must be a single Unicode code point string.

>>> sentence_break('\x0d')
<SentenceBreak.CR: 'CR'>
>>> sentence_break(' ')
<SentenceBreak.SP: 'Sp'>
>>> sentence_break('a')
<SentenceBreak.LOWER: 'Lower'>

If index is specified, this function consider c as a unicode string and return Sentence_Break property of the code point at c[index].

>>> sentence_break('a\x0d', 1)
<SentenceBreak.CR: 'CR'>
uniseg.sentencebreak.sentence_breakables(s: str, /) Iterator[Literal[0, 1]]

Iterate sentence breaking opportunities for every position of s

1 for “break” and 0 for “do not break”. The length of iteration will be the same as len(s).

>>> from pprint import pprint
>>> s = 'He said, \u201cAre you going?\u201d John shook his head.'
>>> pprint(list(sentence_breakables(s)),
...        width=76, compact=True)
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
uniseg.sentencebreak.sentences(s: str, tailor: Callable[[str, Iterator[Literal[0, 1]]], Iterator[Literal[0, 1]]] | None = None, /) Iterator[str]

Iterate every sentence of s

>>> s = 'He said, \u201cAre you going?\u201d John shook his head.'
>>> list(sentences(s)) == ['He said, \u201cAre you going?\u201d ', 'John shook his head.']
True