3.4. uniseg.sentencebreak — Sentence Break

Unicode sentence boundaries.

UAX #29: Unicode Text Segmentation (Unicode 16.0.0)

uniseg.sentencebreak.SB

alias of SentenceBreak

class uniseg.sentencebreak.SentenceBreak(value)

Sentence_Break property values.

ATERM = 'ATerm'

Sentence_Break property value ATerm

CLOSE = 'Close'

Sentence_Break property value Close

CR = 'CR'

Sentence_Break property value CR

EXTEND = 'Extend'

Sentence_Break property value Extend

FORMAT = 'Format'

Sentence_Break property value Format

LF = 'LF'

Sentence_Break property value LF

LOWER = 'Lower'

Sentence_Break property value Lower

NUMERIC = 'Numeric'

Sentence_Break property value Numeric

OLETTER = 'OLetter'

Sentence_Break property value OLetter

OTHER = 'Other'

Sentence_Break property value Other

SCONTINUE = 'SContinue'

Sentence_Break property value SContinue

SEP = 'Sep'

Sentence_Break property value Sep

SP = 'Sp'

Sentence_Break property value Sp

STERM = 'STerm'

Sentence_Break property value STerm

UPPER = 'Upper'

Sentence_Break property value Upper

uniseg.sentencebreak.sentence_boundaries(s: str, /, *, property: ~collections.abc.Callable[[str], ~uniseg.sentencebreak.SentenceBreak] = <function sentence_break>, tailor: ~collections.abc.Callable[[str, ~collections.abc.Iterable[~typing.Literal[0, 1]]], ~collections.abc.Iterable[~typing.Literal[0, 1]]] | None = None) Iterator[int]

Iterate indices of the sentence boundaries of s.

This function yields from 0 to the end of the string (== len(s)).

>>> list(sentence_boundaries('ABC'))
[0, 3]
>>> s = 'He said, “Are you going?” John shook his head.'
>>> list(sentence_boundaries(s))
[0, 26, 46]
>>> list(sentence_boundaries(''))
[]
uniseg.sentencebreak.sentence_break(c: str, /) SentenceBreak

Return the Sentence_Break value assigned to the code point c.

c must be a single Unicode code point string.

>>> sentence_break('\r')
SentenceBreak.CR
>>> sentence_break(' ')
SentenceBreak.SP
>>> sentence_break('a')
SentenceBreak.LOWER
>>> sentence_break('/')
SentenceBreak.OTHER
uniseg.sentencebreak.sentence_breakables(s: str, /, *, property: ~collections.abc.Callable[[str], ~uniseg.sentencebreak.SentenceBreak] = <function sentence_break>) Iterable[Literal[0, 1]]

Iterate sentence breaking opportunities for every position of s.

1 for “break” and 0 for “do not break”. The length of iteration will be the same as len(s).

>>> from pprint import pp
>>> s = 'He said, \u201cAre you going?\u201d John shook his head.'
>>> pp(list(sentence_breakables(s)), width=76, compact=True)
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
uniseg.sentencebreak.sentences(s: str, /, *, property: ~collections.abc.Callable[[str], ~uniseg.sentencebreak.SentenceBreak] = <function sentence_break>, tailor: ~collections.abc.Callable[[str, ~collections.abc.Iterable[~typing.Literal[0, 1]]], ~collections.abc.Iterable[~typing.Literal[0, 1]]] | None = None) Iterator[str]

Iterate every sentence of s.

>>> s = 'He said, “Are you going?” John shook his head.'
>>> list(sentences(s))
['He said, “Are you going?” ', 'John shook his head.']