2.4. uniseg.sentencebreak
— Sentence Break
Unicode sentence boundaries.
UAX #29: Unicode Text Segmentation (Unicode 15.0.0) https://www.unicode.org/reports/tr29/tr29-41.html
- uniseg.sentencebreak.SB
alias of
SentenceBreak
- class uniseg.sentencebreak.SentenceBreak(value)
Sentence_Break property values.
- uniseg.sentencebreak.sentence_boundaries(s: str, tailor: Callable[[str, Iterator[Literal[0, 1]]], Iterator[Literal[0, 1]]] | None = None, /) Iterator[int]
Iterate indices of the sentence boundaries of s
This function yields from 0 to the end of the string (== len(s)).
>>> list(sentence_boundaries('ABC')) [0, 3] >>> s = 'He said, \u201cAre you going?\u201d John shook his head.' >>> list(sentence_boundaries(s)) [0, 26, 46] >>> list(sentence_boundaries('')) []
- uniseg.sentencebreak.sentence_break(c: str, index: int = 0, /) SentenceBreak
Return Sentence_Break property value of c
c must be a single Unicode code point string.
>>> sentence_break('\x0d') <SentenceBreak.CR: 'CR'> >>> sentence_break(' ') <SentenceBreak.SP: 'Sp'> >>> sentence_break('a') <SentenceBreak.LOWER: 'Lower'>
If index is specified, this function consider c as a unicode string and return Sentence_Break property of the code point at c[index].
>>> sentence_break('a\x0d', 1) <SentenceBreak.CR: 'CR'>
- uniseg.sentencebreak.sentence_breakables(s: str, /) Iterator[Literal[0, 1]]
Iterate sentence breaking opportunities for every position of s
1 for “break” and 0 for “do not break”. The length of iteration will be the same as
len(s)
.>>> from pprint import pprint >>> s = 'He said, \u201cAre you going?\u201d John shook his head.' >>> pprint(list(sentence_breakables(s)), ... width=76, compact=True) [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
- uniseg.sentencebreak.sentences(s: str, tailor: Callable[[str, Iterator[Literal[0, 1]]], Iterator[Literal[0, 1]]] | None = None, /) Iterator[str]
Iterate every sentence of s
>>> s = 'He said, \u201cAre you going?\u201d John shook his head.' >>> list(sentences(s)) == ['He said, \u201cAre you going?\u201d ', 'John shook his head.'] True