3.4. uniseg.sentencebreak — Sentence Break
Unicode sentence boundaries.
UAX #29: Unicode Text Segmentation (Unicode 16.0.0)
- uniseg.sentencebreak.SB
alias of
Sentence_Break
- class uniseg.sentencebreak.Sentence_Break(value)
Sentence_Break property values.
- ATerm = 'ATerm'
Sentence_Break property value ATerm
- CR = 'CR'
Sentence_Break property value CR
- Close = 'Close'
Sentence_Break property value Close
- Extend = 'Extend'
Sentence_Break property value Extend
- Format = 'Format'
Sentence_Break property value Format
- LF = 'LF'
Sentence_Break property value LF
- Lower = 'Lower'
Sentence_Break property value Lower
- Numeric = 'Numeric'
Sentence_Break property value Numeric
- OLetter = 'OLetter'
Sentence_Break property value OLetter
- Other = 'Other'
Sentence_Break property value Other
- SContinue = 'SContinue'
Sentence_Break property value SContinue
- STerm = 'STerm'
Sentence_Break property value STerm
- Sep = 'Sep'
Sentence_Break property value Sep
- Sp = 'Sp'
Sentence_Break property value Sp
- Upper = 'Upper'
Sentence_Break property value Upper
- uniseg.sentencebreak.sentence_boundaries(s: str, /, tailor: Callable[[str, Iterable[Literal[0, 1]]], Iterable[Literal[0, 1]]] | None = None) Iterator[int]
Iterate indices of the sentence boundaries of s.
This function yields from 0 to the end of the string (== len(s)).
>>> list(sentence_boundaries('ABC')) [0, 3] >>> s = 'He said, “Are you going?” John shook his head.' >>> list(sentence_boundaries(s)) [0, 26, 46] >>> list(sentence_boundaries('')) []
- uniseg.sentencebreak.sentence_break(c: str, /) Sentence_Break
Return Sentence_Break property value of c.
c must be a single Unicode code point string.
>>> sentence_break('\r') Sentence_Break.CR >>> sentence_break(' ') Sentence_Break.Sp >>> sentence_break('a') Sentence_Break.Lower >>> sentence_break('/') Sentence_Break.Other
- uniseg.sentencebreak.sentence_breakables(s: str, /) Iterable[Literal[0, 1]]
Iterate sentence breaking opportunities for every position of s.
1 for “break” and 0 for “do not break”. The length of iteration will be the same as
len(s).>>> from pprint import pp >>> s = 'He said, \u201cAre you going?\u201d John shook his head.' >>> pp(list(sentence_breakables(s)), width=76, compact=True) [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
- uniseg.sentencebreak.sentences(s: str, /, tailor: Callable[[str, Iterable[Literal[0, 1]]], Iterable[Literal[0, 1]]] | None = None) Iterator[str]
Iterate every sentence of s.
>>> s = 'He said, “Are you going?” John shook his head.' >>> list(sentences(s)) ['He said, “Are you going?” ', 'John shook his head.']