3.4. uniseg.sentencebreak — Sentence Break
Unicode sentence boundaries.
UAX #29: Unicode Text Segmentation (Unicode 16.0.0)
- uniseg.sentencebreak.SB
alias of
SentenceBreak
- class uniseg.sentencebreak.SentenceBreak(value)
Sentence_Break property values.
- ATERM = 'ATerm'
Sentence_Break property value ATerm
- CLOSE = 'Close'
Sentence_Break property value Close
- CR = 'CR'
Sentence_Break property value CR
- EXTEND = 'Extend'
Sentence_Break property value Extend
- FORMAT = 'Format'
Sentence_Break property value Format
- LF = 'LF'
Sentence_Break property value LF
- LOWER = 'Lower'
Sentence_Break property value Lower
- NUMERIC = 'Numeric'
Sentence_Break property value Numeric
- OLETTER = 'OLetter'
Sentence_Break property value OLetter
- OTHER = 'Other'
Sentence_Break property value Other
- SCONTINUE = 'SContinue'
Sentence_Break property value SContinue
- SEP = 'Sep'
Sentence_Break property value Sep
- SP = 'Sp'
Sentence_Break property value Sp
- STERM = 'STerm'
Sentence_Break property value STerm
- UPPER = 'Upper'
Sentence_Break property value Upper
- uniseg.sentencebreak.sentence_boundaries(s: str, /, *, property: ~collections.abc.Callable[[str], ~uniseg.sentencebreak.SentenceBreak] = <function sentence_break>, tailor: ~collections.abc.Callable[[str, ~collections.abc.Iterable[~typing.Literal[0, 1]]], ~collections.abc.Iterable[~typing.Literal[0, 1]]] | None = None) Iterator[int]
Iterate indices of the sentence boundaries of s.
This function yields from 0 to the end of the string (== len(s)).
>>> list(sentence_boundaries('ABC')) [0, 3] >>> s = 'He said, “Are you going?” John shook his head.' >>> list(sentence_boundaries(s)) [0, 26, 46] >>> list(sentence_boundaries('')) []
- uniseg.sentencebreak.sentence_break(c: str, /) SentenceBreak
Return the Sentence_Break value assigned to the code point c.
c must be a single Unicode code point string.
>>> sentence_break('\r') SentenceBreak.CR >>> sentence_break(' ') SentenceBreak.SP >>> sentence_break('a') SentenceBreak.LOWER >>> sentence_break('/') SentenceBreak.OTHER
- uniseg.sentencebreak.sentence_breakables(s: str, /, *, property: ~collections.abc.Callable[[str], ~uniseg.sentencebreak.SentenceBreak] = <function sentence_break>) Iterable[Literal[0, 1]]
Iterate sentence breaking opportunities for every position of s.
1 for “break” and 0 for “do not break”. The length of iteration will be the same as
len(s).>>> from pprint import pp >>> s = 'He said, \u201cAre you going?\u201d John shook his head.' >>> pp(list(sentence_breakables(s)), width=76, compact=True) [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
- uniseg.sentencebreak.sentences(s: str, /, *, property: ~collections.abc.Callable[[str], ~uniseg.sentencebreak.SentenceBreak] = <function sentence_break>, tailor: ~collections.abc.Callable[[str, ~collections.abc.Iterable[~typing.Literal[0, 1]]], ~collections.abc.Iterable[~typing.Literal[0, 1]]] | None = None) Iterator[str]
Iterate every sentence of s.
>>> s = 'He said, “Are you going?” John shook his head.' >>> list(sentences(s)) ['He said, “Are you going?” ', 'John shook his head.']