2.6. uniseg.wrap — Text Wrapping

Unicode-aware text wrapping.

class uniseg.wrap.Formatter

The abstruct base class for formatters invoked by a Wrapper object

This class is implemented only for convinience sake and does nothing itself. You don’t have to design your own formatter as a subclass of it, while it is not deprecated either.

Your formatters should have the methods and properties this class has. They are invoked by a Wrapper object to determin logical widths of texts and to give you the ways to handle them, such as to render them.

handle_new_line() None

The handler method which is invoked when the current line is over and a new line begins

handle_text(text: str, extents: List[int]) None

The handler method which is invoked when text should be put on the current position with extents.

reset() None

Reset all states of the formatter.

property tab_width: int

The logical width of tab forwarding.

This property value is used by a Wrapper object to determin the actual forwarding extents of tabs in each of the positions.

text_extents(s: str, /) List[int]

Return a list of logical lengths from start of the string to each of characters in s.

property wrap_width: int | None

The logical width of text wrapping.

Note that returning None (which is the default) means “do not wrap” while returning 0 means “wrap as narrowly as possible.”

class uniseg.wrap.TTFormatter(*, wrap_width: int, tab_width: int = 8, tab_char: str = ' ', ambiguous_as_wide: bool = False)

A Fixed-width text wrapping formatter.

property ambiguous_as_wide: bool

Treat code points with its East_Easian_Width property is ‘A’ as those with ‘W’; having double width as alpha-numerics

handle_new_line() None

The handler which is invoked when the current line is over and a new line begins

handle_text(text: str, extents: Sequence[int], /) None

The handler which is invoked when a text should be put on the current position

lines() Iterator[str]

Iterate every wrapped line strings

reset() None

Reset all states of the formatter

property tab_char: str

Character to fill tab spaces with

property tab_width: int

forwarding size of tabs

text_extents(s: str, /) List[int]

Return a list of logical lengths from start of the string to each of characters in s

property wrap_width: int

Wrapping width

class uniseg.wrap.Wrapper

Text wrapping engine.

Usually, you don’t need to create an instance of the class directly. Use wrap() instead.

wrap(formatter: Formatter, s: str, cur: int = 0, offset: int = 0, *, char_wrap: bool = False) int

Wrap string s with formatter and invoke its handlers

The optional arguments, cur is the starting position of the string in logical length, and offset means left-side offset of the wrapping area in logical length — this parameter is only used for calculating tab-stopping positions for now.

If char_wrap is set to True, the text will be warpped with its grapheme cluster boundaries instead of its line break boundaries. This may be helpful when you don’t want the word wrapping feature in your application.

This function returns the total count of wrapped lines.

uniseg.wrap.tt_text_extents(s: str, *, ambiguous_as_wide: bool = False) List[int]

Return a list of logical widths from the start of s to each of characters (not of code points) on fixed-width typography

>>> tt_text_extents('')
[]
>>> tt_text_extents('abc')
[1, 2, 3]
>>> tt_text_extents('\u3042\u3044\u3046')
[2, 4, 6]
>>> import sys
>>> s = '\U00029e3d'   # test a code point out of BMP
>>> actual = tt_text_extents(s)
>>> expect = [2] if sys.maxunicode > 0xffff else [2, 2]
>>> len(s) == len(expect)
True
>>> actual == expect
True

The meaning of ambiguous_as_wide is the same as that of tt_width().

uniseg.wrap.tt_width(s: str, index: int = 0, ambiguous_as_wide: bool = False) Literal[1, 2]

Return logical width of the grapheme cluster at s[index] on fixed-width typography

Return value will be 1 (halfwidth) or 2 (fullwidth).

Generally, the width of a grapheme cluster is determined by its leading code point.

>>> tt_width('A')
1
>>> tt_width('\u8240')     # U+8240: CJK UNIFIED IDEOGRAPH-8240
2
>>> tt_width('g\u0308')    # U+0308: COMBINING DIAERESIS
1
>>> tt_width('\U00029e3d') # U+29E3D: CJK UNIFIED IDEOGRAPH-29E3D
2

If ambiguous_as_wide is specified to True, some characters such as greek alphabets are treated as they have fullwidth as well as ideographics does.

>>> tt_width('\u03b1')     # U+03B1: GREEK SMALL LETTER ALPHA
1
>>> tt_width('\u03b1', ambiguous_as_wide=True)
2
uniseg.wrap.tt_wrap(s: str, wrap_width: int, /, *, tab_width: int = 8, tab_char: str = ' ', ambiguous_as_wide: bool = False, cur: int = 0, offset: int = 0, char_wrap: bool = False) Iterator[str]

Wrap s with given parameters and return a list of wrapped lines

See TTFormatter for wrap_width, tab_width and tab_char, and tt_wrap() for cur, offset and char_wrap.

uniseg.wrap.wrap(formatter: Formatter, s: str, cur: int = 0, offset: int = 0, *, char_wrap: bool = False) int

Wrap string s with formatter using the module’s static Wrapper instance

See Wrapper.wrap() for further details of the parameters.

  • Changed in version 0.7.1: It returns the count of lines now.