gh-131507: Refactor screen and cursor position calculations #131547

ambv · 2025-03-21T15:07:30Z

This is based off #131509.

I changed a bunch of variable names to make things more obvious in calc_screen() and pos2xy(). Plus very mild refactoring once the better names make it obvious which things are the same. It is helpful to me at least.

The one actual change here is moving disp_str() to _pyrepl.utils because this is where syntax highlighting will live as well. With that move comes a slight performance optimization that will become functionally important later: disp_str() no longer repacks the list of characters into a string (that is later only iterated on anyway in calc_screen()). Additionally, our version of disp_str() never had the behavior presented in the docstring, so I replaced it with something more sensible.

That's about it. Tests prove no functional change.

Issue: Syntax highlighting in PyREPL #131507

This is based off python#131509.

pablogsal · 2025-03-21T16:12:04Z

Lib/_pyrepl/reader.py

-            offset = len(character_widths) - character_widths.count(0)
-            in_wrapped_line = prompt_len + sum(character_widths) >= self.console.width
+            prompt_len, char_widths = self.screeninfo[i]
+            offset = len(char_widths)


Isn't this breaking 0-width characters (like combining marks)?

No, this is fixing 0-width characters.

Edit: to explain more clearly. The code I removed looks like it was meant to help with 0-width characters, but the code was actually the wrong way round. The reason why was that originally the "0-width" was historically used in pyrepl to show characters like ^C as two-wide by returning [1, 0] from disp_str(). You can still see that in the docstring I edited. But either we made a change to undo that when originally merging pyrepl, or they changed it over the course of history so it didn't work anyway. So those 0-counts weren't doing anything productive anyway, but were only adding to our confusion. As proof, all zero-width characters are currently printed out as part of the Other unicodedata group:

To be clear, the screenshot represents the state before this PR, and this PR is not affecting it in any way.

The way I discovered all this was that I originally wanted disp_str() to emit syntax highlighting as separate zero-width characters and then it turned out that the removal of the .count(0) was necessary for the cursor positioning to work correctly. And even that was insufficient, since having zero-width characters in the stream was causing wrapping on narrow terminals to fall in infinite loops, there's an open issue about this (#126685).

This PR fixes #126685.

pablogsal · 2025-03-21T16:14:34Z

Lib/_pyrepl/reader.py

@@ -560,29 +530,33 @@ def setpos_from_xy(self, x: int, y: int) -> None:

    def pos2xy(self) -> tuple[int, int]:
        """Return the x, y coordinates of position 'pos'."""
-        # this *is* incomprehensible, yes.


pablogsal · 2025-03-21T16:16:00Z

Lib/_pyrepl/utils.py

+def disp_str(buffer: str) -> tuple[CharBuffer, CharWidths]:
+    r"""Decompose the input buffer into a printable variant.
+
+    Returns a tuple of two lists:
+    - the first list is the input buffer, character by character;
+    - the second list is the visible width of each character in the input
+      buffer.
+
+    Examples:
+    >>> utils.disp_str("a = 9")
+    (['a', ' ', '=', ' ', '9'], [1, 1, 1, 1, 1])
+    """
+    chars: CharBuffer = []
+    char_widths: CharWidths = []
+
+    if not buffer:
+        return chars, char_widths
+
+    for c in buffer:
+        if c == "\x1a":  # CTRL-Z on Windows
+            chars.append(c)
+            char_widths.append(2)
+        elif ord(c) < 128:
+            chars.append(c)
+            char_widths.append(1)
+        elif unicodedata.category(c).startswith("C"):
+            c = r"\u%04x" % ord(c)
+            chars.append(c)
+            char_widths.append(len(c))
+        else:
+            chars.append(c)
+            char_widths.append(str_width(c))
+    trace("disp_str({buffer}) = {s}, {b}", buffer=repr(buffer), s=chars, b=char_widths)
+    return chars, char_widths


Check if you like something of this version with some improvements:

Suggested change

def disp_str(buffer: str) -> tuple[CharBuffer, CharWidths]:

r"""Decompose the input buffer into a printable variant.

Returns a tuple of two lists:

- the first list is the input buffer, character by character;

- the second list is the visible width of each character in the input

buffer.

Examples:

>>> utils.disp_str("a = 9")

(['a', ' ', '=', ' ', '9'], [1, 1, 1, 1, 1])

"""

chars: CharBuffer = []

char_widths: CharWidths = []

if not buffer:

return chars, char_widths

for c in buffer:

if c == "\x1a": # CTRL-Z on Windows

chars.append(c)

char_widths.append(2)

elif ord(c) < 128:

chars.append(c)

char_widths.append(1)

elif unicodedata.category(c).startswith("C"):

c = r"\u%04x" % ord(c)

chars.append(c)

char_widths.append(len(c))

else:

chars.append(c)

char_widths.append(str_width(c))

trace("disp_str({buffer}) = {s}, {b}", buffer=repr(buffer), s=chars, b=char_widths)

return chars, char_widths

CharBuffer = List[str]

CharWidths = List[int]

CTRL_Z = "\x1a"

ASCII_MAX = 127

CTRL_Z_WIDTH = 2

ASCII_WIDTH = 1

@lru_cache(maxsize=1024)

def cached_category(c: str) -> str:

"""Cache unicodedata.category calls for better performance."""

return unicodedata.category(c)

def disp_str(buffer: str) -> Tuple[CharBuffer, CharWidths]:

r"""Decompose the input buffer into a printable variant.

Returns a tuple of two lists:

- the first list is the input buffer, character by character;

- the second list is the visible width of each character in the input buffer.

Examples:

>>> utils.disp_str("a = 9")

(['a', ' ', '=', ' ', '9'], [1, 1, 1, 1, 1])

"""

# Fast path for empty buffer

if not buffer:

return [], []

# Optimization for common case: pure ASCII text

if all(ord(c) < ASCII_MAX for c in buffer):

chars = list(buffer)

char_widths = [ASCII_WIDTH] * len(buffer)

trace("disp_str({buffer}) = {s}, {b}", buffer=repr(buffer), s=chars, b=char_widths)

return chars, char_widths

chars: CharBuffer = []

char_widths: CharWidths = []

# Pre-allocate lists for better performance

chars = [None] * len(buffer)

char_widths = [0] * len(buffer)

for i, c in enumerate(buffer):

if c == CTRL_Z: # CTRL-Z on Windows

chars[i] = c

char_widths[i] = CTRL_Z_WIDTH

elif ord(c) < ASCII_MAX:

chars[i] = c

char_widths[i] = ASCII_WIDTH

elif cached_category(c).startswith("C"):

unicode_repr = f"\\u{ord(c):04x}"

chars[i] = unicode_repr

char_widths[i] = len(unicode_repr)

else:

chars[i] = c

try:

char_widths[i] = str_width(c)

except Exception:

# Fallback if str_width fails

char_widths[i] = 1

trace("disp_str({buffer}) = {s}, {b}", buffer=repr(buffer), s=chars, b=char_widths)

return chars, char_widths

I like caching character categories, will add this to the third PR in the sequence. The other optimization you added would have to be removed anyway when adding color codes.

pablogsal · 2025-03-21T16:17:00Z

Lib/_pyrepl/reader.py

+            if not in_wrapped_line:
+                offset += 1  # there's a newline in buffer
+
+            pos -= offset


I don't see why these two are equivalent :(

This is equivalent, but it's subtle. Note that even in the old version, the if test in line 580 is the same as what's under the variable in_wrapped_line.

In the new version, we're reusing the offset variable that already had -1 applied if in_wrapped_line. So we only need to handle the newline case, which we do.

There's less computation in the new version, so it reads better to me. It almost could be shortened further, but the break check is awkwardly in the middle there, so I kept it for equivalence.

pablogsal

LGTM with some comments

miss-islington-app · 2025-03-21T17:27:39Z

Thanks @ambv for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

…thonGH-131547) This is based off pythonGH-131509. (cherry picked from commit 4cc82ff) Co-authored-by: Łukasz Langa <[email protected]>

bedevere-app · 2025-03-21T17:27:49Z

GH-131557 is a backport of this pull request to the 3.13 branch.

…H-131547) (GH-131557) gh-131507: Refactor screen and cursor position calculations (GH-131547) This is based off GH-131509. (cherry picked from commit 4cc82ff) Co-authored-by: Łukasz Langa <[email protected]>

…thonGH-131547) This is based off python#131509.

pythongh-131507: Refactor screen and cursor position calculations

f9f4722

This is based off python#131509.

ambv requested review from pablogsal and lysnikolaou as code owners March 21, 2025 15:07

bedevere-app bot added the awaiting core review label Mar 21, 2025

bedevere-app bot mentioned this pull request Mar 21, 2025

Syntax highlighting in PyREPL #131507

Closed

ambv added skip news needs backport to 3.13 bugs and security fixes labels Mar 21, 2025

pablogsal reviewed Mar 21, 2025

View reviewed changes

pablogsal approved these changes Mar 21, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Mar 21, 2025

ambv merged commit 4cc82ff into python:main Mar 21, 2025
56 checks passed

bedevere-app bot removed the awaiting merge label Mar 21, 2025

bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Mar 21, 2025

ambv mentioned this pull request Mar 21, 2025

Zero-width characters cause unbounded memory use in PyREPL #126685

Closed

ambv deleted the pyrepl-sh-refactor-calc-screen branch March 21, 2025 18:03

seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025

pythongh-131507: Refactor screen and cursor position calculations (py…

c6b57ee

…thonGH-131547) This is based off python#131509.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-131507: Refactor screen and cursor position calculations #131547

gh-131507: Refactor screen and cursor position calculations #131547

Uh oh!

ambv commented Mar 21, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

pablogsal Mar 21, 2025

Uh oh!

ambv Mar 21, 2025 •

edited

Loading

Uh oh!

ambv Mar 21, 2025

Uh oh!

ambv Mar 21, 2025

Uh oh!

ambv Mar 21, 2025

Uh oh!

pablogsal Mar 21, 2025

Uh oh!

pablogsal Mar 21, 2025

Uh oh!

ambv Mar 21, 2025

Uh oh!

pablogsal Mar 21, 2025

Uh oh!

ambv Mar 21, 2025

Uh oh!

pablogsal left a comment

Uh oh!

Uh oh!

miss-islington-app bot commented Mar 21, 2025

Uh oh!

bedevere-app bot commented Mar 21, 2025

Uh oh!

Uh oh!

Uh oh!

gh-131507: Refactor screen and cursor position calculations #131547

gh-131507: Refactor screen and cursor position calculations #131547

Uh oh!

Conversation

ambv commented Mar 21, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ambv Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pablogsal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

miss-islington-app bot commented Mar 21, 2025

Uh oh!

bedevere-app bot commented Mar 21, 2025

Uh oh!

Uh oh!

ambv commented Mar 21, 2025 •

edited by bedevere-app bot

Loading

ambv Mar 21, 2025 •

edited

Loading