Skip to content

gh-131507: Refactor screen and cursor position calculations #131547

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 21, 2025

Conversation

ambv
Copy link
Contributor

@ambv ambv commented Mar 21, 2025

This is based off #131509.

I changed a bunch of variable names to make things more obvious in calc_screen() and pos2xy(). Plus very mild refactoring once the better names make it obvious which things are the same. It is helpful to me at least.

The one actual change here is moving disp_str() to _pyrepl.utils because this is where syntax highlighting will live as well. With that move comes a slight performance optimization that will become functionally important later: disp_str() no longer repacks the list of characters into a string (that is later only iterated on anyway in calc_screen()). Additionally, our version of disp_str() never had the behavior presented in the docstring, so I replaced it with something more sensible.

That's about it. Tests prove no functional change.

offset = len(character_widths) - character_widths.count(0)
in_wrapped_line = prompt_len + sum(character_widths) >= self.console.width
prompt_len, char_widths = self.screeninfo[i]
offset = len(char_widths)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this breaking 0-width characters (like combining marks)?

Copy link
Contributor Author

@ambv ambv Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is fixing 0-width characters.

Edit: to explain more clearly. The code I removed looks like it was meant to help with 0-width characters, but the code was actually the wrong way round. The reason why was that originally the "0-width" was historically used in pyrepl to show characters like ^C as two-wide by returning [1, 0] from disp_str(). You can still see that in the docstring I edited. But either we made a change to undo that when originally merging pyrepl, or they changed it over the course of history so it didn't work anyway. So those 0-counts weren't doing anything productive anyway, but were only adding to our confusion. As proof, all zero-width characters are currently printed out as part of the Other unicodedata group:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-03-21 at 17 47 16

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, the screenshot represents the state before this PR, and this PR is not affecting it in any way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I discovered all this was that I originally wanted disp_str() to emit syntax highlighting as separate zero-width characters and then it turned out that the removal of the .count(0) was necessary for the cursor positioning to work correctly. And even that was insufficient, since having zero-width characters in the stream was causing wrapping on narrow terminals to fall in infinite loops, there's an open issue about this (#126685).

This PR fixes #126685.

@@ -560,29 +530,33 @@ def setpos_from_xy(self, x: int, y: int) -> None:

def pos2xy(self) -> tuple[int, int]:
"""Return the x, y coordinates of position 'pos'."""
# this *is* incomprehensible, yes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆

Comment on lines +44 to +77
def disp_str(buffer: str) -> tuple[CharBuffer, CharWidths]:
r"""Decompose the input buffer into a printable variant.

Returns a tuple of two lists:
- the first list is the input buffer, character by character;
- the second list is the visible width of each character in the input
buffer.

Examples:
>>> utils.disp_str("a = 9")
(['a', ' ', '=', ' ', '9'], [1, 1, 1, 1, 1])
"""
chars: CharBuffer = []
char_widths: CharWidths = []

if not buffer:
return chars, char_widths

for c in buffer:
if c == "\x1a": # CTRL-Z on Windows
chars.append(c)
char_widths.append(2)
elif ord(c) < 128:
chars.append(c)
char_widths.append(1)
elif unicodedata.category(c).startswith("C"):
c = r"\u%04x" % ord(c)
chars.append(c)
char_widths.append(len(c))
else:
chars.append(c)
char_widths.append(str_width(c))
trace("disp_str({buffer}) = {s}, {b}", buffer=repr(buffer), s=chars, b=char_widths)
return chars, char_widths
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if you like something of this version with some improvements:

Suggested change
def disp_str(buffer: str) -> tuple[CharBuffer, CharWidths]:
r"""Decompose the input buffer into a printable variant.
Returns a tuple of two lists:
- the first list is the input buffer, character by character;
- the second list is the visible width of each character in the input
buffer.
Examples:
>>> utils.disp_str("a = 9")
(['a', ' ', '=', ' ', '9'], [1, 1, 1, 1, 1])
"""
chars: CharBuffer = []
char_widths: CharWidths = []
if not buffer:
return chars, char_widths
for c in buffer:
if c == "\x1a": # CTRL-Z on Windows
chars.append(c)
char_widths.append(2)
elif ord(c) < 128:
chars.append(c)
char_widths.append(1)
elif unicodedata.category(c).startswith("C"):
c = r"\u%04x" % ord(c)
chars.append(c)
char_widths.append(len(c))
else:
chars.append(c)
char_widths.append(str_width(c))
trace("disp_str({buffer}) = {s}, {b}", buffer=repr(buffer), s=chars, b=char_widths)
return chars, char_widths
CharBuffer = List[str]
CharWidths = List[int]
CTRL_Z = "\x1a"
ASCII_MAX = 127
CTRL_Z_WIDTH = 2
ASCII_WIDTH = 1
@lru_cache(maxsize=1024)
def cached_category(c: str) -> str:
"""Cache unicodedata.category calls for better performance."""
return unicodedata.category(c)
def disp_str(buffer: str) -> Tuple[CharBuffer, CharWidths]:
r"""Decompose the input buffer into a printable variant.
Returns a tuple of two lists:
- the first list is the input buffer, character by character;
- the second list is the visible width of each character in the input buffer.
Examples:
>>> utils.disp_str("a = 9")
(['a', ' ', '=', ' ', '9'], [1, 1, 1, 1, 1])
"""
# Fast path for empty buffer
if not buffer:
return [], []
# Optimization for common case: pure ASCII text
if all(ord(c) < ASCII_MAX for c in buffer):
chars = list(buffer)
char_widths = [ASCII_WIDTH] * len(buffer)
trace("disp_str({buffer}) = {s}, {b}", buffer=repr(buffer), s=chars, b=char_widths)
return chars, char_widths
chars: CharBuffer = []
char_widths: CharWidths = []
# Pre-allocate lists for better performance
chars = [None] * len(buffer)
char_widths = [0] * len(buffer)
for i, c in enumerate(buffer):
if c == CTRL_Z: # CTRL-Z on Windows
chars[i] = c
char_widths[i] = CTRL_Z_WIDTH
elif ord(c) < ASCII_MAX:
chars[i] = c
char_widths[i] = ASCII_WIDTH
elif cached_category(c).startswith("C"):
unicode_repr = f"\\u{ord(c):04x}"
chars[i] = unicode_repr
char_widths[i] = len(unicode_repr)
else:
chars[i] = c
try:
char_widths[i] = str_width(c)
except Exception:
# Fallback if str_width fails
char_widths[i] = 1
trace("disp_str({buffer}) = {s}, {b}", buffer=repr(buffer), s=chars, b=char_widths)
return chars, char_widths

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like caching character categories, will add this to the third PR in the sequence. The other optimization you added would have to be removed anyway when adding color codes.

Comment on lines +554 to +557
if not in_wrapped_line:
offset += 1 # there's a newline in buffer

pos -= offset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why these two are equivalent :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is equivalent, but it's subtle. Note that even in the old version, the if test in line 580 is the same as what's under the variable in_wrapped_line.

In the new version, we're reusing the offset variable that already had -1 applied if in_wrapped_line. So we only need to handle the newline case, which we do.

There's less computation in the new version, so it reads better to me. It almost could be shortened further, but the break check is awkwardly in the middle there, so I kept it for equivalence.

Copy link
Member

@pablogsal pablogsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some comments

@ambv ambv merged commit 4cc82ff into python:main Mar 21, 2025
56 checks passed
@miss-islington-app
Copy link

Thanks @ambv for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Mar 21, 2025
@bedevere-app
Copy link

bedevere-app bot commented Mar 21, 2025

GH-131557 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Mar 21, 2025
ambv added a commit that referenced this pull request Mar 21, 2025
…H-131547) (GH-131557)

gh-131507: Refactor screen and cursor position calculations (GH-131547)

This is based off GH-131509.
(cherry picked from commit 4cc82ff)

Co-authored-by: Łukasz Langa <[email protected]>
@ambv ambv deleted the pyrepl-sh-refactor-calc-screen branch March 21, 2025 18:03
seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants