Skip to content

Prompt Regex Escaping Confusion #86

Closed
@s-weigand

Description

@s-weigand

First of all thanks for the new Prompt RegEx feature ❤️
I did just try it out and after some fiddeling around with the escaping, it works like a charm.

Expected behaviour

First of all I think in some edge cases it might make a difference to mention that the RegEx you type in your config.py is a JS RegEx and not a python one.
So that said IMHO it should work similar to this:

copybutton_prompt_text = r"<valid_js_reg_ex> "
copybutton_prompt_is_regexp = True

🌈 🦄

What I did

In my case the prompt I wanted to get rid of was of the form

r"In \[\d*\]: |\.\.\.: "

which is the jupyter console style and pretty much the same as the iPython style from the docs

"\\[\\d*\\]: |\\.\\.\\.: "

So it should be straight forward

"In \\[\\d*\\]: |\\.\\.\\.: "

as in the docs and I would be done?

But it didn't work, so I did some digging and ended up with:

"In \\[\\d*\\]: |\\.\\.\\.: |\\$ "

What internally happens

In my config.py, I set copybutton_prompt_text

copybutton_prompt_text = r"In \[\d*\]: |\.\.\.: "

which is the same as:

copybutton_prompt_text = "In \\[\\d*\\]: |\\.\\.\\.: "
>>> r"\\" == "\\\\"
True

Sphinx now replaces the context variable copybutton_prompt_text in copybutton.js_t and generates copybutton.js.
But when we look at that string which should have just been passed as is, it is missing backslashes.

'In \[\d*\]: |\.\.\.: |\$ '

Well this looks like a totaly valid JS RegEx, so this is fine?
And it would be if written like this (JS just wants keep people on their toes, so why not a different character for RegEx strings?):

/In \[\d*\]: |\.\.\.: |\$ /

Now the string gets passed to the RegExp constuctor, which expects strings to be escaped like python, if you don't use raw strings.
But it doesn't match.

> "In [1]: bar".match(new RegExp('^(In \[\d*\]: |\.\.\.: )(.*)'))
null

Why is this going wrong and why didn't the tests catch this?

TLDR

IMHO this all comes down ro JS RegExp 'helping' you in some cases when it can interprete what you might have ment, which gives you false security (false positiv match in the tests).

Details

Disclaimer I'm in no way a JS expert with insight into the internals, I just found this due to WTF moments + trail and error.

> let proper_regex = new RegExp('^(>>> |\\$ |In \\[\\d*\\]: |\\[\\d*\\]: |\\.\\.\\.: )(.*)')
undefined
> let false_positive_regex = new RegExp('^(>>> |\\$ |In \[\d*\]: |\[\d*\]: |\.\.\.: )(.*)')
undefined
>"$ bar".match(proper_regex)
(3) ["$ bar", "$ ", "bar", index: 0, input: "$ bar", groups: undefined]
> "[1]: bar".match(proper_regex)
(3) ["[1]: bar", "[1]: ", "bar", index: 0, input: "[1]: bar", groups: undefined]
> "[1]: bar".match(false_positive_regex)
(3) ["[1]: bar", "[1]: ", "bar", index: 0, input: "[1]: bar", groups: undefined]
> "...: bar".match(proper_regex)
(3) ["...: bar", "...: ", "bar", index: 0, input: "...: bar", groups: undefined]
> "...: bar".match(false_positive_regex)
(3) ["...: bar", "...: ", "bar", index: 0, input: "...: bar", groups: undefined]

So far so good, all testcases work fine, as we would expect since they pass.

> "In [1]: bar".match(proper_regex)
(3) ["In [1]: bar", "In [1]: ", "bar", index: 0, input: "In [1]: bar", groups: undefined]
> "In [1]: bar".match(false_positive_regex)
null

The only explaination I got for this behaviour, is that JS wants to "help" the user by guessing it's intention, like the famous adding thingy:

> "5" + 1
"51"

but at some points it is like "I don't get it", and the pattern that worked before fails.

Possible fix

Use repr in add_to_context

If this line was changed to:

      {"copybutton_prompt_text": r"{}".format(repr(config.copybutton_prompt_text))[1: -1]}

I know it looks ugly, but would add extra backslashes for escaping (maybe someone else has a more elegant solution).
I'm not sure how much escapeRegExp would need to be adjusted to those changes.

I didn't have luck with backslash replacing in JS:

> 'In \[\d*\]: |\.\.\.: '.replace(/\\/, '\\\\');
"In [d*]: |...: "
> 'In \[\d*\]: |\.\.\.: '
"In [d*]: |...: "

So in conclusion:
Python ❤️
JS 😠

P.S.: Wish we had the sphinx-copybutton on github, for the JS blobs I posted 😢

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions