-
Notifications
You must be signed in to change notification settings - Fork 341
Allow gitlab URL link shortening from non-gitlab/github.com domains #2068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks @mattpitkin for this work! I felt that this work needed more tests, so I added tests, and that led to some slight refactoring. @drammock, do you think you could give my additions a review? Please bear in mind that a lot of the test cases are marked with "TODO: this is wrong", but the same is true not just for the bitbucket URLs but for some of the GitHub and GitLab URLs, as well. I think those problems should maybe be tackled in a separate PR. What do you think? |
tests/test_short_url.py
Outdated
# TODO: this is wrong | ||
"pydata/pydata-sphinx-theme#issues", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably be pydata/pydata-sphinx-theme/issues
I think. Let's do in a follow-up PR
tests/test_short_url.py
Outdated
# TODO: should this be shortened the way that GitHub does it: | ||
# pydata/pydata-sphinx-theme@3caf346 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be ideal. Let's do in a follow-up PR
tests/test_short_url.py
Outdated
# TODO, I belive this is wrong as both orgs/pydata/projects/2 and | ||
# pydata/projects/issue/2 shorten to the same | ||
("github", "https://github.com/orgs/pydata/projects/2", "pydata/projects#2"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I agree this looks wrong. The collision is unlikely (would need a repo called "projects") but still, I think the #
should only be used for issues and PRs, not projects. So this should be pydata/projects/2
I guess.
elif self.platform == "bitbucket": | ||
# split the url content | ||
parts = path.split("/") | ||
|
||
if len(parts) > 0: | ||
text = parts[0] # organisation | ||
if len(parts) > 1 and not ( | ||
parts[-2] == "workspace" and parts[-1] == "overview" | ||
): | ||
if len(parts) > 1: | ||
text += f"/{parts[1]}" # repository | ||
if len(parts) > 2: | ||
if parts[2] in ["issues", "pull-requests"]: | ||
itemnumber = parts[-1] | ||
text += f"#{itemnumber}" # element number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is hard to follow, and feels brittle to me. Why aren't we using regex for this? I worked up this gist as a POC and it seems to capture the behavior that we want:
https://gist.github.com/drammock/78d2d3c9837aafd1259866c7b936b9e4
presumably similar things will work for other forges (though IIRC gitlab is a bit more complex). @gabalafou WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. The code in this PR follows patterns already established for GitHub and GitLab, but I think those patterns are bad.
The way I think this should work is that we should be rather picky about which URL patterns we recognize/support and only shorten those. All the other URLs should be not be shortened. But right now, the code takes the opposite approach, as soon as it sees github.com or gitlab.com, it tries to shorten the URL.
For example, let's say we were just starting to support GitHub URLs. But in this scenario, let's start only with supporting pull request URLs.
Then we would convert the following link, like so:
https://github.com/pydata/pydata-sphinx-theme/101
=> pydata/pydata-sphinx-theme#101
But we would not convert any of the following links (if we were only supporting pull request links):
https://github.com/pydata/pydata-sphinx-theme/issues
https://github.com/pydata/pydata-sphinx-theme/issues/
https://github.com/pydata/
https://github.com/pydata/pydata-sphinx-theme/commit/3caf346cacd2dad2a192a83c6cc9f8852e5a722e
None of those links would get shortened until we specifically add each type of URL that we want to support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to go down the route of a rewrite of the link shortening logic, I would be happy to do it. In that case, I would just close this PR and open a new one.
I would abandon the feature in this PR to turn off link shortening because I suspect that part of the motivation for adding a config value to turn it off is because our current link shortener is bad. Perhaps with a better shortener, there would be little to no demand for a config setting to turn it off. What do you think, @mattpitkin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in this PR follows patterns already established for GitHub and GitLab, but I think those patterns are bad. [...] If we want to go down the route of a rewrite of the link shortening logic, I would be happy to do it. In that case, I would just close this PR and open a new one.
Since I think I've basically solved it for bitbucket in the linked Gist, I feel like it might be the right call to just rewrite the others too. +1 to close and open a new PR to refactor what we have and also add bitbucket. See also #2215 which adds link shortening for codeberg/forgejo and gitea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would abandon the feature in this PR to turn off link shortening because I suspect that part of the motivation for adding a config value to turn it off is because our current link shortener is bad. Perhaps with a better shortener, there would be little to no demand for a config setting to turn it off. What do you think, @mattpitkin?
Do you mean, why did I put the gitlab_url
etc in the ``html_contextdictionary of the
conf.py` file, rather than being at the root level? In part, I think I just found that other information was conveyed in that dictionary, so it seemed a safe place to add new information that wouldn't interfere with anything else. If that information can go at root level, then I don't have an issue with that. You'd still have to specify the required URL bases to turn on/off URL shortening for them though - remember this is specifically for URLs that are not in the standard gitlab.com, github.com, bitbucket.org base domains, and they don't even have to have, e.g., gitlab, in the domain name (see, for example, https://git.ligo.org).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattpitkin, sorry, my comment actually didn't make any sense.
I was writing it in a hurry yesterday and I got two different PRs mixed up in my head: this one versus #2109. But the other PR is very much related to this PR because it provides an option to turn off link shortening.
Co-authored-by: Daniel McCloy <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized it would probably be better not to close this PR and open a new one but rather to do the refactor within this PR.
That said, I'm really wondering if this logic should be spun out into a separate Sphinx extension.
@@ -0,0 +1,16 @@ | |||
<div class="bitbucket-container docutils container"> | |||
<p> | |||
<a class="reference external" href="https://bitbucket.org"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we still add the bitbucket
class, which adds the BitBucket icon before the link, even if we do not shorten the link?
My sense is that only shortened links should get the icon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is could be slightly shortened, as it omits by omitting the https://
part. If we did so and I had to pick I would include the logo but I don't really care.
<a class="github reference external" href="https://github.com/orgs/pydata/projects/2"> | ||
pydata/projects#2 | ||
<a class="reference external" href="https://github.com/orgs/pydata/projects/2"> | ||
https://github.com/orgs/pydata/projects/2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep this rewrite manageable, I settled on a limited number of URL types to support, and project URLs were not one of them. It can be added later if folks want it.
</p> | ||
<p> | ||
<a class="reference external" href="https://www.github.com/pydata/pydata-sphinx-theme/pull/1012"> | ||
https://www.github.com/pydata/pydata-sphinx-theme/pull/1012 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we should support www-prefixed URLs or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the argument for not supporting it? Is it hard to make it work?
<a class="gitlab reference external" href="https://gitlab.com/gitlab-org/gitlab/issues"> | ||
gitlab-org/gitlab/issues | ||
<a class="reference external" href="https://gitlab.com/gitlab-org/gitlab/issues"> | ||
https://gitlab.com/gitlab-org/gitlab/issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, same question, should we add the gitlab
class, which adds the GitLab icon before the URL, if we don't shorten the GitLab URL?
), | ||
( | ||
# TODO, non canonical url, discuss if should maybe be shortened to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR takes care of all the TODO comments, yay!
# only do something if the platform is identified | ||
self.platform = self.supported_platform.get(uri.netloc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure about this part of the class refactor.
I got rid of the platform
member. I also renamed the parse_url
method and moved it out of the class.
If I am reading the code correctly, it seems odd to me that each time this class encounters a node, it changes self.platform to whatever is matched in that moment.
I felt like it would be cleaner and easier to test if I just passed platform
as an argument to the shortener function.
platform = self.supported_platform.get(parsed_uri.netloc) | ||
if platform is not None: | ||
short = shorten_url(platform, uri) | ||
if short != uri: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the code change that prevents adding the platform class (and thereby the icon) unless the link is actually shortened.
@@ -39,6 +39,7 @@ repos: | |||
hooks: | |||
- id: djlint-jinja | |||
types_or: ["html"] | |||
exclude: ^tests/test_build/.*\.html$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linter was complaining about http link in a test fixture so I excluded it here.
# Branch URL | ||
elif match := re.match(r"/([^/]+)/([^/]+)/tree/([^/]+)", path): | ||
owner, repo, branch = match.groups() | ||
return f"{owner}/{repo}:{unquote(branch)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the colon here? Aren't (tips of) branches usually represented with @
? For example pip install git+https://git.example.com/MyProject.git@master
if (m := re.match(r"^/(.+)/([^/]+)/-/merge_requests/(\d+)$", path)) or ( | ||
m := re.match(r"^/(.+)/([^/]+)/merge_requests/(\d+)$", path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the second pattern can be omitted if you insert a (?:/-)?
(optional non-capturing group containing literal /-
) in place of the /-
in the first pattern.
# Issue URL | ||
elif match := re.match(r"/([^/]+)/([^/]+)/issues/(\d+)", path): | ||
workspace, repo, issue_id = match.groups() | ||
return f"{workspace}/{repo}!{issue_id}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the !
standard for bitbucket issues? above we use #
for both PRs and issues in the GitHub/GitLab shorteners.
</p> | ||
<p> | ||
<a class="reference external" href="https://www.github.com/pydata/pydata-sphinx-theme/pull/1012"> | ||
https://www.github.com/pydata/pydata-sphinx-theme/pull/1012 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the argument for not supporting it? Is it hard to make it work?
This is a PR for #2065. It allows URL shortening for GitLab/GitHub links that use the
gitlab_url
/github_url
given inhtml_context
, e.g.,:rather than just for GitLab/Github URLs that are in the git[lab/hub].com domain.
Update: I've expanded this PR to also add in the ability to shorten bitbucket URLs.