-
Notifications
You must be signed in to change notification settings - Fork 584
Replace checkAUTHORS.pl with updateAUTHORS.pl/updateAUTHORS.pm, allow users to be excluded #20070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If there are pending changes, it checks if the current author is listed as author. This is to prevent the test-suite from succeeding before making one's first commit but failing after it. |
On Mon, 8 Aug 2022 at 23:10, Leon Timmermans ***@***.***> wrote:
t/porting/pending-author.t - I couldnt figure out what this was for, and
it doesnt seem to be required with the updateAUTHORS.pl workflow. If
someone can explain what it was used for better than the test file itself
then I would have no problem adding it back.
If there are pending changes, it checks if the current author is listed as
author.
What is a pending change?
Yves
…--
perl -Mre=debug -e "/just|another|perl|hacker/"
|
On Tue, 9 Aug 2022 at 09:09, demerphq ***@***.***> wrote:
On Mon, 8 Aug 2022 at 23:10, Leon Timmermans ***@***.***> wrote:
>
> t/porting/pending-author.t - I couldnt figure out what this was for, and it doesnt seem to be required with the updateAUTHORS.pl workflow. If someone can explain what it was used for better than the test file itself then I would have no problem adding it back.
>
> If there are pending changes, it checks if the current author is listed as author.
What is a pending change?
To expand on my failure to understand this concept, authors.t checks
that the AUTHORS file is up to date. That means it checks that if you
ran updateAUTHORS.pl nothing would change given the commits that are
in the repo when it runs.
When pending-author.t runs what does it check?
Our tests run in two scenarios relevant to this discussion, on peoples
local box while developing and on CI when the code is pushed.
When this executes on my local box what is it testing that authors.t
doesnt test? When this runs on CI when the code has been pushed what
is it testing that authors.t doesnt test?
I cannot work out a reasonable understanding of what a "pending
change" is in this context. Either the commit is in the commit log and
it is being tested or it is not. If it is then it gets tested by
authors.t, if the commit isn't in the commit log then it isn't pending
it is non-existent as far our testing goes.
cheers,
Yves
…--
perl -Mre=debug -e "/just|another|perl|hacker/"
|
d27aa44
to
e25eb4c
Compare
Uncommited changes |
@Leont - Ok, when I first read pending-author i think i misunderstood what it does and got myself in a muddle, the term "pending" really threw me. I should have just ignored that and looked at the code more closely. So it is looking to see if there are uncommitted changes that if they were committed with the current users details would result in a test failure. I dont see why we need another test file for this, we can just put it into the authors.t testing. I will add that. |
1d8e24a
to
60cfdac
Compare
Ok, authors.t now does the equivalent of what pending authors used to do, arguably better as it now checks (nearly) all the things that git would. I also added copious verbiage if the tests fail. |
Some early feedback without really looking at the code yet:
|
On Thu, 11 Aug 2022 at 12:56, Bram ***@***.***> wrote:
Some early feedback without really looking at the code yet:
commit Porting/updateAUTHORS.pl - add support for excluding contributors:
based on the commit description I got a feeling that the commit tries to do too much at once (i.e. consider splitting the commit?)
Ok.
Porting/checkAUTHORS.pl which does not do the right thing these days anyway.: Is it possible/sensible to include a small description of how/when it doesn't do the right thing?
Sure. I think it was that it doesn't understand the new
Porting/exclude_contrib.txt but ill update the commit.
commit remove redundant data from .mailmap:
can you describe in the commit message what you consider redundant data?
When the LHS and RHS of a .mailmap line are the same.
and the reason why this matters? [I'm not disputing that there - probably - is a good reason for it; but I like seeing it in the commit message]
Actually this is a controversial point. Some folks seem to prefer us
not having it in the .mailmap. It also makes the mailmap larger.
I personally dont think it matters and think that removing the entries
results in us losing capabilities, specifically the ability to
recognize when someone has manually updated the AUTHORS file to change
their preferred email or their name. I only implemented it because it
came up multiple times and I got tired with discussing it. To
compensate I added options to change name via the command line.
Yves
…--
perl -Mre=debug -e "/just|another|perl|hacker/"
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I reviewed this commit per commit and not by final changes..
GitHub UI doesn't really make clear on which commit the comment was added tho :(
Some of the comments apply to code that was later changed, so it might be a bit confusing when looking at the current code..
db4f96f
to
3f1d192
Compare
BTW, I am still not entirely satisfied with a couple of aspects of this PR, it might be wise to wait until I say so before anyone wastes their time reviewing code I am going to kill. |
3f1d192
to
2627f5b
Compare
Hi @bram-perl I have update the code and include I think all of your feedback. With the amount that I have rebased, squashed, and otherwise munged the sequence I am going to mark all your previous feedback as resolved (as I hope it is) and start fresh. The patch sequence should be much cleaner now with much of the interim stuff squashed out. Please let me know what you find that I overlooked. I have been staring at it so long I have become desensitized. :-) |
2627f5b
to
046763a
Compare
That last push just removes a function made obsolete by the checkAUTHORS removal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done a commit-per-commit review.
(I may have gotten carried away a bit during reviewing - hope you don't mind :) )
9d613e5
to
4a03be6
Compare
Hi @bram-perl I have incorporated most of your feedback, either literally as you suggested, or in some case in spirit but i went a different way. The only points I did not go along with were the following two:
So if you don't mind I will let you push a follow PR for point 1. For point 2, I am open to doing something, but I would really like to know how this fails when that travis logic fires, which is simplest done by merging, waiting for it to fail (if it does) and then looking at the failure and then dealing with it. If folks are ok with that I would like to proceed that way. |
I'll try todo a final review tonight on this PR; |
e279804
to
7203f62
Compare
Re: POD docs for the module. I would /like/, long run, to have complete docs for this, but I don't see those docs as a necessary precondition to getting the PR merged. Rather I would like to get this merged and then incrementally improve it over time, ideally with the support from others who care about these things. IMO after the PR (very much with your help) updateAUTHORS.pl will be a better tool than we had with either itself or checkAUTHORS.pl before this PR. We can incrementally make it even better as we like once its merged. |
A. Sinan Unur was listed twice in AUTHORS, under Sinan Unur as well. This removes the dupe. This is in prep for various changes to the updateAUTHORS.pl process.
they are not pod, but rather editor droppings.
…lved_debug" The existing name is confusing to me, especially when the workflow gets skipped so change the name to something more intuitive. We also change it to respect .mailmap by using %aN and %aE instead of %an and %ae. We also change it to use ".." and "..." in the git log statement, and use `sort | uniq -c` instead of `sort -u`, it is helpful to see counts. We also show the resolved begin and end commits.
We can at least make sure it compiles... Later we can build on this, for now this sanity checks that we haven't *completely* broken the script while we refactor it.
As we extend the module logic mixing the docs for the module and the command line tool becomes problematic, and it is pain passing around the state via distinct variables. This is first step to moving to a simpler model. We don't make much use of the OO in this patch at all, that will come in a follow up patch.
It is much cleaner when the object stores its state internally. That way we don't have to adjust the transitive caller graph when we need something new. Also normalize some var names, we were using "author_info" and "authors_info" in a way that was confusing, now it is consistently "author_info".
This makes it easier and simpler to add new options and extend the script in the future.
Makes it easy to add diagnostics while developing.
If we want to just check if the files are up to date we dont want to have to write to the disk. We will use this in a future patch.
checkAUTHORS.pl supports specifying a commit range, and we actually need it even more as we do not support piping from git log (as our git log format is ungainly) so this allows us to be feature compatible.
Exactly what "update_authors" does is a little less clear than what "update_authors_file" does.
This add a new file Porting/exclude_contrib.txt to hold a list of base64 SHA-256 digests of the user name and email who should be excluded. This adds the options --exclude-me --exclude-contrib=NAME_AND_EMAIL --exclude-missing to add exclusions in different ways. --exclude-me uses the git credentials for the current user, --exclude-contrib expects a name and email, and --exclude-missing excludes someone who is missing from the change log. When excluding someone their name will be removed from AUTHORS if it is already listed, and the relevant entries to .mailmap will be removed, and if necessary additional exclusion entries will be added for any of their old identities which were mentioned in .mailmap. This feature will be used in a later patch to resolve GH issue #20010.
Adds the --stats, --files, --who, and related options similar to what checkAUTHORS.pl offers. See perldoc Porting/updateAUTHORS.pl for the list of options it supports.
Binmode the pipe, use tabs not nulls, and on windows use double quotes not single quotes for arguments to subprocess scripts. Since we weren't using Porting/updateAUTHORS.pm as part of our test pipeline we didn't know it didnt work on Windows builds.
Use consistent hash key and list sort orders.
This is helpful when adding a new option to updateAUTHORS.pl. It not really meant for users, more for someone working on the tool itself, and thus is not documented.
The blurb mentions checkAUTHORS.pl, and we should point people at updateAUTHORS.pl instead.
This emulates the test feature in checkAUTHORS.pl and in t/porting/pending-author.t which overlaps with the tests that checkAUTHORS.pl does on behalf of t/porting/authors.t We test that the .mailmap and AUTHORS files are up to date. If the working directory is not clean (eg git status --porcelain returns a non empty list of files) then we also validate the git credentials in use to make sure that if the changes were applied we would also still pass the test. The script implements the same logic that `git commit` uses to decide what identity a commit would use with the exception of the last resort measures git undertakes to use the hostname and similar files.
Improved POD verbiage. Explain how to exclude people in a more friendly way. Various other pod cleanups and improvements.
A number of devs have noted and raised concerns that having .mailmap entries where the LHS and RHS of the entry is redundant. A few have also expressed the view that this exposes unnecessarily exposes their email address in an easily harvestible form. On the technical level as far as git specifically is concerned it is true this data is redundant, as git uses .mailmap to transform user data that match the RHS of an entry into the value on LHS, and when they are the same obviously it is a no-op. However on the technical level for our infra these entries are not redundant. We can use them to identify and correctly respond to many cases of manual update of the AUTHORS file, for instance changing ones preferred name. With the .mailmap entries we have the data to identify the old preferred name, and join it together with the unchanged email for the user and then automatically update their .mailmap entries. This is why these entries were created originally. However, I believe that this functionality is not useful enough to require us to have discussions with contributors on this subject on a regular basis. We can add command line options that allow people to change the AUTHORS file and the .mailmap file properly, so we can drop the "redundant" data and avoid the need to talk about why it is there. The required functionality for changing names will come in a follow up patch in this PR.
With the removal of the redundant data from .mailmap we lacked a way to rename an author properly. Previously someone could manually update a name or email in AUTHORS files and things would "just work" as the .mailmap file would tell us the old mapping. Now we have removed those mappings we need a way to easily rename someones canonical data. This also means manually changing the AUTHORS file is now a problem.
updateAUTHORS.pl now can replace all the capabilities of checkAUTHORS.pl, and checkAUTHORS.pl has a slightly different idea of what needs to be validated, and is not aware of the Porting/exclude_contrib.txt. Instead of teaching it to be aware and maintaining two tools this drops checkAUTHORS.pl In the process we move the logic for t/porting/pending-author.t into t/porting/authors.t, by letting updateAUTHORS.pl do it as well. This is helpful because updateAUTHORS.pl contains similar core git logic so that it can implement some of its options.
Contributor has requested they be excluded from our contributor databases.
08c5367
to
9a25b33
Compare
rebased to deal with the conflicts in .mailmap |
And remove the duplicate for "vividsnow" that this found.
9a25b33
to
17f95b8
Compare
In #20010 a user has asked that we do not list their name or email any of our latest files in the repo.
This was a bit problematic as updateAUTHORS.pl would just add the user back if we did so. I added functionality to updateAUTHORS.pl to support us using hashed contributor details as a way to prevent users from being added by our automation. Unfortunately this then meant that checkAUTHORS.pl and updateAUTHORS.pl had different idea about what should be in the files. There are other issues, the checkAUTHORS.pl file contains a lot of data in a hard to reuse or manage form, and in plaintext, (although slightly munged so its not a direct scraper target). In the end it made a lot more sense to me to simply drop checkAUTHORS.pl and replace it with updateAUTHORS.pl for all use cases.
This patch sequence adds support for a bunch of features that checkAUTHORS.pl had that updateAUTHOR.pl did not, almost uniformly related to reporting so that the tool is now nearly feature equivalent with checkAUTHORS.pl.
There are a few differences:
This PR adds two new files
This PR removes two files:
This PR removes the personal details as requested in #20010
The latest version of updateAUTHORS.pl has support for additional reports, and I expect as I have time I will add more. IMO it is much easier to do this with updateAUTHORS.pl than it was with checkAUTHORS.pl (because of the reliance on the git standard log formats). I plan to add a "who-works-on-what" to show file data by author. Eg, it might show that @khwilliamson, myself and @hvds do most of the changes to the regex engine.
I have marked this PR as draft for now, as i expect to make changes as I get feedback, but the patch it pretty stable, review appreciated!