-
Notifications
You must be signed in to change notification settings - Fork 10.1k
etcd_debugging_mvcc_watcher_total
may become negative value
#19987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think we should clearly document our review policy:
|
I personally don't agree with it. The change was pretty simple, covered with proper unit tests and reviewed by me and @ivanvc.
While agree that for complicated changes it's good and customary to ask for more reviewers, I don't think it's a good default. I would prefer that we start moving in K8s direction of having a directory owners that are trusted to approve the PR or delegate to the best person to approve. |
@ivanvc did not approve the PR. Any production change on the core part (i.e. mvcc, applying workflow, etcdserver etc) should have at least two approvals. Rule is rule ( of course, to be reviewed & approved ) Also I see that some PRs were merged without any reviews,
Note we have blameless culture. It's not blaming anyone, it's just for safety & quality -- prevent any potential risk in future. |
While I agree in general more reviews would make etcd safer and improve code quality and we should do better. We also need to take into considerations that we only have a small pool of reviewers across different time zones. Having hard requirements could greatly hinder dev speed, especially for the ongoing antithesis project. I think it is acceptable to trust the approver's judgement calls while we all lean more on the more cautious side. In addition, if etcd is going to own submodules in kubernetes, it would be more consistent to adopt the review process from kubernetes. |
It isn't just antithesis, it's also robustness test. When robustness test raises any issue, we need to analyse it as well, although not everyone work on it. So quality of robustness test matters.
See my #19988, the reviewers include the submodule owners.
Quality & stability outweighs dev speed. If you see the PRs listed above, they are all raised & merged in few hours.
I think we should rely on process/policy to guarantee quality instead of any individual's judgement. Human makes mistakes. |
If you read the #19988, they aren't hard requirements. There are some flexiblities. |
That doesn't really seem blameless, a good practice is to ensure diverse examples. However I don't think there is a need scour PR history to make a point. I can take the fault because I merged those PRs. I did that after I broke robustness tests on main, after I missed that tests were not run on PR. It's easy to miss that tests didn't run because they are not showing up until I think the proper solution is not a policy, but just taking back the repo write access from maintainers. It's just to easy to make a mistake, to tempting to try to make a dirty fix.
I agree with the second part, but I take different conclusion. People make mistakes, but we should trust them to not be malicious and take accountability. Sorry if I broken your trust, but the goal was not to abuse the system. |
👋 Hello team, sorry for any confusion my PR may have caused here. With respect to the failpoint test and cherry-picking to backport, I'm happy to help here, bearing in mind I'm not at all familiar with your processes so might need some pointers. For the failpoint for example - is this what you had in mind: #19997 ? It does help make the test fail deterministically locally without the patch after |
This is to clarify why I added the policy: "
Agreed. It's also exactly the reason why I raised the comment in the first place. kubernetes/org#5523 (comment)
I think we trust all members, at least trust all maintainers and reviewers.
thx
Yes, failpoint isn't enabled by default for unit test. You can add an e2e test instead. Please refer to the example #19608 |
Bug report criteria
What happened?
etcd_debugging_mvcc_watcher_total
may become negative value due to a race condition. The issue was fixed in #19600.Two problems/followups,
cc @kjgorman @ivanvc @jmhbnz @serathius
What did you expect to happen?
.
How can we reproduce it (as minimally and precisely as possible)?
.
Anything else we need to know?
No response
Etcd version (please run commands below)
Etcd configuration (command line flags or environment variables)
paste your configuration here
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output
The text was updated successfully, but these errors were encountered: