-
Notifications
You must be signed in to change notification settings - Fork 173
[ENH] Add reset index flag to row_to_name function #849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fireddd thanks for your work!
A few things I hope you could fix, please.
The first one is a bit nitpicky, but it's in the interest of keeping the codebase uniform - could you replace all tabs with spaces? I noticed your PR has some tabs and some spaces mixed in, and in the interest of keeping the codebase uniform, I think it might be better to use all spaces.
Secondly, could you add an associated test for the code? This simply guarantees that if the pandas
' implementation changes, we get an early warning system in place.
Thanks for handling this PR, much appreciated!
1-> Sure, I will fix the spaces and tabs mixup. 2-> Sorry I am not getting your point. What exactly is needed in the test? The default behavoiur of the tests will change so I missed that in my PR. Thanks for pointing out |
@ericmjl I have tried to address the indentation issue. Can you let me know if now it's alright? |
Thanks @fireddd! Yes, spaces/indentation looks correct to me. For tests, we usually request that any code changes be accompanied by a test function that goes into the test suite. The pattern usually should be not too difficult to follow, but if you're having issues let us know. Regarding your question, we use I'm in a bit of a rush to type my reply, so if there's something unclear still, firstly, pardon my brevity, and secondly, be sure to ask here! |
Btw, looks like code style checks are failing; could you run |
How can I run these tests locally? |
The dev docs should be able to give you a starting point! (We should have them in our official documentation site.) |
I was following this doc-> https://pyjanitor.readthedocs.io/CONTRIBUTION_TYPES.html#code-pattern-guidelines Makefile:22: recipe for target 'test' failed |
@fireddd please hang tight! There's another PR open (#851) that is infrastructural in nature that reinstates code coverage. It will give you a way of running tests and get feedback on which lines of code the tests are covering, without needing to run them locally. In the meantime, you might want to check your |
tried fixing the terminal in my jupyter notebook, was able to do it. However still facing some issues: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I think this exercise is complete
Codecov Report
@@ Coverage Diff @@
## dev #849 +/- ##
==========================================
+ Coverage 94.86% 94.89% +0.03%
==========================================
Files 19 19
Lines 2065 2078 +13
==========================================
+ Hits 1959 1972 +13
Misses 106 106 |
this is kindoff annoying now :( |
@fireddd thanks for all of your hard work here! Just a few more things and I think we might be able to pull it over the finish line. Firstly, can you merge in the latest dev branch? We just merged in a PR that I did that helps with some infrastructural matters. That may bump up the code coverage on your PR branch up to the newly set baseline from that PR. Secondly, could you check that the lines of code that you added are covered by the test suite? Code test coverage is something we don't compromise on because it is one (even if imperfect) measure of code quality. Ensuring that all newly contributed lines of code are covered by a test makes maintenance much easier. I might write docs about this at some point, but on "how to write the test", they should be small and easy to understand while accurately replicating real-world usage of the function being tested. |
I have added the tests regarding the functionality that i added. I think that should cover the code that i wrote regarding resetting the index. Let's bring this home :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very good work, @fireddd! We have one final thing pertaining to the tests that you've provided. They're in the same mould as the other tests in the same file, but I think we have an opportunity here to radically improve the quality of the test suite, starting with your contribution 😄. Please see the review pointer I made.
Other than that, I have to say that I enjoyed reviewing the PR. It was small in scope, thus making it easy to follow. Though there may have been frustrations with the CI system, we have them there to ensure code quality and maintainability in the long run. Thanks for being patient with it, I know it can be intimidating for a newcomer to the library, but it's also a great learning opportunity on how to use automation to maintain standards. Hope you also learned something along the way!
Final thing, I am going to ask the dev team for one more pair of eyes on the PR to ensure that it's not just me who knows what's going on here. This too is important for maintenance. Hang tight as there may be one or two more review pointers coming your way.
Co-authored-by: Eric Ma <[email protected]>
… into 285_row_to_names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful work, @fireddd - approving! Let's get one more pair of eyes on the PR from the team before we merge.
Before this change, the default is not reset index = True. Wondering why we are making this default to True? This would be a breaking change for those who rely on the index to join to other dataframes. |
In the original issue mentioned that having resetting the index by default is a good idea( personally agreeing with it) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
janitor/functions.py
Outdated
@@ -2289,6 +2293,7 @@ def row_to_names( | |||
row_number: int = None, | |||
remove_row: bool = False, | |||
remove_rows_above: bool = False, | |||
reset_index: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fireddd I think @jk3587 brings up a good point and that we should set the default value of reset_index
to False
.
This isn't because I disagree with the reasons that you stated. It's just that I think we should avoid making breaking changes without warning users.
The best course of action here, then, is to set the value to False
and raise a warning for users that says something like:
warnings.warn(
"The function row_to_names will, in the official 1.0 release, "
"change its behaviour to reset the dataframe's index by default. "
"You can prepare for this change right now by explicitly setting "
"`reset_index=True` when calling on `row_to_names`."
)
(If you copy/paste the message verbatim, please preserve the spacing just before the quotes!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Description
Please describe the changes proposed in the pull request:
This PR resolves #285.
PR Checklist
Please ensure that you have done the following:
<your_username>
:dev
, but rather from<your_username>
:<feature-branch_name>
.AUTHORS.rst
.CHANGELOG.md
under the latest version header (i.e. the one that is "on deck") describing the contribution.Automatic checks
There will be automatic checks run on the PR. These include:
Relevant Reviewers
Please tag maintainers to review.