Skip to content

[ENH] Add reset index flag to row_to_name function #849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jul 31, 2021

Conversation

fireddd
Copy link
Contributor

@fireddd fireddd commented Jul 20, 2021

PR Description

Please describe the changes proposed in the pull request:

  • Add reset index flag to row_to_name function

This PR resolves #285.

PR Checklist

Please ensure that you have done the following:

  1. [x ] PR in from a fork off your branch. Do not PR from <your_username>:dev, but rather from <your_username>:<feature-branch_name>.
  1. [x ] If you're not on the contributors list, add yourself to AUTHORS.rst.
  1. [x ] Add a line to CHANGELOG.md under the latest version header (i.e. the one that is "on deck") describing the contribution.
    • Do use some discretion here; if there are multiple PRs that are related, keep them in a single line.

Automatic checks

There will be automatic checks run on the PR. These include:

  • Building a preview of the docs on Netlify
  • Automatically linting the code
  • Making sure the code is documented
  • Making sure that all tests are passed
  • Making sure that code coverage doesn't go down.

Relevant Reviewers

Please tag maintainers to review.

Copy link
Member

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fireddd thanks for your work!

A few things I hope you could fix, please.

The first one is a bit nitpicky, but it's in the interest of keeping the codebase uniform - could you replace all tabs with spaces? I noticed your PR has some tabs and some spaces mixed in, and in the interest of keeping the codebase uniform, I think it might be better to use all spaces.

Secondly, could you add an associated test for the code? This simply guarantees that if the pandas' implementation changes, we get an early warning system in place.

Thanks for handling this PR, much appreciated!

@fireddd
Copy link
Contributor Author

fireddd commented Jul 20, 2021

1-> Sure, I will fix the spaces and tabs mixup.

2-> Sorry I am not getting your point. What exactly is needed in the test? The default behavoiur of the tests will change so I missed that in my PR. Thanks for pointing out

@fireddd
Copy link
Contributor Author

fireddd commented Jul 21, 2021

@ericmjl I have tried to address the indentation issue. Can you let me know if now it's alright?

@ericmjl
Copy link
Member

ericmjl commented Jul 21, 2021

Thanks @fireddd! Yes, spaces/indentation looks correct to me.

For tests, we usually request that any code changes be accompanied by a test function that goes into the test suite.

The pattern usually should be not too difficult to follow, but if you're having issues let us know.

Regarding your question, we use pytest fixtures to obtain the data frames necessary. The reference docs are here: https://docs.pytest.org/en/6.2.x/fixture.html.

I'm in a bit of a rush to type my reply, so if there's something unclear still, firstly, pardon my brevity, and secondly, be sure to ask here!

@ericmjl
Copy link
Member

ericmjl commented Jul 21, 2021

Btw, looks like code style checks are failing; could you run black locally and push up?

@fireddd
Copy link
Contributor Author

fireddd commented Jul 21, 2021

How can I run these tests locally?

@ericmjl
Copy link
Member

ericmjl commented Jul 21, 2021

The dev docs should be able to give you a starting point! (We should have them in our official documentation site.)

@fireddd
Copy link
Contributor Author

fireddd commented Jul 21, 2021

I was following this doc-> https://pyjanitor.readthedocs.io/CONTRIBUTION_TYPES.html#code-pattern-guidelines
Tried to run all the tests.
fireddd@LAPTOP-AFM6T731:/mnt/c/Users/shubh/Desktop/Learn Git/pyjanitor$ make test
Running test suite...
pytest --cov-report html
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/_pytest/config.py", line 365, in _importconftest
mod = conftestpath.pyimport()
File "/usr/lib/python2.7/dist-packages/py/_path/local.py", line 668, in pyimport
import(modname)
File "/mnt/c/Users/shubh/Desktop/Learn Git/pyjanitor/tests/conftest.py", line 17
"Bell__Chart": [1.234_523_45, 2.456_234, 3.234_612_5] * 3,
^
SyntaxError: invalid syntax
ERROR: could not load /mnt/c/Users/shubh/Desktop/Learn Git/pyjanitor/tests/conftest.py

Makefile:22: recipe for target 'test' failed
make: *** [test] Error 4
Could this be because i have pyt
fireddd@LAPTOP-AFM6T731:/mnt/c/Users/shubh/Desktop/Learn Git/pyjanitor$ python --version
Python 2.7.15+
fireddd@LAPTOP-AFM6T731:/mnt/c/Users/shubh/Desktop/Learn Git/pyjanitor$ python3 --version
Python 3.6.7
I am using a jupyter notebook that sets by defaults to python3 for dev but for some reason i can't use terminal there so i was trying to use windows bash for testing this

@ericmjl
Copy link
Member

ericmjl commented Jul 22, 2021

@fireddd please hang tight! There's another PR open (#851) that is infrastructural in nature that reinstates code coverage. It will give you a way of running tests and get feedback on which lines of code the tests are covering, without needing to run them locally.

In the meantime, you might want to check your PATH environment variable. (If you're unsure about what that is, I wrote a blog post to consolidate what I've learned.)

@fireddd
Copy link
Contributor Author

fireddd commented Jul 22, 2021

tried fixing the terminal in my jupyter notebook, was able to do it.
PS C:\Users\shubh\Desktop\Learn Git\pyjanitor> python --version
Python 3.7.1
Installed make as i am on windows.
Following this https://github.com/pyjanitor-devs/pyjanitor/blob/89e3bd4ffe735f16f657ec73da0627e948c8b158/CONTRIBUTING.rst

However still facing some issues:
I am trying to see if i can understand the errors and am trying to fix them.
Can you test now?

Copy link
Contributor Author

@fireddd fireddd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I think this exercise is complete

@codecov
Copy link

codecov bot commented Jul 23, 2021

Codecov Report

Merging #849 (3e99789) into dev (e76db2c) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              dev     #849      +/-   ##
==========================================
+ Coverage   94.86%   94.89%   +0.03%     
==========================================
  Files          19       19              
  Lines        2065     2078      +13     
==========================================
+ Hits         1959     1972      +13     
  Misses        106      106              

@fireddd
Copy link
Contributor Author

fireddd commented Jul 23, 2021

this is kindoff annoying now :(

@ericmjl
Copy link
Member

ericmjl commented Jul 23, 2021

@fireddd thanks for all of your hard work here! Just a few more things and I think we might be able to pull it over the finish line.

Firstly, can you merge in the latest dev branch? We just merged in a PR that I did that helps with some infrastructural matters. That may bump up the code coverage on your PR branch up to the newly set baseline from that PR.

Secondly, could you check that the lines of code that you added are covered by the test suite?

Code test coverage is something we don't compromise on because it is one (even if imperfect) measure of code quality. Ensuring that all newly contributed lines of code are covered by a test makes maintenance much easier.

I might write docs about this at some point, but on "how to write the test", they should be small and easy to understand while accurately replicating real-world usage of the function being tested.

@fireddd
Copy link
Contributor Author

fireddd commented Jul 23, 2021

I have added the tests regarding the functionality that i added. I think that should cover the code that i wrote regarding resetting the index. Let's bring this home :)

@fireddd fireddd requested a review from ericmjl July 23, 2021 21:08
@fireddd fireddd changed the title Add reset index flag to row_to_name function [ENH] Add reset index flag to row_to_name function Jul 23, 2021
Copy link
Member

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very good work, @fireddd! We have one final thing pertaining to the tests that you've provided. They're in the same mould as the other tests in the same file, but I think we have an opportunity here to radically improve the quality of the test suite, starting with your contribution 😄. Please see the review pointer I made.

Other than that, I have to say that I enjoyed reviewing the PR. It was small in scope, thus making it easy to follow. Though there may have been frustrations with the CI system, we have them there to ensure code quality and maintainability in the long run. Thanks for being patient with it, I know it can be intimidating for a newcomer to the library, but it's also a great learning opportunity on how to use automation to maintain standards. Hope you also learned something along the way!

Final thing, I am going to ask the dev team for one more pair of eyes on the PR to ensure that it's not just me who knows what's going on here. This too is important for maintenance. Hang tight as there may be one or two more review pointers coming your way.

@fireddd fireddd requested a review from ericmjl July 24, 2021 21:03
Copy link
Member

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful work, @fireddd - approving! Let's get one more pair of eyes on the PR from the team before we merge.

@jk3587
Copy link
Collaborator

jk3587 commented Jul 25, 2021

Before this change, the default is not reset index = True. Wondering why we are making this default to True? This would be a breaking change for those who rely on the index to join to other dataframes.

@fireddd
Copy link
Contributor Author

fireddd commented Jul 25, 2021

Before this change, the default is not reset index = True. Wondering why we are making this default to True? This would be a breaking change for those who rely on the index to join to other dataframes.

In the original issue mentioned that having resetting the index by default is a good idea( personally agreeing with it)
#285 ->
In both these cases, when rows are deleted post a column name change operation, there is a good chance of having a bad index.
Sometimes it's exactly what you need, but most of the time it's probably bad news. 🐈
However you have also raised a valid point. @ericmjl

Copy link
Member

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fireddd one final change requested based on @jk3587's comment about default behaviours. I think the warning message suggested in my review should help disambiguate what a user needs to do to prepare for a breaking change.

@@ -2289,6 +2293,7 @@ def row_to_names(
row_number: int = None,
remove_row: bool = False,
remove_rows_above: bool = False,
reset_index: bool = True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fireddd I think @jk3587 brings up a good point and that we should set the default value of reset_index to False.

This isn't because I disagree with the reasons that you stated. It's just that I think we should avoid making breaking changes without warning users.

The best course of action here, then, is to set the value to False and raise a warning for users that says something like:

warnings.warn(
    "The function row_to_names will, in the official 1.0 release, "
    "change its behaviour to reset the dataframe's index by default. "
    "You can prepare for this change right now by explicitly setting "
    "`reset_index=True` when calling on `row_to_names`."
)

(If you copy/paste the message verbatim, please preserve the spacing just before the quotes!)

@ericmjl
Copy link
Member

ericmjl commented Jul 25, 2021

@jk3587 could you also mark your review as "approved"? Once I see the final changes from @fireddd, I'd like to invite you to do the honors of hitting the big green button 😄.

@fireddd fireddd requested a review from ericmjl July 31, 2021 08:09
Copy link
Member

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fireddd we're good to go here.

@jk3587 please do the honors of hitting the big green button after you've reviewed!

@jk3587 jk3587 merged commit 6040418 into pyjanitor-devs:dev Jul 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] Reset index after row/s delete in row_to_names() ?
3 participants