Skip to content

Clean names remove outer underscores #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

JoshuaC3
Copy link
Contributor

Adds kwarg to the function clean_names to strip leading and trailing underscores.

I did change some the syntax previously decided in the original issue. This was because I used pythons built-in strip, lstrip and rstrip string functions to remove the underscores. Therefore, it seemed more intuitive and "pythonic" to use 'left' and 'right' (and 'both').

I also added a shorthand 'l' and 'r' and True. This type of behaviour is common place in the Pandas API e.g. here and here, so this should be familiar to users.

We can also add 'end' and 'start' easily if these are still desired. Or, if you feel the above is not correct, I can just add exclusively 'both', 'end' and 'start' for a more terse/concise API.

Cheers

Closes #12

Copy link
Member

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good! I would also like to see a test and an example added before merging in. Could you get those up into the PR too?

df = _strip_underscores(df, strip_underscores='left')

:param df: The pandas DataFrame object.
:param strip_underscores: A str of either 'left', 'right' or 'both'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document the options in line 24, and what they're intended to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added better documentation of strip_underscores.

df[column] = (pd.TimedeltaIndex(df[column], unit='d')
+ dt.datetime(1899, 12, 30))
df[column] = (pd.TimedeltaIndex(df[column], unit='d') +
dt.datetime(1899, 12, 30))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logical and mathematical operators are put on a new line for clarity (particularly if there are multiple math operations going on). Please revert this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Sorry, I only changed it as I was getting these two cautions from flake8.

janitor/functions.py:223:15: W503 line break before binary operator
janitor/functions.py:323:19: W503 line break before binary operator

Reverted to operator on newline.

elif (isinstance(target_columns, list)
or isinstance(target_columns, tuple)):
elif (isinstance(target_columns, list) or
isinstance(target_columns, tuple)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logical and mathematical operators are put on a new line for clarity (particularly if there are multiple math operations going on). Please revert this.

@@ -29,6 +57,7 @@ def clean_names(df):
df = jn.DataFrame(df).clean_names()

:param df: The pandas DataFrame object.
:param strip_underscores: A str of either 'left', 'right' or 'both'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here, please document the options for clarity purposes, and what they're intended to do.

@ericmjl
Copy link
Member

ericmjl commented Mar 28, 2018

@JoshuaC3 thanks for the PR! This looks great. Just for my own memory recordkeeping, here's my thoughts below:

I did change some the syntax previously decided in the original issue. This was because I used pythons built-in strip, lstrip and rstrip string functions to remove the underscores. Therefore, it seemed more intuitive and "pythonic" to use 'left' and 'right' (and 'both').

Good reasoning. Let's follow the Pandas API then.

I also added a shorthand 'l' and 'r' and True. This type of behaviour is common place in the Pandas API e.g. here and here, so this should be familiar to users.

As mentioned in the code review, I'd like to have those options clearly documented, otherwise it'll be a nightmare for my future self to decipher!

We can also add 'end' and 'start' easily if these are still desired. Or, if you feel the above is not correct, I can just add exclusively 'both', 'end' and 'start' for a more terse/concise API.

Not necessary, reasoning for shorthand and copying the "left/right" paradigm is good enough justification.

@JoshuaC3
Copy link
Contributor Author

I think that is all of the changes now.

Do I create another PR or does this one update now that I have pushed the changes to this branch?

@ericmjl
Copy link
Member

ericmjl commented Mar 28, 2018

Do I create another PR or does this one update now that I have pushed the changes to this branch?

@JoshuaC3: The latter is true! 😄

@ericmjl
Copy link
Member

ericmjl commented Mar 28, 2018

@JoshuaC3: one thing I couldn't easily decipher from the test - does the strip underscore test explicitly deal with the left and right underscores? Please let me know.

@JoshuaC3
Copy link
Contributor Author

JoshuaC3 commented Mar 28, 2018

It did not previously but I am pushing the left and right tests now. Just as well as I made an easy typo! I have not done tests for 'l', 'r' or True but can do if you think it is needed.

@ericmjl
Copy link
Member

ericmjl commented Mar 28, 2018

I have not done tests for 'l', 'r' or True but can do if you think it is needed.

@JoshuaC3 I think it's better to have more tests than fewer. Let's get those in as well while you have the momentum, otherwise my future self laziness will get in the way of making it happen!

@JoshuaC3
Copy link
Contributor Author

All done :) learning lots from this too so thanks.

Copy link
Member

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoshuaC3 my apologies, but I think I left this last point hanging when I went to grab a coffee! Let me know what you think; if things are good, we can merge!



def test_clean_names_strip_underscores_right(multiindex_dataframe):
df = clean_names(multiindex_dataframe, strip_underscores='right')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the following on line 192:

df = multiindex_dataframe.rename(columns=lambda x: '_' + x)

Do you think we need the same before line 179?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need that for 'l' and 'left'. The original multiindex_dataframe has a right/trailing underscore on r_i_p_rhino_. There were no leading/left underscores so I added some with df = multiindex_dataframe.rename(columns=lambda x: '_' + x) . Is this ok?

BTW, It was actually the trailing underscore in r_i_p_rhino_ that made me think that this strip_underscores was needed!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I should add this line to 'both' so that we test fully a mix of stripping lefts, rights and boths. Will add this now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need that for 'l' and 'left'. The original multiindex_dataframe has a right/trailing underscore on r_i_p_rhino_. There were no leading/left underscores so I added some with df = multiindex_dataframe.rename(columns=lambda x: '_' + x) . Is this ok?

Ok got it! Thanks for pointing it out, I should have read the code a bit more closely.

@ericmjl
Copy link
Member

ericmjl commented Mar 29, 2018

@JoshuaC3 everything looks great! Thanks for contributing 😄.

@ericmjl ericmjl merged commit e5c9c76 into pyjanitor-devs:master Mar 29, 2018
@JoshuaC3
Copy link
Contributor Author

Great 😄 I got there eventually. Thanks for your guidance.

@ericmjl
Copy link
Member

ericmjl commented Mar 29, 2018

It's my pleasure, @JoshuaC3!

Btw, I got my start contributing to open source software through the guidance of the matplotlib team, who guided me basically the same way. If and when you get a chance, pay it forward to someone else too! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants