-
Notifications
You must be signed in to change notification settings - Fork 173
Clean names remove outer underscores #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean names remove outer underscores #13
Conversation
Fix formatting errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good! I would also like to see a test and an example added before merging in. Could you get those up into the PR too?
janitor/functions.py
Outdated
df = _strip_underscores(df, strip_underscores='left') | ||
|
||
:param df: The pandas DataFrame object. | ||
:param strip_underscores: A str of either 'left', 'right' or 'both'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document the options in line 24, and what they're intended to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added better documentation of strip_underscores
.
janitor/functions.py
Outdated
df[column] = (pd.TimedeltaIndex(df[column], unit='d') | ||
+ dt.datetime(1899, 12, 30)) | ||
df[column] = (pd.TimedeltaIndex(df[column], unit='d') + | ||
dt.datetime(1899, 12, 30)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logical and mathematical operators are put on a new line for clarity (particularly if there are multiple math operations going on). Please revert this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Sorry, I only changed it as I was getting these two cautions from flake8.
janitor/functions.py:223:15: W503 line break before binary operator
janitor/functions.py:323:19: W503 line break before binary operator
Reverted to operator on newline.
janitor/functions.py
Outdated
elif (isinstance(target_columns, list) | ||
or isinstance(target_columns, tuple)): | ||
elif (isinstance(target_columns, list) or | ||
isinstance(target_columns, tuple)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logical and mathematical operators are put on a new line for clarity (particularly if there are multiple math operations going on). Please revert this.
janitor/functions.py
Outdated
@@ -29,6 +57,7 @@ def clean_names(df): | |||
df = jn.DataFrame(df).clean_names() | |||
|
|||
:param df: The pandas DataFrame object. | |||
:param strip_underscores: A str of either 'left', 'right' or 'both'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here, please document the options for clarity purposes, and what they're intended to do.
@JoshuaC3 thanks for the PR! This looks great. Just for my own memory recordkeeping, here's my thoughts below:
Good reasoning. Let's follow the Pandas API then.
As mentioned in the code review, I'd like to have those options clearly documented, otherwise it'll be a nightmare for my future self to decipher!
Not necessary, reasoning for shorthand and copying the "left/right" paradigm is good enough justification. |
I think that is all of the changes now. Do I create another PR or does this one update now that I have pushed the changes to this branch? |
@JoshuaC3: The latter is true! 😄 |
@JoshuaC3: one thing I couldn't easily decipher from the test - does the strip underscore test explicitly deal with the left and right underscores? Please let me know. |
It did not previously but I am pushing the left and right tests now. Just as well as I made an easy typo! I have not done tests for |
@JoshuaC3 I think it's better to have more tests than fewer. Let's get those in as well while you have the momentum, otherwise my future self laziness will get in the way of making it happen! |
All done :) learning lots from this too so thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoshuaC3 my apologies, but I think I left this last point hanging when I went to grab a coffee! Let me know what you think; if things are good, we can merge!
|
||
|
||
def test_clean_names_strip_underscores_right(multiindex_dataframe): | ||
df = clean_names(multiindex_dataframe, strip_underscores='right') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw the following on line 192:
df = multiindex_dataframe.rename(columns=lambda x: '_' + x)
Do you think we need the same before line 179?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need that for 'l' and 'left'. The original multiindex_dataframe has a right/trailing underscore on r_i_p_rhino_
. There were no leading/left underscores so I added some with df = multiindex_dataframe.rename(columns=lambda x: '_' + x)
. Is this ok?
BTW, It was actually the trailing underscore in r_i_p_rhino_
that made me think that this strip_underscores was needed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I should add this line to 'both
' so that we test fully a mix of stripping lefts, rights and boths. Will add this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need that for 'l' and 'left'. The original multiindex_dataframe has a right/trailing underscore on r_i_p_rhino_. There were no leading/left underscores so I added some with df = multiindex_dataframe.rename(columns=lambda x: '_' + x) . Is this ok?
Ok got it! Thanks for pointing it out, I should have read the code a bit more closely.
@JoshuaC3 everything looks great! Thanks for contributing 😄. |
Great 😄 I got there eventually. Thanks for your guidance. |
It's my pleasure, @JoshuaC3! Btw, I got my start contributing to open source software through the guidance of the matplotlib team, who guided me basically the same way. If and when you get a chance, pay it forward to someone else too! 😄 |
Adds kwarg to the function
clean_names
to strip leading and trailing underscores.I did change some the syntax previously decided in the original issue. This was because I used pythons built-in
strip
,lstrip
andrstrip
string functions to remove the underscores. Therefore, it seemed more intuitive and "pythonic" to use'left'
and'right'
(and'both'
).I also added a shorthand
'l'
and'r'
andTrue
. This type of behaviour is common place in the Pandas API e.g. here and here, so this should be familiar to users.We can also add
'end'
and'start'
easily if these are still desired. Or, if you feel the above is not correct, I can just add exclusively'both'
,'end'
and'start'
for a more terse/concise API.Cheers
Closes #12