Clean names remove outer underscores #13

JoshuaC3 · 2018-03-28T10:56:32Z

Adds kwarg to the function clean_names to strip leading and trailing underscores.

I did change some the syntax previously decided in the original issue. This was because I used pythons built-in strip, lstrip and rstrip string functions to remove the underscores. Therefore, it seemed more intuitive and "pythonic" to use 'left' and 'right' (and 'both').

I also added a shorthand 'l' and 'r' and True. This type of behaviour is common place in the Pandas API e.g. here and here, so this should be familiar to users.

We can also add 'end' and 'start' easily if these are still desired. Or, if you feel the above is not correct, I can just add exclusively 'both', 'end' and 'start' for a more terse/concise API.

Cheers

Closes #12

Fix formatting errors.

ericmjl

Overall looks good! I would also like to see a test and an example added before merging in. Could you get those up into the PR too?

ericmjl · 2018-03-28T12:14:35Z

janitor/functions.py

+        df = _strip_underscores(df, strip_underscores='left')
+
+    :param df: The pandas DataFrame object.
+    :param strip_underscores: A str of either 'left', 'right' or 'both'.


Please document the options in line 24, and what they're intended to do.

Added better documentation of strip_underscores.

ericmjl · 2018-03-28T12:16:52Z

janitor/functions.py

-    df[column] = (pd.TimedeltaIndex(df[column], unit='d')
-                  + dt.datetime(1899, 12, 30))
+    df[column] = (pd.TimedeltaIndex(df[column], unit='d') +
+                  dt.datetime(1899, 12, 30))


Logical and mathematical operators are put on a new line for clarity (particularly if there are multiple math operations going on). Please revert this.

Sure. Sorry, I only changed it as I was getting these two cautions from flake8.

janitor/functions.py:223:15: W503 line break before binary operator janitor/functions.py:323:19: W503 line break before binary operator

Reverted to operator on newline.

ericmjl · 2018-03-28T12:16:55Z

janitor/functions.py

-        elif (isinstance(target_columns, list)
-                or isinstance(target_columns, tuple)):
+        elif (isinstance(target_columns, list) or
+              isinstance(target_columns, tuple)):


Logical and mathematical operators are put on a new line for clarity (particularly if there are multiple math operations going on). Please revert this.

ericmjl · 2018-03-28T12:17:48Z

janitor/functions.py

@@ -29,6 +57,7 @@ def clean_names(df):
        df = jn.DataFrame(df).clean_names()

    :param df: The pandas DataFrame object.
+    :param strip_underscores: A str of either 'left', 'right' or 'both'.


Likewise here, please document the options for clarity purposes, and what they're intended to do.

ericmjl · 2018-03-28T12:26:15Z

@JoshuaC3 thanks for the PR! This looks great. Just for my own memory recordkeeping, here's my thoughts below:

I did change some the syntax previously decided in the original issue. This was because I used pythons built-in strip, lstrip and rstrip string functions to remove the underscores. Therefore, it seemed more intuitive and "pythonic" to use 'left' and 'right' (and 'both').

Good reasoning. Let's follow the Pandas API then.

I also added a shorthand 'l' and 'r' and True. This type of behaviour is common place in the Pandas API e.g. here and here, so this should be familiar to users.

As mentioned in the code review, I'd like to have those options clearly documented, otherwise it'll be a nightmare for my future self to decipher!

We can also add 'end' and 'start' easily if these are still desired. Or, if you feel the above is not correct, I can just add exclusively 'both', 'end' and 'start' for a more terse/concise API.

Not necessary, reasoning for shorthand and copying the "left/right" paradigm is good enough justification.

…ores

JoshuaC3 · 2018-03-28T16:24:47Z

I think that is all of the changes now.

Do I create another PR or does this one update now that I have pushed the changes to this branch?

ericmjl · 2018-03-28T17:26:41Z

Do I create another PR or does this one update now that I have pushed the changes to this branch?

@JoshuaC3: The latter is true! 😄

ericmjl · 2018-03-28T17:28:46Z

@JoshuaC3: one thing I couldn't easily decipher from the test - does the strip underscore test explicitly deal with the left and right underscores? Please let me know.

JoshuaC3 · 2018-03-28T17:55:13Z

It did not previously but I am pushing the left and right tests now. Just as well as I made an easy typo! I have not done tests for 'l', 'r' or True but can do if you think it is needed.

ericmjl · 2018-03-28T20:12:49Z

I have not done tests for 'l', 'r' or True but can do if you think it is needed.

@JoshuaC3 I think it's better to have more tests than fewer. Let's get those in as well while you have the momentum, otherwise my future self laziness will get in the way of making it happen!

JoshuaC3 · 2018-03-28T21:18:02Z

All done :) learning lots from this too so thanks.

ericmjl

@JoshuaC3 my apologies, but I think I left this last point hanging when I went to grab a coffee! Let me know what you think; if things are good, we can merge!

ericmjl · 2018-03-28T20:13:57Z

tests/test_functions.py

+
+
+def test_clean_names_strip_underscores_right(multiindex_dataframe):
+    df = clean_names(multiindex_dataframe, strip_underscores='right')


I saw the following on line 192:

df = multiindex_dataframe.rename(columns=lambda x: '_' + x)

Do you think we need the same before line 179?

We only need that for 'l' and 'left'. The original multiindex_dataframe has a right/trailing underscore on r_i_p_rhino_. There were no leading/left underscores so I added some with df = multiindex_dataframe.rename(columns=lambda x: '_' + x) . Is this ok?

BTW, It was actually the trailing underscore in r_i_p_rhino_ that made me think that this strip_underscores was needed!

I guess I should add this line to 'both' so that we test fully a mix of stripping lefts, rights and boths. Will add this now.

We only need that for 'l' and 'left'. The original multiindex_dataframe has a right/trailing underscore on r_i_p_rhino_. There were no leading/left underscores so I added some with df = multiindex_dataframe.rename(columns=lambda x: '_' + x) . Is this ok?

Ok got it! Thanks for pointing it out, I should have read the code a bit more closely.

ericmjl · 2018-03-29T00:14:04Z

@JoshuaC3 everything looks great! Thanks for contributing 😄.

JoshuaC3 · 2018-03-29T10:36:12Z

Great 😄 I got there eventually. Thanks for your guidance.

ericmjl · 2018-03-29T11:41:09Z

It's my pleasure, @JoshuaC3!

Btw, I got my start contributing to open source software through the guidance of the matplotlib team, who guided me basically the same way. If and when you get a chance, pay it forward to someone else too! 😄

JoshuaC3 and others added 2 commits March 22, 2018 11:41

Merge pull request #1 from JoshuaC3/clean_names-multiindex

d292b05

Fix formatting errors.

Add strip_underscores

0a62978

ericmjl requested changes Mar 28, 2018

View reviewed changes

Joshua Dunn added 3 commits March 28, 2018 16:30

Add test for clean_names strip_underscores

e8e6e8a

Add detailed function descriptions for clean_names and _strip_undersc…

d4ec01d

…ores

Add detailed function descriptions formatted.

de340d2

Add left and right tests for clean_names strip_underscore.

ea74a28

Add tests for clean_names strip_underscores l, r and True.

c316427

ericmjl reviewed Mar 28, 2018

View reviewed changes

Improve 'both' and True tests.

be35c09

ericmjl approved these changes Mar 29, 2018

View reviewed changes

ericmjl merged commit e5c9c76 into pyjanitor-devs:master Mar 29, 2018



		def test_clean_names_strip_underscores_right(multiindex_dataframe):
		df = clean_names(multiindex_dataframe, strip_underscores='right')

Clean names remove outer underscores #13

Clean names remove outer underscores #13

Uh oh!

Conversation

JoshuaC3 commented Mar 28, 2018

Uh oh!

ericmjl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericmjl commented Mar 28, 2018

Uh oh!

JoshuaC3 commented Mar 28, 2018

Uh oh!

ericmjl commented Mar 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericmjl commented Mar 28, 2018

Uh oh!

JoshuaC3 commented Mar 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericmjl commented Mar 28, 2018

Uh oh!

JoshuaC3 commented Mar 28, 2018

Uh oh!

ericmjl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericmjl commented Mar 29, 2018

Uh oh!

JoshuaC3 commented Mar 29, 2018

Uh oh!

ericmjl commented Mar 29, 2018

Uh oh!

Uh oh!

ericmjl commented Mar 28, 2018 •

edited

Loading

JoshuaC3 commented Mar 28, 2018 •

edited

Loading