[ENH] Add null_flag function #501 #510

anzelpwj · 2019-07-29T03:11:37Z

PR Description

Adds a null flag function, which creates a new column in a DataFrame to mark which rows had null values. You can choose a subset of columns to consider as needed.

This PR resolves #501.

PR Checklist

Please ensure that you have done the following:

PR in from a fork off your branch. Do not PR from <your_username>:master, but rather from <your_username>:<branch_name>.
If you're not on the contributors list, add yourself to AUTHORS.rst.
Add a line to CHANGELOG.rst under the latest version header (i.e. the one that is "on deck") describing the contribution.

Quick Check

To do a very quick check that everything is correct, follow these steps below:

Run the command make check from pyjanitor's top-level directory. This will automatically run:
- black formatting
- pycodestyle checking
- running the test suite
- docs build

Once done, please check off the check-box above.

If make check does not work for you, you can execute the commands listed in the Makefile individually.

Code Changes

If you are adding code changes, please ensure the following:

Ensure that you have added tests.
Run all tests ($ pytest .) locally on your machine.
- Check to ensure that test coverage covers the lines of code that you have added.
- Ensure that all tests pass.

Relevant Reviewers

Please tag maintainers to review.

codecov · 2019-07-29T03:22:17Z

Codecov Report

Merging #510 into dev will increase coverage by 0.03%.
The diff coverage is 95.45%.

@@            Coverage Diff             @@
##              dev     #510      +/-   ##
==========================================
+ Coverage   92.88%   92.92%   +0.03%     
==========================================
  Files          10       10              
  Lines         872      891      +19     
==========================================
+ Hits          810      828      +18     
- Misses         62       63       +1

anzelpwj · 2019-07-29T03:33:57Z

Just realized I probably need to add some files to the docs folder. Will do that tomorrow.

ericmjl

Thanks for the PR, @anzelpwj! I think this will be a great addition to the library. From this first-pass review, I have a few requested changes, mostly centering around docstrings + the tests, with the comments coming from a maintainability perspective. Hope you understand 😄.

janitor/functions.py

tests/functions/test_flag_nulls.py

janitor/functions.py

ericmjl · 2019-08-04T22:49:50Z

Everything looks great, thanks @anzelpwj!

hectormz · 2019-08-12T23:20:15Z

@ericmjl, @anzelpwj I can move this conversation to an issue if more appropriate.

I rebased recently and included this PR. I'm getting some failing tests that were added with this PR:

test_non_method_functional
test_functional_on_some_columns
test_rename_output_column
test_functional_on_all_columns

The issue is that flag_nulls() is adding a column with dtype=int32, but the expected dataframe is adding a column that is int64. Not sure why that might only be happening for me, and passed tests for all of you and Azure.

Adding a line like:

expected = expected.astype({"null_flag": "int32"})

fixes the issues. Any thoughts?

I'm on Windows in a pyjanitor venv w/ Python 3.7.4, Pandas 0.24.2

anzelpwj · 2019-08-12T23:37:52Z

Hmmm, what if we explicitly set check_column_type='equiv' in the dataframes assert? Does that fix it?

hectormz · 2019-08-13T00:53:15Z

Hmm, I believe that is the default, and setting it explicitly didn't change for me. So that's even stranger that this is happening.

anzelpwj · 2019-08-13T00:56:30Z

It is default, but thought we'd check anyway. I'd be fine with this fix, but TBH we should open up a ticket on the Pandas project to list this as a bug with their assert method.

Ram-N · 2019-08-13T23:38:04Z

I am on Windows10 and getting the exact same 4 tests failing, reported by @hectormz above.

 assert_frame_equal(df, expected)
E       AssertionError: Attributes are different
E       
E       Attribute "dtype" are different
E       [left]:  int32
E       [right]: int64

The thing is, I am just getting the dev env set up. I haven't made a single change to the code or to the docs yet.

Is there any way to 'soften' these tests so that int32 and int64 are treated as equivalent dtypes?

hectormz · 2019-08-14T16:34:54Z

@anzelpwj it might be an issue with Pandas assert method. @Ram-N , are you using vanilla Python as well? Not anaconda?

Ram-N · 2019-08-14T19:59:04Z

@hectormz I am using Anaconda's Python.

Python 3.7.3 | packaged by conda-forge | (default, Jul  1 2019, 22:01:29) 
[MSC v.1900 64 bit (AMD64)] :: Anaconda, Inc. on win32

anzelpwj · 2019-08-14T20:02:56Z

Would y'all like me to make the extra PR for the expected = expected.astype({"null_flag": "int32"}) line?

Ram-N · 2019-08-14T20:46:13Z

@anzelpwj Yes, please. That will help me. I can rebase and continue.

Paul Anzel added 3 commits July 28, 2019 22:04

[ENH] Add null_flag function

3496054

Update changelog

3d429c6

Fix merge conflict

abebc89

ericmjl changed the title ~~[ENH] Add null_flag function (See #501)~~ [ENH] Add null_flag function #501 Jul 29, 2019

ericmjl requested changes Jul 29, 2019

View reviewed changes

janitor/functions.py Show resolved Hide resolved

tests/functions/test_flag_nulls.py Outdated Show resolved Hide resolved

janitor/functions.py Show resolved Hide resolved

Paul Anzel added 5 commits August 4, 2019 13:12

Update docstrings

6eb01d4

Add functional test

0c429e5

Update check_column to test for exclusion, and use it

c0d6dbd

Add conftests df for test

d40c9f9

Typedef update

2b2f368

anzelpwj mentioned this pull request Aug 4, 2019

[INF] Implement utils pytest mark #520

Closed

Paul Anzel and others added 2 commits August 4, 2019 15:43

Update documentation fof check_column

c054140

Merge branch 'dev' into flagnulls-#501

c90a69c

ericmjl approved these changes Aug 4, 2019

View reviewed changes

ericmjl merged commit 0c5219c into pyjanitor-devs:dev Aug 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENH] Add null_flag function #501 #510

[ENH] Add null_flag function #501 #510

Uh oh!

anzelpwj commented Jul 29, 2019 •

edited

Loading

Uh oh!

codecov bot commented Jul 29, 2019 •

edited

Loading

Uh oh!

anzelpwj commented Jul 29, 2019

Uh oh!

ericmjl left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ericmjl commented Aug 4, 2019

Uh oh!

hectormz commented Aug 12, 2019 •

edited

Loading

Uh oh!

anzelpwj commented Aug 12, 2019 •

edited

Loading

Uh oh!

hectormz commented Aug 13, 2019

Uh oh!

anzelpwj commented Aug 13, 2019

Uh oh!

Ram-N commented Aug 13, 2019

Uh oh!

hectormz commented Aug 14, 2019

Uh oh!

Ram-N commented Aug 14, 2019 •

edited

Loading

Uh oh!

anzelpwj commented Aug 14, 2019

Uh oh!

Ram-N commented Aug 14, 2019

Uh oh!

Uh oh!

[ENH] Add null_flag function #501 #510

[ENH] Add null_flag function #501 #510

Uh oh!

Conversation

anzelpwj commented Jul 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Description

PR Checklist

Quick Check

Code Changes

Relevant Reviewers

Uh oh!

codecov bot commented Jul 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

anzelpwj commented Jul 29, 2019

Uh oh!

ericmjl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ericmjl commented Aug 4, 2019

Uh oh!

hectormz commented Aug 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anzelpwj commented Aug 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hectormz commented Aug 13, 2019

Uh oh!

anzelpwj commented Aug 13, 2019

Uh oh!

Ram-N commented Aug 13, 2019

Uh oh!

hectormz commented Aug 14, 2019

Uh oh!

Ram-N commented Aug 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anzelpwj commented Aug 14, 2019

Uh oh!

Ram-N commented Aug 14, 2019

Uh oh!

Uh oh!

anzelpwj commented Jul 29, 2019 •

edited

Loading

codecov bot commented Jul 29, 2019 •

edited

Loading

hectormz commented Aug 12, 2019 •

edited

Loading

anzelpwj commented Aug 12, 2019 •

edited

Loading

Ram-N commented Aug 14, 2019 •

edited

Loading