Skip to content

[ENH] Updated label_encode to use pandas factorize #847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 21, 2021

Conversation

nvamsikrishna05
Copy link
Collaborator

@nvamsikrishna05 nvamsikrishna05 commented Jul 18, 2021

PR Description

Please describe the changes proposed in the pull request:

  • Updated label_encode to use pandas factorize implementation internally replacing scikit-learn LabelEncoder
  • Deprecation warning has been added to label_encode
  • Removed scikit-learn as a dependency from environment-dev.yml and base.in files
  • Regenerated base.txt using pip-compile from base.in after removing scikit-learn

This PR resolves #834

PR Checklist

Please ensure that you have done the following:

  1. PR in from a fork off your branch. Do not PR from <your_username>:dev, but rather from <your_username>:<feature-branch_name>.
  1. If you're not on the contributors list, add yourself to AUTHORS.rst.
  1. Add a line to CHANGELOG.md under the latest version header (i.e. the one that is "on deck") describing the contribution.
    • Do use some discretion here; if there are multiple PRs that are related, keep them in a single line.

Automatic checks

There will be automatic checks run on the PR. These include:

  • Building a preview of the docs on Netlify
  • Automatically linting the code
  • Making sure the code is documented
  • Making sure that all tests are passed
  • Making sure that code coverage doesn't go down.

Relevant Reviewers

Please tag maintainers to review.

@nvamsikrishna05 nvamsikrishna05 changed the title Updated label_encode to use pandas factorize [ENH] Updated label_encode to use pandas factorize Jul 20, 2021
@nvamsikrishna05 nvamsikrishna05 marked this pull request as ready for review July 20, 2021 10:47
@nvamsikrishna05
Copy link
Collaborator Author

@ericmjl
Will raise another PR for the Following:

  • Add new factorize_columns function and it's associated tests
  • Update the warning message for label_encode to guide users to use factorize_columns.

@ericmjl
Copy link
Member

ericmjl commented Jul 20, 2021

Looking at the code changes, I have no comments to make. @nvamsikrishna05 please do the honours of merging!

@nvamsikrishna05 nvamsikrishna05 merged commit 89e3bd4 into pyjanitor-devs:dev Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deprecate label_encode?
2 participants