Skip to content

Commit c06d829

Browse files
Merge pull request #450 from PaulWestenthanner/improve_type_hints
Add poetry and linting
2 parents 75e8f5a + e4980b5 commit c06d829

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

80 files changed

+6131
-3276
lines changed

.github/workflows/docs.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,12 @@ jobs:
1212
- name: Dependencies
1313
run: |
1414
python -m pip install --upgrade pip wheel
15-
pip install -r requirements.txt
16-
pip install -r requirements-dev.txt
17-
- name: Build Docs
18-
uses: ammaraskar/sphinx-action@master
19-
with:
20-
docs-folder: "docs/"
15+
python -m pip install poetry
16+
poetry install
17+
- name: Directly build docs
18+
run: |
19+
pip install -r docs/requirements.txt
20+
sphinx-build -D docs/source ./docs/build/html/
2121
- name: Deploy Docs
2222
uses: peaceiris/actions-gh-pages@v3
2323
with:

.github/workflows/pypi-publish.yml

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,16 +11,15 @@ jobs:
1111
steps:
1212
- name: Clone
1313
uses: actions/checkout@v2
14-
- name: Set up Python 3.7
14+
- name: Set up Python 3.12
1515
uses: actions/setup-python@v2
1616
with:
17-
python-version: 3.7
17+
python-version: 3.12
1818
- name: Build package
1919
run: |
20-
python -m pip install --upgrade pip
21-
pip install -r requirements.txt
22-
pip install -r requirements-dev.txt
23-
pip install wheel
24-
python setup.py bdist_wheel sdist
20+
python -m pip install --upgrade pip wheel
21+
python -m pip install poetry
22+
poetry install
23+
poetry build
2524
- name: Publish package
2625
uses: pypa/gh-action-pypi-publish@release/v1

.github/workflows/test-docs-build.yml

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,13 @@ jobs:
77
runs-on: ubuntu-latest
88
strategy:
99
matrix:
10-
python-version: ['3.10']
10+
python-version: ['3.12']
1111
steps:
1212
- uses: actions/checkout@v2
1313
- uses: actions/setup-python@v2
1414
with:
1515
python-version: ${{ matrix.python-version }}
16-
- uses: ammaraskar/sphinx-action@master
17-
with:
18-
docs-folder: "docs/"
16+
- name: directly build sphinx (plugin only supports python 3.8)
17+
run: |
18+
pip install -r docs/requirements.txt
19+
sphinx-build docs/source ./docs/build/html/

.github/workflows/test-suite.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ jobs:
1515
runs-on: ubuntu-latest
1616
strategy:
1717
matrix:
18-
python-version: ['3.7', '3.8', '3.9', '3.10']
18+
python-version: ['3.10', '3.11', '3.12', '3.13']
1919

2020
steps:
2121
- uses: actions/checkout@v2
@@ -26,8 +26,8 @@ jobs:
2626
- name: Install dependencies
2727
run: |
2828
python -m pip install --upgrade pip wheel
29-
python -m pip install -r requirements.txt
30-
python -m pip install -r requirements-dev.txt
29+
python -m pip install poetry
30+
poetry install
3131
- name: Test with pytest
3232
run: |
33-
pytest
33+
poetry run pytest tests

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,12 @@
11
unreleased
22
==========
33

4+
* Refactor: Use poetry as packaging tool
5+
* Refactor: Add more typing
6+
* Change `feature_names_in_` and `feature_names_out_` to `np.ndarray` instead of lists.
7+
* Breaking: Do not allow scalar values as target variable (of length 1) anymore
8+
* Breaking: Force dataframe column names to be strings.
9+
410
v2.6.4
511
======
612
* fixed: Future Warning in Pandas

CONTRIBUTING.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,14 +16,18 @@ How to Contribute
1616
The preferred workflow to contribute to git-pandas is:
1717

1818
1. Fork this repository into your own github account.
19-
2. Clone the fork on your account onto your local disk:
20-
19+
2. Clone the fork and install project via poetry:
20+
```
2121
$ git clone [email protected]:YourLogin/category_encoders.git
2222
$ cd category_encoders
23+
$ poetry install
24+
```
2325
2426
3. Create a branch for your new awesome feature, do not work in the master branch:
2527
28+
```
2629
$ git checkout -b new-awesome-feature
30+
```
2731
2832
4. Write some code, or docs, or tests.
2933
5. When you are done, submit a pull request.

category_encoders/__init__.py

Lines changed: 32 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""
1+
"""Category encoders library.
22
33
.. module:: category_encoders
44
:synopsis:
@@ -7,51 +7,50 @@
77
"""
88

99
from category_encoders.backward_difference import BackwardDifferenceEncoder
10+
from category_encoders.basen import BaseNEncoder
1011
from category_encoders.binary import BinaryEncoder
11-
from category_encoders.gray import GrayEncoder
12+
from category_encoders.cat_boost import CatBoostEncoder
1213
from category_encoders.count import CountEncoder
14+
from category_encoders.glmm import GLMMEncoder
15+
from category_encoders.gray import GrayEncoder
1316
from category_encoders.hashing import HashingEncoder
1417
from category_encoders.helmert import HelmertEncoder
18+
from category_encoders.james_stein import JamesSteinEncoder
19+
from category_encoders.leave_one_out import LeaveOneOutEncoder
20+
from category_encoders.m_estimate import MEstimateEncoder
1521
from category_encoders.one_hot import OneHotEncoder
1622
from category_encoders.ordinal import OrdinalEncoder
17-
from category_encoders.sum_coding import SumEncoder
1823
from category_encoders.polynomial import PolynomialEncoder
19-
from category_encoders.basen import BaseNEncoder
20-
from category_encoders.leave_one_out import LeaveOneOutEncoder
24+
from category_encoders.quantile_encoder import QuantileEncoder, SummaryEncoder
25+
from category_encoders.rankhot import RankHotEncoder
26+
from category_encoders.sum_coding import SumEncoder
2127
from category_encoders.target_encoder import TargetEncoder
2228
from category_encoders.woe import WOEEncoder
23-
from category_encoders.m_estimate import MEstimateEncoder
24-
from category_encoders.james_stein import JamesSteinEncoder
25-
from category_encoders.cat_boost import CatBoostEncoder
26-
from category_encoders.rankhot import RankHotEncoder
27-
from category_encoders.glmm import GLMMEncoder
28-
from category_encoders.quantile_encoder import QuantileEncoder, SummaryEncoder
29-
3029

3130
__version__ = '2.6.4'
3231

33-
__author__ = "willmcginnis", "cmougan", "paulwestenthanner"
32+
__author__ = 'willmcginnis', 'cmougan', 'paulwestenthanner'
3433

3534
__all__ = [
36-
"BackwardDifferenceEncoder",
37-
"BinaryEncoder",
38-
"GrayEncoder",
39-
"CountEncoder",
40-
"HashingEncoder",
41-
"HelmertEncoder",
42-
"OneHotEncoder",
43-
"OrdinalEncoder",
44-
"SumEncoder",
45-
"PolynomialEncoder",
46-
"BaseNEncoder",
47-
"LeaveOneOutEncoder",
48-
"TargetEncoder",
49-
"WOEEncoder",
50-
"MEstimateEncoder",
51-
"JamesSteinEncoder",
52-
"CatBoostEncoder",
53-
"GLMMEncoder",
54-
"QuantileEncoder",
55-
"SummaryEncoder",
35+
'BackwardDifferenceEncoder',
36+
'BinaryEncoder',
37+
'GrayEncoder',
38+
'CountEncoder',
39+
'HashingEncoder',
40+
'HelmertEncoder',
41+
'OneHotEncoder',
42+
'OrdinalEncoder',
43+
'SumEncoder',
44+
'PolynomialEncoder',
45+
'BaseNEncoder',
46+
'LeaveOneOutEncoder',
47+
'TargetEncoder',
48+
'WOEEncoder',
49+
'MEstimateEncoder',
50+
'JamesSteinEncoder',
51+
'CatBoostEncoder',
52+
'GLMMEncoder',
53+
'QuantileEncoder',
54+
'SummaryEncoder',
5655
'RankHotEncoder',
5756
]

category_encoders/backward_difference.py

Lines changed: 24 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
"""Backward difference contrast encoding"""
1+
"""Backward difference contrast encoding."""
22

3-
from patsy.contrasts import Diff, ContrastMatrix
43
import numpy as np
4+
from patsy.contrasts import ContrastMatrix, Diff
55

66
from category_encoders.base_contrast_encoder import BaseContrastEncoder
77

@@ -13,31 +13,39 @@ class BackwardDifferenceEncoder(BaseContrastEncoder):
1313
1414
Parameters
1515
----------
16-
1716
verbose: int
1817
integer indicating verbosity of the output. 0 for none.
1918
cols: list
2019
a list of columns to encode, if None, all string columns will be encoded.
2120
drop_invariant: bool
2221
boolean for whether or not to drop columns with 0 variance.
2322
return_df: bool
24-
boolean for whether to return a pandas DataFrame from transform (otherwise it will be a numpy array).
23+
boolean for whether to return a pandas DataFrame from transform
24+
(otherwise it will be a numpy array).
2525
handle_unknown: str
26-
options are 'error', 'return_nan', 'value', and 'indicator'. The default is 'value'. Warning: if indicator is used,
27-
an extra column will be added in if the transform matrix has unknown categories. This can cause
28-
unexpected changes in dimension in some cases.
26+
options are 'error', 'return_nan', 'value', and 'indicator'. The default is 'value'.
27+
Warning: if indicator is used, an extra column will be added in if the transform matrix
28+
has unknown categories. This can cause unexpected changes in dimension in some cases.
2929
handle_missing: str
30-
options are 'error', 'return_nan', 'value', and 'indicator'. The default is 'value'. Warning: if indicator is used,
31-
an extra column will be added in if the transform matrix has nan values. This can cause
32-
unexpected changes in dimension in some cases.
30+
options are 'error', 'return_nan', 'value', and 'indicator'. The default is 'value'.
31+
Warning: if indicator is used, an extra column will be added in if the transform
32+
matrix has nan values. This can cause unexpected changes in dimension in some cases.
3333
3434
Example
3535
-------
3636
>>> from category_encoders import *
3737
>>> import pandas as pd
3838
>>> from sklearn.datasets import fetch_openml
39-
>>> bunch = fetch_openml(name="house_prices", as_frame=True)
40-
>>> display_cols = ["Id", "MSSubClass", "MSZoning", "LotFrontage", "YearBuilt", "Heating", "CentralAir"]
39+
>>> bunch = fetch_openml(name='house_prices', as_frame=True)
40+
>>> display_cols = [
41+
... 'Id',
42+
... 'MSSubClass',
43+
... 'MSZoning',
44+
... 'LotFrontage',
45+
... 'YearBuilt',
46+
... 'Heating',
47+
... 'CentralAir',
48+
... ]
4149
>>> y = bunch.target
4250
>>> X = pd.DataFrame(bunch.data, columns=bunch.feature_names)[display_cols]
4351
>>> enc = BackwardDifferenceEncoder(cols=['CentralAir', 'Heating']).fit(X, y)
@@ -46,12 +54,11 @@ class BackwardDifferenceEncoder(BaseContrastEncoder):
4654
<class 'pandas.core.frame.DataFrame'>
4755
RangeIndex: 1460 entries, 0 to 1459
4856
Data columns (total 12 columns):
49-
# Column Non-Null Count Dtype
50-
--- ------ -------------- -----
51-
0 intercept 1460 non-null int64
57+
# Column Non-Null Count Dtype
58+
--- ------ -------------- -----
5259
1 Id 1460 non-null float64
5360
2 MSSubClass 1460 non-null float64
54-
3 MSZoning 1460 non-null object
61+
3 MSZoning 1460 non-null object
5562
4 LotFrontage 1201 non-null float64
5663
5 YearBuilt 1460 non-null float64
5764
6 Heating_0 1460 non-null float64
@@ -76,5 +83,5 @@ class BackwardDifferenceEncoder(BaseContrastEncoder):
7683
"""
7784

7885
def get_contrast_matrix(self, values_to_encode: np.array) -> ContrastMatrix:
86+
"""Get the contrast matrix for the backward difference encoder."""
7987
return Diff().code_without_intercept(values_to_encode)
80-

0 commit comments

Comments
 (0)