Skip to content

[ENH] Add optional removal of accents on functions.clean_names, enabled by default. #502

Closed
@mralbu

Description

@mralbu

Brief Description

I'd like to suggest an option to remove accents from column names on the clean_names function.
It could be implemented using the normalize function of the standard library unicodedata:

What is the best way to remove accents in a Python unicode string?

I have created a branch called strip_accents and checked that the code addition does not break any tests.
mralbu/pyjanitor/tree/strip_accents

Example API

# create test DataFrame
df = pd.DataFrame({"João": [1, 2], "Лука́ся": [1, 2], "Käfer": [1, 2]})

# remove column name accents
df = df.clean_names(strip_accents=True)
expected_columns = ["joao", "лукася", "kafer"]
assert set(df.columns) == set(expected_columns)

Metadata

Metadata

Assignees

Labels

being worked onAn individual has claimed this issue and would like to hack on it.enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions