Releases · jawah/charset_normalizer

18 Oct 19:18

Ousret

3.0.0rc1

544595d

Version 3.0.0rc1 Pre-release

Pre-release

This is the last pre-release. If everything goes well, I will publish the stable tag.

3.0.0rc1 (2022-10-18)

Added

Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
Add parameter language_threshold in from_bytes, from_path and from_fp to adjust the minimum expected coherence ratio

Changed

Build with static metadata using 'build' frontend
Make language detection stricter

Fixed

CLI with opt --normalize fail when using full path for files
TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha characters have been fed to it

Removed

Coherence detector no longer returns 'Simple English' instead returns 'English'
Coherence detector no longer returns 'Classical Chinese' instead returns 'Chinese'

Assets 2

21 Aug 19:07

Ousret

3.0.0b2

03aa701

Version 3.0.0b2 Pre-release

Pre-release

3.0.0b2 (2022-08-21)

Added

normalizer --version now specify if current version provide extra speedup (meaning mypyc compilation whl)

Removed

Breaking: Method first() and best() from CharsetMatch
UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)

Fixed

Sphinx warnings when generating the documentation

Assets 2

19 Aug 21:56

Ousret

2.1.1

86cda88

Version 2.1.1

2.1.1 (2022-08-19)

Deprecated

Function normalize scheduled for removal in 3.0

Changed

Removed useless call to decode in fn is_unprintable (#206)

Fixed

Third-party library (i18n xgettext) crashing not recognizing utf_8 (PEP 263) with underscore from @aleksandernovikov (#204)

Assets 2

15 Aug 16:17

Ousret

3.0.0b1

09402e6

Version 3.0.0b1 Pre-release

Pre-release

3.0.0b1 (2022-08-15)

Changed

Optional: Module md.py can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1

Removed

Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
Breaking: Top-level function normalize
Breaking: Properties chaos_secondary_pass, coherence_non_latin and w_counter from CharsetMatch
Support for the backport unicodedata2

Assets 74

19 Jun 21:56

Ousret

2.1.0

cb2dbde

Version 2.1.0

2.1.0 (2022-06-19)

Added

Output the Unicode table version when running the CLI with --version (PR #194)

Changed

Re-use decoded buffer for single byte character sets from @nijel (PR #175)
Fixing some performance bottlenecks from @deedy5 (PR #183)

Fixed

Workaround potential bug in cpython with Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space (PR #175)
CLI default threshold aligned with the API threshold from @oleksandr-kuzmenko (PR #181)

Removed

Support for Python 3.5 (PR #192)

Deprecated

Use of backport unicodedata from unicodedata2 as Python is quickly catching up, scheduled for removal in 3.0 (PR #194)

Assets 2

12 Feb 14:25

Ousret

2.0.12

a5f4348

Version 2.0.12

2.0.12 (2022-02-12)

Fixed

ASCII miss-detection on rare cases (PR #170)

Assets 2

30 Jan 18:26

Ousret

2.0.11

f256c3e

Version 2.0.11

2.0.11 (2022-01-30)

Added

Explicit support for Python 3.11 (PR #164)

Changed

The logging behavior has been completely reviewed, now using only TRACE and DEBUG levels (PR #163 #165)

Assets 2

04 Jan 20:15

Ousret

2.0.10

de25562

Version 2.0.10

2.0.10 (2022-01-04)

Fixed

Fallback match entries might lead to UnicodeDecodeError for large bytes sequence (PR #154)

Changed

Skipping the language-detection (CD) on ASCII (PR #155)

Assets 2

03 Dec 19:27

Ousret

2.0.9

3874edb

Version 2.0.9

2.0.9 (2021-12-03)

Changed

Moderating the logging impact (since 2.0.8) for specific environments (PR #147)

Fixed

Wrong logging level applied when setting kwarg explain to True (PR #146)

Assets 2

24 Nov 19:45

Ousret

2.0.8

8913e21

Version 2.0.8

Changed

Improvement over Vietnamese detection (PR #126)
MD improvement on trailing data and long foreign (non-pure latin) data (PR #124)
Efficiency improvements in cd/alphabet_languages from @adbar (PR #122)
call sum() without an intermediary list following PEP 289 recommendations from @adbar (PR #129)
Code style as refactored by Sourcery-AI (PR #131)
Minor adjustment on the MD around european words (PR #133)
Remove and replace SRTs from assets / tests (PR #139)
Initialize the library logger with a NullHandler by default from @nmaynes (PR #135)
Setting kwarg explain to True will add provisionally (bounded to function lifespan) a specific stream handler (PR #135)

Fixed

Fix large (misleading) sequence giving UnicodeDecodeError (PR #137)
Avoid using too insignificant chunk (PR #137)

Added

Add and expose function set_logging_handler to configure a specific StreamHandler from @nmaynes (PR #135)
Add CHANGELOG.md entries, format is based on Keep a Changelog (PR #141)

Assets 2

Uh oh!

Releases: jawah/charset_normalizer

Version 3.0.0rc1

3.0.0rc1 (2022-10-18)

Added

Changed

Fixed

Removed

Uh oh!

Version 3.0.0b2

3.0.0b2 (2022-08-21)

Added

Removed

Fixed

Uh oh!

Version 2.1.1

2.1.1 (2022-08-19)

Deprecated

Changed

Fixed

Uh oh!

Version 3.0.0b1

3.0.0b1 (2022-08-15)

Changed

Removed

Uh oh!

Version 2.1.0

2.1.0 (2022-06-19)

Added

Changed

Fixed

Removed

Deprecated

Uh oh!

Version 2.0.12

2.0.12 (2022-02-12)

Fixed

Uh oh!

Version 2.0.11

2.0.11 (2022-01-30)

Added

Changed

Uh oh!

Version 2.0.10

2.0.10 (2022-01-04)

Fixed

Changed

Uh oh!

Version 2.0.9

2.0.9 (2021-12-03)

Changed

Fixed

Uh oh!

Version 2.0.8

Changed

Fixed

Added

Uh oh!