Skip to content

Releases: jawah/charset_normalizer

Version 3.0.0rc1

18 Oct 19:18
544595d
Compare
Choose a tag to compare
Version 3.0.0rc1 Pre-release
Pre-release

This is the last pre-release. If everything goes well, I will publish the stable tag.

3.0.0rc1 (2022-10-18)

Added

  • Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
  • Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
  • Add parameter language_threshold in from_bytes, from_path and from_fp to adjust the minimum expected coherence ratio

Changed

  • Build with static metadata using 'build' frontend
  • Make language detection stricter

Fixed

  • CLI with opt --normalize fail when using full path for files
  • TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha characters have been fed to it

Removed

  • Coherence detector no longer returns 'Simple English' instead returns 'English'
  • Coherence detector no longer returns 'Classical Chinese' instead returns 'Chinese'

Version 3.0.0b2

21 Aug 19:07
Compare
Choose a tag to compare
Version 3.0.0b2 Pre-release
Pre-release

3.0.0b2 (2022-08-21)

Added

  • normalizer --version now specify if current version provide extra speedup (meaning mypyc compilation whl)

Removed

  • Breaking: Method first() and best() from CharsetMatch
  • UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)

Fixed

  • Sphinx warnings when generating the documentation

Version 2.1.1

19 Aug 21:56
86cda88
Compare
Choose a tag to compare

2.1.1 (2022-08-19)

Deprecated

  • Function normalize scheduled for removal in 3.0

Changed

  • Removed useless call to decode in fn is_unprintable (#206)

Fixed

  • Third-party library (i18n xgettext) crashing not recognizing utf_8 (PEP 263) with underscore from @aleksandernovikov (#204)

Version 3.0.0b1

15 Aug 16:17
Compare
Choose a tag to compare
Version 3.0.0b1 Pre-release
Pre-release

3.0.0b1 (2022-08-15)

Changed

  • Optional: Module md.py can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1

Removed

  • Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
  • Breaking: Top-level function normalize
  • Breaking: Properties chaos_secondary_pass, coherence_non_latin and w_counter from CharsetMatch
  • Support for the backport unicodedata2

Version 2.1.0

19 Jun 21:56
cb2dbde
Compare
Choose a tag to compare

2.1.0 (2022-06-19)

Added

  • Output the Unicode table version when running the CLI with --version (PR #194)

Changed

  • Re-use decoded buffer for single byte character sets from @nijel (PR #175)
  • Fixing some performance bottlenecks from @deedy5 (PR #183)

Fixed

  • Workaround potential bug in cpython with Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space (PR #175)
  • CLI default threshold aligned with the API threshold from @oleksandr-kuzmenko (PR #181)

Removed

  • Support for Python 3.5 (PR #192)

Deprecated

  • Use of backport unicodedata from unicodedata2 as Python is quickly catching up, scheduled for removal in 3.0 (PR #194)

Version 2.0.12

12 Feb 14:25
a5f4348
Compare
Choose a tag to compare

2.0.12 (2022-02-12)

Fixed

  • ASCII miss-detection on rare cases (PR #170)

Version 2.0.11

30 Jan 18:26
f256c3e
Compare
Choose a tag to compare

2.0.11 (2022-01-30)

Added

  • Explicit support for Python 3.11 (PR #164)

Changed

  • The logging behavior has been completely reviewed, now using only TRACE and DEBUG levels (PR #163 #165)

Version 2.0.10

04 Jan 20:15
de25562
Compare
Choose a tag to compare

2.0.10 (2022-01-04)

Fixed

  • Fallback match entries might lead to UnicodeDecodeError for large bytes sequence (PR #154)

Changed

  • Skipping the language-detection (CD) on ASCII (PR #155)

Version 2.0.9

03 Dec 19:27
3874edb
Compare
Choose a tag to compare

2.0.9 (2021-12-03)

Changed

  • Moderating the logging impact (since 2.0.8) for specific environments (PR #147)

Fixed

  • Wrong logging level applied when setting kwarg explain to True (PR #146)

Version 2.0.8

24 Nov 19:45
8913e21
Compare
Choose a tag to compare

Changed

  • Improvement over Vietnamese detection (PR #126)
  • MD improvement on trailing data and long foreign (non-pure latin) data (PR #124)
  • Efficiency improvements in cd/alphabet_languages from @adbar (PR #122)
  • call sum() without an intermediary list following PEP 289 recommendations from @adbar (PR #129)
  • Code style as refactored by Sourcery-AI (PR #131)
  • Minor adjustment on the MD around european words (PR #133)
  • Remove and replace SRTs from assets / tests (PR #139)
  • Initialize the library logger with a NullHandler by default from @nmaynes (PR #135)
  • Setting kwarg explain to True will add provisionally (bounded to function lifespan) a specific stream handler (PR #135)

Fixed

  • Fix large (misleading) sequence giving UnicodeDecodeError (PR #137)
  • Avoid using too insignificant chunk (PR #137)

Added

  • Add and expose function set_logging_handler to configure a specific StreamHandler from @nmaynes (PR #135)
  • Add CHANGELOG.md entries, format is based on Keep a Changelog (PR #141)