Releases: jawah/charset_normalizer
Releases · jawah/charset_normalizer
Charset Normalizer
Changes :
- Feature : Added
has_submatch
,percent_chaos
andpercent_coherence
properties on single match object. - Improvement :
best()
method of CharsetNormalizerMatches has been rewritten for better readability. - Feature : Added
explain
boolean positional parameter to print out what actually happen when searching for a match. - Improvement : Detection has been globally improved.
- Feature : You can exclude some encoding when searching for a match with parameter
cp_exclusion
. List of str. forfrom_bytes
from_path
andfrom_fp
. - Feature : You can limit the search to some encoding when looking for a match with parameter
cp_isolation
. List of str. forfrom_bytes
from_path
andfrom_fp
. - Feature :
import charset_normalizer
is enough to provide additional help when you encounter UnicodeDecodeError exception.
Charset Normalizer
Changes :
- Bugfix :
from_bytes
parameters steps and chunk_size were not adapted to sequence len if provided values were not fitted to content. Therefore could lead to misdetection on small content.
Charset Normalizer
Charset Normalizer
Release 1.0.0 (#11) * Adjustement in frequencies.json about Chinese Remove latin based char in it * Added the possibility to list encoding aliases for a match Encoding name are known by many name, using this could help when searching for IBM855 when it's listed as CP855. * Added submatch in match list of submatch that produce the EXACT same output as a match * Changes in docs + comment unused code. * Add param in doc ProbeChaos giveup_threshold * Doc improvement in unicode.py * Add static method list_by_range in unicode.py Sort letters by unicode range in a dict * ProbeCoherence reliability improved Can now probe & sort by alphabet used or unicode range. * Added coherence_non_latin method in NormalizerMatch Verify if a non latin based language got verified by probe coherence * CLI is now more verbose * More tests, yay ! * bump 1.0.0 * readme upd8
Charset Normalizer
- Improvement on detection
- Performance loss to expect
- Added --threshold option to CLI
Charset Normalizer
- Bugfix on UTF 7 support
- Legacy detect(byte_str) method
Charset Normalizer
RC 5
- BOM support (Unicode mostly)
- Chaos prober improved on small text
Charset Normalizer
RC 4
- Probe Chaos: Code cleanup, performance review and accuracy improved
Charset Normalizer
RC 3
- Language detection has been reviewed to give better result
- Bugfix on jp detection, every jp text was considered chaotic
Charset Normalizer
RC 2
- Fixes #3 🎉 First PR
- Close file after reading them in CLI mode