Skip to content

Is NFD type normalizer supported? #1209

Closed
@adewdev

Description

@adewdev

Question

Hi,

I was trying the following code on browser which uses dewdev/language_detection:

`import { pipeline, Pipeline } from '@huggingface/transformers';

export class DetectLanguage {
private modelid: string | null = null;
private detectPipeline: Pipeline | null = null;
private initialized: boolean = false;

constructor(modelid: string = 'dewdev/language_detection') {
    this.modelid = modelid;
}

async initialize() {
    try {
        this.detectPipeline = await pipeline('text-classification', this.modelid, {
            dtype: 'fp32',
            device: navigator.gpu? 'webgpu': 'wasm'
        });
        this.initialized = true;
        console.log("Model initialization successful.");
    } catch (error) {
        console.error('Error initializing language detection model with fallback:', error);
        this.initialized = false;
        throw error;
    }
}

async detect(text: string) {
    if (!this.initialized || !this.detectPipeline) {
        console.error("Model not initialized.");
        return '';
    }
    try {
        const language = await this.detectPipeline(text, { top: 1 });
        return language;
    } catch (error) {
        console.error('Error during language detection:', error);
        return '';
    }
}

}

async function main() {
const detectLanguage = new DetectLanguage();
await detectLanguage.initialize();
const text = "This is a test sentence.";
const language = await detectLanguage.detect(text);
console.log(Detected language: ${language});
}

// Call the main function
main();
`

The above code brings up the following error:
Error initializing language detection model with fallback: Error: Unknown Normalizer type: NFD
at Normalizer.fromConfig (tokenizers.js:1011:1)
at tokenizers.js:1187:1
at Array.map ()
at new NormalizerSequence (tokenizers.js:1187:1)
at Normalizer.fromConfig (tokenizers.js:993:1)
at new PreTrainedTokenizer (tokenizers.js:2545:1)
at new BertTokenizer (tokenizers.js:3277:8)
at AutoTokenizer.from_pretrained (tokenizers.js:4373:1)
at async Promise.all (:5173/index 0)
at async loadItems (pipelines.js:3413:1)

Here is the normalizer section from tokenizer:
"normalizer": { "type": "Sequence", "normalizers": [ { "type": "NFD" }, { "type": "BertNormalizer", "clean_text": true, "handle_chinese_chars": true, "strip_accents": true, "lowercase": true } ] },

May be NFD normalizer is missing.

Is there any way to bypass this error? Can you please me know?

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions