Is NFD type normalizer supported?

### Question

Hi,

I was trying the following code on browser which uses [dewdev/language_detection](https://huggingface.co/dewdev/language_detection):

`import { pipeline, Pipeline } from '@huggingface/transformers';

export class DetectLanguage {
    private modelid: string | null = null;
    private detectPipeline: Pipeline | null = null;
    private initialized: boolean = false;

    constructor(modelid: string = 'dewdev/language_detection') {
        this.modelid = modelid;
    }

    async initialize() {
        try {
            this.detectPipeline = await pipeline('text-classification', this.modelid, {
                dtype: 'fp32',
                device: navigator.gpu? 'webgpu': 'wasm'
            });
            this.initialized = true;
            console.log("Model initialization successful.");
        } catch (error) {
            console.error('Error initializing language detection model with fallback:', error);
            this.initialized = false;
            throw error;
        }
    }

    async detect(text: string) {
        if (!this.initialized || !this.detectPipeline) {
            console.error("Model not initialized.");
            return '';
        }
        try {
            const language = await this.detectPipeline(text, { top: 1 });
            return language;
        } catch (error) {
            console.error('Error during language detection:', error);
            return '';
        }
    }
}

async function main() {
    const detectLanguage = new DetectLanguage();
    await detectLanguage.initialize();
    const text = "This is a test sentence.";
    const language = await detectLanguage.detect(text);
    console.log(`Detected language: ${language}`);
}

// Call the main function
main();
`

The above code brings up the following error:
  Error initializing language detection model with fallback: Error: Unknown Normalizer type: NFD
      at Normalizer.fromConfig (tokenizers.js:1011:1)
      at tokenizers.js:1187:1
      at Array.map (<anonymous>)
      at new NormalizerSequence (tokenizers.js:1187:1)
      at Normalizer.fromConfig (tokenizers.js:993:1)
      at new PreTrainedTokenizer (tokenizers.js:2545:1)
      at new BertTokenizer (tokenizers.js:3277:8)
      at AutoTokenizer.from_pretrained (tokenizers.js:4373:1)
      at async Promise.all (:5173/index 0)
      at async loadItems (pipelines.js:3413:1)

Here is the normalizer section from tokenizer:
`"normalizer": {
    "type": "Sequence",
    "normalizers": [
      {
        "type": "NFD"
      },
      {
        "type": "BertNormalizer",
        "clean_text": true,
        "handle_chinese_chars": true,
        "strip_accents": true,
        "lowercase": true
      }
    ]
  },`

May be NFD normalizer is missing.

Is there any way to bypass this error? Can you please me know?

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is NFD type normalizer supported? #1209

Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is NFD type normalizer supported? #1209

Description

Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions