Description
Question
Hi,
I was trying the following code on browser which uses dewdev/language_detection:
`import { pipeline, Pipeline } from '@huggingface/transformers';
export class DetectLanguage {
private modelid: string | null = null;
private detectPipeline: Pipeline | null = null;
private initialized: boolean = false;
constructor(modelid: string = 'dewdev/language_detection') {
this.modelid = modelid;
}
async initialize() {
try {
this.detectPipeline = await pipeline('text-classification', this.modelid, {
dtype: 'fp32',
device: navigator.gpu? 'webgpu': 'wasm'
});
this.initialized = true;
console.log("Model initialization successful.");
} catch (error) {
console.error('Error initializing language detection model with fallback:', error);
this.initialized = false;
throw error;
}
}
async detect(text: string) {
if (!this.initialized || !this.detectPipeline) {
console.error("Model not initialized.");
return '';
}
try {
const language = await this.detectPipeline(text, { top: 1 });
return language;
} catch (error) {
console.error('Error during language detection:', error);
return '';
}
}
}
async function main() {
const detectLanguage = new DetectLanguage();
await detectLanguage.initialize();
const text = "This is a test sentence.";
const language = await detectLanguage.detect(text);
console.log(Detected language: ${language}
);
}
// Call the main function
main();
`
The above code brings up the following error:
Error initializing language detection model with fallback: Error: Unknown Normalizer type: NFD
at Normalizer.fromConfig (tokenizers.js:1011:1)
at tokenizers.js:1187:1
at Array.map ()
at new NormalizerSequence (tokenizers.js:1187:1)
at Normalizer.fromConfig (tokenizers.js:993:1)
at new PreTrainedTokenizer (tokenizers.js:2545:1)
at new BertTokenizer (tokenizers.js:3277:8)
at AutoTokenizer.from_pretrained (tokenizers.js:4373:1)
at async Promise.all (:5173/index 0)
at async loadItems (pipelines.js:3413:1)
Here is the normalizer section from tokenizer:
"normalizer": { "type": "Sequence", "normalizers": [ { "type": "NFD" }, { "type": "BertNormalizer", "clean_text": true, "handle_chinese_chars": true, "strip_accents": true, "lowercase": true } ] },
May be NFD normalizer is missing.
Is there any way to bypass this error? Can you please me know?
Thanks