Skip to content

Exception upon attempting to load a Tokenizer from file #566

Closed as not planned
@joepalermo

Description

@joepalermo

Hi, I'm attempting to simply serialize and then unserialize a trained tokenizer. When I run the following code:

tokenizer = Tokenizer(BPE())
trainer = BpeTrainer(vocab_size=280)
tokenizer.train(trainer, ["preprocessing/corpus/corpus.txt"])
save_to_filepath = 'preprocessing/tokenizer.json'
tokenizer.save(save_to_filepath)
tokenizer = Tokenizer.from_file(save_to_filepath)

I get the following traceback:

Traceback (most recent call last):
...
    tokenizer = Tokenizer.from_file(save_to_filepath)
Exception: data did not match any variant of untagged enum ModelWrapper at line 1 column 5408

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions