You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there any way to use a custom tokenizer by the way? (like UTF-8 ids)
I'd be cool to get rid of the tokenizer itself, obviously at the expense of losing some computing efficiency for more interpretability, while covering the whole Unicode.
Is it possible use bpe tokenizer instead rwkv_vocab_v20230424 in the next model?
I tried rwkv model in Thai language. It look good but it is very slow because Thai is character level for rwkv_vocab_v20230424.
I think if the next model use bpe tokenizer like qwen2, It can improve model and the speed.
The text was updated successfully, but these errors were encountered: