When encoding `" ..."` through the tokenizer, it's encoded as `" ..."` (one missing space), but tokenizers should be always 100% invertible, aren't they?