Skip to content

BitNet (based on v4.51.3)

Compare
Choose a tag to compare
@LysandreJik LysandreJik released this 08 May 13:02
· 378 commits to main since this release
b262680

A new model is added to transformers: BitNet
It is added on top of the v4.51.3 release, and can be installed from the following tag: v4.51.3-BitNet-preview.

In order to install this version, please install with the following command:

pip install git+https://github.com/huggingface/[email protected]

If fixes are needed, they will be applied to this release; this installation may therefore be considered as stable and improving.

As the tag implies, this tag is a preview of the BitNet model. This tag is a tagged version of the main branch and does not follow semantic versioning. This model will be included in the next minor release: v4.52.0.

BitNet

image

Trained on a corpus of 4 trillion tokens, this model demonstrates that native 1-bit LLMs can achieve performance comparable to leading open-weight, full-precision models of similar size, while offering substantial advantages in computational efficiency (memory, energy, latency).

Usage example

BitNet can be found on the Huggingface Hub.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "microsoft/bitnet-b1.58-2B-4T"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16
)

# Apply the chat template
messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "How are you?"},
]
chat_input = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)

# Generate response
chat_outputs = model.generate(chat_input, max_new_tokens=50)
response = tokenizer.decode(chat_outputs[0][chat_input.shape[-1]:], skip_special_tokens=True) # Decode only the response part
print("\nAssistant Response:", response)