Skip to content
This repository was archived by the owner on May 26, 2025. It is now read-only.

huggingface/tei-gaudi

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Embeddings Inference on Habana Gaudi

Warning

This repository is deprecated. Please use the text-embeddings-inference repository instead for the latest version of TEI on Intel Gaudi. You should now use the latest image hosted on the TEI repo ghcr.io/huggingface/text-embeddings-inference:hpu-latest instead of ghcr.io/huggingface/tei-gaudi:latest. You can find the new Gaudi images in the text-embeddings-inference registry. You can also check the Gaudi Backend documentation for more information.

Table of contents

Get started

To use 🤗 text-embeddings-inference on Habana Gaudi/Gaudi2, follow these steps:

  1. Pull the official Docker image with:
    docker pull ghcr.io/huggingface/tei-gaudi:latest

Note

Alternatively, you can build the Docker image using Dockerfile-hpu located in this folder with:

docker build -f Dockerfile-hpu -t tei_gaudi .
  1. Launch a local server instance on 1 Gaudi card:
    model=BAAI/bge-large-en-v1.5
    volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
    
    docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e MAX_WARMUP_SEQUENCE_LENGTH=512 --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tei-gaudi:latest --model-id $model --pooling cls
    For models within the Transformers library that need remote code to run customized implementations, please set the environment variable -e TRUST_REMOTE_CODE=TRUE within docker run command line. Here is an example:
    model="Alibaba-NLP/gte-large-en-v1.5"
    volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
    
    docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e MAX_WARMUP_SEQUENCE_LENGTH=512 -e TRUST_REMOTE_CODE=TRUE --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tei-gaudi:latest --model-id $model --pooling cls
    
  2. You can then send a request:
     curl 127.0.0.1:8080/embed \
         -X POST \
         -d '{"inputs":"What is Deep Learning?"}' \
         -H 'Content-Type: application/json'

For more information and documentation about Text Embeddings Inference, checkout README of the original repo.

Supported Models

Text Embeddings

tei-gaudi currently supports Nomic, BERT, CamemBERT, XLM-RoBERTa models with absolute positions, JinaBERT model with Alibi positions and Mistral, Alibaba GTE and Qwen2 models with Rope positions.

Below are some examples of our validated models:

Architecture Pooling Models
BERT Cls/Mean/Last token
  • BAAI/bge-large-en-v1.5
  • sentence-transformers/all-MiniLM-L6-v2
  • sentence-transformers/all-MiniLM-L12-v2
  • sentence-transformers/multi-qa-MiniLM-L6-cos-v1
  • sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
  • sentence-transformers/paraphrase-MiniLM-L3-v2
  • BERT Splade
  • naver/efficient-splade-VI-BT-large-query
  • MPNet Cls/Mean/Last token
  • sentence-transformers/all-mpnet-base-v2
  • sentence-transformers/paraphrase-multilingual-mpnet-base-v2
  • sentence-transformers/multi-qa-mpnet-base-dot-v1
  • ALBERT Cls/Mean/Last token
  • sentence-transformers/paraphrase-albert-small-v2
  • Mistral Cls/Mean/Last token
  • intfloat/e5-mistral-7b-instruct
  • Salesforce/SFR-Embedding-2_R
  • GTE Cls/Mean/Last token
  • Alibaba-NLP/gte-large-en-v1.5
  • JinaBERT Cls/Mean/Last token
  • jinaai/jina-embeddings-v2-base-en
  • Sequence Classification and Re-Ranking

    tei-gaudi currently supports CamemBERT, and XLM-RoBERTa Sequence Classification models with absolute positions.

    Below are some examples of the currently supported models:

    Task Model Type Model ID
    Re-Ranking XLM-RoBERTa BAAI/bge-reranker-large
    Re-Ranking XLM-RoBERTa BAAI/bge-reranker-base
    Sentiment Analysis RoBERTa SamLowe/roberta-base-go_emotions

    How to Use

    Using Re-rankers models

    model=BAAI/bge-reranker-large
    volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
    
    docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e MAX_WARMUP_SEQUENCE_LENGTH=512 --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tei-gaudi:latest --model-id $model

    And then you can rank the similarity between a query and a list of texts with:

    curl 127.0.0.1:8080/rerank \
        -X POST \
        -d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
        -H 'Content-Type: application/json'

    Using Sequence Classification models

    You can also use classic Sequence Classification models like SamLowe/roberta-base-go_emotions:

    model=SamLowe/roberta-base-go_emotions
    volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
    
    docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e MAX_WARMUP_SEQUENCE_LENGTH=512 --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tei-gaudi:latest --model-id $model

    Once you have deployed the model you can use the predict endpoint to get the emotions most associated with an input:

    curl 127.0.0.1:8080/predict \
        -X POST \
        -d '{"inputs":"I like you."}' \
        -H 'Content-Type: application/json'

    Using SPLADE pooling

    You can choose to activate SPLADE pooling for Bert and Distilbert MaskedLM architectures:

    docker build -f Dockerfile-hpu -t tei_gaudi .
    model=naver/efficient-splade-VI-BT-large-query
    volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
    
    docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e MAX_WARMUP_SEQUENCE_LENGTH=512 --cap-add=sys_nice --ipc=host tei_gaudi --model-id $model --pooling splade

    Once you have deployed the model you can use the /embed_sparse endpoint to get the sparse embedding:

    curl 127.0.0.1:8080/embed_sparse \
        -X POST \
        -d '{"inputs":"I like you."}' \
        -H 'Content-Type: application/json'

    The license to use TEI on Habana Gaudi is the one of TEI: https://github.com/huggingface/text-embeddings-inference/blob/main/LICENSE

    Please reach out to [email protected] if you have any question.

    About

    A blazing fast inference solution for text embeddings models

    Resources

    License

    Code of conduct

    Stars

    Watchers

    Forks

    Packages

     
     
     

    Languages

    • Rust 91.8%
    • Python 5.3%
    • JavaScript 1.8%
    • Other 1.1%