Skip to content

Commit 6e40939

Browse files
Embeddings finetuning guide for LLMOps guide (#2917)
* fix docs import * update TOC * add skeleton * add scarfs * add headers * complete intro section * fix markdown parsing error * update TOC and remove dead files * update main section TOC * update * update image URL * Optimised images with calibre/image-actions * more distilabel docs * more docs updates * final first draft of synth page * fix links * remove standalone argilla section * add image for synthetic section * add links * Optimised images with calibre/image-actions * add image * add image for synthetic section * Optimised images with calibre/image-actions * add section on finetuning process * add final section part * add final section divider * updates * chore: Refactor evaluate_base_model function and log model metadata * add visualization section and image * Optimised images with calibre/image-actions * add final section * add a few more links --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent 9c3fc56 commit 6e40939

22 files changed

+514
-111
lines changed
Loading
Loading
Loading
7.98 MB
Loading
318 KB
Loading
Loading
Loading

docs/book/toc.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,8 +56,11 @@
5656
* [Understanding reranking](user-guide/llmops-guide/reranking/understanding-reranking.md)
5757
* [Implementing reranking in ZenML](user-guide/llmops-guide/reranking/implementing-reranking.md)
5858
* [Evaluating reranking performance](user-guide/llmops-guide/reranking/evaluating-reranking-performance.md)
59-
* [Improve retrieval by finetuning embeddings](user-guide/llmops-guide/finetuning-embeddings.md)
60-
* [Finetuning LLMs with ZenML](user-guide/llmops-guide/finetuning-llms.md)
59+
* [Improve retrieval by finetuning embeddings](user-guide/llmops-guide/finetuning-embeddings/finetuning-embeddings.md)
60+
* [Synthetic data generation](user-guide/llmops-guide/finetuning-embeddings/synthetic-data-generation.md)
61+
* [Finetuning embeddings with Sentence Transformers](user-guide/llmops-guide/finetuning-embeddings/finetuning-embeddings-with-sentence-transformers.md)
62+
* [Evaluating finetuned embeddings](user-guide/llmops-guide/finetuning-embeddings/evaluating-finetuned-embeddings.md)
63+
* [Finetuning LLMs with ZenML](user-guide/llmops-guide/finetuning-llms/finetuning-llms.md)
6164

6265
## How-To
6366

docs/book/user-guide/llmops-guide/README.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,11 @@ In this guide, we'll explore various aspects of working with LLMs in ZenML, incl
2626
* [Understanding reranking](reranking/understanding-reranking.md)
2727
* [Implementing reranking in ZenML](reranking/implementing-reranking.md)
2828
* [Evaluating reranking performance](reranking/evaluating-reranking-performance.md)
29-
* [Improve retrieval by finetuning embeddings](finetuning-embeddings.md)
30-
* [Finetuning LLMs with ZenML](finetuning-llms.md)
29+
* [Improve retrieval by finetuning embeddings](finetuning-embeddings/finetuning-embeddings.md)
30+
* [Synthetic data generation](finetuning-embeddings/synthetic-data-generation.md)
31+
* [Finetuning embeddings with Sentence Transformers](finetuning-embeddings/finetuning-embeddings-with-sentence-transformers.md)
32+
* [Evaluating finetuned embeddings](finetuning-embeddings/evaluating-finetuned-embeddings.md)
33+
* [Finetuning LLMs with ZenML](finetuning-llms/finetuning-llms.md)
3134

3235
To follow along with the examples and tutorials in this guide, ensure you have a Python environment set up with ZenML installed. Familiarity with the concepts covered in the [Starter Guide](../starter-guide/README.md) and [Production Guide](../production-guide/README.md) is recommended.
3336

docs/book/user-guide/llmops-guide/finetuning-embeddings.md

Lines changed: 0 additions & 8 deletions
This file was deleted.
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
---
2+
description: Evaluate finetuned embeddings and compare to original base embeddings.
3+
---
4+
5+
Now that we've finetuned our embeddings, we can evaluate them and compare to the
6+
base embeddings. We have all the data saved and versioned already, and we will
7+
reuse the same MatryoshkaLoss function for evaluation.
8+
9+
In code, our evaluation steps are easy to comprehend. Here, for example, is the
10+
base model evaluation step:
11+
12+
```python
13+
from zenml import log_model_metadata, step
14+
15+
def evaluate_model(
16+
dataset: DatasetDict, model: SentenceTransformer
17+
) -> Dict[str, float]:
18+
"""Evaluate the given model on the dataset."""
19+
evaluator = get_evaluator(
20+
dataset=dataset,
21+
model=model,
22+
)
23+
return evaluator(model)
24+
25+
@step
26+
def evaluate_base_model(
27+
dataset: DatasetDict,
28+
) -> Annotated[Dict[str, float], "base_model_evaluation_results"]:
29+
"""Evaluate the base model on the given dataset."""
30+
model = SentenceTransformer(
31+
EMBEDDINGS_MODEL_ID_BASELINE,
32+
device="cuda" if torch.cuda.is_available() else "cpu",
33+
)
34+
35+
results = evaluate_model(
36+
dataset=dataset,
37+
model=model,
38+
)
39+
40+
# Convert numpy.float64 values to regular Python floats
41+
# (needed for serialization)
42+
base_model_eval = {
43+
f"dim_{dim}_cosine_ndcg@10": float(
44+
results[f"dim_{dim}_cosine_ndcg@10"]
45+
)
46+
for dim in EMBEDDINGS_MODEL_MATRYOSHKA_DIMS
47+
}
48+
49+
log_model_metadata(
50+
metadata={"base_model_eval": base_model_eval},
51+
)
52+
53+
return results
54+
```
55+
56+
We log the results for our core Matryoshka dimensions as model metadata to ZenML
57+
within our evaluation step. This will allow us to inspect these results from
58+
within [the Model Control Plane](https://docs.zenml.io/how-to/use-the-model-control-plane) (see
59+
below for more details). Our results come in the form of a dictionary of string
60+
keys and float values which will, like all step inputs and outputs, be
61+
versioned, tracked and saved in your artifact store.
62+
63+
## Visualizing results
64+
65+
It's possible to visualize results in a few different ways in ZenML, but one
66+
easy option is just to output your chart as an `PIL.Image` object. (See our
67+
[documentation on more ways to visualize your
68+
results](../../../how-to/visualize-artifacts/README.md).) The rest the
69+
implementation of our `visualize_results` step is just simple `matplotlib` code
70+
to plot out the base model evaluation against the finetuned model evaluation. We
71+
represent the results as percentage values and horizontally stack the two sets
72+
to make comparison a little easier.
73+
74+
![Visualizing finetuned embeddings evaluation
75+
results](../../../.gitbook/assets/finetuning-embeddings-visualization.png)
76+
77+
We can see that our finetuned embeddings have improved the recall of our
78+
retrieval system across all of the dimensions, but the results are still not
79+
amazing. In a production setting, we would likely want to focus on improving the
80+
data being used for the embeddings training. In particular, we could consider
81+
stripping out some of the logs output from the documentation, and perhaps omit
82+
some pages which offer low signal for the retrieval task. This embeddings
83+
finetuning was run purely on the full set of synthetic data generated by
84+
`distilabel` and `gpt-4o`, so we wouldn't necessarily expect to see huge
85+
improvements out of the box, especially when the underlying data chunks are
86+
complex and contain multiple topics.
87+
88+
## Model Control Plane as unified interface
89+
90+
Once all our pipelines are finished running, the best place to inspect our
91+
results as well as the artifacts and models we generated is the Model Control
92+
Plane.
93+
94+
![Model Control Plane](../../../.gitbook/assets/mcp-embeddings.gif)
95+
96+
The interface is split into sections that correspond to:
97+
98+
- the artifacts generated by our steps
99+
- the models generated by our steps
100+
- the metadata logged by our steps
101+
- (potentially) any deployments of models made, though we didn't use this in
102+
this guide so far
103+
- any pipeline runs associated with this 'Model'
104+
105+
We can easily see which are the latest artifact or technical model versions, as
106+
well as compare the actual values of our evals or inspect the hardware or
107+
hyperparameters used for training.
108+
109+
This one-stop-shop interface is available on ZenML Pro and you can learn more
110+
about it in the [Model Control Plane
111+
documentation](https://docs.zenml.io/how-to/use-the-model-control-plane).
112+
113+
## Next Steps
114+
115+
Now that we've finetuned our embeddings and evaluated them, when they were in a
116+
good shape for use we could bring these into [the original RAG pipeline](../rag/basic-rag-inference-pipeline.md),
117+
regenerate a new series of embeddings for our data and then rerun our RAG
118+
retrieval evaluations to see how they've improved in our hand-crafted and
119+
LLM-powered evaluations.
120+
121+
The next section will cover [LLM finetuning and deployment](../finetuning-llms/finetuning-llms.md) as the
122+
final part of our LLMops guide. (This section is currently still a work in
123+
progress, but if you're eager to try out LLM finetuning with ZenML, you can use
124+
[our LoRA
125+
project](https://github.com/zenml-io/zenml-projects/blob/main/llm-lora-finetuning/README.md)
126+
to get started. We also have [a
127+
blogpost](https://www.zenml.io/blog/how-to-finetune-llama-3-1-with-zenml) guide which
128+
takes you through
129+
[all the steps you need to finetune Llama 3.1](https://www.zenml.io/blog/how-to-finetune-llama-3-1-with-zenml) using GCP's Vertex AI with ZenML,
130+
including one-click stack creation!)
131+
132+
To try out the two pipelines, please follow the instructions in [the project
133+
repository README](https://github.com/zenml-io/zenml-projects/blob/main/llm-complete-guide/README.md),
134+
and you can find the full code in that same directory.
135+
136+
<!-- For scarf -->
137+
<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>
138+
139+

docs/book/user-guide/llmops-guide/finetuning-embeddings/finetuning-embeddings-for-better-retrieval-performance.md

Lines changed: 0 additions & 8 deletions
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
---
2+
description: Finetune embeddings with Sentence Transformers.
3+
---
4+
5+
We now have a dataset that we can use to finetune our embeddings. You can
6+
[inspect the positive and negative examples](https://huggingface.co/datasets/zenml/rag_qa_embedding_questions_0_60_0_distilabel) on the Hugging Face [datasets page](https://huggingface.co/datasets/zenml/rag_qa_embedding_questions_0_60_0_distilabel) since
7+
our previous pipeline pushed the data there.
8+
9+
![Synthetic data generated with distilabel for embeddings finetuning](../../../.gitbook/assets/distilabel-synthetic-dataset-hf.png)
10+
11+
Our pipeline for finetuning the embeddings is relatively simple. We'll do the
12+
following:
13+
14+
- load our data either from Hugging Face or [from Argilla via the ZenML
15+
annotation integration](../../../component-guide/annotators/argilla.md)
16+
- finetune our model using the [Sentence
17+
Transformers](https://www.sbert.net/) library
18+
- evaluate the base and finetuned embeddings
19+
- visualise the results of the evaluation
20+
21+
![Embeddings finetuning pipeline with Sentence Transformers and
22+
ZenML](../../../.gitbook/assets/rag-finetuning-embeddings-pipeline.png)
23+
24+
## Loading data
25+
26+
By default the pipeline will load the data from our Hugging Face dataset. If
27+
you've annotated your data in Argilla, you can load the data from there instead.
28+
You'll just need to pass an `--argilla` flag to the Python invocation when
29+
you're running the pipeline like so:
30+
31+
```bash
32+
python run.py --embeddings --argilla
33+
```
34+
35+
This assumes that you've set up an Argilla annotator in your stack. The code
36+
checks for the annotator and downloads the data that was annotated in Argilla.
37+
Please see our [guide to using the Argilla integration with ZenML](../../../component-guide/annotators/argilla.md) for more details.
38+
39+
## Finetuning with Sentence Transformers
40+
41+
The `finetune` step in the pipeline is responsible for finetuning the embeddings model using the Sentence Transformers library. Let's break down the key aspects of this step:
42+
43+
1. **Model Loading**: The code loads the base model (`EMBEDDINGS_MODEL_ID_BASELINE`) using the Sentence Transformers library. It utilizes the SDPA (Self-Distilled Pruned Attention) implementation for efficient training with Flash Attention 2.
44+
45+
2. **Loss Function**: The finetuning process employs a custom loss function called `MatryoshkaLoss`. This loss function is a wrapper around the `MultipleNegativesRankingLoss` provided by Sentence Transformers. The Matryoshka approach involves training the model with different embedding dimensions simultaneously. It allows the model to learn embeddings at various granularities, improving its performance across different embedding sizes.
46+
47+
3. **Dataset Preparation**: The training dataset is loaded from the provided `dataset` parameter. The code saves the training data to a temporary JSON file and then loads it using the Hugging Face `load_dataset` function.
48+
49+
4. **Evaluator**: An evaluator is created using the `get_evaluator` function. The evaluator is responsible for assessing the model's performance during training.
50+
51+
5. **Training Arguments**: The code sets up the training arguments using the `SentenceTransformerTrainingArguments` class. It specifies various hyperparameters such as the number of epochs, batch size, learning rate, optimizer, precision (TF32 and BF16), and evaluation strategy.
52+
53+
6. **Trainer**: The `SentenceTransformerTrainer` is initialized with the model,
54+
training arguments, training dataset, loss function, and evaluator. The
55+
trainer handles the training process. The `trainer.train()` method is called
56+
to start the finetuning process. The model is trained for the specified
57+
number of epochs using the provided hyperparameters.
58+
59+
7. **Model Saving**: After training, the finetuned model is pushed to the Hugging Face Hub using the `trainer.model.push_to_hub()` method. The model is saved with the specified ID (`EMBEDDINGS_MODEL_ID_FINE_TUNED`).
60+
61+
9. **Metadata Logging**: The code logs relevant metadata about the training process, including the training parameters, hardware information, and accelerator details.
62+
63+
10. **Model Rehydration**: To handle materialization errors, the code saves the
64+
trained model to a temporary file, loads it back into a new
65+
`SentenceTransformer` instance, and returns the rehydrated model.
66+
67+
(*Thanks and credit to Phil Schmid for [his tutorial on finetuning embeddings](https://www.philschmid.de/fine-tune-embedding-model-for-rag) with Sentence
68+
Transformers and a Matryoshka loss function. This project uses many ideas and
69+
some code from his implementation.*)
70+
71+
## Finetuning in code
72+
73+
Here's a simplified code snippet highlighting the key parts of the finetuning process:
74+
75+
```python
76+
# Load the base model
77+
model = SentenceTransformer(EMBEDDINGS_MODEL_ID_BASELINE)
78+
# Define the loss function
79+
train_loss = MatryoshkaLoss(model, MultipleNegativesRankingLoss(model))
80+
# Prepare the training dataset
81+
train_dataset = load_dataset("json", data_files=train_dataset_path)
82+
# Set up the training arguments
83+
args = SentenceTransformerTrainingArguments(...)
84+
# Create the trainer
85+
trainer = SentenceTransformerTrainer(model, args, train_dataset, train_loss)
86+
# Start training
87+
trainer.train()
88+
# Save the finetuned model
89+
trainer.model.push_to_hub(EMBEDDINGS_MODEL_ID_FINE_TUNED)
90+
```
91+
92+
The finetuning process leverages the capabilities of the Sentence Transformers library to efficiently train the embeddings model. The Matryoshka approach allows for learning embeddings at different dimensions simultaneously, enhancing the model's performance across various embedding sizes.
93+
94+
Our model is finetuned, saved in the Hugging Face Hub for easy access and
95+
reference in subsequent steps, but also versioned and tracked within ZenML for
96+
full observability. At this point the pipeline will evaluate the base and
97+
finetuned embeddings and visualise the results.
98+
99+
<!-- For scarf -->
100+
<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>
101+
102+
Lines changed: 38 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,43 @@
11
---
2-
description: Finetune embeddings to improve retrieval performance.
2+
description: Finetune embeddings on custom synthetic data to improve retrieval performance.
33
---
44

5-
🚧 This guide is a work in progress. Please check back soon for updates.
5+
We previously learned [how to use RAG with ZenML](../rag-with-zenml/README.md) to
6+
build a production-ready RAG pipeline. In this section, we will explore how to
7+
optimize and maintain your embedding models through synthetic data generation and
8+
human feedback. So far, we've been using off-the-shelf embeddings, which provide
9+
a good baseline and decent performance on standard tasks. However, you can often
10+
significantly improve performance by finetuning embeddings on your own domain-specific data.
611

7-
Coming soon!<!-- For scarf -->
12+
Our RAG pipeline uses a retrieval-based approach, where it first retrieves the
13+
most relevant documents from our vector database, and then uses a language model
14+
to generate a response based on those documents. By finetuning our embeddings on
15+
a dataset of technical documentation similar to our target domain, we can improve
16+
the retrieval step and overall performance of the RAG pipeline.
17+
18+
The work of finetuning embeddings based on synthetic data and human feedback is
19+
a multi-step process. We'll go through the following steps:
20+
21+
- [generating synthetic data with `distilabel`](synthetic-data-generation.md)
22+
- [finetuning embeddings with Sentence Transformers](finetuning-embeddings-with-sentence-transformers.md)
23+
- [evaluating finetuned embeddings and using ZenML's model control plane to get a systematic overview](evaluating-finetuned-embeddings.md)
24+
25+
Besides ZenML, we will do this by using two open source libraries:
26+
[`argilla`](https://github.com/argilla-io/argilla/) and
27+
[`distilabel`](https://github.com/argilla-io/distilabel). Both of these
28+
libraries focus optimizing model outputs through improving data quality,
29+
however, each one of them takes a different approach to tackle the same problem.
30+
`distilabel` provides a scalable and reliable approach to distilling knowledge
31+
from LLMs by generating synthetic data or providing AI feedback with LLMs as
32+
judges. `argilla` enables AI engineers and domain experts to collaborate on data
33+
projects by allowing them to organize and explore data through within an
34+
interactive and engaging UI. Both libraries can be used individually but they
35+
work better together. We'll showcase their use via ZenML pipelines.
36+
37+
To follow along with the example explained in this guide, please follow the
38+
instructions in [the `llm-complete-guide` repository](https://github.com/zenml-io/zenml-projects/llm-complete-guide/README.md) where the full code is also
39+
available. This specific section on embeddings finetuning can be run locally or
40+
using cloud compute as you prefer.
41+
42+
<!-- For scarf -->
843
<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>

docs/book/user-guide/llmops-guide/finetuning-embeddings/integrating-finetuned-embeddings-into-zenml-pipelines.md

Lines changed: 0 additions & 8 deletions
This file was deleted.

0 commit comments

Comments
 (0)