feat: Updating documents to highlight v2 api for Vector Similarity Se… (#5000)

franciscojavierarceo · web-flow · commit 32b82a4b59bc · 2025-02-03T11:49:00.000-05:00
* feat: Updating documents to highlight v2 api for Vector Similarity Search

Signed-off-by: Francisco Javier Arceo &lt;farceo@redhat.com&gt;

* updated to add another column about v2 support

Signed-off-by: Francisco Javier Arceo &lt;farceo@redhat.com&gt;

---------

Signed-off-by: Francisco Javier Arceo &lt;farceo@redhat.com&gt;
diff --git a/docs/reference/alpha-vector-database.md b/docs/reference/alpha-vector-database.md
@@ -7,20 +7,35 @@ Vector database allows user to store and retrieve embeddings. Feast provides gen
 ## Integration
 Below are supported vector databases and implemented features:
 
-| Vector Database | Retrieval | Indexing |
-|-----------------|-----------|----------|
-| Pgvector        | [x]       | [ ]      |
-| Elasticsearch   | [x]       | [x]      |
-| Milvus          | [ ]       | [ ]      |
-| Faiss           | [ ]       | [ ]      |
-| SQLite          | [x]       | [ ]      |
-| Qdrant          | [x]       | [x]      |
+| Vector Database | Retrieval | Indexing | V2 Support* | 
+|-----------------|-----------|----------|-------------|
+| Pgvector        | [x]       | [ ]      | []          |
+| Elasticsearch   | [x]       | [x]      | []          |
+| Milvus          | [x]       | [x]      | [x]         |
+| Faiss           | [ ]       | [ ]      | []          |
+| SQLite          | [x]       | [ ]      | []          |
+| Qdrant          | [x]       | [x]      | []          |
+
+*Note: V2 Support means the SDK supports retrieval of features along with vector embeddings from vector similarity search.
 
 Note: SQLite is in limited access and only working on Python 3.10. It will be updated as [sqlite_vec](https://github.com/asg017/sqlite-vec/) progresses.
 
-## Example
+{% hint style="danger" %}
+We will be deprecating the `retrieve_online_documents` method in the SDK in the future. 
+We recommend using the `retrieve_online_documents_v2` method instead, which offers easier vector index configuration 
+directly in the Feature View and the ability to retrieve standard features alongside your vector embeddings for richer context injection. 
+
+Long term we will collapse the two methods into one, but for now, we recommend using the `retrieve_online_documents_v2` method.
+Beyond that, we will then have `retrieve_online_documents` and `retrieve_online_documents_v2` simply point to `get_online_features` for 
+backwards compatibility and the adopt industry standard naming conventions.
+{% endhint %}
+
+**Note**: Milvus implements the v2 `retrieve_online_documents_v2` method in the SDK. This will be the longer-term solution so that Data Scientists can easily enable vector similarity search by just flipping a flag.
 
-See [https://github.com/feast-dev/feast-workshop/blob/rag/module_4_rag](https://github.com/feast-dev/feast-workshop/blob/rag/module_4_rag) for an example on how to use vector database.
+## Examples
+
+- See the v0 [Rag Demo](https://github.com/feast-dev/feast-workshop/blob/rag/module_4_rag) for an example on how to use vector database using the `retrieve_online_documents` method (planning migration and deprecation (planning migration and deprecation).
+- See the v1 [Milvus Quickstart](../../examples/rag/milvus-quickstart.ipynb) for a quickstart guide on how to use Feast with Milvus using the `retrieve_online_documents_v2` method.
 
 ### **Prepare offline embedding dataset**
 Run the following commands to prepare the embedding dataset:
@@ -34,25 +49,23 @@ The output will be stored in `data/city_wikipedia_summaries.csv.`
 Use the feature_store.yaml file to initialize the feature store. This will use the data as offline store, and Pgvector as online store.
 
 ```yaml
-project: feast_demo_local
+project: local_rag
 provider: local
-registry:
-  registry_type: sql
-  path: postgresql://@localhost:5432/feast
+registry: data/registry.db
 online_store:
-  type: postgres
+  type: milvus
+  path: data/online_store.db
   vector_enabled: true
-  vector_len: 384
-  host: 127.0.0.1
-  port: 5432
-  database: feast
-  user: ""
-  password: ""
+  embedding_dim: 384
+  index_type: "IVF_FLAT"
 
 
 offline_store:
   type: file
-entity_key_serialization_version: 2
+entity_key_serialization_version: 3
+# By default, no_auth for authentication and authorization, other possible values kubernetes and oidc. Refer the documentation for more details.
+auth:
+    type: no_auth
 ```
 Run the following command in terminal to apply the feature store configuration:
 
@@ -63,75 +76,128 @@ feast apply
 Note that when you run `feast apply` you are going to apply the following Feature View that we will use for retrieval later:  
 
 ```python
-city_embeddings_feature_view = FeatureView(
-    name="city_embeddings",
-    entities=[item],
+document_embeddings = FeatureView(
+    name="embedded_documents",
+    entities=[item, author],
     schema=[
-        Field(name="Embeddings", dtype=Array(Float32)),
+        Field(
+            name="vector",
+            dtype=Array(Float32),
+            # Look how easy it is to enable RAG!
+            vector_index=True,
+            vector_search_metric="COSINE",
+        ),
+        Field(name="item_id", dtype=Int64),
+        Field(name="author_id", dtype=String),
+        Field(name="created_timestamp", dtype=UnixTimestamp),
+        Field(name="sentence_chunks", dtype=String),
+        Field(name="event_timestamp", dtype=UnixTimestamp),
     ],
-    source=source,
-    ttl=timedelta(hours=2),
+    source=rag_documents_source,
+    ttl=timedelta(hours=24),
 )
 ```
 
-Then run the following command in the terminal to materialize the data to the online store:  
-
-```shell  
-CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")  
-feast materialize-incremental $CURRENT_TIME  
+Let's use the SDK to write a data frame of embeddings to the online store:
+```python
+store.write_to_online_store(feature_view_name='city_embeddings', df=df)
 ```
 
 ### **Prepare a query embedding**
+During inference (e.g., during when a user submits a chat message) we need to embed the input text. This can be thought of as a feature transformation of the input data. In this example, we'll do this with a small Sentence Transformer from Hugging Face.
+
 ```python
-from batch_score_documents import run_model, TOKENIZER, MODEL
+import torch
+import torch.nn.functional as F
+from feast import FeatureStore
+from pymilvus import MilvusClient, DataType, FieldSchema
 from transformers import AutoTokenizer, AutoModel
-
-question = "the most populous city in the U.S. state of Texas?"
+from example_repo import city_embeddings_feature_view, item
+
+TOKENIZER = "sentence-transformers/all-MiniLM-L6-v2"
+MODEL = "sentence-transformers/all-MiniLM-L6-v2"
+
+def mean_pooling(model_output, attention_mask):
+    token_embeddings = model_output[
+        0
+    ]  # First element of model_output contains all token embeddings
+    input_mask_expanded = (
+        attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
+    )
+    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(
+        input_mask_expanded.sum(1), min=1e-9
+    )
+
+def run_model(sentences, tokenizer, model):
+    encoded_input = tokenizer(
+        sentences, padding=True, truncation=True, return_tensors="pt"
+    )
+    # Compute token embeddings
+    with torch.no_grad():
+        model_output = model(**encoded_input)
+
+    sentence_embeddings = mean_pooling(model_output, encoded_input["attention_mask"])
+    sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)
+    return sentence_embeddings
+
+question = "Which city has the largest population in New York?"
 
 tokenizer = AutoTokenizer.from_pretrained(TOKENIZER)
 model = AutoModel.from_pretrained(MODEL)
-query_embedding = run_model(question, tokenizer, model)
-query = query_embedding.detach().cpu().numpy().tolist()[0]
+query_embedding = run_model(question, tokenizer, model).detach().cpu().numpy().tolist()[0]
 ```
 
-### **Retrieve the top 5 similar documents**
-First create a feature store instance, and use the `retrieve_online_documents` API to retrieve the top 5 similar documents to the specified query.
+### **Retrieve the top K similar documents**
+First create a feature store instance, and use the `retrieve_online_documents_v2` API to retrieve the top 5 similar documents to the specified query.
 
 ```python
-from feast import FeatureStore
-store = FeatureStore(repo_path=".")
-features = store.retrieve_online_documents(
-    feature="city_embeddings:Embeddings",
-    query=query,
-    top_k=5
-).to_dict()
-
-def print_online_features(features):
-    for key, value in sorted(features.items()):
-        print(key, " : ", value)
-
-print_online_features(features)
+context_data = store.retrieve_online_documents_v2(
+    features=[
+        "city_embeddings:vector",
+        "city_embeddings:item_id",
+        "city_embeddings:state",
+        "city_embeddings:sentence_chunks",
+        "city_embeddings:wiki_summary",
+    ],
+    query=query_embedding,
+    top_k=3,
+    distance_metric='COSINE',
+).to_df()
 ```
+### **Generate the Response** 
+Let's assume we have a base prompt and a function that formats the retrieved documents called `format_documents` that we 
+can then use to generate the response with OpenAI's chat completion API.
+```python 
+FULL_PROMPT = format_documents(rag_context_data, BASE_PROMPT)
 
-### Configuration
+from openai import OpenAI
 
-We offer [PGVector](https://github.com/pgvector/pgvector), [SQLite](https://github.com/asg017/sqlite-vec), [Elasticsearch](https://www.elastic.co) and [Qdrant](https://qdrant.tech/) as Online Store options for Vector Databases.
-
-#### Installation with SQLite
+client = OpenAI(
+    api_key=os.environ.get("OPENAI_API_KEY"),
+)
+response = client.chat.completions.create(
+    model="gpt-4o-mini",
+    messages=[
+        {"role": "system", "content": FULL_PROMPT},
+        {"role": "user", "content": question}
+    ],
+)
 
-If you are using `pyenv` to manage your Python versions, you can install the SQLite extension with the following command:
-```bash
-PYTHON_CONFIGURE_OPTS="--enable-loadable-sqlite-extensions" \
-    LDFLAGS="-L/opt/homebrew/opt/sqlite/lib" \
-    CPPFLAGS="-I/opt/homebrew/opt/sqlite/include" \
-    pyenv install 3.10.14
+# And this will print the content. Look at the examples/rag/milvus-quickstart.ipynb for an end-to-end example.
+print('\n'.join([c.message.content for c in response.choices]))
 ```
-And you can the Feast install package via:
+
+### Configuration and Installation
+
+We offer [Milvus](https://milvus.io/), [PGVector](https://github.com/pgvector/pgvector), [SQLite](https://github.com/asg017/sqlite-vec), [Elasticsearch](https://www.elastic.co) and [Qdrant](https://qdrant.tech/) as Online Store options for Vector Databases.
+
+Milvus offers a convenient local implementation for vector similarity search. To use Milvus, you can install the Feast package with the Milvus extra.
+
+#### Installation with Milvus
 
 ```bash
-pip install feast[sqlite_vec]
+pip install feast[milvus]
 ```
-
 #### Installation with Elasticsearch
 
 ```bash
@@ -143,3 +209,17 @@ pip install feast[elasticsearch]
 ```bash
 pip install feast[qdrant]
 ```
+#### Installation with SQLite
+
+If you are using `pyenv` to manage your Python versions, you can install the SQLite extension with the following command:
+```bash
+PYTHON_CONFIGURE_OPTS="--enable-loadable-sqlite-extensions" \
+    LDFLAGS="-L/opt/homebrew/opt/sqlite/lib" \
+    CPPFLAGS="-I/opt/homebrew/opt/sqlite/include" \
+    pyenv install 3.10.14
+```
+
+And you can the Feast install package via:
+```bash
+pip install feast[sqlite_vec]
+```
diff --git a/docs/reference/online-stores/milvus.md b/docs/reference/online-stores/milvus.md
@@ -43,7 +43,7 @@ The set of functionality supported by online stores is described in detail [here
 Below is a matrix indicating which functionality is supported by the Milvus online store.
 
 |                                                           | Milvus |
-| :-------------------------------------------------------- |:-------|
+|:----------------------------------------------------------|:-------|
 | write feature values to the online store                  | yes    |
 | read feature values from the online store                 | yes    |
 | update infrastructure (e.g. tables) in the online store   | yes    |
@@ -59,6 +59,7 @@ Below is a matrix indicating which functionality is supported by the Milvus onli
 | support for deleting expired data                         | yes    |
 | collocated by feature view                                | no     |
 | collocated by feature service                             | no     |
-| collocated by entity key                                  | yes    |
+| collocated by entity key                                  | no     |
+| vector similarity search                                  | yes    |
 
 To compare this set of functionality against other online stores, please see the full [functionality matrix](overview.md#functionality-matrix).