Race condition when using upsert_text_artifacts with the meta kwarg to upsert multiple chunks

If [BaseVectorStoreDriver.upsert_text_artifacts](https://github.com/griptape-ai/griptape/blob/1074cc7b3e71bbbed00ad94b240ce0e47e051645/griptape/drivers/vector/base_vector_store_driver.py#L41-L72) is used to upsert multiple chunks, and a dict is provided via its `meta` kwarg, then the `TextArtifact` stored in the `meta` column in the embeddings table is not guaranteed to be the artifact used to generate the embedding vector.

To reproduce (requires OpenAI credentials and a local PostgreSQL instance with pgvector):

```python
#!/usr/bin/env python

from griptape.chunkers import TextChunker
from griptape.drivers.vector.pgvector import PgVectorVectorStoreDriver
from griptape.drivers.embedding.openai import OpenAiEmbeddingDriver

# Prepare external deps
embedding_driver = OpenAiEmbeddingDriver(model="text-embedding-3-small")
vector_store = PgVectorVectorStoreDriver(
    connection_string="postgresql://localhost:5432/test_db",
    embedding_driver=embedding_driver,
    table_name="test_embeddings",
)
vector_store.setup()

test_text="""
This is some content for testing embeddings.
It spans multiple lines.
It is otherwise quite uninteresting.
"""

chunker = TextChunker(max_tokens=10)
chunks = chunker.chunk(test_text)

for i, chunk in enumerate(chunks):
    print(f"Chunk {i}: {chunk.to_text()}")

print(f"Upserting {len(chunks)} chunks...")
vector_store.upsert_text_artifacts(chunks, meta={"metadata_field": "metadata_value"})

print("Done!")
```

In the database we end up with:
```sql
test_db=# select (vector::float4[])[0:3],meta->'artifact' from test_embeddings;
                 vector                  |                                                                                                 ?column?                                                                                                 
-----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 {0.015975304,-0.01191094,0.0093917055}  | "{\"type\": \"TextArtifact\", \"id\": \"747c4fff2f00453dbedc64cb7b14b28e\", \"reference\": null, \"meta\": {}, \"name\": \"747c4fff2f00453dbedc64cb7b14b28e\", \"value\": \"It spans multiple lines.\"}"
 {-0.012808571,0.011958273,0.09028612}   | "{\"type\": \"TextArtifact\", \"id\": \"747c4fff2f00453dbedc64cb7b14b28e\", \"reference\": null, \"meta\": {}, \"name\": \"747c4fff2f00453dbedc64cb7b14b28e\", \"value\": \"It spans multiple lines.\"}"
 {-0.041691482,0.023411594,-0.032732744} | "{\"type\": \"TextArtifact\", \"id\": \"747c4fff2f00453dbedc64cb7b14b28e\", \"reference\": null, \"meta\": {}, \"name\": \"747c4fff2f00453dbedc64cb7b14b28e\", \"value\": \"It spans multiple lines.\"}"
(3 rows)
```

We see different vectors for each row but the TextArtifact stored in the `meta` column is always the same chunk.

I spent a bit of time debugging and I think the following is a likely explanation:

1. `upsert_text_artifacts` executes `BaseVectorStoreDriver.upsert_text_artifact` using worker threads.
2. Each thread adds its `TextArtifact` to the `meta` dict via [`meta["artifact"] = artifact.to_json()`](https://github.com/griptape-ai/griptape/blob/1074cc7b3e71bbbed00ad94b240ce0e47e051645/griptape/drivers/vector/base_vector_store_driver.py#L92).
3. This causes the dict to be modified for all threads.
4. When it is time for a thread to send the `meta` dict to the vector store then the `artifact` field may contain a TextArtifact handled by a different thread.

If the `meta` dict is omitted when calling `upsert_text_artifacts` then everything works as expected because [each thread creates its own `meta` dict](https://github.com/griptape-ai/griptape/blob/1074cc7b3e71bbbed00ad94b240ce0e47e051645/griptape/drivers/vector/base_vector_store_driver.py#L83).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Race condition when using upsert_text_artifacts with the meta kwarg to upsert multiple chunks #1781

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Race condition when using upsert_text_artifacts with the meta kwarg to upsert multiple chunks #1781

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions