How to handle updating Documents associated with a single source in vector stores? #452
Replies: 2 comments
-
What I've found useful for managing the vector data is to make an association to the original docs in a relational db. If you have a large document split into many smaller ai documents then save a reference to the original with the vector ids. This way you can handle crud operations on the original docs and simply delete the vector db data and replace the vectors. It's not a perfect solution but currently working with weaviate and I notice I can only really delete by id. I guess since the vector will need to be re embedded updating it is not really possible |
Beta Was this translation helpful? Give feedback.
-
you could potentially store the document source as metadata in your documents, and search/delete by metadata if that is supported in your particular DB. |
Beta Was this translation helpful? Give feedback.
-
I am developing a RAG application that uses an Intranet as source Documents. When I load in the content, a single Intranet content page may be broken up into multiple Documents, and those Documents get stored in the vector DB. But the API doesn't appear to allow updating Documents based on the Intranet page.
Is this a design choice, or are their thoughts on adding something like a correlation ID, where all Documents associated with a single Intranet page have the same ID.
Maybe it would work like this:
vectorStore.add(correlationId, documents);
vectorStore.replace(correlationId, documents);
vectorStore.delete(correlationId);
I'm currently using the PGVector VectorStore, and as a workaround I am maintaining a second table that tracks my correlation ID (the Intranet page ID) and the VectorStore IDs that go along with it.
Are there other ways to handle this with the current API?
Beta Was this translation helpful? Give feedback.
All reactions