cookies

Hands-on-Tech: Choosing a vector database for querifai.ai's RAG application

Discover the choices of vector databases for RAG applications, focusing on MongoDB and Pinecone's features, costs, and practical use.
We at querifai had to make a decision on the best fitting vendor for a database that supports vector search to enable our generative AI applications. In this guide, we discuss cost-effectiveness and ease of use for different options, ultimately highlighting a preferable choice for us in managing complex data in RAG applications.

What is Retrieval-Augmented Generation (RAG)

RAG, short for Retrieval-Augmented Generation, is a technology in the field of Artificial Intelligence. It enables smart assistants by combining retrieval and generative AI. The idea behind RAG is to find information on a database that is semantically similar to the user query, add this information to the prompt and use a language model to formulate an answer based on this.

The reason RAG applications are significant is that they help automate and improve tasks that involve dealing with a lot of information. This includes answering customer queries, helping with research, task automation, or even making recommendations. They are fast and efficient and reduce the likelihood of errors.

Understanding Vector Databases in RAG Applications

In RAG applications, information like text, images, or other data types are converted into numerical formats, into vectors. Vector databases are designed to store and manage these high-dimensional vector data. Unlike traditional databases that handle simpler data types like numbers and strings, vector databases are optimized for complex, multi-dimensional data for fast searching. When a query is made, the RAG system uses the vector database to quickly find and retrieve vectors that are most similar to the query vector. A fast implementation of this is essential for effectively functioning RAG applications.

Exploring Popular Vector Database Options

Selecting the appropriate vector database is a critical decision in developing RAG applications. This choice impacts not just the application's performance but also its efficiency, scalability, and cost-effectiveness.

Although there are many popular vector databases like Pinecone, MongoDB, Chroma, Weaviate, Qdrant etc. However, this guide ultimately focuses on the comparative analysis between MongoDB and Pinecone.

The link https://benchmark.vectorview.ai/vectordbs.html provides a benchmark of Pinecone and other vector databases. However, we just looked at Pinecone in particular.

While MongoDB offers a more general-purpose database solution with added capabilities for handling vector data, Pinecone is specifically made for handling vector data for RAG applications.

Database as a Service (DBaaS) and Its Advantages

DBaaS is a cloud-based data storage and management approach that allows developers to access and use database functionality without the complexities of setting up and deploying the database infrastructure.

The integration of DBaaS in vector databases offers significant advantages in data management and analytics:

  • Scalability and flexibility
  • Reliability/ availability
  • Ease of use, lack of maintenance need

querifai.ai uses DBaaS vendors for these reasons.

Both, Pinecone and MongoDB are available as DBaaS, thus they are both an excellent fit to our stack.

Assessing Pinecone's fit

Challenges with Separate Vector Storage

Since Pinecone is solely optimized towards vector storage, all other application metadata needs to be stored on a separate database. The pinecone index accepts metadata, but only up to 40kb per vector.

Thus, one of the primary challenges in using a dedicated vector storage system is the complexity involved in syncing data across multiple databases. This poses a threat to data consistency and creates performance overheads.

Vector Storage Costs

Indexing is the process of organizing data to enable quick retrievals, Pinecone utilizes an indexing technique known as Approximate Nearest Neighbour (ANN) search, which allows for quick retrieval of vectors that are similar to a given query vector. However, indexing can be resource-intensive, as RAG applications often require handling and indexing large volumes of data, and the costs can escalate quickly.

In Pinecone's architecture, the term "pod" refers to a configured hardware component that operates the Pinecone server. Pods are the units of cloud resources, that have allocated virtual CPUs, RAM, and disk space, which supply storage and computational power for every index. Each index is tied to at least one, or multiple pods. This structural decision can increase costs, especially for applications requiring multiple indices and was one of the main reasons for us to look for alternatives.

While Pinecone allows for the separation of indices by namespaces, enabling the use of a single index for multiple applications, and thus making re-use of a single pod, we found the design choice of using only a single index across different applications rather unnatural during implementation. We thus utilized several indices (already for the different environments -Test, Production, …) which contributes significantly to the monthly costs of our stack.

MongoDB's Solution to This Challenge

MongoDB, a popular NoSQL database, addresses the challenges discussed above with its comprehensive approach. Many organizations, including ours, already host their metadata on MongoDB. This existing integration simplifies the process of implementing vector search, as there's no need to transfer data to a new system.

However, it is worth noticing that MongoDB excels in rapidly storing and querying large volumes of structured or semi-structured data. In contrast, Pinecone is tailored for similarity searches in large vector datasets. MongoDB supports vector dimensions up to 2,048, whereas Pinecone can handle up to 20,000 dimensions, making it more suitable for high-dimensional vector storage.

Cost-Effective Solution

MongoDB allows the creation of multiple indices within a single collection. This offers an advantage for applications requiring diverse indexing strategies to optimize performance across different query types. Creating numerous indices on a single collection without any additional costs is a major advantage. This contrasts with Pinecone's model, where each index requires a separate pod, ultimately increasing expenses.

In our case, using a Pinecone vector database across four environments (development global, local, test, and production) incurs significant costs, as each index requires a separate pod per environment. The smallest Pinecone pod costs ~70$ monthly, maintaining a single index across all environments costs therefore ~280$. In contrast, MongoDB allows multiple indices within a single collection, and environment specific databases can be deployed on single atlas instance. Beyond that, on querifai, the core feature is a comparison of different AI models, and different models produce vectors of different dimensions, requiring also different indices, which further increases cost and complexity.

Overall, the pricing structure makes MongoDB a more cost-effective and scalable option for querifai.ai.

Practical Implementation: MongoDB vs. Pinecone

Finally, let us illustrate some sample code for an examplary application that involves index setup and querying. The syntax of Mongo and Pinecone is rather different, since the indexing and searching for metadata differs substantially.

If it is planned to query the vector index, combined with a filter, then this filter needs to be explicitly defined in the mongoDB case, while on Pinecone, the developer can utilize basic mongo syntax to restrict the query to a certain field in the metadata.

So, let us assume that we have put embeddings of separate documents on a collection and plan to query these file by file. E.g. for a Bot that should answer questions based on the contents of a single file.

To support such an application, a compound index needs to be defined for mongo, incl. The field “file_uri”. “File_uri” is of type string, for the index setup, the syntax is “token”, see below.

Syntax for Indexing on Mongo

The following JSON structure outlines an indexing schema for MongoDB. It includes:

  • mappings: Configures how document fields are mapped and indexed.
  • dynamic: Determines if fields are indexed automatically.
  • fields: Lists specific fields to be indexed.
  • openai_embeddings: An array that defines how the openai_embeddings field in the documents is indexed.
  • type: Defines the indexing method.
  • dimensions: Specifies the size of the vector.
  • similarity: Indicates the metric used for vector comparison.
  • file_uri: A field intended for tokenized text indexing.

Syntax for Indexing on Pinecone

For Pinecone, the index setup is less complex, because metadata does not become part of the index. In the Pinecone create_index function, the fields are:

  • index name: Identifies the index.
  • dimension: Specifies vector’s dimension.
  • metric: Selects the similarity measurement method.
  • pods: Sets the quantity of computing resources.
  • pod_type: Defines the category of computing resources.

Query Syntax and Pipeline in MongoDB

The syntax to query the collection in MongoDB is then an aggregate pipeline, with the filter on the file_uri.

In the MongoDB $vectorSearch pipeline, the fields are:

  • $vectorSearch: Executes an approximate nearest neighbor vector search.
  • index: Names the Atlas Vector Search index for the query.
  • path: Identifies the document field containing the searchable vector data.
  • filter: Optional criteria for pre-filtering documents using expressions.
  • queryVector: The vector whose nearest neighbours should be found.
  • numCandidates: Specifies the number of potential matches to consider.
  • limit: Limits the number of search results returned.

Query Syntax in Pinecone

This Python code snippet demonstrates the use of Pinecone, a vector database service, for executing a query with metadata filtering. After initializing Pinecone with an API key and environment, it accesses an index named 'azure-embeddings-openai'. The query searches for the top 8 vectors closest to the specified vector, applying a metadata filter on the field 'file_uri'. The code includes both include_values=True, to return the vector values, and include_metadata=True, to include the associated metadata of each returned vector, providing a comprehensive result set based on both vector similarity and metadata criteria.

Conclusion: Why MongoDB is the Preferable Choice for Us

During the development of the querifai.ai platform, we had to choose a suitable vector database. Initially considering Pinecone, we ultimately selected MongoDB for the advantages we discussed. MongoDB's ability to create multiple vector indexes within a single collection makes it a more cost-effective choice for our operations model. Its straightforward approach to querying and building indexes was key in addressing a major development challenge, allowing us to provide a technically proficient, scalable, and cost-effective AI solution.

Sign up now on our platform to discover and compare off-the-shelf AI models tailored to your needs and business in just a few clicks!