Guides
Beyond RAG Ingestion: A Smarter Approach to Context Retrieval
Everyone is racing to ingest data for RAG - but it isn't always optimal for accurate/efficient retrieval. Learn when you should leverage API-based query-time retrieval instead.
Brian Yam
,
Head of Marketing
12
mins to read
RAG (retrieval augmented generation) picked up a lot of steam last year, and it has resulted in a race to "ingest everything". But this indiscriminate ingestion strategy that AI companies are adopting has led to more problems than it solves - inaccurate retrieval being the biggest challenge.
We’ve seen this happen with a lot of our enterprise AI customers.
They tell us “We want to ingest all of our users’ external data. Google Drive to start, but after that, we want Salesforce, Jira, Confluence, Slack, and dozens more.”
But after we helped all these customers go live with all of these 3rd-party connectors and ingestion pipelines, we consistently had them ask us for advice. ”Do you have any tips on retrieval accuracy? We can’t seem to get our product to retrieve the right sales data from Salesforce..”.
While optimizing RAG retrieval accuracy is a whole science of its own, one that many AI companies consider their secret sauce, one key decision that we noticed was often overlooked was:
“What data should we NOT ingest?”
After working with all of these AI companies on their RAG & retrieval strategies, we’ve put together our learnings and a framework for navigating this decision.
TL;DR
Traditional querying is much more effective at accessing the necessary structured data than a semantic search
More data does not always result in better retrieval
Ingesting data for RAG comes with permissions management overhead
In most cases, you should ingest unstructured data for RAG retrieval, and query APIs at runtime for structured data
AI agent tool calling is critical for enabling an optimized data and context retrieval strategy
Companies like Intercom are starting to build in this direction
Why RAG ingestion is not always the answer
First, let’s dive into a few of the reasons why ingesting all of your users’ data into a vector store for query-time retrieval is not always the best approach.
Vector stores don’t respect structure
While vector search in RAG is powerful for finding semantic similarities in unstructured text, it is not the right tool for searching through unstructured data. Structured databases like CRMs, Shopify inventory, or ticket management systems are already designed for precision search.
For example, imagine if an end-user is asking about customer John Smith. A vector search might return semantically similar orders (Jane Smith, John Smitten, etc.) when a database lookup is what's actually needed. On top of that, users are more likely to generate queries that require computation on top of structured data, such as "show our total net new revenue from the last 30 days", questions which are quite trivial for a relational database to answer, but become convoluted and unreliable when attempted through vector similarity. By ingesting structured data into vector stores, you're essentially trying to reinvent database functionality that already exists—and doing it less effectively.
More data != Better retrieval
As your vector store grows, retrieval accuracy can become increasingly challenging if there is a lot of noise. So while generally more data and context is going to improve response quality, if you introduce structured data that isn’t fit for a semantic search model, it’s more likely you’ll run into semantic collisions and false positives (aka retrieving context that’s not actually relevant). While techniques like hybrid search can help, where you combine semantic search and keyword search, there's no escaping the fundamental challenge that larger search spaces make finding the right information harder and less efficient.
Permissions complexity
When you ingest all of your customers’ business data, you also need to ingest and reconcile their permissions metadata. This creates a parallel permissions system that must be kept in sync with your source systems, which can be very difficult to manage. If you’re curious how this can be implemented, check out this tutorial on implementing 3rd-party permissions in your RAG application.
Stale data problems
Your customers’ external data is changing constantly. To ensure your AI product can retrieve accurate data, you’ll need to ensure that the data in your vector store is consistently updated with every update in the external source, and handles any errors that may occur in those data pipelines. Depending on the requirements for your product and use case, this may require you to ensure your database is kept in sync within seconds of changes occurring in the 3rd-party application - a very difficult engineering challenge.
Infrastructure maintenance costs
Building resilient data & permissions ingestion and pre-processing pipelines across dozens of external applications can get very expensive from an engineering perspective, not just to build, but also to maintain. The more of that work you can avoid, the faster you’ll be able to ship core product functionality.
Compliance and data governance
If you are selling into more highly regulated industries or enterprises that have stringent data governance requirements, you need to account for all of those policies and regulations across your ingested data. As an example, we have customers like Pryon whose AI products are used by the government. For companies like that, where possible, query-time retrieval offers a simpler compliance story by keeping data within already-authorized system. As for unstructured data sources where ingestion is inevitable, they need to enforce the source systems’ data retention policies and maintain clear data lineage.
Now let’s talk about why query-time retrieval is a better approach when possible.
Query-Time Retrieval for Structured Data
The prerequisite here is that the data source being pulled from at query-time must be structured in nature and accessible via API. As hinted earlier, most system of record SaaS applications fit this bill, including CRMs, e-commerce platforms, project management tools, and even HR systems.
Data is always up-to-date
When querying external data sources via APIs, you can only access the most up-to-date data that is available—changes made are reflected immediately.
Permissions are inherited via authentication
Permissions will be enforced by the 3rd-party API via each of your users’ 3rd-party credentials, which means you won’t need to reconstruct your entire permissions model to support that specific third-party data source. If your AI product tries to request data that the querying user doesn’t have access to with your user’s authentication credentials, the 3rd-party API will either return an error or an empty array.
No ingestion/storage needed
Since your product will be querying the 3rd-party API every time data from the specific source is required, there is no need to store it in a database on your end. This saves your team from needing to ingest, store, and maintain all of your customers’ external structured data on your end.
Exceptions to Query-Time Retrieval for Structured Data
While most AI applications should optimize towards query-time retrieval for structured data, there are specific use cases in which it may not be the optimal implementation.
Query complexity
There are specific scenarios where this query-time retrieval approach breaks down, even with structured data. One significant limitation being the need to support complex analytical queries that compute across large datasets.
While certain enterprise applications like Salesforce would be an exception, as they have a SOQL endpoint that enables more complex querying, most application APIs do not.
This is a screenshot of our AI agent playground for ActionKit where we wanted to test how far we could push the limits of the actions.
In general, most third-party APIs are designed for simple CRUD operations and record lookups—they rarely support the kind of sophisticated querying needed for in-depth analysis.
In fact, we have multiple customers building AI analyst-type products that needs to process natural language queries about their customers’ business data. A question like "Show me the win rates of deals across different company size and funding ranges” would require joining and analyzing data across multiple objects. This simply wouldn’t be possible to do via an API call, and would require pulling excessive amounts of data into the context window which can introduce even more complexity and error.
For these types of use cases, the optimal approach is to ingest and keep data in sync from those specific external data - albeit in your own structured database, not a vector database.
To be clear, this isn't an ingestion strategy for RAG in the traditional sense—you're not storing this data in vector embeddings. Instead, you're maintaining a proper relational database that can be queried precisely and efficiently. This pattern works particularly well when:
Queries require complex joins across multiple data objects or sources
Analysis needs to process large amounts of historical data
Real-time updates aren't critical for the use case
The external APIs have limited query capabilities
Latency
Latency should not be a concern in most use cases. In fact, we have customers equipping their AI voice support agent products with tools to query Salesforce within an AI-led customer call without any issues. But theoretically if you had insane latency requirements, ingesting and caching that data would result in faster retrieval at query-time. Generally, most applications can tolerate a few hundred milliseconds of API latency, especially if it means getting fresher, more accurate data.
Now that we’ve covered the exceptions, let’s briefly discuss how query-time retrieval works.
Query-Time Retrieval via Tool Calling
LLMs cannot call external APIs on their own, which is why you’ll need to implement agentic functionality into your product.
At its core, agents use an LLM to reason about what it should do, and can take action via ‘tools’.
It’s up to your engineering team to define individual tools for your AI product to use, which in the case of external data retrieval means wrapping individual API calls as tools. For example, they would create a tool for searching contacts in Salesforce, a tool for creating contacts in Salesforce, a tool for searching opportunities in.. you get the idea.
This does come with its challenges - as you can imagine, for every external API, you would have to build dozens of tools to provide it sufficient access to the external data.
ActionKit
This is actually one of the reasons we built ActionKit. It’s a single 'tool' that you can implement for your AI agent in a few lines of code, and will instantly give your AI product access to thousands of 3rd-party actions across the most popular applications like CRMs, ticketing systems, and more. While it can also be used to enable AI agents to take action in 3rd-party APIs (such as creating / updating records), it also provides all the necessary actions for querying those external applications.
Learn more about ActionKit here.
Real-World Example: Intercom Fin's Hybrid Approach
Since most companies know of Intercom, we thought it’d be relevant to use them as a reference. Intercom's AI customer service agent, Fin, has demonstrated this approach of balancing ingestion and query-time retrieval.
While they leverage the ingestion approach to equip Fin with context from help center articles and knowledge base content (all unstructured), when an end-user asks about specific orders or account details, Fin uses Actions (aka tool calls) to query their customers’ Shopify accounts via the API for up to date inventory/order data.
This hybrid approach showcases the benefits of selective ingestion. By maintaining help center content in a vector store, Fin can quickly provide general product information and troubleshoot with guidance. And by using API-based retrieval for dynamic data, it avoids the complexity of keeping order data synchronized while ensuring customers always receive up-to-date information about their specific situations. This architecture also naturally handles permissions, as the Shopify API integration respects existing access controls for customer data.
A Summary: When to Ingest vs. Retrieve
When to choose RAG ingestion
RAG ingestion into a vector store remains the best choice when dealing with unstructured data, as semantic search is the most effective way to retrieve relevant context. From a product development perspective, this could be ingesting your users’ Google Drive files, Confluence documents, Slack threads, knowledge bases, and/or call transcripts.
When to choose query-time retrieval
Query-time retrieval via agent tool calling is better when it comes to retrieving context from a structured data source, such as CRMs, ticketing, and even ERPs. This will save you the overhead of building and maintaining infrastructure for ingesting and maintaining millions of records of data and thousands of updates per day, while ensuring only the most up to date information is retrieved. This query-time retrieval also inherits the existing 3rd-party providers’ auth mechanisms, saving you from building a separate permissions strategy for every application.
It is up to you to decide what data should be ingested and stored, vs. queried at runtime. However, this article should have provided you the context you need to design the most efficient approach for building AI features into your AI application.