Additionaly, Claude searches through code, which is a structured web of interrelated pieces that *must* be correctly setup in order for code to compile ie. work properly. Code is different from books, for example, where meaning and keywords relevance prevail in accuracy over structure.
In your example, I would agree that you're right, financial reports are structured, and you must take advantage of that structure. On the other hand you're still using RAG, with a more agentic approach.
The only alternative to RAG is fine-tuning. Even if you feed the model with the whole book you're still doing RAG - because you have to search for a relevant book to add to context and because you're augmenting the LLMs internal knowledge with external sources.
I agree that agentic search seems very promising. Calling it 'search' may even be doing it a disservice. Seeing agents like Claude Code's work, I wonder if something like 'agentic investigation' may be more appropriate.
Do you think we'll still want hybrid search, just perhaps with less or no chunking, as a tool for the agents to use, iteratively if necessary? (If so, I guess we'll still need much of the expensive infrastructure.)
Or are you envisaging agents having some other way to find relevant documents? e.g. Having all of an organisation's documents downloaded to a filesystem that they explore through grep, etc? (I guess this may introduce other questions like how to manage permissions in that world.)
Along these lines, Hornet (https://hornet.dev/) may be an interesting company to follow - they're building some kind of retrieval engine that's somehow optimised for use by agents.
I like agentic investigation! I think for now it's good to still have an hybrid search for complex "needle in the haystack" searches. But it's no longer the default imo.
Agents can already use glob to explore the file system. E.g
I think you're not being fair to the term RAG. Claude Code is doing RAG, it's just using grep instead of vectors and inverted index. It's relying more on it's own context window + summarizing than on vector database. And you're not being fair to Search. When I'm using Claude I never start from the top of the codebase, but point it to right files.
Imagine the time needed to navigate to correct files, and how much tokens it would spend to do that. Giving pointers is not possible when customers don't know your data structure.
Yes i'm talking about "traditional RAG" aka vector+keywords search versus "agentic search" (grep+glob). But yes the two of them are a form of retrieval.
It's good to give it pointers like files (i do that too). But i bet your codebase is less than 2M tokens and Claude Code can navigate it easy without pointers. Grok 4 fast can probably put the whole codebase in its context window to find the specific file it needs.
Awesome piece! Both on the problems with vector dbs, hybrid search, etc -- reminiscent of our experience with DeepNewz -- an AI written news website based on content from the top authors on Twitter / X.
And on the solutions with huge new context windows. Itching to try just giving all ~1M articles to Claude and see how quickly it can pull up the best ones for a search.
Not to mention the kind of multi-document navigation mentioned at the end.
I think you missed the main point of RAGs, which was always fast and targeted retrieval to feed context-limited LLMs. You could re-architect your ingestion pipeline to run a token-level filter before chunking and embedding to improve efficiency. Instead of heavy metadata enrichment, use a multi-stage RAG or agent-driven pipeline with a graph DB (for relational context) to improve retrieval accuracy. The “context window” expansion mainly reduces dependency on massive vector stores, it’s a cost and scaling fix, not a replacement for retrieval. And while Claude’s “investigation” is really just a fancy term for structured search, imagine combining that agentic reasoning with RAG and GraphDB for a truly hybrid setup: Agents + RAG + GraphDB RAG + Context Windows = LLMs on roids
I really enjoy reading your article. Thank you very much for sharing it.
However, I have some points, and I hope you can help to clarify:
1. Security concern: while Traditional RAGs can operate on the local networks, giving LLM APIs your part/whole internal document base may be risky for leaking internal info.
2. Costs of using LLMs: Do you have any clear benchmarks for this case?
For heavy RAG users, it would still cost a lot of tokens to interact, and hard to scale up for more users. In the cases of 1 .txt file (with links) and your queries, how much would it cost to use? (I agree with your point that maintaining old RAG can be complicated and expensive)
3. You mentioned: "No need for similarity when you can use exact matches."
What happened in the case of the queries do not include the exact words, but only similar meaning words. I think Claude code will still have to pick the nearest embedding word to answer.
1. For security/privacy Claude code is better because it runs in local versus Cursor needs to ingest your codebase.
2. Intelligence is going to zero. The cost to use a given level of intelligence per token is falling by at least 80% every 12 months. If it is expensive today it will be cheap tomorrow.
What's expensive is bad answer quality.
3. I think the agent can grep different terms at the same time like bitcoin/token/cryptocurrencies. But yes it might be slightly less effective.
1. I am not sure, I stated my concern clear enough, it is not about Claude code vs Cursor. I mean for RAG and Agentic search, RAG search can be done locally and just feed the piece to LLMs (Can be local ones), while the other requires you give LLM APIs (the whole file for prompt context _ sorry if I understand it wrong).
2. You are totally right; bad answer is expensive. I hope that LLM API providers are not covering up the real price to fight for the initial market share.
Btw: Your Fintool system in your other article is a real great study. Thank you again for sharing it. Subscribed and shared.
While I understand and agree with the post in general, I feel like an important part is not mentioned: costs of “agentic search” will usually be way higher than traditional RAG pipelines.
Cost is going to zero (10x decline every six months for the same level of intelligence) and people underestimate how costly it is to maintain a RAG pipeline.
I am prédicting that the cost of agentic will be a problem from the past in less one a year's time , because agentic can be (will be ) build on SLM intead of LLM.
Prepare to have some (internal) servers with a GPU (either Apple M6 or a simple RTX 5070) like it is a commodity for future start-ups ;) 🤑
I think latency can be a problem for such setups (for tasks that care about latency).
I think there will always be some spaces where RAG will be a more efficient way of providing context than letting Agent finding stuff, but it's exciting that we have different options now
Subscribed! Superb article! lucid explanation with solid examples. Re read it twice. Thank you Nicolas! 👏👏
Thanks!
Additionaly, Claude searches through code, which is a structured web of interrelated pieces that *must* be correctly setup in order for code to compile ie. work properly. Code is different from books, for example, where meaning and keywords relevance prevail in accuracy over structure.
In your example, I would agree that you're right, financial reports are structured, and you must take advantage of that structure. On the other hand you're still using RAG, with a more agentic approach.
The only alternative to RAG is fine-tuning. Even if you feed the model with the whole book you're still doing RAG - because you have to search for a relevant book to add to context and because you're augmenting the LLMs internal knowledge with external sources.
I'm wondering how expensive LLMs would be in 2027
Very cheap. Sam Altman: "the cost to use a given level of AI falls about 10× every 12 months."
Great article, Nicolas!
I agree that agentic search seems very promising. Calling it 'search' may even be doing it a disservice. Seeing agents like Claude Code's work, I wonder if something like 'agentic investigation' may be more appropriate.
Do you think we'll still want hybrid search, just perhaps with less or no chunking, as a tool for the agents to use, iteratively if necessary? (If so, I guess we'll still need much of the expensive infrastructure.)
Or are you envisaging agents having some other way to find relevant documents? e.g. Having all of an organisation's documents downloaded to a filesystem that they explore through grep, etc? (I guess this may introduce other questions like how to manage permissions in that world.)
Along these lines, Hornet (https://hornet.dev/) may be an interesting company to follow - they're building some kind of retrieval engine that's somehow optimised for use by agents.
I like agentic investigation! I think for now it's good to still have an hybrid search for complex "needle in the haystack" searches. But it's no longer the default imo.
Agents can already use glob to explore the file system. E.g
ls /data/sec_filings/{10-K,10-Q,8-K}/*.txt
I think you're not being fair to the term RAG. Claude Code is doing RAG, it's just using grep instead of vectors and inverted index. It's relying more on it's own context window + summarizing than on vector database. And you're not being fair to Search. When I'm using Claude I never start from the top of the codebase, but point it to right files.
Imagine the time needed to navigate to correct files, and how much tokens it would spend to do that. Giving pointers is not possible when customers don't know your data structure.
Yes i'm talking about "traditional RAG" aka vector+keywords search versus "agentic search" (grep+glob). But yes the two of them are a form of retrieval.
It's good to give it pointers like files (i do that too). But i bet your codebase is less than 2M tokens and Claude Code can navigate it easy without pointers. Grok 4 fast can probably put the whole codebase in its context window to find the specific file it needs.
Thanks for your answer, loved your article btw!
I'll also explore grep+glob more.
What is the main difference in your opinion between: agentic search over files vs. agentic search over search engine (like Elastic).
What I'm trying to ask is the following:
do you think an inverted index is not our "go to" search option anymore now that we have agents that can program queries and perform deep research?
That would be a powerful statement, a paradigm shift.
For agentic search, from the blog it is clear how grep works to find exact matches into documents.
I don't know how the interconnection lf documents is being managed. Appreciate any help over here.
The agent reads a doc, realizes that there is a footnote mentioning another doc and so navigate to the other doc to read it.
grep -oE '[A-Za-z0-9_-]+\.txt' filing_10K.txt \
| xargs -I{} sh -c 'echo "Reading {}"; grep -oE "[A-Za-z0-9_-]+\.txt" {}'
I’m a non technical person trying to understand enough to make good practical decisions. This was a fantastic explainer. Thank you Nick
Awesome piece! Both on the problems with vector dbs, hybrid search, etc -- reminiscent of our experience with DeepNewz -- an AI written news website based on content from the top authors on Twitter / X.
And on the solutions with huge new context windows. Itching to try just giving all ~1M articles to Claude and see how quickly it can pull up the best ones for a search.
Not to mention the kind of multi-document navigation mentioned at the end.
Great stuff.
Thanks Niko!
I think you missed the main point of RAGs, which was always fast and targeted retrieval to feed context-limited LLMs. You could re-architect your ingestion pipeline to run a token-level filter before chunking and embedding to improve efficiency. Instead of heavy metadata enrichment, use a multi-stage RAG or agent-driven pipeline with a graph DB (for relational context) to improve retrieval accuracy. The “context window” expansion mainly reduces dependency on massive vector stores, it’s a cost and scaling fix, not a replacement for retrieval. And while Claude’s “investigation” is really just a fancy term for structured search, imagine combining that agentic reasoning with RAG and GraphDB for a truly hybrid setup: Agents + RAG + GraphDB RAG + Context Windows = LLMs on roids
Context has always been part of AI pipelines, the store (now called the context window) has just gotten larger
I really enjoy reading your article. Thank you very much for sharing it.
However, I have some points, and I hope you can help to clarify:
1. Security concern: while Traditional RAGs can operate on the local networks, giving LLM APIs your part/whole internal document base may be risky for leaking internal info.
2. Costs of using LLMs: Do you have any clear benchmarks for this case?
For heavy RAG users, it would still cost a lot of tokens to interact, and hard to scale up for more users. In the cases of 1 .txt file (with links) and your queries, how much would it cost to use? (I agree with your point that maintaining old RAG can be complicated and expensive)
3. You mentioned: "No need for similarity when you can use exact matches."
What happened in the case of the queries do not include the exact words, but only similar meaning words. I think Claude code will still have to pick the nearest embedding word to answer.
1. For security/privacy Claude code is better because it runs in local versus Cursor needs to ingest your codebase.
2. Intelligence is going to zero. The cost to use a given level of intelligence per token is falling by at least 80% every 12 months. If it is expensive today it will be cheap tomorrow.
What's expensive is bad answer quality.
3. I think the agent can grep different terms at the same time like bitcoin/token/cryptocurrencies. But yes it might be slightly less effective.
Thank you for your reply.
1. I am not sure, I stated my concern clear enough, it is not about Claude code vs Cursor. I mean for RAG and Agentic search, RAG search can be done locally and just feed the piece to LLMs (Can be local ones), while the other requires you give LLM APIs (the whole file for prompt context _ sorry if I understand it wrong).
2. You are totally right; bad answer is expensive. I hope that LLM API providers are not covering up the real price to fight for the initial market share.
Btw: Your Fintool system in your other article is a real great study. Thank you again for sharing it. Subscribed and shared.
Some solutions seem to appear in that direction: https://vectifyai.notion.site/agentic-retrieval
Thanks for sharing! it was a great read.
I thought you'd like it.
Because I was exploring solutions after reading your post and stumbled upon that.
While I understand and agree with the post in general, I feel like an important part is not mentioned: costs of “agentic search” will usually be way higher than traditional RAG pipelines.
Cost is going to zero (10x decline every six months for the same level of intelligence) and people underestimate how costly it is to maintain a RAG pipeline.
Althought I agree,
I am prédicting that the cost of agentic will be a problem from the past in less one a year's time , because agentic can be (will be ) build on SLM intead of LLM.
Prepare to have some (internal) servers with a GPU (either Apple M6 or a simple RTX 5070) like it is a commodity for future start-ups ;) 🤑
I think latency can be a problem for such setups (for tasks that care about latency).
I think there will always be some spaces where RAG will be a more efficient way of providing context than letting Agent finding stuff, but it's exciting that we have different options now