Nicolas Bustamante

Subscribed! Superb article! lucid explanation with solid examples. Re read it twice. Thank you Nicolas! 👏👏

Expand full comment

Thanks!

Expand full comment

Filip

Oct 4

Additionaly, Claude searches through code, which is a structured web of interrelated pieces that *must* be correctly setup in order for code to compile ie. work properly. Code is different from books, for example, where meaning and keywords relevance prevail in accuracy over structure.

In your example, I would agree that you're right, financial reports are structured, and you must take advantage of that structure. On the other hand you're still using RAG, with a more agentic approach.

The only alternative to RAG is fine-tuning. Even if you feed the model with the whole book you're still doing RAG - because you have to search for a relevant book to add to context and because you're augmenting the LLMs internal knowledge with external sources.

Expand full comment

Dmytro

I'm wondering how expensive LLMs would be in 2027

Expand full comment

Very cheap. Sam Altman: "the cost to use a given level of AI falls about 10× every 12 months."

Expand full comment

Matt Collins

Great article, Nicolas!

I agree that agentic search seems very promising. Calling it 'search' may even be doing it a disservice. Seeing agents like Claude Code's work, I wonder if something like 'agentic investigation' may be more appropriate.

Do you think we'll still want hybrid search, just perhaps with less or no chunking, as a tool for the agents to use, iteratively if necessary? (If so, I guess we'll still need much of the expensive infrastructure.)

Or are you envisaging agents having some other way to find relevant documents? e.g. Having all of an organisation's documents downloaded to a filesystem that they explore through grep, etc? (I guess this may introduce other questions like how to manage permissions in that world.)

Along these lines, Hornet (https://hornet.dev/) may be an interesting company to follow - they're building some kind of retrieval engine that's somehow optimised for use by agents.

Expand full comment

I like agentic investigation! I think for now it's good to still have an hybrid search for complex "needle in the haystack" searches. But it's no longer the default imo.

Agents can already use glob to explore the file system. E.g

ls /data/sec_filings/{10-K,10-Q,8-K}/*.txt

Expand full comment

Filip

Oct 4

I think you're not being fair to the term RAG. Claude Code is doing RAG, it's just using grep instead of vectors and inverted index. It's relying more on it's own context window + summarizing than on vector database. And you're not being fair to Search. When I'm using Claude I never start from the top of the codebase, but point it to right files.

Imagine the time needed to navigate to correct files, and how much tokens it would spend to do that. Giving pointers is not possible when customers don't know your data structure.

Expand full comment

Yes i'm talking about "traditional RAG" aka vector+keywords search versus "agentic search" (grep+glob). But yes the two of them are a form of retrieval.

It's good to give it pointers like files (i do that too). But i bet your codebase is less than 2M tokens and Claude Code can navigate it easy without pointers. Grok 4 fast can probably put the whole codebase in its context window to find the specific file it needs.

Expand full comment

Filip

Thanks for your answer, loved your article btw!

I'll also explore grep+glob more.

What is the main difference in your opinion between: agentic search over files vs. agentic search over search engine (like Elastic).

What I'm trying to ask is the following:

do you think an inverted index is not our "go to" search option anymore now that we have agents that can program queries and perform deep research?

That would be a powerful statement, a paradigm shift.

Expand full comment

rajat

For agentic search, from the blog it is clear how grep works to find exact matches into documents.

I don't know how the interconnection lf documents is being managed. Appreciate any help over here.

Expand full comment

The agent reads a doc, realizes that there is a footnote mentioning another doc and so navigate to the other doc to read it.

grep -oE '[A-Za-z0-9_-]+\.txt' filing_10K.txt \

| xargs -I{} sh -c 'echo "Reading {}"; grep -oE "[A-Za-z0-9_-]+\.txt" {}'

Expand full comment

Khe Hy

Oct 25

I’m a non technical person trying to understand enough to make good practical decisions. This was a fantastic explainer. Thank you Nick

Expand full comment

Niko

Oct 20

Awesome piece! Both on the problems with vector dbs, hybrid search, etc -- reminiscent of our experience with DeepNewz -- an AI written news website based on content from the top authors on Twitter / X.

And on the solutions with huge new context windows. Itching to try just giving all ~1M articles to Claude and see how quickly it can pull up the best ones for a search.

Not to mention the kind of multi-document navigation mentioned at the end.

Great stuff.

Expand full comment

Oct 21

Thanks Niko!

Expand full comment

Ed Barahona

Oct 14

I think you missed the main point of RAGs, which was always fast and targeted retrieval to feed context-limited LLMs. You could re-architect your ingestion pipeline to run a token-level filter before chunking and embedding to improve efficiency. Instead of heavy metadata enrichment, use a multi-stage RAG or agent-driven pipeline with a graph DB (for relational context) to improve retrieval accuracy. The “context window” expansion mainly reduces dependency on massive vector stores, it’s a cost and scaling fix, not a replacement for retrieval. And while Claude’s “investigation” is really just a fancy term for structured search, imagine combining that agentic reasoning with RAG and GraphDB for a truly hybrid setup: Agents + RAG + GraphDB RAG + Context Windows = LLMs on roids

Expand full comment

Ed Barahona

Oct 14

Context has always been part of AI pipelines, the store (now called the context window) has just gotten larger

Expand full comment

Nguyen Ngoc Hai

Oct 9

I really enjoy reading your article. Thank you very much for sharing it.

However, I have some points, and I hope you can help to clarify:

1. Security concern: while Traditional RAGs can operate on the local networks, giving LLM APIs your part/whole internal document base may be risky for leaking internal info.

2. Costs of using LLMs: Do you have any clear benchmarks for this case?

For heavy RAG users, it would still cost a lot of tokens to interact, and hard to scale up for more users. In the cases of 1 .txt file (with links) and your queries, how much would it cost to use? (I agree with your point that maintaining old RAG can be complicated and expensive)

3. You mentioned: "No need for similarity when you can use exact matches."

What happened in the case of the queries do not include the exact words, but only similar meaning words. I think Claude code will still have to pick the nearest embedding word to answer.

Expand full comment

Oct 9

1. For security/privacy Claude code is better because it runs in local versus Cursor needs to ingest your codebase.

2. Intelligence is going to zero. The cost to use a given level of intelligence per token is falling by at least 80% every 12 months. If it is expensive today it will be cheap tomorrow.

What's expensive is bad answer quality.

3. I think the agent can grep different terms at the same time like bitcoin/token/cryptocurrencies. But yes it might be slightly less effective.

Expand full comment

Nguyen Ngoc Hai

Oct 10

Thank you for your reply.

1. I am not sure, I stated my concern clear enough, it is not about Claude code vs Cursor. I mean for RAG and Agentic search, RAG search can be done locally and just feed the piece to LLMs (Can be local ones), while the other requires you give LLM APIs (the whole file for prompt context _ sorry if I understand it wrong).

2. You are totally right; bad answer is expensive. I hope that LLM API providers are not covering up the real price to fight for the initial market share.

Btw: Your Fintool system in your other article is a real great study. Thank you again for sharing it. Subscribed and shared.

Expand full comment

Christophe Fischer

Oct 8

Some solutions seem to appear in that direction: https://vectifyai.notion.site/agentic-retrieval

Expand full comment

Oct 8

Thanks for sharing! it was a great read.

Expand full comment

Christophe Fischer

Oct 8

I thought you'd like it.

Because I was exploring solutions after reading your post and stumbled upon that.

Expand full comment

healqq

While I understand and agree with the post in general, I feel like an important part is not mentioned: costs of “agentic search” will usually be way higher than traditional RAG pipelines.

Expand full comment

Reply (2)

Cost is going to zero (10x decline every six months for the same level of intelligence) and people underestimate how costly it is to maintain a RAG pipeline.

Expand full comment

jos

Althought I agree,

I am prédicting that the cost of agentic will be a problem from the past in less one a year's time , because agentic can be (will be ) build on SLM intead of LLM.

Prepare to have some (internal) servers with a GPU (either Apple M6 or a simple RTX 5070) like it is a commodity for future start-ups ;) 🤑

Expand full comment

healqq