10 Comments
User's avatar
Ed Barahona's avatar

NIT, I just stumbled on this after reading your other post about RAGs

A lot of what you describe, the 50B tokens a week, 70M chunks, and constant embedding tuning, comes down to the scale friction every RAG setup eventually hits. A cleaner long-term path could be adding a token-level filter before embedding and a graph-backed RAG retrieval layer. The token filter cuts noise before it even hits your GPU by removing boilerplate like “Forward-Looking Statements,” repeated disclaimers, duplicate tables, and normalizing numerics ($45.2M ↔ 45,200,000). That alone can drop embedding volume by 30% to 40%, lower GPU load, and make your vectors far more signal heavy.

On the retrieval side, a graph layer on top of the vector index fixes context fragmentation by linking sections, tables, and footnotes through relationships like explains, refers_to, or quantifies. So when someone searches “companies with declining net income excluding stock-based comp,” it can follow Metric(Net Income) → Footnote(Stock-Based Comp) → Table(Income Statement) instead of guessing across scattered chunks. The graph restores structure while vectors handle semantics, turning retrieval from flat and noisy into something relational and context aware.

P.S. You could take it a step further by fine tuning lightweight LoRA/QLoRA retrieval models on common analyst queries and topics, like the ones you mentioned identifying revenue drivers or detecting liquidity stress patterns. Since LoRA adapters can be hot loaded at runtime, you could dynamically switch retrieval behavior between domains or query types without retraining or redeploying the main model, keeping inference light while improving domain precision

Expand full comment
Pierre Brunelle's avatar

This is a great blog post.. I'd love to chat about your workflow and see what tools/UDF I need to build in https://github.com/pixeltable/pixeltable to support it from A to Z.

Expand full comment
inferenceloop's avatar

Are you planning to rewrite your whole stack now that you yourself have written about the issues of semantic search

Expand full comment
Nicolas Bustamante's avatar

With every leap in model capabilities, much of the scaffolding code becomes unnecessary

Expand full comment
River's avatar

You did some pretty serious work here building all these fast pipelines! Also very interesting to learn about the technology you used!

Expand full comment
Nicolas Bustamante's avatar

Thanks!! Elastic for the win again, haha!

Expand full comment
Mehdi Cornilliet's avatar

Great insights.

Do you plan to add transcripts of calls?

Expand full comment
Nicolas Bustamante's avatar

Thanks! We already have all earnings calls as well as conference transcripts like JP Morgan Healthcare or Goldman Sachs Technology Conference :)

Expand full comment
Mighty Nine's avatar

Such a great piece of content, full of insights. Thank you for sharing ☺️

Expand full comment
Nicolas Bustamante's avatar

Thanks!

Expand full comment