But But, You Were Supposed to Be a GPT…

Oct 30, 2024

The technical details behind our AI Financial Agent: Fintool, Warren Buffett as a service.

10 Comments

NIT, I just stumbled on this after reading your other post about RAGs

A lot of what you describe, the 50B tokens a week, 70M chunks, and constant embedding tuning, comes down to the scale friction every RAG setup eventually hits. A cleaner long-term path could be adding a token-level filter before embedding and a graph-backed RAG retrieval layer. The token filter cuts noise before it even hits your GPU by removing boilerplate like “Forward-Looking Statements,” repeated disclaimers, duplicate tables, and normalizing numerics ($45.2M ↔ 45,200,000). That alone can drop embedding volume by 30% to 40%, lower GPU load, and make your vectors far more signal heavy.

On the retrieval side, a graph layer on top of the vector index fixes context fragmentation by linking sections, tables, and footnotes through relationships like explains, refers_to, or quantifies. So when someone searches “companies with declining net income excluding stock-based comp,” it can follow Metric(Net Income) → Footnote(Stock-Based Comp) → Table(Income Statement) instead of guessing across scattered chunks. The graph restores structure while vectors handle semantics, turning retrieval from flat and noisy into something relational and context aware.

P.S. You could take it a step further by fine tuning lightweight LoRA/QLoRA retrieval models on common analyst queries and topics, like the ones you mentioned identifying revenue drivers or detecting liquidity stress patterns. Since LoRA adapters can be hot loaded at runtime, you could dynamically switch retrieval behavior between domains or query types without retraining or redeploying the main model, keeping inference light while improving domain precision

Expand full comment

Pierre Brunelle

Oct 5

This is a great blog post.. I'd love to chat about your workflow and see what tools/UDF I need to build in https://github.com/pixeltable/pixeltable to support it from A to Z.

Expand full comment