Burning Billions: The Gamble Behind Training LLM Models
Why most AI infrastructure companies end up with burned cash and broken dreams, while the application layer is printing money.
Why don’t you train your own large language model?
I've been frequently asked this question over the past year. I wrote this piece in September 2023 but never published it, thinking the answer was obvious and would become even more apparent with time. I was asked the same question twice last week, so here is my perspective.
As a reminder, Fintool is an AI equity research analyst for institutional investors. We leverage LLM to discover financial insights beyond the reach of human analysis. Fintool helps summarize long annual reports, compute numbers, and find new investment opportunities. We have a front-row seat to witness how LLMs are revolutionizing the way information is organized, consumed, and created.
Go Big or Go Broke: The LLM Training Gamble
Training large language models is challenging. It requires billions of capital to secure GPUs, hundreds of millions to label data, access to proprietary data sets, and the ability to hire the brightest minds. Vinod Khosla, an early OpenAI investor, estimated that “a typical model in 2025 will cost $5-10b to train.” Only hyperscalers like Google, Meta, or Microsoft, who are already spending 25B+ in CAPEX per year, can afford this game. A company like Meta can increase its CAPEX guidance by 3+ billion dollars to train frontier models, and that’s not a big deal considering their $43.847B free cash flow per year. Good luck competing with those guys!
The additional challenge is the requirement to always train the next frontier model to stay in the race. If your model is not first, it might as well be last. Users and customers gravitate towards the best, leaving little market for inferior models. It’s a power law where the model with the optimal mix of intelligence, speed, and cost-effectiveness dominates. It’s a multi-billion dollar recurring expense, and the window for monetization is a function of the little time your model can stay at the top of the leaderboard before being outperformed.
Sequoia Capital recently emphasized that an estimated $600 billion in revenue would be necessary to justify the massive investments in AI data centers and GPUs. In my view, as seen in most technological booms, a large portion of the money invested will ultimately be wasted, similar to the dot-com bubble that led to excessive investment in telecom infrastructure. The telecom boom saw massive capital inflows into building out networks and laying vast amounts of fiber optic cables. Companies thrived initially, but as the bubble burst, it became evident that much of the infrastructure was redundant, leading to significant financial losses. Global Crossing filed for bankruptcy with $12.4 billion in debt, while WorldCom went bankrupt with $107 billion in largely worthless assets.
Similarly, the current surge in investment for LLM infrastructure risks leading to overcapacity and inefficiencies. While a few key players may achieve significant rewards, many others will likely face considerable financial setbacks.
Most companies entering the LLM race fail despite massive investments. Bloomberg's effort, BloombergGPT, trained on 363 billion tokens, was quickly outperformed by GPT-3.5 on financial tasks. Even well-funded startups struggle: Inflection, despite raising $1.525 billion, was acqui-hired by Microsoft. Adept, with $415M in funding, is rumored to be exploring a sale, and models developed by Databricks, IBM, or Snowflake are today absent from top LLM rankings.
When I usually explains why Fintool doesn’t train its own LLM the pundit always ask: “Well in that case, why don’t you fine-tune your model on your vertical?”
Small Tweaks, Big Disappointments: The Fine-Tuning Trap
The reason for fine-tuning is the hope to get better quality on a set of tasks while reducing the cost and increasing the speed because fine-tuned models are smaller than generalist models. In my opinion, this approach is not yet yielding the results worth the millions invested. For instance, OpenAI developed Codex, a model fine-tuned on a large corpus of code, and that model was outperformed by GPT-4, a large generic model. The same was true for text-to-SQL fine-tune models, which were better on some narrow benchmarks but got outclassed by the next general model release. So far, every fine-tuned model was outclassed by the next big generic model. The rapid decline in LLM prices, coupled with significant improvements in quality and latency, makes such investments increasingly unjustifiable, in my opinion.
If you don’t like losing millions and billions of dollars, it’s better to stay away from this game. For most organizations, training or fine-tuning is driven by FOMO and a lack of understanding of technological trends. Only a few players, like B2C companies such as Character.ai, which processes 20,000 queries per second (approximately 20% of Google’s search volume), require their own models.
Intelligence for Pennies: The Real Value Lies in Applications
LLM are such a commodity that a leaked Google memo stated “we have no moats nor openai.” It’s fairly easy to switch models, and the fact that open-source models are getting better fastens the commoditization. There is still a premium for the most intelligent model, but most tasks don’t require the best intelligence. Commoditized tasks are already worth zero, while harder tasks are worth something but not much. Training LLM and selling intelligence as a service is not a great business.
Future research estimated that OpenAI makes $2.9B from ChatGPT products versus $510M a year for the API. The fact that the API of the leading provider is only 17% of their revenue exemplifies that most of the value creation and value capture happen at the application layer.
Application layers like Fintool are developing model-agnostic infrastructure tailored to specific use cases, leveraging improvements in any AI model. Just as Charlie Munger practices "sit on your ass investing," waiting for the market to recognize the intrinsic value of his investments, I practice "sit on my ass product building," where I focus on creating complex workflows that meet specific user needs, while anticipating AI models to become better, faster, and cheaper.
When we started Fintool, the cost of analyzing an earnings call for a complex task was roughly $1 with GPT-4. A year later, the cost for GPT-4 has dropped by 79.17%, and the model is significantly smarter and faster. By running open-source models, we dropped the price to less than $0.01. So, while not wasting our time and money on training or fine-tuning, we got better quality and speed with a 99.9% price drop. What’s not to like?
Thanks for sharing these insights. It's interesting that even fine-tuned models are not worth it because they get outperformed by the next general model anyway.
Awesome post, as always! ❤️