This article is superb. Validates my learnings, some intuitions and pushes me forward in ways I've not experienced yet.
I'd like to ask you if you can tell us more about Evals and whether you've been able to get your non tech users to start thinking in terms of evals as well. I've been in the path of advocacy for cross functional teams but it's quite the challenge to target different audiences with a concept like evals (I was avoiding explaining sandboxing/observability to start and focusing on incrementality of intention -> determinism -> stability)
In the Rag obituary, you hint that classic RAG with chunking is no longer relevant because it adds point of failure and infrastructure cost & burden.
But in this article you mention that chunking strategy matters.
Did you change your mind about the classic RAG with chunking, or found out it still performs better than passing the doccument in the whole context window in some situations ? If so which ones ?
Thanks for writing this. I’d be interested to hear more about the decision to just give Agent a sandbox over giving it access to a number of discrete tools. Is it because it is too difficult to predict and manage which tools may be useful? Because doing so leads to a proliferation of tools? I am still reading through your write-up, but seems like “skills are the new tools”? If the Agent needs to do some dynamic retrieval (e.g., a simple use-case like calling a function to search a Postgres DB), where and how are the results stored? P.S. I also have a legal tech background (including a law background :tear_smiley:) and am currently working on a fintech agent as well, in a narrower niche.
Agents need total freedom. They also need to be able to retry in case the tool fails or to combine the tools in a different way. It's just way more reliable and provide better answers.
Yes skills are the new tools. Skills are the new everything. All business logic will be inside the skills
What’s sandbox better to use? Runtime docker is too heavy to process multiple tasks?
This article is superb. Validates my learnings, some intuitions and pushes me forward in ways I've not experienced yet.
I'd like to ask you if you can tell us more about Evals and whether you've been able to get your non tech users to start thinking in terms of evals as well. I've been in the path of advocacy for cross functional teams but it's quite the challenge to target different audiences with a concept like evals (I was avoiding explaining sandboxing/observability to start and focusing on incrementality of intention -> determinism -> stability)
In the Rag obituary, you hint that classic RAG with chunking is no longer relevant because it adds point of failure and infrastructure cost & burden.
But in this article you mention that chunking strategy matters.
Did you change your mind about the classic RAG with chunking, or found out it still performs better than passing the doccument in the whole context window in some situations ? If so which ones ?
Thanks for writing this. I’d be interested to hear more about the decision to just give Agent a sandbox over giving it access to a number of discrete tools. Is it because it is too difficult to predict and manage which tools may be useful? Because doing so leads to a proliferation of tools? I am still reading through your write-up, but seems like “skills are the new tools”? If the Agent needs to do some dynamic retrieval (e.g., a simple use-case like calling a function to search a Postgres DB), where and how are the results stored? P.S. I also have a legal tech background (including a law background :tear_smiley:) and am currently working on a fintech agent as well, in a narrower niche.
Agents need total freedom. They also need to be able to retry in case the tool fails or to combine the tools in a different way. It's just way more reliable and provide better answers.
Yes skills are the new tools. Skills are the new everything. All business logic will be inside the skills
Thanks! Streaming is very very important.