Beautiful analysis, as Karpathy highlighted agentic behaviors is next frontier - Still think contextualization is a big challenge rn to improve on real world complex tasks (window size + proper data contexts).
Thank you for another post! I agree as someone tinkering to push all the models, i have my internal benchmarking system to test models as they come out and find out what are most suited for what task and there is definitely a positive slope,
as a mathmatitan i can’t wait new models to start farting out incomprehensible proofs well soon see how little we know
as for the general consumers they are just the loudest and it’s that simple, those who know, know.
Most people measure the speed of the car by going the speed limit, but you right, they need to floor it to notice it’s a rocket, good stuff thanks for sharing
Beautiful analysis, as Karpathy highlighted agentic behaviors is next frontier - Still think contextualization is a big challenge rn to improve on real world complex tasks (window size + proper data contexts).
Thank you for another post! I agree as someone tinkering to push all the models, i have my internal benchmarking system to test models as they come out and find out what are most suited for what task and there is definitely a positive slope,
as a mathmatitan i can’t wait new models to start farting out incomprehensible proofs well soon see how little we know
as for the general consumers they are just the loudest and it’s that simple, those who know, know.
Most people measure the speed of the car by going the speed limit, but you right, they need to floor it to notice it’s a rocket, good stuff thanks for sharing
yeah most people use LLM to correct their emails grammar.
Nice performance on the benchmark :)
thanks!