Discussion about this post

User's avatar
Pawel Jozefiak's avatar

Tested roughly the same window of updates, mostly through breaking things rather than benchmarking.

The extended thinking changes were the biggest miss for me - looked impressive in demos, slower in practice when you need quick decisions in a workflow. What actually landed: the file handling improvements and context loading. Small thing, huge practical difference when running agents on multi-day tasks.

The hype/reality gap on tools features is real. Half the announcements read like 'we made X slightly better' in engineering terms, but the blog post sounds like a breakthrough.

Good way to cut through it - pairing two builders with different use cases.

Hidayat Ali's avatar

The five custom skills you are using are just mind blowing I will also start to use them

3 more comments...

No posts

Ready for more?