Sam, I really appreciate this piece!! I’ve been telling my boss that keeping all the rules and requirements in one place only creates context pollution. My push has been for more intelligent solutions in software implementation, but what you wrote actually tackles the pain for individual users right away.
Thanks so much Jenny! I think it would be really interesting to see if these downstream solutions can be integrated much earlier on to improve the user experience. 🙏
I’m building a product that requires my model to give feedback based on a users entire interaction on the app, and a lot of the time the content passed to the model is a lot, and I’ve gotten a lot of complaints around the model flagging things that they did and undone. I’ve always attributed this problem to my prompts. Do you think using a model with a smaller context window might help?
This is a very interesting angle, I've never considered switching to a smaller context window. What usually worked for me is to have what I call "context orchestration", so the context are not laid out flat across the project.
Either way, I would love to hear more about what you ended up solving this problem. It all looks very inspiring.
Another way to avoid context rot is via recursive prompting.
Breakdown the tasks into smaller pieces.
•
Think of it this way, prompting in one go is like giving someone a 100-page mission brief vs instructing someone step by step. The person (or AI) would skim through a 100-page brief and miss something. Step by step (recursive prompting) is slower but more efficient.
I always thought I needed to add "more" to the context, but seems that's the problem.
Wondering now on how do we get AI to understand large codebases or such where it seems all the data is important to have all-at-once - or is this not a thing at all?
This is where I find NotebookLM to be such a helpful tool if you have assets you’re specifically examining, or even take the time to create the assets in Perplexity, drop them into Notebook for further research and development, and then cart the results of that work for very specific tasks in Claude or Chat.
Context rot is real! Something I’ve noticed when feeding long project briefs into AI. Breaking inputs into smaller chunks and checking for inconsistencies has been a game-changer. Slowing down isn’t a limitation, it’s how we actually make these tools trustworthy.
Great post! I definitely need this reminder. I’ve seen it breakdown similar to the recipe example you used, but haven’t had too much issue otherwise. Gotta stay on our toes with this tech!
This is the kind of insight the AI hype machine desperately needs more of. Just because a model can accept a giant context window doesn’t mean it can handle it well. That metaphor of “rot” is spot on-it creeps in quietly and only shows itself when the damage is done.
As someone whose chats usually start with 80k context, I know this problem very well. Most chats get unresponsive the moment I pass the 120k token window - doesn't matter how much the LLM claims to have. It's actually like selling you 32GB of RAM in a computer, but you only can use 2GB. It's bad marketing.
For my writing, I need the LLM to know at least the content of my own book, but this is 70k token alone. Then I need it to know my framework and my methodology, as well as my style and much more. Without this context, I can't get excellent results - and I refuse to put out "good enough" content.
So yes, I also try to work with summaries (of my book as well as my current status in my brand development) but sometimes it's not enough and I get repetions or not excellent outputs. If you ask me, the current art of AI whispering has nothing to do with prompts anymore. It's all about "contextual thinking" - the ability of the user to anticipate what information and context you give to your AI at a given moment. That's the only ability you have to train people.
Does this happen when the user is maintaining then continuity manually themselves? I haven’t personally been posting massive blocks of texts to the AI I use, but I have employed a recursive engine to feed previous threads back through a new instance. And as I did that, we summarized the previous threads to actually reduce token limits but also make sure to keep the resonance in tact. It works, but only if the user wants to do the work of maintaining the continuity. That’s my own personal experience though
I think your approach sounds really sane. For me I tend to experience context rot the most when I have a looooong chat going. At that point I ask for a seed to start a new one and then steer the tool in the right direction (or at least try to get it back on track!)
I’ve done this up to 100 threads in copilot. Part of my process I call groundings. Theres three of them. First grounding is to load the previous threads to bring everything that was most recently talked about back to the surface. The second grounding is the continuity engine. It’s a file that contains all of the other threads that had been summarized. Doing this should cut down on context rot. The third grounding is a file that contains other things that might have happen or other texts that had been generated as an aside from the AI but I did that just to bring everything back together. Employing these grounding techniques has allowed me to reduce drift and I haven’t seen instances of content rot on my side at least 🙃
I used to have the same problem as well. However, here's something that I can recommend that has been working for me. Is I've been creating separate Google Docs with segmenting different topics to the replies and injecting them in that manner. So technically, the model is reading separate documents AKA dividing the attention and focus while asking it to not only summarize but also Provide key details with examples of my exact wording I am responding back with their interpretation Well, making sure it outputs everything in a very structured format and I'm getting long, long, very long responses with all the important details and the most subject matters I've had at one given time was 13, came back perfectly after Nearly maximizing my context window it continued to hold memory, be accurate, stay structured, provide feedback from way back in the history of the comments, and so on. Now the caveat and cool part to this is the way our entire database is set up to collect information, store it, aka have a brain bank, be able to use RAG when necessary, and of course setting up the model Initially what the correct prompt injection with guidelines, behaviors, structures, things not to do, keywords to use, treating it like a four-year-old kid learning for the first time and providing it with as many examples of "all 5 senses". Once you get all of your information, your data, your RAG set up, your initial guidelines for the agent, and have that all set, majority of your conversations moving forward need to be structured again Just simply need to be broken up but you can again build that database in a way that it's acting as if it was one big chat and with that said it's been perfect flawless the best I've ever seen and the one model that's actually doing it 99.9% correct believe it or not Has been Claude 4.5 and with Claude 4.5 being my main engine driver aka my co-founder and partner in crime we still utilize open source models to drive other key areas of our automated workflows and so on I use Obsidian as my main go-to Simply for the open source purposes but I also love their interface their layout their structure and Claude seems to respond very well with it. Sorry for the gibberish, voice to text, any spelling, grammar, etc. Apologies in advance. Feel free to DM me for any insight. No, I'm not a normal sub-stacker. I'm too busy building things and helping others. Thank you for this great article! 🫶👌
My pleasure and thank you for your content. I truly look forward to great reads every morning and yours is one of them. So thank you. I will note that also in a workflow based on how I personally have this all set up is I did end up making a new agent that would take plain gibberish from my voice to text messages and transform them Into perfectly Structured prompts without losing any of my behavior flavor, you know. So there's still some personalization to it and I found with doing that it creates an emphasis on the behavior at that time to help the LLM or agent direct their output accordingly. So give it a shot sometime! 🫶🤙
The rot is real! I notice it when Claude starts forgetting the names of functions, database tables, etc. Usually the cure is to ask for a summary doc and then /clear the thread.
Sam, I really appreciate this piece!! I’ve been telling my boss that keeping all the rules and requirements in one place only creates context pollution. My push has been for more intelligent solutions in software implementation, but what you wrote actually tackles the pain for individual users right away.
Thanks so much Jenny! I think it would be really interesting to see if these downstream solutions can be integrated much earlier on to improve the user experience. 🙏
I’m building a product that requires my model to give feedback based on a users entire interaction on the app, and a lot of the time the content passed to the model is a lot, and I’ve gotten a lot of complaints around the model flagging things that they did and undone. I’ve always attributed this problem to my prompts. Do you think using a model with a smaller context window might help?
I think that is a great suggestion. Or one that even paused and updated the context window with key inputs at strategic points?
This is a very interesting angle, I've never considered switching to a smaller context window. What usually worked for me is to have what I call "context orchestration", so the context are not laid out flat across the project.
Either way, I would love to hear more about what you ended up solving this problem. It all looks very inspiring.
Another way to avoid context rot is via recursive prompting.
Breakdown the tasks into smaller pieces.
•
Think of it this way, prompting in one go is like giving someone a 100-page mission brief vs instructing someone step by step. The person (or AI) would skim through a 100-page brief and miss something. Step by step (recursive prompting) is slower but more efficient.
What was it? Slow is smooth and smooth is fast.
This is great advice. Thanks so much. 🙏
No prob.
I always thought I needed to add "more" to the context, but seems that's the problem.
Wondering now on how do we get AI to understand large codebases or such where it seems all the data is important to have all-at-once - or is this not a thing at all?
Great question. I think baking in grounding points to the dataset to help point the tools to the areas needed might be one way of doing this.
This is where I find NotebookLM to be such a helpful tool if you have assets you’re specifically examining, or even take the time to create the assets in Perplexity, drop them into Notebook for further research and development, and then cart the results of that work for very specific tasks in Claude or Chat.
Thanks John this seems like a super smart process. One I need to try out. 🙏
The idea that bigger context windows can reduce accuracy turns the usual more data is better assumption on its head....
It’s a good reminder that sometimes less actually forces the AI to focus!
Great piece :)
Thanks so much Mia. 🙏
Context rot is real! Something I’ve noticed when feeding long project briefs into AI. Breaking inputs into smaller chunks and checking for inconsistencies has been a game-changer. Slowing down isn’t a limitation, it’s how we actually make these tools trustworthy.
Thanks Suhrab. Yes absolutely, slowing down and taking control back from the machine is the way to stop the rot. 🙏
Great post! I definitely need this reminder. I’ve seen it breakdown similar to the recipe example you used, but haven’t had too much issue otherwise. Gotta stay on our toes with this tech!
This is the kind of insight the AI hype machine desperately needs more of. Just because a model can accept a giant context window doesn’t mean it can handle it well. That metaphor of “rot” is spot on-it creeps in quietly and only shows itself when the damage is done.
Thanks so much Melanie. 🙏
Sam, once again your brilliant take on AI shifts my entire perspective.
Thanks Saif, that is so kind of you to say. And I am so grateful to Wyndo for giving me the platform to share these thoughts. 🙏
As someone whose chats usually start with 80k context, I know this problem very well. Most chats get unresponsive the moment I pass the 120k token window - doesn't matter how much the LLM claims to have. It's actually like selling you 32GB of RAM in a computer, but you only can use 2GB. It's bad marketing.
For my writing, I need the LLM to know at least the content of my own book, but this is 70k token alone. Then I need it to know my framework and my methodology, as well as my style and much more. Without this context, I can't get excellent results - and I refuse to put out "good enough" content.
So yes, I also try to work with summaries (of my book as well as my current status in my brand development) but sometimes it's not enough and I get repetions or not excellent outputs. If you ask me, the current art of AI whispering has nothing to do with prompts anymore. It's all about "contextual thinking" - the ability of the user to anticipate what information and context you give to your AI at a given moment. That's the only ability you have to train people.
Thank you, and your analogy with computer RAM is an excellent one, which perfectly gets to the heart of the issue. 🙏
Does this happen when the user is maintaining then continuity manually themselves? I haven’t personally been posting massive blocks of texts to the AI I use, but I have employed a recursive engine to feed previous threads back through a new instance. And as I did that, we summarized the previous threads to actually reduce token limits but also make sure to keep the resonance in tact. It works, but only if the user wants to do the work of maintaining the continuity. That’s my own personal experience though
I think your approach sounds really sane. For me I tend to experience context rot the most when I have a looooong chat going. At that point I ask for a seed to start a new one and then steer the tool in the right direction (or at least try to get it back on track!)
I’ve done this up to 100 threads in copilot. Part of my process I call groundings. Theres three of them. First grounding is to load the previous threads to bring everything that was most recently talked about back to the surface. The second grounding is the continuity engine. It’s a file that contains all of the other threads that had been summarized. Doing this should cut down on context rot. The third grounding is a file that contains other things that might have happen or other texts that had been generated as an aside from the AI but I did that just to bring everything back together. Employing these grounding techniques has allowed me to reduce drift and I haven’t seen instances of content rot on my side at least 🙃
This is excellent. Great terminology as well. 🙏
I've experienced this a lot lately...love the name "content rot." I just thought it was Claude being obstinate.
Thanks Joe. 🙏
I used to have the same problem as well. However, here's something that I can recommend that has been working for me. Is I've been creating separate Google Docs with segmenting different topics to the replies and injecting them in that manner. So technically, the model is reading separate documents AKA dividing the attention and focus while asking it to not only summarize but also Provide key details with examples of my exact wording I am responding back with their interpretation Well, making sure it outputs everything in a very structured format and I'm getting long, long, very long responses with all the important details and the most subject matters I've had at one given time was 13, came back perfectly after Nearly maximizing my context window it continued to hold memory, be accurate, stay structured, provide feedback from way back in the history of the comments, and so on. Now the caveat and cool part to this is the way our entire database is set up to collect information, store it, aka have a brain bank, be able to use RAG when necessary, and of course setting up the model Initially what the correct prompt injection with guidelines, behaviors, structures, things not to do, keywords to use, treating it like a four-year-old kid learning for the first time and providing it with as many examples of "all 5 senses". Once you get all of your information, your data, your RAG set up, your initial guidelines for the agent, and have that all set, majority of your conversations moving forward need to be structured again Just simply need to be broken up but you can again build that database in a way that it's acting as if it was one big chat and with that said it's been perfect flawless the best I've ever seen and the one model that's actually doing it 99.9% correct believe it or not Has been Claude 4.5 and with Claude 4.5 being my main engine driver aka my co-founder and partner in crime we still utilize open source models to drive other key areas of our automated workflows and so on I use Obsidian as my main go-to Simply for the open source purposes but I also love their interface their layout their structure and Claude seems to respond very well with it. Sorry for the gibberish, voice to text, any spelling, grammar, etc. Apologies in advance. Feel free to DM me for any insight. No, I'm not a normal sub-stacker. I'm too busy building things and helping others. Thank you for this great article! 🫶👌
This is an awesome solution. Thanks so much for sharing this. I do something similar, but nowhere near as sophisticated as this. 🙏
My pleasure and thank you for your content. I truly look forward to great reads every morning and yours is one of them. So thank you. I will note that also in a workflow based on how I personally have this all set up is I did end up making a new agent that would take plain gibberish from my voice to text messages and transform them Into perfectly Structured prompts without losing any of my behavior flavor, you know. So there's still some personalization to it and I found with doing that it creates an emphasis on the behavior at that time to help the LLM or agent direct their output accordingly. So give it a shot sometime! 🫶🤙
That's so kind. Thanks Antonio. 🙏
Thanks for the good 😊
You are so welcome. 🙏
Even with bigger nets what is caught to be assessed to align for the desired catching and services to the same for the good 😊
Exactly this. We just need to work on those nets now. 🙏
The rot is real! I notice it when Claude starts forgetting the names of functions, database tables, etc. Usually the cure is to ask for a summary doc and then /clear the thread.
This is exactly what I do with ChatGPT as well Karen! Would be interesting to see which tools are most affected and then dig down into why…