Categories AI

Clean Context Cheatsheet

Hello everyone,

I’m about to board my flight to San Francisco, and I just received an email informing me that there will be no WiFi on the flight today. Uh oh! I was really hoping that my 11 hours without the kids would be a productive stretch (usually, I’m offline for the long haul with no internet). Nevertheless, I still have some work to fine-tune for a talk I’m giving on Tuesday.

While in town, I’m also looking to allocate $100k checks to founders in development tools and infrastructure, in addition to catching up with my amazing LPs and meeting new ones. Ben’s Bites Fund II has already commenced its investment journey.

Back to my flight situation… I had to quickly download a few local models to utilize my agents offline, and it seems Gemma 4:26b will be my model of choice so far.

We are incredibly fortunate today to have rapid intelligence at our fingertips, and it’s amusing how accustomed we’ve become to such advanced levels of intelligence availability.

That said, local models take time to start up; you have to be more careful about the context loaded at startup. To speed things up, I’m running with no-skills and can activate the skills as needed. This might actually be preferable for me moving forward. 🤔 While local models feel slow in executing tasks, it’s really just a reflection of our expectations after being spoiled.

I’ve been delving into context management recently for a course I’m developing, and it’s been a good reminder of how tricky this can be:

  • If an agent performs web searches, you likely haven’t checked those sources to ensure they are 1. accurate, 2. free from AI inaccuracies, and 3. from a source you would endorse.

  • Little (or major) inaccuracies can slip into the context and accumulate over time.

  • Reaching around 60% of a context window is likely where you want to be for optimal performance.

  • Use different sessions for context gathering; if there are multiple documents, create a summary file consolidating the information. And do make an effort to read or at least skim it! – I promise to try.

  • I’m skeptical of context windows spanning 1M tokens. There’s a fantastic post by Thariq from Anthropic on this topic. The context necessary for my tasks shouldn’t require perfect recall beyond ~150k tokens—that’s quite a bit of information. Only when 1M context windows become standard will models effectively forget irrelevant information and assist in cleaning up context pollution along the way!

Anyway, I’m off to the gate! This introduction was a bit different, so I’d love to hear your feedback. I want to share more as I learn and explore deeper.

Ben’s Bites is sponsored by Attio, the AI CRM.

Honestly, no one really gets thrilled about a CRM. But after trying Attio, it’s a different story. It connects with Claude Code and n8n through its MCP server, seamlessly integrating my customer data and applications. And there’s more, such as flagging churn risks and converting customer feedback into Linear projects. Give it a try!

  • Claude Code’s desktop has undergone a redesign. A number of CLI-only features are now available on the desktop app, including split windows for multitasking. While it’s a significant upgrade, there’s still room for improvement, as it struggles with recognizing some CLI sessions and lacks an intuitive method for opening and editing files.

  • New models have emerged – GPT-5.4-Cyber from OpenAI, specially calibrated for cybersecurity, now available to a limited set of trusted partners. Additionally, Gemini 3.1 Flash TTS from Google promises enhanced voice quality, audio tags for tone and pacing control, and support for 70 languages.

  • Routines in Claude Code are currently in research preview. You can set up a prompt, a repository, and your connectors once, then execute on a schedule (or via API/GitHub trigger). This runs on Anthropic’s infrastructure without needing your laptop open, functioning as extended cron jobs. OpenClaw refers to these as heartbeats.

  • With the recent update to OpenAI’s Agents SDK, you can now run Codex-style agents in production settings without needing to create the entire harness from scratch. This includes sandboxed execution, computer usage, skills, memory, and compaction pre-configured.

  • Many RAG systems return incorrect answers with overconfidence. Gauntlet’s free Night School will cover how production AI engineers tackle this — including setup, evaluation, and feedback loops. Join us on Wednesday, April 22. Register for free!*

  • Skills in Chrome allow you to save prompts as reusable one-click workflows applicable to any web page you’re visiting.

  • Cursor has been upgraded to support interactive canvases, offering dashboards and custom interfaces beyond plain text.

  • Resend has launched a new email editor featuring BYOA (bring your own agent). Alongside a built-in LLM, you can integrate your own setup into the editor.

  • Sparkle v4 by Every enables AI to organize your filesystem the way you would.

  • Daniel directed an agent at five years of home-building emails (511 events, 690 documents, 170 finance records), resulting in a comprehensive project timeline for around $500 in Opus tokens.

  • Impeccable v2 enhances coding agents with a CLI scanner (functional without an LLM), a Chrome extension, and a /shape command that conducts a design interview before generating code.

  • Using Claude Code provides guidance on managing sessions, context compaction, and utilizing the 1M context window.

  • A 30-minute tutorial on building software using agents in Cursor is available for viewing.

  • Lindy AI’s founder mentioned that GLM 5.1 will likely be the default model for most use cases, replacing closed-source models, thus reducing their inference costs significantly (which are higher than payroll).

  • OpenRouter now provides video generation models via a universal API applicable across all video models.

  • Copilot in Word can now log changes and add comments to documents.

  • Windsurf 2.0 allows you to manage all your agents from a single platform and delegate tasks to the cloud with Devin.

  • Gradient Bang is an enjoyable multiplayer game featuring subagents in space, developed using Pipecat, Supabase, and open-source technology.






Share Ben’s Bites

* sponsors who make this newsletter possible 🙂
Interested in partnering with us for the upcoming quarter?
Email us at shanice@bensbites.com or k@bensbites.com

Leave a Reply

您的邮箱地址不会被公开。 必填项已用 * 标注

You May Also Like