Use GPT-5.4: Ben’s Bites Guide

Hello, I’m Ben. While I may not be a technical expert, I enjoy experimenting and creating with agents. Here you’ll find everything I’m exploring. If you’re looking to start building or enhance your ‘vibe-coding’ abilities, come join our community.

Hi everyone,

The ‘Become a Builder’ workshop we held last week had its ups and downs 😊 (Codex experienced some issues). The recording is now available, and I’m currently preparing a comprehensive guide to cover everything properly, including the topics we couldn’t touch on during the session. I’m about halfway finished with that and hope to share it with you this week.

In addition, Factory is hosting a hackathon this Thursday, where every participant will receive 200 million tokens, and there’s a Mac Mini up for grabs!

OpenAI has launched GPT 5.4 which comes in both “thinking” and “pro” variants. It advances the coding capabilities of GPT-5.3-Codex into the primary model series, enhancing its vision and tool usage efficiency, and now features a context window of 1 million tokens. Its improved functionality makes it better suited for computer tasks (see demo) and financial operations. However, it comes at a slightly higher cost compared to GPT-5.2 ($1.75/$14 vs. $2.5/$15 per million input/output tokens). OpenAI plans to maintain this distinction between instant models (GPT-5.3 Instant) and reasoning models in the future.

Additional updates from OpenAI:

ChatGPT for Excel – A sidebar extension allowing you to use ChatGPT directly from your workbooks.
Codex Security, an AI application security agent evolved from Project Aardvark, is free for a month for Enterprise customers.
Codex for Open Source – A program offering six months of ChatGPT Pro with Codex, conditional access to Codex Security, and API credits for open-source maintainers.

Moreover, OpenAI is acquiring Promptfoo, a popular open-source AI security testing tool among Fortune 500 companies, which will remain open-source.

Claude Code has introduced a new built-in skill: /loop. This allows you to schedule recurring tasks for up to three days within a single session. Additionally, you can now schedule tasks through Claude Code Desktop—these tasks will run regularly as long as your computer stays awake. They’ve also initiated a community ambassadors program for Claude.

For businesses, Anthropic has released Code Review by Claude and Claude Marketplace. The review tool utilizes a team of agents to analyze every Pull Request (PR), averaging around $15-25 per review. The marketplace helps businesses optimize their AI expenditures by allowing them to use their Anthropic commitments to fund other AI applications like GitLab, Harvey, Replit, and more.

Karpathy has introduced autoresearch, which enables agents to autonomously iterate on LLM training code. It ran for two days on 8xH100, resulting in 20 substantial improvements along with an 11% speed boost. The open-source project comprises 630 lines of code and operates on a single GPU. I expect to see more activity in this area, where agents generate and implement ideas, throughout this year.

Yann LeCun, the former Chief AI Scientist at Meta, alongside other researchers, has successfully raised over $1 billion at a $3.5 billion valuation for their startup, Advanced Machine Intelligence (AMI Labs). Operating from Paris, New York, Montreal, and Singapore, they focus on developing world models and conducting research that extends beyond traditional LLM frameworks.

Streamline your approach and enhance your sales efforts. Remember when selling meant genuinely connecting with people? Before the hassle of switching tabs and constant syncing errors? Reevo consolidates everything into one platform: prospecting, calls, pipeline, and reporting happen all in a single tab, from prospect to close. reevo.ai*

Cursor Automations – Create continuous agents that can run on a schedule or activate based on events like Slack messages.
T3 Code – A desktop app for utilizing Codex CLI as an alternative to the Codex app; it’s user-friendly, but still feels like it’s in alpha development.
Handles by here.now – Unique sub-domains personalized for everything you publish with your agents.
Copilot Cowork – Delegate tasks to agents that can operate across your Microsoft 365 applications.
Air by JetBrains – A development environment tailored for collaboration with agents from various vendors.
Clawcard – A secure inbox, phone number, and a credit card that safeguards against misuse by your agents.
21st Agents – Infrastructure supporting seamless agent integration into your app—covering runtime, sandboxing, billing, user interface, streaming, and beyond. Also, check out: Terminal Use (very similar, YC W26).
Code Review Tools:
- Warden by Sentry – A toolkit for reviewing every Pull Request in your codebase.
- Vet by Imbue – A rapid, local code review tool ensuring that agents follow your specifications.
- OpenReview – An open-source, self-hosted AI code review assistant powered by the Vercel AI Cloud.

Notchi – An adorable Tamagotchi that dwells in your notch, responds emotional to your interactions with Claude.
Context Hub – An open tool that provides your coding agents with the most current API documentation they need. (read more)
Agent Safehouse – A macOS-native sandboxing solution for local agents.
Flue by Astro – A framework designed for building sandboxed AI agents and CI workflows.
Slacrawl – Retrieve your Slack data locally with or without API keys.
Claude-replay – Convert Claude code session transcripts into self-contained, embeddable HTML replays.
Executor – A local-first execution environment tailored for AI agents. (read more)
Agent-coworker – An agent backend accessible via a terminal or desktop application.
Agent-kanban – A VS Code extension offering an integrated Kanban board for managing tasks assigned to coding agents.
Fractals – This tool breaks tasks into subtasks repeatedly, allowing agents to complete them while managing the entire process.
Uithub is now open-source, enabling users to convert GitHub repositories into contexts suitable for LLM.
shadcn/cli v4 – Now includes skills, presets, dry-run capability, monorepo support, and more.
Experimental UI to fork conversations and navigate side topics without disrupting the main dialogue. (read more)
An agent skill that assists you in crafting smarter, simpler, and more modern SwiftUI.
Connecting OpenClaw and Codex apps via ACP for seamless interaction.

Thank you for reading this newsletter! Feel free to share it with a friend.

That’s all for today! Don’t hesitate to share your thoughts or comments. 👋

* sponsors who support this newsletter 🙂

Interested in partnering with us for March? Limited slots are still available.

* sponsors who support this newsletter 🙂

Interested in partnering with us for March? Limited slots are still available.

Leave a Reply 取消回复

You May Also Like

I’ve Got a Hunch

Using GPT-5.6: A Guide from Ben’s Bites

Grok and Cursor Collaboration