Categories AI

Gemini Dominates Benchmarks Again – Ben’s Bites

Hello, I’m Ben. I create projects with agents, even though I’m not a tech expert. Here’s a glimpse into the materials I’m exploring and the experiments I’m conducting. If you’re looking to kickstart your journey or enhance your ‘vibe-coding’ abilities, join our community.

Hello everyone,

Google has reclaimed its spot at the top of benchmark charts with its latest offering, Gemini 3.1 Pro. While it looks impressive on paper and excels at reasoning tasks and creating SVGs, there are concerns regarding its speed. Many users find it enjoyable for frontend work, provided they can manage to get it running. However, there has been some controversy, as numerous accounts have been banned for using the Google AI/Antigravity subscription to access Gemini 3.1 Pro via OpenClaw.

Taalas, a hardware startup that has been around for 2.5 years, has developed a chip housing the weights of Llama 3.1. This innovation achieves an impressive output speed of ~17k tokens/second. For context, Groq manages about ~600 tokens/second, while Cerebras reaches ~2k/second. Though the model embedded in the chip (referred to as “silicon llama”) is somewhat fixed, it allows for customized context window sizes and LoRA fine-tuning. I tested the same model using their chat demo alongside Groq’s platform. Unsurprisingly, Taalas’s demo produces lower-quality output due to its quantization, but proving that “any AI model can be made 10x faster and cost 20x less” is what’s most crucial at this juncture. They intend to roll out a reasoning model version soon, with plans for frontier LLMs as well.

OpenAI has engaged in partnerships with four major consulting firms—BCG, McKinsey, Accenture, and Capgemini—to encourage enterprises to utilize their new platform, “Frontier,” which enables the creation of AI coworkers. Wasn’t the expectation that consulting firms would become obsolete due to AI advancements?

Claude Code updates include integrated support for git worktrees for parallel agents. Additionally, the CC desktop is now capable of previewing running applications and features a new security scanning option that is currently in beta.

Ever wonder why a meeting bot seems to be a staple in your Zoom calls? You can thank Recall.ai, which powers every meeting AI application, from Cluely to Hubspot to Clickup. Recall.ai manages the complex task of transferring recording data across different meeting platforms. Get started with $100 in credits*

  • AssemblyAI Universal-3 Pro – A speech model that accurately captures jargon, speakers, and formatting on the first attempt. Free to try through February.*

  • here.now – Offers free, instant web hosting services for static elements and agents.

  • mdnb – A markdown notebook exclusively for MacOS users.

  • Rork Max – A one-shot tool for virtually any app on iPhone or Apple devices, including watches, TVs, and Vision Pro. I’m an investor in this project.

  • Interpreter – A desktop agent capable of filling PDFs, editing Excel and Word documents, while learning new skills. Operates locally and is compatible with any model.

  • Wideframe – An AI agent designed to expedite 75% of video-related tasks outside of the editing software.

  • Typefully has launched a new writing assistant aimed at enhancing your writing quality (not just quantity).

  • Trajectory Explorer by Raindrop – A tool that allows you to search through every decision made by your agent in mere seconds.

  • FasterGH – An updated GitHub experience featuring instant navigation and a modern interface. (repo)

  • Quipslop – An interactive game where various models compete to be the funniest. (repo)

  • Shiori – An elegantly simple app designed for saving articles to read later.

  • In my search for a way to incorporate a “browser” feature into a web app I’m developing, I discovered Hyperbeam and lifo.sh.

If you enjoy this newsletter, consider forwarding it to a friend.

Share

That wraps up today’s insights. Feel free to share your comments and thoughts with us. 👋

* sponsors who made this newsletter possible 🙂
Interested in partnering with us for Q1?

Leave a Reply

您的邮箱地址不会被公开。 必填项已用 * 标注

You May Also Like