Will It Mythos?

OK, so Mythos finds really challenging security bugs, right? That’s why it’s cordoned off from the hoi polloi, to protect the world from such a powerful finder of exploits. I am skeptical of the reasons given publicly, I suspect it’s really just so much more expensive to operate than their current models that they don’t […]
GLM-5.2 is a step change for open agents

Housekeeping: Following my “State of the blog” post last week, noting a slight increase in paid features, it’s a good time to remind folks that I offer group subscriptions with larger discounts proportional to the number of seats. I also released a new paper today on open RL recipes for terminal agents, read more here. […]
Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models

Abstract:Real-time vision demands models that are accurate, efficient, and simple to deploy across diverse hardware. The YOLO family has become widely deployed for this reason, yet most YOLO detectors still rely on non-maximum suppression at inference, carry heavy detection heads due to Distribution Focal Loss, require long training schedules, and can leave the smallest objects […]
VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

Abstract:This technical report introduces VibeThinker-3B, a compact dense model with 3B parameters developed to investigate how far verifiable reasoning can be pushed within a strictly small-model regime. Building upon the Spectrum-to-Signal post-training paradigm, we systematically enhance the model through an optimized pipeline that includes curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation. Experimental evaluations […]
An Introduction to YOLO26

YOLO26 is an end-to-end object detection and multi-task model family supporting detection, instance segmentation, pose estimation, oriented object detection, and image classification across five size variants from Nano to Extra Large. Released in January 2026, it removes Non-Maximum Suppression for lower latency and drops the Distribution Focal Loss module for better compatibility with edge and […]
In Praise of Memcached
If you happen to find yourself in a sysadmin position, or a position where you just so happen to maintain someone’s infrastructure, chances are that at some point in time the topic “we need a cache” comes up. You think for a moment and reach out for Redis, because you’re used to it, it’s fully […]
Polymarket’s viral videos showed people winning big, but the bets were fake

Polymarket is seeking the CFTC’s permission to bring its main exchange back to the US, but also offers a more limited, US-regulated version of its trading service through a mobile app. Polymarket launched the app last year after acquiring QCX, a firm that is licensed by the CFTC and now operates under the name Polymarket […]
Why eval startups fail (2025)

May 8th, 2025 Why are there so few independent eval startups? Whenever there’s a new AI trend, like agents, or voice, or voice agents, developers are faced with a flurry of options, and a subset of them are convinced that there’s a business opportunity in identifying the best models and selling that knowledge to other […]
Bipartite Matching Is in NC

Since I’m a good mood today—at a beautiful science camp with my kids, high in the mountains near Big Bear Lake in California—I thought I’d blog about something positive. Last week, five authors (Chatterjee, Ghosh, Gurjar, Raj, and Thierauf) posted a major paper to the Electronic Colloquium on Computational Complexity, which shows (or anyway, credibly […]
The anxiety of the perfect loaf: the illusion of culinary precision

One of my favorite recipes for challah does not tell you how much flour to use. To a modern amateur baker, this omission borders on heresy. In an era where we measure yeast to the gram, a recipe that merely offers a rough estimate and casually instructs you to “add flour until the dough feels tacky” […]