Are local models strong enough for chat?
OpenClaw allows you to switch models in the middle of a session. This enables one of my favorite quick and dirty evals, which I call the “brain transplant”: start talking to a frontier model like Sonnet-4.6, switch to a local model like Nemotron 3 Super, and see if you can spot the difference.
When you do this, it turns out local models are both stronger and weaker than you’d expect. But are they strong enough for chat? Specifically, for voice chat?
One example brain transplant shows how they’re strong enough to sound smart, but maybe too weak to follow instructions in the way needed in order to sound more natural.