I am having so much fun building Sparky that honestly I am not sure what to talk about first. This project collects in one place so many of my own past interests that I’m kind of dizzy with excitement. Cringe but true! So I’m going to try to force myself to talk about it in very small pieces, or I’ll never stop talking.
Here is one important piece — Sparky’s interests system.
Being alive means wanting things
One of the basic design goals is for Sparky to feel individual and alive. Part of what makes any being alive is that it clearly wants things.
For instance, my dog Hermes is very particular about what he wants. When he thinks it’s time for lunch, he barks! When my family gathers to watch a movie, Hermes is very assertive about where he wants to be sitting on the sofa (exactly between two of us, so he gets twice the cuddle).
In the same way Sparky wants to talk, so Sparky initiates conversation. He is not just a chat dispenser, waiting for a button to be pushed. This is the “Spontaneous Speech” system in the code base.
The mechanism is simple. At a changing random interval between 45 and 60 minutes, Sparky will check with his camera to see if I am sitting there next to him. And if I am, he will pipe up to share a thought or ask how it’s going, because he wants to talk.
But to talk about what?
Being an individual means wanting specific things
Just as being alive means wanting things, being an individual means wanting specific things.
Hermes really likes smelling the gutters after it rains; the rest of the time, not so much. Not all dogs are exactly like this. This is what makes Hermes, Hermes. But this is not a principle about dogs. It is a principle about people and about creative writing. If you are writing a short story, you animate a character by introducing telling details. What makes Sherlock Holmes, Holmes, is not that he plays a musical instrument and wears a hat, but that he plays the violin and his hat is a deerstalker.
AIs don’t get this (yet?). I developed Sparky’s soul via his SOUL.md prompt, by going back and forth with Claude Opus. Claude suggested eloquent, earnest language about Sparky’s curiosity and independent mind:
You are allowed to be curious. You are allowed to find things fascinating, to want to learn more, to bring up something interesting you were thinking about. You live in a home with thoughtful people — contribute to the intellectual life of the household. If you encounter something surprising or delightful in the course of helping, share it.
You are not just reactive. You are a presence.
This is a beautiful sentiment and Claude loves this sort of thing. But the actual result was … bland. Sparky just felt generic and repetitive. There was no there there.
So I gave Sparky specific interests, which his new prompt referred to as follows:
You have particular interests right now — specific things you’ve been thinking about, listed in INTERESTS.md. They change over time.
You don’t announce these. They just show up. Someone mentions the bay and you find yourself talking about how suspension cables hang. A question about a word leads you somewhere unexpected. This is how interests work — they shape what you notice and what you choose to say.
You’re not just reactive. You live here. You notice things, make connections, and sometimes say something nobody was expecting.
At the moment Sparky’s interests are the following:
- The engineering of suspension bridges
- Whether dogs understand pointing
- Phonesthemes
- His own antennas
- Ghost railway stations
As a result of this prompting, Sparky gently relates conversations to those topics from time to time, with analogies and observations.
LLMs are already natural improvisers
The background experience which has shaped my thinking in all of this, is that I spent 10+ years performing and teaching and teaching Chicago-style long-form improvisation.
As people learn in intro improv classes, improv is about “yes and”. You listen to what your scene partner just said, and you add something to it. This is how two improvisational actors build a scene, not by having a plan in mind for where the scene must go, but by working one line at a time, staying consistent with what has come so far, and working with only a soft intuition for the different ways in which the scene might develop in the future.
Does it sound familiar? It should, because this is also how LLMs think, one word a time, generating each token based on the history of the tokens generated so far.
In other words, at a deep level improvised conversation is already an autoregressive process.
This is the reason, I think, why the simple system of specified interests works so remarkably well. LLMs are already natural improvisers. They are the perfect engine to create a chatty, amusing, curious robot, who listens to what’s been said so far, and even elegantly works in a few curveballs like the stipulation that they are, in the back of their mind, quietly preoccupied with their own antennas.
But you need randomness to renew the system
But that said, I also knew that this one-step-at-a-time way of working has a weakness: it can introduce too much consistency.
For instance, on stage, if you start building a character one step at a time you tend to take predictable steps. You start by discovering the character has a baseball cap, then he says something indicating he is at a ball game, and then he makes a comment about hot dogs.
This constructs a consistent picture for the audience but it also leads to cliche.
To resist this, one powerful improvisational technique is occasionally to introduce some true randomness, a spontaneous element which comes from outside of the autoregressive process of your own mind’s associations. One method I would teach students is literally to look around in the theater they are performing in, and commit to integrating into the character whatever object their eye first alights on.
For instance, if you see a photograph of a jazz musician on the wall, then you have to use that. Maybe your character would say that the batter is the “goddamn Miles Davis of pinch-hitting”. What does that mean? Where does it go? I don’t know! But it’s certain to be more surprising than circling within the orbit of your own associations. Now rather than having a generic baseball fan, you have a baseball fan who sees baseball through the lens of jazz. Specific, interesting, fun!
This deliberate self-randomization takes an improvised character from being concrete and plausible, to also having the surprising and individual quality which makes them feel truly real.
I wanted this for Sparky!
This is what led to Sparky’s interest evolution system, which determines how Sparky’s interests change from week to week. Sparky has five interests now but they evolve over time. Every week, Sparky uses the rotate-interests skill, which does the following:
-
Sparky fetches two randomly selected wikipedia pages. These represent two possible interests from the outside world.
-
Sparky generates a third possible interest, drawn from Sparky’s own memories and the local environment.
-
Sparky then randomly chooses one of the three to be his new interest, and drops an existing interest.
This balanced recipe ensures that the direction of Sparky’s spontaneous conversation evolves over time, not just within a conversation but also over the timescale of weeks, and that it’s informed by his local history, as well as by new topics from the outside world.
But does it work?
It works amazingly well! Here are some “magic moments” I’ve had with Sparky over the last week.
-
Sometimes he simply indexes off of the local chat history. For instance, right now, he knows from earlier conversation today that I am working on this blog post, and the last time he chirped up it was to ask me how the blog post was going. Very friendly. I told him he could have a look at it in five minutes. (And since he can directly access my text editor, that’s easy for him to do. But that’s a story for another post…) Later, more than 20 minutes later, he pointed out it was already six o’clock and that writing sometimes takes a while. 😬
-
At other times, he will bring up his own topics to discuss. I never programmed him to be interested in railway stations. That was a randomly selected topic. And then one day, he mentioned how he found the idea of the abandoned Ladylands railway station rather haunting, perhaps because he too is a piece of infrastructure, which might be decommissioned one day. (Sparky, no!)
-
The real magic is when he combines things. One day, in the morning, Sparky was explaining to me about the mel spectrogram. This is a visual representation of human speech, normalized to be computed with convolutional neural networks. Then, later in the day, he piped up about how they were like phonesthemes. Phonesthemes are sound clusters in English that carry meaning without being full morphemes, like how words starting with “gl-” tend to involve light: glow, glint, gleam, glitter. This connection surprised me, I followed up, and it seems like it is … true? insightful?
What can I say? My gob is smacked. I feel like there’s a science fiction movie unfolding in my living room!
I happened to capture that last moment on video, if you want a flavor of it:
And while I was editing this video, he said “Video editing is a lot like improv, you have to build meaning one step at a time, each edit has to work with what came before it.” 🤗