ClawPod: OpenClaw on HomePod

Tech Twitter has gone nuts about OpenClaw, the agent orchestration and personal assistant tool formerly known as ClawdBot.

One of OpenClaw’s superpowers is simply that it brings an AI agent to where you already are. You can chat with a powerful AI like Claude, about your own data, using a familiar mobile messaging app like iMessages, WhatsApp, or Discord. So you can be out walking your dog, and fire off a quick message to launch a Kimi-2.5 research subagent, pull a file off your laptop, or summarize your emails. It’s fun!

But you know where else you already are? In your own home! This is where AI was supposed to be delivered by the original smart speaker products like Amazon’s Echo or Apple’s HomePod, which lets you talk to Siri any time you want.

I own HomePods but frankly Siri needs a brain transplant. From day to day, you can talk to Claude about software, science, the nuances of words and languages, and about the philosophical conundrum of their own nature. With OpenClaw, you can also talk about your own work and calendar. Poor Siri, well, I mostly ask to turn on light switches.

So over the weekend I hacked up ClawPod.¹ ClawPod is a tiny HomePod-to-OpenClaw bridge. With it, you can yell “Hey Siri” and ask Siri to connect you to your Claude-backed OpenClaw agent, a bit like asking Homer Simpson to put Einstein on the line.

This post will show what that looks like, and then explain briefly how it works. (Code and instructions on GitHub.)

What you get

So what does this setup give you, and how well does it work?

Here’s a peek:

As you can see, it works. But alas I wouldn’t say it Just Works™.

It’s a reasonable choice for one-shot requests, especially to lookup personal information or general knowledge assistance. But it is not so comfortable that you’re likely to pursue extended interactions.

In particular, the flow has these limitations, which can be seen above:

Siri’s Voice. Your OpenClaw agent will speak with whatever voice Siri uses. You can’t use a custom voice from ElevenLabs or OpenAI and route that audio to the HomePod.
Indirect Invocation. You cannot connect to your agent directly. You need to ask Siri to connect you. With luck, you can say something like “Hey Siri, connect me to Dobby”. But you might be obliged to use a more literal phrase, like “Hey Siri, run the Lobster Time shortcut”.
High Latency. Because you’re not using a native, low-latency audio model, you need to wait quite a bit for LLM text generation and text/speech conversion.
Handshaking. Because you’re going through Siri, you probably want handshaking phrases to signal this encapsulation. Hence, the “Go on” cues in the video.
Flakiness. Finally, it can be finicky to setup Siri to work reliably. If your agent is called “Dobby”, and you have a friend named “Darby”, well, you might want to rename your agent to invoke it reliably.

All of these drawbacks are because of how it works, by using a Shortcut which runs on your iPhone.

How it works

Technically, the HomePod acts as a proxy for the ClawPod iOS Shortcut which runs on your iPhone, via the iOS Personal Content feature. That Shortcut connects via HTTP to the ClawPod server, which in turn is a voice proxy server for the OpenClaw gateway, to which it connects via the openclaw command-line client.

You
  │ (voice)
  ▼
HomePod / Siri
  │  (invokes)
  ▼
iOS Shortcut (runs on your iPhone/iPad)
  │  (HTTP POST /chat)
  ▼
ClawPod server (Python/FastAPI)
  │  (shells out)
  ▼
openclaw agent (CLI → Gateway)
  │
  ▼
OpenClaw agent reply (text)
  ▲
  │ (HTTP response)
  │
 iOS Shortcut
  ▲
  │ (Siri TTS)
  │
HomePod speaks

Here’s the sequence:

You speak out loud to the HomePod, invoking Siri.
The HomePod recognizes it’s you and that you’re trying to invoke an iOS Shortcut.
The HomePod then runs your iOS Shortcut on your iOS device.

You can do this without unlocking your device and without needing interactive approval from your device, using an iOS feature called Personal Content. As far as I can tell, this is the closest you can get to running your own code on an Apple HomePod. In short, you don’t really run it on the HomePod. The HomePod proxies through the iPhone.
The Shortcut on your iOS device then communicates by HTTP with the custom ClawPod server, which you run on your own server. I run it on a Linux server on my home’s local subnet, but you could also run it on a remote server if you authenticate it properly.
Finally, the ClawPod server relays your message to OpenClaw itself using the openclaw command line tool.

The reply returns in the opposite direction. The openclaw command line tool prints the reply from your agent. The ClawPod server captures that value and returns it to the iOS Shortcut as the reply to its HTTP request. The Shortcut captures the reply, and fires a Shortcut action to speak the reply out loud. If the Shortcut was triggered by the HomePod, it’s the HomePod speaks that speaks out loud.

Yeah, it’s a Rube Goldberg contraption, but so is the internet. 🤷

By the way, for me, the least reliable part by far is step 2, getting Siri on the HomePod to consistently invoke an iOS Shortcut by voice. It can be finicky, especially if you have a lot of apps, contacts, or other Shortcuts crowding the phoentic space of possible utterances. If you’re thinking of setting up ClawPod, I recommend first validating that you can configure your HomePod reliably to invoke a Shortcut of your choice. Your mileage may vary.

Setting it up

There are two main parts to set this up, the ClawPod iOS Shortcut on your iPhone, and the ClawPod server, which listens for requests from the iOS Shortcut and relays them to the OpenClaw agent.

Configuring the iOS Shortcut itself is pretty simple but setting up iOS itself and the necessary Apple HomePod configurations are a bit obscure. To run your Shortcuts, you need to enable a little-known feature called Personal Content. Personal Content also works in tandem with voice recognition. The HomePod recognizes which user is speaking, so if you configure each user’s Shortcut appropriately, then the agent will know who is speaking. Apple documents the Personal Content configuration flow in a couple articles: Set up Siri and invite others to use HomePod (Apple Support) and Set up voice recognition on HomePod or HomePod mini (Apple Support). But Apple’s tech here is so wobbly (as of iOS 26.2.1) that I would recommend first verifying Shortcut invocation works for you, before installing any other components.

To run the ClawPod server, you configure it with env vars and run it like any other Python script:

# from your OpenClaw box (or any machine which can run `openclaw agent` and reach the gateway)
cd ~/gits/clawpod
# run the server (uv resolves deps from the inline script metadata)
uv run clawpod_server.py

I provide code and fuller setup instructions in the GitHub repo. Shortcut build/install and HomePod settings are in SHORTCUT.md. Server config, env vars, and testing notes are in README.md.

Useful?

Is this useful? Too soon to say: I just built it! But I can testify I had fun doing it, which is its own use.

But for what it’s worth, if you use an Apple HomePod, then as far as I can tell, this is the only setup I have seen which delivers a powerful AI through that device.

And if you configure your OpenClaw agent appropriately, it’s good for more than trivia questions. It can answer questions about your emails and calendar, the files on your server, the files on your laptop, etc.. And the AI is not “trapped” in just the HomePod. The HomePod becomes just one more interaction surface, like WhatsApp or the Dashboard web interface, for accessing the same agent.

My opinion: this seems to me like it is obviously the future. It just hasn’t fully arrived yet. You can already see glimpses of it, with the polished voice interfaces in the ChatGPT and Grok apps, but no one has yet put together all the pieces.

This will come. We’re going to have AI with strong world knowledge and agentic reasoning, tool calling for flexible data access and problem solving, embedded in a gateway architecture which can span multiple hosts flexibly, and which can extend to multiple interaction surfaces, including voice interaction. I can’t wait. Let’s build it! :)

In fact, I vibed up ClawPod by adapting another project of mine, Homechat. Homechat is also an AI voice proxy server for the HomePod but it talks directly to an LLM provider without using OpenClaw. Homechat dates from what I think of now as the Impossible Feats Era of vibecoding, where every morning you could read on twitter that vibecoding was impossible and then in the evening make a working piece of software by doing it. But the cat is out of the bag now. ↩

What you get

How it works

Setting it up

Useful?

Footnotes