On the Uselessness of AI Assistants

There’s been some much-needed provocation of the tech world recently, popularised (but not spearheaded) by MKBHD.

There’s been a lot of back and forth on both the Humane ai pin* and the Rabbit R1. About how they’re essentially just what you’ve got on your phone already, but worse in every way. This isn’t incorrect, and showcases what is endemic in Big Tech™: a failure of imagination.

* I’m ever so slightly pissed off at this company name, because I had the same one in Chromed: Consensus and subsequently Chromed: Restore. Except instead of lower-casing the product name, I upper-cased the last letter: HumanE. Now that’s innovation.

That might seem ironic – Big Tech™ is supposed to be innovating, right? Sure, but innovation is expensive and risky. Very few companies do it, let alone well. Microsoft prefer to push partner innovations. Apple innovate, but years late. Tesla pushes the cost of innovation (like training data) onto customers. Etc.

The AI assistants we’re seeing are nothing different. Two different companies wanted to be in a hot market, but the conservatism driving investment nervousness means (again, ironically) you can’t be too bold. But hey, it’s sure easy to throw stones from the sidelines. So, Richard, what would you have done to make these things better?

If you have a look at yesterday’s post, you’ll see a conversation between me and an AI assistant. This is pretty phenomenal, not least of which because it’s an 8B parameter model running on local hardware. The product/feature gap is that it’s not doing this itself; it’s not general purpose AI, walking around with thoughts of its own. It still needs a human (me) to prompt it.

If you coupled multiple technologies though, you can have something driving something else. This isn’t a radical idea; it’s essentially what Fooocus does. This is a Stable Diffusion art generator that uses GPT2 for prompt expansion. Basically, you say, “Bro, draw me an elf,” and it uses a generative large language model (GPT2) to drive the generative art model (Stable Diffusion) to make better art. A prompt like this:

an elf

Fooocus GPT2 expands to this:

[Prompt Expansion] an elf, intricate, highly detailed, lush, dramatic light, sharp focus, animated, strong, stunning, noble, fair, inspired, extremely handsome, background composed, iconic, fine, epic composition, amazing detail, professional, best, creative, positive, glowing, vibrant, pure, beautiful, symmetry, illuminated, perfect, coherent, cute, very inspirational, color

Which gives you this instead of that (Fooocus left, AUTOMATIC1111 right):

Fooocus vs. AUTOMATIC1111 with the prompt, “An elf,” JuggernautXL X model.

About now you’re wondering whether I dropped a tab of acid before starting this post, and while I’m not saying either way, we’re almost at my main point! Which is, what would happen if either the Humane ai pin or Rabbit R1 did this, but like, for your life?

We want these things to be like human assistants, but better. They have a large advantage human assistants don’t, which is that they ride on your shoulder or in your pocket. They are literally everywhere you are. They know your location, who you talk to, what car you drive, how you organise your day, who you don’t talk to, and can probably infer based on your tone what sparks joy vs. pisses you off. They are more fucking omniscient than Santa, who seems to be able to make better naughty:nice decisions than this advanced tech.

Riddle me this: when I have the All Seeing Eye in my pocket, why do I have to ask it stuff that I can do easier and faster on my phone (e.g., like how tall the Empire State Building is)? That’s old news. We can already do that better with existing tech, without the additional fiscal reaming. Why aren’t these devices learning from what I do, and using that to generate a series of conversations or ideas about me and my life?

Take the observed data. Feed it into a generative model (LAM, LLM…) and expand it. Use this to drive daily suggestions, life advice, or so on. Imagine your life coach in your pocket. Or the better angel suggesting not to send that Xitter reply as it watches what you type on the screen. Advising you to get to Alcoholics Anonymous to stop using the gin to stave off ennui. Teaching you how to parent better. Helping you train your dog. Being a BBQ cooking coach.

And so on.

This is what I mean by a failure of imagination. These devices have cameras and see what we see. In MKBHD’s videos, he shows how relatively good they are at ‘seeing’ things. And just how terribly poor they are at doing anything with that astounding wealth of information.

But hey, maybe in the 2.0 product, right? Don’t forget to preorder.

If you enjoyed this piece, get on my mailing list for more barely innovative ideas trillion dollar companies should have had first.

MKBHD on the Humane ai pin
MKBHD on the Rabbit R1

Discover more from Parrydox

Subscribe to get the latest posts to your email.