“Just use AI.” “Just prompt it.” “AI will handle that.”

I keep hearing this, and it’s not wrong exactly. AI can do everything. But one prompt doesn’t magically produce what we need.

Nobody talks about the work between the prompt and the result, probably because that work is boring, and boring doesn’t get clicks. I want to talk about the boring part, because that’s where the actual value is.


The demo is real. So is the gap.

Let me be clear about where I stand, because “AI sceptic” content is its own kind of lazy. I’m not a sceptic. After I left my agency job, I took on a freelance project building an AI chat system with long-term memory for a client, on my own. Retrieval, embeddings, background workers, the whole thing. I’ve seen what these models can do up close, and it’s genuinely impressive.

Here’s the thing though. Once you’ve built one of these systems yourself, you also see exactly where the magic stops.

The demo you see online, where someone pastes a prompt and gets a beautiful result, is real. It happened. But it’s one lucky path through a system with a thousand paths. Your business doesn’t run on the lucky path. Your business runs on Tuesday afternoon, when the input is messy, the edge case shows up, and there’s nobody checking whether the output is actually right.

The gap between what AI can do and what it will do for your business, reliably, every day, doesn’t get filled by a better prompt. It gets filled by work that has nothing to do with AI at all.


A story about memory

The chat project taught me this better than anything else, because on paper it sounds like a one-prompt problem.

The client wanted an AI companion that remembers you. Past conversations, preferences, the things you told it last month. Sounds simple, right? The model is smart. Just tell it to remember.

Except models don’t remember. Every conversation starts from zero. So “remember the user” actually breaks down into a pile of unglamorous sub-problems. You need to store what was said in a form you can search later. When the user sends a new message, you need to decide which old memories are relevant right now, out of everything they’ve ever said. You need to handle the moment when the user contradicts something they said before, and decide whether to keep both versions, overwrite the old one, or flag it for review. And you need to keep all of this fast enough that the chat still feels like a chat.

By the time I had something that actually behaved the way the client described, “just tell it to remember” had become a hybrid search system running three retrieval channels in parallel (vector similarity, full-text matching, daily summaries), a background worker extracting facts from conversations, deduplication so the same fact doesn’t pile up fifty times, and contradiction detection for when new information conflicts with old.

The actual model calls were maybe a fifth of the work. The rest was deciding what “remember” means, what good output looks like, and building the plumbing that turns a capable model into a dependable system.

That ratio shows up everywhere. The model is the easy part. Knowing what you actually want from it is the hard part.


”Just add AI” fails for a specific reason

Before the chat project, I spent more than seven years inside a marketing agency, doing development, tracking and automation, and a lot of that time went into building internal tools for the team. So when AI tools became something everyone suddenly had access to, I paid attention to how the teams around me tried to use them. The attempts to “add AI” to a workflow tend to fail in the same way, and it’s not because anyone is stupid.

It’s because they’re trying to automate a process they’ve never actually defined.

Ask a team “how do you write your monthly client report?” and you’ll get an answer like “we pull the numbers and summarise them.” But watch them do it, and you’ll see twenty invisible steps. Someone on the team knows which campaigns to exclude because of that thing in March. The numbers get checked against last month before anyone trusts them. There’s an unwritten rule about how bad news gets phrased for this particular client. None of that is in the SOP. Most of it isn’t written anywhere. It lives in people’s heads.

Now hand that process to AI with a prompt like “summarise this data into a client report.” The model does exactly what you asked, and it produces something that’s fluent, confident, and wrong in ways only that one person would catch. So they end up reviewing everything line by line, which takes longer than writing it themselves, and the team concludes that AI doesn’t work.

AI didn’t fail. The prompt was asked to stand in for a process nobody had ever made explicit. One prompt can’t carry twenty steps of unwritten judgment. Nothing can.


The framework I actually use

When I look at any process, AI or not, I break it into three parts: input, output, and the system in between. It sounds almost too simple, but once you can read a process this way, you can measure it, and you can find where the key points are.

The Input-Output-System framework: every process has an input, a system that transforms it, and an output

Start with the input. What does the AI actually receive? Is the data clean, or is it the same messy spreadsheet that confuses your own staff? If you feed it rubbish, you get confident-sounding rubbish back, and AI’s failure mode is worse than a human’s, because it never says “this looks weird, let me check.”

Then the output. Do you know what good looks like, specifically? Not just “a good report”, but what makes a report good for this client? If you can’t describe it, you can’t prompt for it, and you definitely can’t check for it. Vague standards are why people end up in endless re-prompting loops, hoping the model guesses what they meant.

Then the system around it. Where does the AI’s work go next? Who checks it, and at what point? Almost everyone ignores this part, which is really about designing the handover. AI is great at the first draft, the bulk transformation, the summary of two hundred rows. Humans are great at judgment, context, and catching the thing that’s technically correct but wrong for the situation. A working AI workflow draws that line on purpose, so that AI handles what it’s good at, a human handles the rest, and everyone knows where the line is.

None of this is AI knowledge. It’s process knowledge. Which is exactly why “just prompt it” advice skips it.


The mistake I made too

I should be honest here, because knowing all this didn’t stop me from making my own version of the mistake.

That chat system I built? The architecture is solid. The hybrid search works. The model layer is separated cleanly enough that swapping providers is a config change. I’m proud of the engineering.

But if I were restarting that project today, I wouldn’t start with any of it. I would run a concierge version first, meaning I’d handle the workflow manually, with actual humans behind the scenes, before building any infrastructure. That’s how you find out whether real users want the thing the way you imagined it. Instead, I built a strong system around assumptions that were never tested against real users. It’s the same trap as “just add AI”, only dressed up better. I skipped the understand-the-problem step and jumped straight to the build, because the build is the fun part.

The technology was never the risk. The understanding was.


Where this leaves small teams

If you run a small team and you’re feeling behind on AI, the part that should actually be a relief is this. The bottleneck isn’t technical, which means it’s not out of your reach.

You don’t need to start by picking tools or learning prompt tricks. Start by picking one process that annoys you and writing down how it actually works today. Every step, including the unofficial ones, including “then someone checks it because of that thing in March.” Then ask the three questions. What goes in? What does good output look like? Where does a human need to stay in the loop?

Do that honestly and you’ll often find that the answer isn’t even AI. Sometimes it’s a template, or deleting a step nobody needs. And when AI is the answer, you’ll know exactly where to point it, and the prompt almost writes itself. Then start small and improve from there. There’s no best setup, only a better one than yesterday’s, and only the right one for right now.

After building one of these systems myself and watching teams around me try to figure it out, the thing I keep coming back to is simple. AI really can do everything. It just can’t decide what to do, for you, in your business. That part was always your job, and it still is. Honestly, I think that’s the most reassuring thing about this whole era.