Working with AI

Why AI Still Needs a Human Digital Operations Partner (Like Me)

Six months in, the tools are sharper and more wrong, and the human part of the work has grown more, not less, valuable. How I use (and check) AI in real client work.

1 Aug 2025 · 6 min read · By Sophie Kazandjian

Why AI Still Needs a Human Digital Operations Partner (Like Me)

Six months in, the tools are sharper and more wrong, and the human part of the work has grown more, not less, valuable.

Earlier this year I wrote a piece asking whether AI would replace Digital Operations Partners. The short version: no, but it would make us better. Several months later, the picture has shifted in the quiet details. The tools have improved in some ways and got worse in others. The part of the job that doesn't transfer to AI has grown more important, not less.

This is what I've seen, running a tech-aware, human-first digital operations practice, and how I now actually use (and check) AI in real client work.

The newer models are more capable, and more wrong

The strangest thing about the last six months is that some of the newer AI tools are less reliable than the ones that came before them. They hallucinate more often, more confidently, and in better prose.

OpenAI's own April 2025 system card for its reasoning models, o3 and o4-mini, showed o3 hallucinating on 33% of PersonQA questions and o4-mini on 48%. On the SimpleQA benchmark, o4-mini hallucinated 79% of the time. The previous reasoning model, o1, hallucinated 16% on the same PersonQA test. OpenAI itself stated that "more research is needed" to understand why this is happening.

The tools I use across a working week:

Claude: strong on long-form work and structured analysis, but I've caught it inventing statistics that read as plausible enough that you'd believe them if you didn't already know the answer.
ChatGPT: fast, and good for first drafts, but unreliable with anything involving numbers or percentages.
Omni in Airtable: useful for spotting patterns across structured data, less reliable for actual calculations.
Perplexity: I used to use it for research because it showed sources, but I've now moved away from it (the ethical AI alternatives piece explains why).

In the last six months I've seen AI produce project summaries that read as polished and confident and were almost entirely fabricated, client reports that looked professional but contained invented data points, and email sequences that captured a brand voice perfectly while proposing strategies the client would never approve.

These tools are useful. They don't know what the work is for. Someone still has to be the one asking, every time:

Is that even true?
Would the client actually say it like that?
Does this match what we agreed?
Let me double-check those numbers.

That's the part that doesn't delegate.

The work has shifted from doing to checking

The conversation about "human versus AI" stopped being useful around the start of 2025. What I do now is closer to editing and direction than to execution. The model generates a draft. I bring the brand context, the client knowledge, the operational judgment, and the willingness to say "this is wrong, try again." That willingness does more work than people realise.

For Digital Operations Partners, or anyone running similar work:

We aren't doing the tasks any more. We're curating, fixing, steering, fact-checking.
Clients don't want output, they want trust. They want someone who understands the texture of their business and won't ship them a confident lie.
AI gives you a draft. You give it direction, brand voice, accuracy, and the call about whether what came back is good enough.

Where this is heading, in my experience, is a market that increasingly values the people who can hold the steering wheel without getting impressed by the engine.

What's tripping people up

Common patterns I've watched colleagues and clients run into, including some I've made myself:

Over-trusting AI numbers

Tools fudge data more often than you'd think, especially across large spreadsheets or when layered logic is involved. I had Claude confidently tell me a client's email open rates had improved by 23% when they had actually dropped. Always verify anything involving figures, percentages, or comparisons.

The AI tell

Perfect grammar. Slightly Mid-Atlantic tone. No contractions. Sentences that open with "It's important to note" or "While it's true that". Output that reads correctly but doesn't sound like the person who's meant to have written it. The fix is unglamorous: read the draft aloud, rewrite anything that sounds like a press release, leave some texture in.

Forgetting what it was meant to remember

I use ChatGPT's memory feature and Claude's projects feature to load client preferences and project context, but neither is reliable enough for ongoing complex work. I still find myself reminding both tools, mid-conversation, about a client's tone of voice, their audience, or which buzzwords they hate. The memory helps. It doesn't replace re-briefing.

The context cliff

Both ChatGPT and Claude have conversation limits that interrupt complex work. Nothing worse than building a long, careful strategy document and hitting a session cap with no warning. The discipline is to save the context elsewhere as you go, so you can hand it back.

What AI has actually made easier

For all the above, some things have got faster:

Transcribing video and audio. Otter and similar tools save hours. I still scan transcripts for the odd hilarious misinterpretation of technical terms.
First drafts. Blog posts, social copy, tricky emails. The more specific the brief, the better the draft, which means the work shifts from facing a blank page to refining something concrete.
Reformatting one piece of writing across multiple platforms. A LinkedIn version, an Instagram caption, an email subject line, generated from one base in under a minute.
Automation between tools. Make and n8n combined with AI have replaced several hours of weekly manual work in my own practice. Invoice summaries pulled from Xero, project updates generated from ClickUp data, files routed from MailerLite to storage. None of it earth-shattering on its own, all of it adding up.

What I don't hand to AI

Some parts of the work still need a person paying attention:

Anything emotionally sensitive. Difficult emails. Conversations where someone is stressed and needs to feel heard before they need to feel helped.
Anything that requires instinct. Reading between the lines on a client call. Noticing what a client hasn't said.
Final deliverables. Nothing goes to a client without my eyes on it first.
Strategic decisions that affect the client's business.
Quality control. Particularly anything I've used AI to draft. The double-check is the work.

There's a line where handing over judgment turns you from a partner into a prompt operator. Those aren't the same role.

For people choosing whether to hire

If you're considering a Digital Operations Partner, or a VA, or whatever the equivalent is in your part of the world, two questions worth asking on the call.

How do you use AI in your work? If the answer is "I don't", they're behind. The shift has happened.

How do you check what AI gives you? If the answer is "I trust it completely", or there isn't a clear answer, that's the bigger problem. The value isn't in whether someone uses AI. It's in whether they catch the things AI gets wrong.

Where this is going

The framing that's settled across most of the research I trust is "collaborative intelligence" rather than "replacement versus augmentation". AI as a fast, slightly unreliable colleague. That matches my experience.

The Digital Operations Partners who'll do best from here are the ones who can do both jobs at once: technical fluency with the tools, and the brand judgment and operational instinct that make the work worth paying for in the first place. The tools amplify whoever's using them. They don't substitute for the person making the calls.

If you've been told AI is about to replace your work, I'd want to know who told you and why. From inside the work, what I'm seeing is more becoming possible, less of it being mindless, and a sharper distinction between the parts that need a person and the parts that don't. The middle is getting squeezed. What's left at the edges is getting clearer.

Back to the Journal