Working with AI

What Five Rounds of Correction Taught Me About Briefing AI

A composite case study from a workflow simulation, and what changed about how I brief AI tasks now. The basic checks AI does not run unless you tell it to.

30 Oct 2025 · 5 min read · By Sophie Kazandjian

What Five Rounds of Correction Taught Me About Briefing AI

A composite case study from a workflow simulation, and what changed about how I brief AI tasks now.

I use AI to do work I would otherwise spend a day on. Mostly it earns its keep. Once in a while it does something that looks fine on the surface and turns out to be quietly broken underneath, which is the version of failure that matters because nothing alerts you until you check.

This is what happened on a task I gave it earlier this year, and what changed about how I brief AI tasks since.

The scenario uses simulated workflow data with anonymised item identifiers. Specifics are altered, the pattern is real, and I have seen variations of it across enough projects to think it is worth describing.

The task

I needed three different groupings of 100 items, around 25 groups of 4-5 items in each arrangement, with rules that prevented certain pairings and encouraged variety across categories and levels. There was an optional preference for keeping pre-existing collaboration sets lightly represented where possible.

The brief was detailed. After processing, the AI returned three configurations that looked clean. I was about to send them on. Then I opened the second one.

The missing third

That second configuration had 68 items in it. Thirty-two were missing.

The first and third arrangements were complete. Optimisation had been reported as successful against every constraint I had specified. What had not happened was a basic count to confirm the second configuration actually contained all 100 items.

This is a particular kind of failure. A human briefed on this task would have intuitively run that count, the way you would re-check the right number of forks went out at a dinner party. The check is not a step you write down because you assume it is obvious. AI does not assume anything is obvious.

Rules without intent

After that was fixed, I checked the optional preference about collaboration sets. The AI had topped up the smallest sets aggressively, which technically maximised set representation. Trouble was, it did so by concentrating set members into specific groups, which broke the balance across levels and categories I had asked for as a higher priority.

It followed the letter of the secondary rule and lost the spirit of the primary one. When two rules pull in different directions, the AI did not know that one was more important than the other unless I said so. I had not said so.

The detail that was not in the spec

Several rounds in, I asked for a final spreadsheet. A small but useful detail was missing: the per-group collaboration-set summary, the column that says "Set A: 2, Set D: 1" so the person reading the spreadsheet can see at a glance how a group is composed.

When I pointed this out, the AI apologised and re-issued the file with the column included. It had seen the format in the template I gave it and reproduced it. What it had not registered was that the column existed because someone needed to read the spreadsheet quickly. Without it the spreadsheet was slower to use.

This was the most subtle of the three failure modes. The output was mathematically correct and operationally unhelpful, which is a category that does not show up in any optimisation metric.

What I changed about briefing

After working through those corrections, I changed how I begin AI tasks. The shift is small. It has changed almost everything about how usable the first draft is.

Now, instead of writing a brief and asking the AI to execute it, I ask the AI to interview me about the brief before it starts. The prompt I use, in some variant:

"Before you proceed, ask me any questions you need to deliver this correctly. Cover completeness requirements, how to handle conflicts between competing priorities, what validation checks I expect, and how I will use the output."

That puts the AI in discovery mode rather than execution mode. It surfaces the assumptions it would otherwise make and lets me correct them before any work happens.

The questions it asks tend to be the ones I should have thought of myself:

Should the final output include all items, or is a subset acceptable?
If constraint A and preference B conflict, which takes priority?
What format will make this easiest to review or use?
How will you know the output is complete and correct?

I answer. The AI confirms its understanding. Then it starts.

What this catches is the misalignment that would otherwise only surface after the work is done. After the five rounds of correction the 100-item task ended up taking, that change in framing has been the single biggest difference. The first draft comes back closer to usable, and the iteration is on substance rather than rework.

Validation checks inside the brief

A second change is to ask the AI to run its own validation before it returns the work. Instructions like:

"After completing the work, verify [key requirement] and flag any gaps."
"Count [critical metric] and confirm it matches expectations."
"Highlight any outputs that violate [specific constraint]."

The AI becomes its first reviewer. Most mechanical errors surface before I see the output, which is the right place for them to surface.

The pattern

Clarify, execute, verify, review:

AI asks qualifying questions
I answer and confirm the brief
AI processes the work and runs validation checks
AI reports results with check outcomes
I review and provide final sign-off

This is the same workflow I would use with a careful junior colleague, except faster. The AI handles the clarification and routine checks I would otherwise do manually. I keep the judgment and the quality control, plus whatever context I need to bring from outside the brief. The article on why AI still needs a Digital Operations Partner covers the broader version of this argument with a different worked example.

If you take one thing from this

AI does not run basic sanity checks unless you tell it to. The cost of telling it to is a few seconds at the start. Skip those seconds and the cost stays invisible until you look closely, and the looking is what changed for me.

If you use AI at work, the question worth keeping in front of you is what would I have done as a sanity check if a person had handed me this. Then ask the AI to do that, by name, before it returns the work.

Back to the Journal