Have you ever had one of those maddening moments: you throw the same request to ChatGPT, Claude, and Gemini, and the answer’s style feels like it came from “a different person”; Midjourney is even more outrageous—clearly the prompt didn’t change, yet the images feel like you’re opening a loot box. Rather than cursing based on gut feeling, I recommend using a conversation analysis mindset to give AI a “checkup” and quantify the issues.
Metric 1: Resolution rate — don’t just look at whether it wrote a lot
A commonly used KPI in conversation analysis is “resolution rate,” which, put simply, is whether this output can be used as-is. My approach is crude but effective: tag each result as “ready to deliver / needs follow-up / totally off-topic.” After a week, you’ll be able to see who’s more consistent and who’s more prone to self-indulgent rambling.
Metric 2: Rework count — the cure for answering the wrong question
Rework isn’t because you’re bad; models often miss constraints. Record the extra sentence you add—like “output as a table,” “don’t make up data,” “use Chinese”—and calculate how many additional prompts each tool needs on average before it gets it right.


