Unstable outputs from ChatGPT, Claude, Gemini, and Midjourney: use conversation analysis and 3 metrics to quickly pinpoint the problem

Have you ever had one of those maddening moments: you throw the same request to ChatGPT, Claude, and Gemini, and the answer’s style feels like it came from “a different person”; Midjourney is even more outrageous—clearly the prompt didn’t change, yet the images feel like you’re opening a loot box. Rather than cursing based on gut feeling, I recommend using a conversation analysis mindset to give AI a “checkup” and quantify the issues.

Metric 1: Resolution rate — don’t just look at whether it wrote a lot

A commonly used KPI in conversation analysis is “resolution rate,” which, put simply, is whether this output can be used as-is. My approach is crude but effective: tag each result as “ready to deliver / needs follow-up / totally off-topic.” After a week, you’ll be able to see who’s more consistent and who’s more prone to self-indulgent rambling.

Metric 2: Rework count — the cure for answering the wrong question

Rework isn’t because you’re bad; models often miss constraints. Record the extra sentence you add—like “output as a table,” “don’t make up data,” “use Chinese”—and calculate how many additional prompts each tool needs on average before it gets it right.

ChatGPT: usually good at structuring, but sometimes confidently makes things up, so you need to watch it
Claude: more stable on long-form writing; if detailed constraints aren’t clearly stated, it can “gently drift off course”
Gemini: fast at synthesizing information, but it’s best to nail down formatting requirements from the start

Metric 3: Response experience — time cost is still a cost

Conversation analysis also looks at performance metrics like response time. You can track two things: wait time + the minutes you need to edit after reading. The same logic applies to Midjourney: treat the number of rerolls and the number of variations as “rework.” The more times you have to do them, the less stable the prompt or the model’s understanding.

A conclusion I often use

Once you turn “feels unstable” into data (resolution rate, rework, time), it becomes obvious at a glance whether you should change the prompt, switch models, or change the workflow.

If you want a more hassle-free way to handle subscriptions, access points, and all the tinkering around these AI tools, you can also drop by Titikey— I use it myself quite often to avoid pitfalls.

Metric 1: Resolution rate — don’t just look at whether it wrote a lot

Metric 2: Rework count — the cure for answering the wrong question

Metric 3: Response experience — time cost is still a cost

A conclusion I often use

Search articles

ChatGPT Pro Subscription | 30% Off | Credited in 1 Minute | Renewal Supported

Spotify Premium 3-Month Subscription | $10 Top-Up | For Your Own Account | Ad-Free Offline Listening

Popular Articles

Some of the best ChatGPT prompts—methods that can truly boost efficiency by 10x

Claude Code Installation Keeps Failing? A Step-by-Step Guide to Fix the Setup in 3 Steps

ChatGPT, Claude, Gemini, and Midjourney output fail-safe troubleshooting checklist and KISS prompt tips

An efficient ChatGPT + Claude + Gemini + Midjourney workflow to solve inconsistent outputs and rewrite meltdowns

ChatGPT and Claude always miss the point: three questioning techniques to make AI instantly understand your needs