Titikey
HomeTips & TricksUnstable outputs from ChatGPT, Claude, Gemini, and Midjourney: use conversation analysis and 3 metrics to quickly pinpoint the problem

Unstable outputs from ChatGPT, Claude, Gemini, and Midjourney: use conversation analysis and 3 metrics to quickly pinpoint the problem

2/2/2026
实用技巧

Have you ever had one of those maddening moments: you throw the same request to ChatGPT, Claude, and Gemini, and the answer’s style feels like it came from “a different person”; Midjourney is even more outrageous—clearly the prompt didn’t change, yet the images feel like you’re opening a loot box. Rather than cursing based on gut feeling, I recommend using a conversation analysis mindset to give AI a “checkup” and quantify the issues.

Metric 1: Resolution rate — don’t just look at whether it wrote a lot

A commonly used KPI in conversation analysis is “resolution rate,” which, put simply, is whether this output can be used as-is. My approach is crude but effective: tag each result as “ready to deliver / needs follow-up / totally off-topic.” After a week, you’ll be able to see who’s more consistent and who’s more prone to self-indulgent rambling.

Metric 2: Rework count — the cure for answering the wrong question

Rework isn’t because you’re bad; models often miss constraints. Record the extra sentence you add—like “output as a table,” “don’t make up data,” “use Chinese”—and calculate how many additional prompts each tool needs on average before it gets it right.

  • ChatGPT: usually good at structuring, but sometimes confidently makes things up, so you need to watch it
  • Claude: more stable on long-form writing; if detailed constraints aren’t clearly stated, it can “gently drift off course”
  • Gemini: fast at synthesizing information, but it’s best to nail down formatting requirements from the start

Metric 3: Response experience — time cost is still a cost

Conversation analysis also looks at performance metrics like response time. You can track two things: wait time + the minutes you need to edit after reading. The same logic applies to Midjourney: treat the number of rerolls and the number of variations as “rework.” The more times you have to do them, the less stable the prompt or the model’s understanding.

A conclusion I often use

Once you turn “feels unstable” into data (resolution rate, rework, time), it becomes obvious at a glance whether you should change the prompt, switch models, or change the workflow.

If you want a more hassle-free way to handle subscriptions, access points, and all the tinkering around these AI tools, you can also drop by Titikey— I use it myself quite often to avoid pitfalls.

HomeShopOrders