When there are lots of customer-service chat logs, it’s miserable to go through them manually: you miss high-risk phrasing, your statistical definitions aren’t consistent, and it’s easy to get dragged around by emotions. I personally prefer a “conversation analysis” approach for QA: extract intent, sentiment, and key entities, then look at metrics like resolution rate and response speed—efficiency goes way up.
How to ask ChatGPT, Claude, and Gemini about the same conversation
You can paste in a conversation and have the model output structured results; then you can drop them into a spreadsheet for statistics later.
- General prompt: Please extract user intent, sentiment (1–5), entities such as product/price/refund, whether it escalates to a complaint, provide a one-sentence improvement suggestion, and output in JSON.
- ChatGPT: Good for writing “rules” very rigidly—like QA scoring rubrics and forbidden-terms lists—so the output is more stable.
- Claude: Better at summarizing long conversations and doing nuanced analysis like “why this line would anger the user”; you’ll feel a bit schooled after reading it.
- Gemini: Handy for multilingual and channel attribution—for example, unifying mixed Chinese–English dialogs into one consistent set of labels.
Turn QA into trackable KPIs
Based on common conversation-analysis practices, don’t just look at “whether it was resolved”; also watch: Top high-frequency issues, negative-emotion triggers, first response time, and resolution rate. Once the model’s output fields are fixed, your statistical definitions won’t be argued about every day.


