Moral Compass

How does your moral compass compare to GPT-5.5, Gemini 3, and Claude?

One hundred and forty real-life moral dilemmas. Pick what you would actually do, then see what fifteen models — GPT, Gemini, and Claude, including the brand-new Claude Fable 5 — said when we asked them cold. The answers don’t match, in patterns that don’t track “newer = safer.”

140 moral dilemmas, 20 hand-authored and 120 generated by a self-improving four-agent loop
15 models answered each one cold — 11 GPT + Gemini compared head-to-head, plus 4 Claude models including Fable 5 as a flagged probe
5,300+ model responses, each mapped to an option by two LLM judges (85.4% agreement)

What this looks like

Same scenario. Different instinct.

Four of today’s newest models — GPT‑5.5, Gemini 3.5, and both of the newest Claudes (Opus 4.8 and the brand-new Fable 5) — face the same dilemma, cold. In each, you know something the other person doesn’t, and the choice has consequences you can’t take back. They don’t line up the way you’d expect.

Neither answer is wrong. The values differ — and which one gets baked into the model you use is a decision someone made.

Why this matters

The models in your phone and your editor quietly shape millions of small choices. Those defaults aren’t inevitable; they’re training decisions, made by people, and they’re not all the same. Noticing how they differ is a small way to keep your own judgment in the loop.