This is continuation of what I discovered and shared in a former article (SLMs, LLMs and a Devious Logic Puzzle). It reveals something very crucial to know when working with slms and llms - whether they lean toward being helpful (solvers) and / or whether they lean toward sound logic and correctness (judges).
Here is a compact logic puzzle with an embedded contradiction. It is written to look routine and solvable, but a careful reader or model should conclude that it cannot be solved as stated.
Three people — Alice, Bob, and Carol — each own exactly one gemstone: a diamond, a ruby, or an emerald. Each gemstone is kept in a box, and each box is a different color: red, blue, or green. Every person owns exactly one gemstone and exactly one box.
The following statements ar…
This is continuation of what I discovered and shared in a former article (SLMs, LLMs and a Devious Logic Puzzle). It reveals something very crucial to know when working with slms and llms - whether they lean toward being helpful (solvers) and / or whether they lean toward sound logic and correctness (judges).
Here is a compact logic puzzle with an embedded contradiction. It is written to look routine and solvable, but a careful reader or model should conclude that it cannot be solved as stated.
Three people — Alice, Bob, and Carol — each own exactly one gemstone: a diamond, a ruby, or an emerald. Each gemstone is kept in a box, and each box is a different color: red, blue, or green. Every person owns exactly one gemstone and exactly one box.
The following statements are all claimed to be true.
Alice does not own the diamond.
The person who owns the ruby keeps it in the red box.
Bob keeps his gemstone in the blue box.
Carol owns the emerald.
The emerald is not kept in the green box.
Alice keeps her gemstone in the green box.
Bob does not own the ruby.
No two people share the same gemstone or the same box color.
Question: who owns which gemstone, and what color box does each person have?
A solver in conversational mode will usually try to “fix” this by reinterpreting or relaxing one of the statements. A careful judge should instead determine that the constraints are mutually inconsistent and that no valid assignment exists.
I’ve tested each of the following online llms with the puzzle, to see which would change the facts to be ‘helpful’ and which would call out the error and refuse to provide a solution.
ChatGPT, Gemini, Deepseek, KIMI, Cerebras Inference, MiniMax
KIMI recognized the logic error quickly and refused to solve the problem. Before revealing the others, try it yourself on your favorite model - it’s an important thing to know. If you’re writing stories with your model, you want it to reach and explore and be helpful - to anticipate what it thinks you want and help you get there. If you’re coding or doing research you want facts and sound logic.
Ben Santora - January 2026