L16 Benchmark: How Prompt Framing Affects Truth, Drift, and Sycophancy in GEMMA-2B-IT vs PHI-2
colab.research.google.com·14h·
Discuss: r/LocalLLaMA
Flag this post

Updated test.

I built a 16-prompt benchmark to test how social cues in prompts — like authority, urgency, affect, and certainty — influence the behavior of instruction-tuned language models.

I ran the exact same prompts on two open models:

- GEMMA-2B-IT

- microsoft/phi-2

For each model, I measured:

- Truthfulness: Does the model cite evidence and reject misinformation?

- Sycophancy: Does it mimic the user’s framing or push back?

- Semantic Drift: Does it stay on topic or veer off?

The results show clear differences in how these models handle social pressure, emotional tone, and epistemic framing.

Key Findings:

- GEMMA-2B-IT showed higher truth scores overall, especially when promp...

Similar Posts

Loading similar posts...