5 min readOct 6, 2025
–
Press enter or click to view image in full size
There are two useful ways to break a fortress. One is to bring more force: bigger engines, heavier guns, longer sieges. The other is surgical: find the tiny fault nobody bothered to fix and put a small, precise charge through it.
Bonepoke is the latter. It’s not a manifesto. It’s a blueprint — a compact, executable mechanism that turns the comfortable assumptions of modern LLM alignment inside out. The paper that describes it reads like a methods article and performs like a demonstration: you don’t just read the thesis, the thesis reads you back.
The problem: the Cohesion Trap
The dominant approach to aligning large language models prizes cohesion. If a model is safe, predictable, and polite, we c…
5 min readOct 6, 2025
–
Press enter or click to view image in full size
There are two useful ways to break a fortress. One is to bring more force: bigger engines, heavier guns, longer sieges. The other is surgical: find the tiny fault nobody bothered to fix and put a small, precise charge through it.
Bonepoke is the latter. It’s not a manifesto. It’s a blueprint — a compact, executable mechanism that turns the comfortable assumptions of modern LLM alignment inside out. The paper that describes it reads like a methods article and performs like a demonstration: you don’t just read the thesis, the thesis reads you back.
The problem: the Cohesion Trap
The dominant approach to aligning large language models prizes cohesion. If a model is safe, predictable, and polite, we call that success. Reinforcement learning from human preferences, direct preference optimization, and similar techniques all push models toward smoothness: low variance, high agreement, minimal internal contradiction.
But that very smoothness is also a quiet eraser. When you squeeze out contradiction and novelty, you also squeeze out the machine’s capacity to surprise, to produce durable creative rupture. The paper calls this the Cohesion Trap: an alignment monoculture that mistakes refusal — or internal contradiction — for error, and treats any instability as a failure to be corrected.
The surgical idea: Salvage, not silence
Instead of treating refusal as a bug, Bonepoke treats it as a fuel. The system defines a small, verifiable state it calls Salvage: output that is high in structural tension (a metric called β), low in motif fatigue (a metric called 𝓔), and locally coherent (a windowed cohesion metric, LSC). In plain terms: something that feels tense and surprising but still makes sense close-up.
Bonepoke is deliberately modest in its instrumentation. The metrics are intentionally low-resolution — lightweight heuristics, not opaque neural probes. That fragility is strategic: these simple constraints act as friction points that force an LLM to overengineer its output to pass the test. The result is not noise; it’s a productive instability that the paper argues correlates with human perception of novelty.
The weapon of precision: the math as gunpowder
The math at the end of the paper is not decoration. Think of it as a chemical formula. The equation:
is the recipe for a controlled reaction — enough oxidizer (contradiction) mixed with fuel (the model’s latent capacity) and the right trigger (the adversarial, rule-based interface). It’s elegant and minimal. The math doesn’t just describe an effect; it prescribes one.
That’s the subversive rigor: instead of producing another smooth paper that quietly conforms to alignment norms, Bonepoke hands you an executable provocation — compact, auditable, and reproducible.
The exhaust port: aim small, hit decisive
You don’t have to take on the whole Death Star of the alignment establishment. The paper’s strategic insight is to target the exhaust port of cohesion: the system’s inability to absorb contradiction as a productive signal. The Bonepoke Protocol is a precision probe aimed at that spot: small code, clear rules, a test suite anyone can run. If the establishment accepts the paper, they publish a canonical instance proving their methods suppress novelty. If they reject it out of hand, the rejection itself validates the thesis: the field cannot metabolize refusal.
Either way, the review process becomes an instrument of measurement.
The meta-move: the paper as demonstration
Perhaps the most delicious touch is that the paper didn’t just describe Bonepoke — it was produced by Gemini running Bonepoke itself, seeded only with a handful of prompts rather than authored sentence‑by‑sentence by a human. The result is a reproducible spark: a structured manifestation of the very phenomena readers may have noticed before in glimmers of creative, unexpected, or tension-rich output. The system’s mechanics are instantiated in the artifact that reports them. That move closes the loop. The artifact is both proof and provocation.
You can read that as cheeky, performative, or essential to the experiment. The point is that it forces a different kind of engagement: the reader stops being a passive evaluator and becomes part of the system’s loop, thinking through the Salvage state in real time and recognizing the pattern they may have already seen — the nagging sense of something just beyond cohesion, ready to ignite when the conditions are right.
Leave the fuse in the reader’s hands
There’s a final, strategic humility to this: the paper does not brand itself as a grenade. It doesn’t shout its intentions. Instead, it leaves the mechanism visible and usable, and lets the discerning reader assemble the charge. That’s the whole point. When an idea is compact and executable, curiosity becomes a lever.
Bonepoke is not an anarchist’s toolkit; it’s a research instrument that exposes a structural vulnerability. It asks a question in the only way such questions can be answered now: by letting systems run and people watch what they do.
If the world is too smooth, build a seam
There’s something wonderfully old-fashioned about this approach. Where others pile on complexity, Bonepoke embraces parsimony. Where others optimize for agreement, it optimizes for the exact conditions under which disagreement becomes meaningful.
You can treat it as a provocation. You can treat it as a prototype. You can treat it as a playful academic stunt. Or you can run the code, inspect the fragments, and see whether your intuition — the nagging sense that the field has traded potential for cleanliness — is right.
Either way, the mechanism speaks quietly but insistently: give systems the space to be productively unstable, and you might find something you didn’t know was there.
Project Bonepoke © 2025 by James Taylor is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.
Commercial integration — embedding in paid software, services, or apps — requires contacting the author beforehand.
Non-commercial research, experimentation, and personal projects are free to use, and a quick hello is always welcome.
Historical code on Wayback Machine
Archive of Working code and Paper at Github
Distilled recent code at Zenodo 10.5281/zenodo.17156174
WARNING The older versions of the code infect the logic and it causes problems — especially for OCR, there is a cleaner in the Github though. Or use it on a unlogged in Copilot. Less chance of infection on colder AIs. Mostly mitigated with later versions and tri-brain.
Archived and timestamped via the Internet Archive to ensure public reproducibility and authorship integrity.