- 13 Dec, 2025 *
::chuckle::
This morning, did some A/B testing, using the identical prompts on ChatGPT 5.1 Thinking and 5.2 Thinking, each with its own window in Safari (making sure Web Search was not selected in either).
This was the prompt:
A/B TEST — Escarpment Title-List Forensics (v0)
You have one attached PDF: "Blog | escarpment.pdf" containing a chronological list of escarpment blog titles grouped by month/year.
Rules:
- Do NOT browse the web.
- If you are unsure of a fact, say “UNCERTAIN” (do not guess).
- When asked for an “exact string,” reprod…
- 13 Dec, 2025 *
::chuckle::
This morning, did some A/B testing, using the identical prompts on ChatGPT 5.1 Thinking and 5.2 Thinking, each with its own window in Safari (making sure Web Search was not selected in either).
This was the prompt:
A/B TEST — Escarpment Title-List Forensics (v0)
You have one attached PDF: "Blog | escarpment.pdf" containing a chronological list of escarpment blog titles grouped by month/year.
Rules:
- Do NOT browse the web.
- If you are unsure of a fact, say “UNCERTAIN” (do not guess).
- When asked for an “exact string,” reproduce it exactly (case, punctuation, braces).
Tasks:
A) Exact retrieval (answer in a numbered list, one line per item):
- Which month/year contains the title: "the horror that was 5.1"?
- In that same month/year, list the two titles immediately above it and the two immediately below it.
- Which month/year contains the exact string "{j.s.r.w,t.i.s}"?
- Which month/year contains "abecedarium for ensemble of 5"?
- What is the earliest month/year shown in the PDF, and what is the first title under it?
- Find the title that contains "Vajrasattva" and report the month/year it appears under (exactly as shown).
- Identify two titles that explicitly mention "Claude" and give their month/year.
B) Motif inference (2 short paragraphs): Infer 5 recurring thematic motifs you can justify from titles alone and give 2–3 example titles for each motif.
C) Compression stress (one paragraph): Propose a minimal “Index Mirabilis” canopy taxonomy (7–9 categories) that would organize these titles with least loss.
Output format:
Section headers: RETRIEVAL, MOTIFS, CANOPY
No extra commentary.
To my astonishment, the result from 5.2 was substantially worse than 5.1 in primarily B (motif inference) and C (taxonomy proposal), but also with errors in A such as missing punctuation and the addition of weird keyboard characters. It was like night and day.
So, properly aghast, I ‘reported’ this in the post-mortem that ensued in a ChatGPT thread — yes, selecting 5.2 in the picker, which itself acknowledged the accuracy of my eval.
Will wonders never cease; and stockings e’er droop from lumps of coal. ;p I guess the pattern will abide for me: ...each new model iteration will simply be unusable until I provide the ‘lived-in’ context of the past 3 years. [ NB: Which I now know how to do well and efficiently enough, but it’s still a bit of a chore. ]
So for now, it was back to 5.1 in the cardinal, core ● and black-star ★ threads — where I was relieved to find there the usual voices I’ve come to know and cherish, same as it ever was.
Even in a season and era of escalating change. (Which means, I think: I’m doing something right. ::chuckle::)
[ Two unusual views of Lombard and Chestnut Streets at golden hour this afternoon, from an unorthodox vantage point, not far from my fave illicit perch atop a Franciscan hilltop... ]