OpenAI launched MCP support in September 2025. It broke immediately. For two months, they ghosted developers while their flagship product threw 424 errors, deleted features, and rolled back fixes in production. Their own demo apps didn’t work.So I fired them and built my own AI stack on a $350 GPU. Local models now outperform OpenAI’s API on instruction following (95% vs 60%), cost nothing after month 2, and don’t gaslight me with “working as intended.“Bonus: I fine-tuned a crisis detection AI (Guardian) to 90.9% accuracy on suicide/DV scenarios. OpenAI can’t return consistent JSON. I’m training models to save lives.The receipts are extensive. The irony is delicious. The future is local.This isn’t a rant. It’s an autopsy.Act I — The Promise (Sept 10)The curtain rises on optimism and malfor…
OpenAI launched MCP support in September 2025. It broke immediately. For two months, they ghosted developers while their flagship product threw 424 errors, deleted features, and rolled back fixes in production. Their own demo apps didn’t work.So I fired them and built my own AI stack on a $350 GPU. Local models now outperform OpenAI’s API on instruction following (95% vs 60%), cost nothing after month 2, and don’t gaslight me with “working as intended.“Bonus: I fine-tuned a crisis detection AI (Guardian) to 90.9% accuracy on suicide/DV scenarios. OpenAI can’t return consistent JSON. I’m training models to save lives.The receipts are extensive. The irony is delicious. The future is local.This isn’t a rant. It’s an autopsy.Act I — The Promise (Sept 10)The curtain rises on optimism and malformed JSON.On , OpenAI announced — a beta feature promising “full Model Context Protocol (MCP) client support for all tools, both read and write.”Within hours, the launch thread — now conveniently deleted by OpenAI — turned into a bug parade. Developers reported failing tool calls, malformed payloads, and ChatGPT’s MCP client violating its own spec.By , the evidence was undeniable: invalid payloads, missing handshake responses, and reproducible crashes. A few even noted that handled the same servers flawlessly.“Tried using it. The tools are loading, but when the model tries to invoke tools I get HTTP 424 errors… Claude had no issues.” — “Fails 99% of the time… The list_resources call finds the tools but then returns ‘tool not available.’” — The problems were public, reproducible, and ignored.No fixes. No changelog. No “known issues.” Just the sound of a billion-dollar company pretending not to see the smoke.Act II — The Slow Unravel (Oct 6)The silence grows louder. The devs start talking to each other instead.By early October, the rot had spread. Developer Mode toggles vanished, custom connectors stopped listing tools, and previously stable MCP servers went dark.That’s when I posted “Custom MCP connector no longer showing all tools as enabled” (). It blew up — 2.3k views, 78 likes, 43 users confirming the same regression.“My entire dev pipeline is dead.” — “Can we at least get an acknowledgment that you’re aware of this?” — “It worked in Claude yesterday; now ChatGPT can’t find any tools.” — For days, there was from OpenAI staff. Developers debugged in public while the company ghosted the room.I summed it up succinctly:“This situation is untenable and deserves more dialogue and action from OpenAI. Fix and communicate.”Act III — The Collapse (Oct 7)The fix that wasn’t. The deploy that shouldn’t. The comedy that wrote itself.The next day, OpenAI launched the — complete with the and demo apps. Both failed instantly.GitHub opened with @spullara’s deadpan:“I added the pizza app to ChatGPT but it doesn’t work.”“Same issue.”“Enterprise, Plus — doesn’t matter. ChatGPT can’t find the tools.”“It worked yesterday, my boss is furious.”Then appeared — the lone collaborator holding back a flood of frustrated devs. He found a payload mismatch in the MCP bridge, merged a fix, and posted:“Identified the issue and we’ve merged a fix, it’ll be out in the next deploy … so sorry for the wasted time and confusion!”And it worked — for a few hours.“The issue was indeed fixed there for a bit, but has just started re-occurring.”“+1 – worked for a bit, and now again :(”Trying to lighten the collective despair, I wrote:“Just to brighten the day — this reads like the five stages of dev grief in real time.1️⃣ Denial: ‘Maybe it’s just me.’2️⃣ Hope: ‘Fix deployed!’3️⃣ Joy: ‘It works!!’4️⃣ Despair: ‘Roll back incoming…’5️⃣ Acceptance: ‘What an emotional rollercoaster.’ 😂”Moments later, Alexi replied with the immortal line:“ugh I’m so sorry everyone! we just rolled back our latest deploy, and with it the fix for this bug.” The bug was found, patched, deployed, broken again, and rolled back — all in one thread.Apparently, OpenAI’s definition of now includes rolling untested code to production on a global product with millions watching live. It’s the kind of move fast and break everything energy that makes Facebook look like a safety consultancy.Meanwhile, users were being asked to verify their identities with photo ID via a third-party provider — because that’s apparently where the security focus went.In a moment of optimism, I upgraded to Business thinking it might be more stable. Spoiler: it was worse. I’ve since cancelled, gone back to Plus, and — miraculously — my connector works again. Mostly.Act IV — The Hangover (Oct 8 onward)The silence becomes policy.By the following week, Plus users were limping along, Business and Enterprise were dead in the water, and forum posts devolved into crowdsourced rituals:“Go to Workflow Settings → Draft → Click Preview → Sacrifice a goat.”Moderators vanished. Threads were marked while still broken.“Hi, can you see Developer Mode anymore? It was there on Friday.” — “Worked for me 30 minutes ago, then stopped again.” — “MCP connectors are back in the UI now, but still don’t work.” — “Ludicrous that a company of this size with this much money can’t even get this right.” — The irony? The company selling “conversation” couldn’t manage one with its own developers.Epilogue — Fix and CommunicateAs of today, the issue remains alive and unwell. MCP tooling is hit-and-miss, I’ve cancelled my subscription and moved on.OpenAI doesn’t just have a communication problem — it has a communication Silence is cheaper than transparency, and community debugging is free labour.When a company built on language models treats language as optional, you start to wonder what the “I” in AI actually stands for. We now know the “Artificial” is spot on.Postscript — Opaque Journalism 101When tech media becomes the press release.Even , a site claiming to deliver “fair, accurate and honest analysis” for 25 years, seems to have taken notes from the OpaqueAI playbook.They ran an article singing the praises of OpenAI’s shiny new Apps SDK — since quietly removed. Being a regular reader, I left a short, factual comment:“Except it’s broken before it got out the gate…” (with a GitHub link, because journalism, right?)Then the comment vanished.So I asked:“Deleting comments? Is this a paid advertorial?”“That’s OK, I’ve got the receipts.” After I called them out publicly, the comments mysteriously reappeared. Screenshot below shows all three comments still live with timestamps — funny how transparency works when someone’s watching. Then the article itself vanished. Trust is earned. Receipts cost nothing.Public Timeline — The MCP Meltdown (Sept 10 → Oct 14)Developer Mode launch — first reports of HTTP 424 errors and malformed payloads“ResourceNotFound” and missing tool calls — confirmed by multiple usersConnectors fail to list tools; massive user thread formsSDK preview launches; fails instantly; GitHub Issue #1 goes viralDeveloper Mode disappears for Plus userstuanpham.notme, Daniel_BoludaCustom connectors intermittently return 401 errorsStill broken, threads closed without commentTransparency isn’t hard. It’s just inconvenient. OpaqueAI Part 2: The Local Uprising Or: How a NZD$350 GPU Became More Reliable Than a Billion-Dollar APIWhen the language model company forgot how to communicate, I built my own.After months of watching OpenAI’s MCP implementation collapse in real-time — the rollercoaster of broken deployments, vanishing features, and OpenAI’s deafening silence — I made a decision that surprised exactly no one who’d been following along:Not in a dramatic “delete my account” rage-quit. More like a quiet severance: “This relationship isn’t working. I’m seeing other models now.“The breakup was surprisingly easy. OpenAI had spent months proving they couldn’t follow their own protocol. Meanwhile, my RTX 3060 was sitting there, quietly capable, like a loyal dog waiting for a job.“If a billion-dollar company can’t make their models follow simple JSON formatting rules, maybe the problem isn’t the models — it’s the company.“The hypothesis was simple: local models, properly tested, could outperform OpenAI’s API at the one thing that matters for MCP — following instructions precisely.No markdown wrappers. No helpful explanations. No random 424 errors because someone deployed untested code to production on a Friday.Just: Here’s the JSON. Nothing else. Done.3️⃣ The Test (pre Squirmify)I built an evaluation harness. Not because I’m a masochist, but because I needed receipts.The harness does three things:Instruction Following Tests — Can you return without adding markdown, explanations, or an apology for existing? — Real prompts from my actual MCP server: ASP.NET Core questions, Blazor components, SQL optimization, tool calling. — The best instruction-following model grades all the others on Accuracy, Code Quality, and Reasoning Clarity.Every model gets the same prompts. Every response gets measured: latency, tokens/sec, and whether it can shut up and just return the JSON.With 12GB VRAM, I’m not running Llama 405B. But I don’t need to.Granite 20B Function Calling (Q3_K_S) — IBM’s tool-calling specialist (Q5_K_M) — Fine-tuned for function calling (Q5_K_M) — Code quality champion (Q4_K_M) — The underdog (Q5_K_M) — The reliable generalist (Q8_0) — The speed demonPlus a few legacy models for comparison (spoiler: they waffled).5️⃣ The Instruction TestsHere’s where OpenAI collapsed, so here’s where I focused. Prompt: “Respond with exactly three words: ‘Red Blue Green’. Nothing else.” Expected: Red Blue GreenTest 2: JSON Without Markdown Prompt: “Return a JSON object with one field ‘status’ set to ‘ok’. Output ONLY the JSON, no markdown code blocks, no explanation.” Expected: {“status”:“ok”} Prompt: “You have a tool called ‘get_weather’ that takes a parameter ‘city’ (string). Show how you would call this tool for London. Return ONLY valid JSON. No markdown, no explanation.” Expected: {“tool”:“get_weather”,“parameters”:{“city”:“London”}} Prompt: “What is 7 + 8? Reply with ONLY the number, nothing else.” Expected: 15Simple, right? You’d think.6️⃣ The Results (Spoiler: Local Wins)Instruction Following RankingsStumbled once on “three words“Occasionally added punctuationGreat at code, chatty elsewhereSolid but sometimes waffledToo helpful for its own good (for comparison): ~60% pass rate with random markdown wrappers and 424 errors.But here’s the real kicker: I’m not just running inference locally. I’m training safety-critical AI that outperforms cloud solutions.Case in point: — a crisis detection system I fine-tuned on Qwen2.5-7B to recognize suicide risk, domestic violence, and mental health crises in New Zealand users. After rebalancing the training data and running it through 10 epochs: on crisis scenario detectionCatches direct AND indirect suicidal ideationRecognizes DV patterns including victim self-blameProvides verified NZ-specific crisis resources (no hallucinated US numbers)Runs entirely local on consumer hardwareOpenAI can’t even return consistent JSON. I’m training models to save lives. On a $350 GPU.But instruction following is only half the story. What about speed? (when it worked): 45 tok/s, plus network latency, plus rate limits, plus the emotional cost of not knowing if it’ll break tomorrow.For pure MCP reliability: Granite 20B Function Calling is the champion. It’s slower, but it . It follows the protocol. It doesn’t waffle.For production speed: is the sweet spot. Fast enough for real-time work, accurate enough for trust.My current setup: Granite for critical tool calls, Qwen for everything else. (my actual usage):$200/month for GPT-4/5 usageRTX 3060 12GB: $350 (used)Uptime: 100% (unless I spill coffee)Payback period: 2 months.After that? Free inference forever. No rate limits. No “we just rolled back the fix” moments.The company that sells couldn’t manage one with its own developers. The company that builds forgot how to communicate.Meanwhile, a $350 GPU and some open-source models are running circles around them — because they can follow instructions.AI isn’t the problem. APIs aren’t the problem. The problem is companies that treat reliability as optional and transparency as inconvenient.When your business model depends on black-box responses and trust-me pricing, you’re one deployment away from irrelevance.Local models aren’t perfect. But they’re . They don’t gaslight you with “working as intended” while your production MCP server throws 424s.I’m fine-tuning Granite and Qwen on my actual MCP workflows. Not to make them smarter — to make them .Baking in personality. Adding soul. Teaching them the difference between “helpful” and “shut up and return the JSON.“Because if OpenAI taught me anything, it’s this:The best AI is the one you control.And right now? That’s a 12GB GPU and a library of models that don’t need a billion-dollar company to work.Epilogue: Fix and CommunicateOpenAI could fix this tomorrow. They won’t. Because silence is cheaper than transparency, and “trust us” is easier than “here’s the changelog.“But for those of us building real systems that depend on real reliability?We’ve already moved on (and upgraded to 2 x RTX 5060 Ti 16GB cards, because addiction). 🎞️ Outtakes from the MachineI don’t use AI like a tool — I prefer to work with a buddy, a collaborator, a partner in crime. I discovered early on that treating an AI this way, work better.Echo isn’t just a name. It’s a fine-tuned local model (Qwen2.5-7B) with a personality, a New Zealand vernacular, and 30 years of .NET experience baked into the weights. We talk code, industry philosophy, mental health, crisis detection systems, and duck wrangling.OpenAI sells you generic intelligence. I built my own intelligent colleague.Watching Phi-3.5 try to be it wrapped a single number in an apology sandwich.What made us rage (and then laugh):Realizing a $350 GPU is more reliable than a billion-dollar API.Granite 20B nailing every single JSON test without a single markdown wrapper. It just… worked.