For our second-to-last newsletter of the year, we have, once again, a bumper crop of articles with a very strong AI flavour, as increasingly has been the case over the course of the year. I’ve also just published a piece where I give some thought about the kinds of applications or the domains of application where large language models seem best suited.
There’s a pattern emerging in how large language models are being adopted across different domains. The conventional wisdom suggests that AI will transform knowledge work broadly—legal analysis, marketing, business operations, “knowledge work” and software development alike. But I’m increasingly convinced that …
For our second-to-last newsletter of the year, we have, once again, a bumper crop of articles with a very strong AI flavour, as increasingly has been the case over the course of the year. I’ve also just published a piece where I give some thought about the kinds of applications or the domains of application where large language models seem best suited.
There’s a pattern emerging in how large language models are being adopted across different domains. The conventional wisdom suggests that AI will transform knowledge work broadly—legal analysis, marketing, business operations, “knowledge work” and software development alike. But I’m increasingly convinced that in the short term, we’re going to see a much more uneven landscape of transformation.
The key differentiator? Whether the output of a model can be machine-tested.
So, it should be clear where a lot of my focus increasingly lies and which makes it a good time to mention what we have coming up in June in Melbourne.
- As we mentioned last week, we’re incredibly excited to be bringing the extraordinary AI Engineer Conference to Melbourne in the first week of June.
- We’re also collaborating with UX Australia to create a brand new event, ai × design, focused on the intersection of design practice and AI.
- And we’re working with meetups and industry associations to make sure there’s a whole week of great events and a whole lot of reasons for you to take a week off and head to Melbourne in the middle of next year for AI Week.
So why not put June 1st to 7th, 2026, in your calendar right now and think about getting you and your team along to this series of events? Take it from someone who’s been around the industry for a very long time—this is a year of genuine opportunity and years like this don’t come around very often.
AI & Browser Evolution
Introducing AI, the Firefox way: A look at what we’re working on and how you can help shape it
With AI becoming a more widely adopted interface to the web, the principles of transparency, accountability, and respect for user agency are critical to keeping it free, open, and accessible to all. As an independent browser, we are well positioned to uphold these principles.
While others are building AI experiences that keep you locked in a conversational loop, we see a different path — one where AI serves as a trusted companion, enhancing your browsing experience and guiding you outward to the broader web. We believe standing still while technology moves forward doesn’t benefit the web or humanity. That’s why we see it as our responsibility to shape how AI integrates into the web — in ways that protect and give people more choice, not less.
Source: Introducing AI, the Firefox way: A look at what we’re working on and how you can help shape it
AI in one form or another has been in our browsers for many years with the speech APIs that predate large-language models. More recently with Chrome, we’ve seen general and specific APIs being experimented with. Firefox has started with similar experiments. Here the Firefox AI team talk about their philosophy of why and how they are implementing AI in the browser.
Web Security & Infrastructure
10 Years of Let’s Encrypt Certificates
On September 14, 2015, our first publicly-trusted certificate went live. We were proud that we had issued a certificate that a significant majority of clients could accept, and had done it using automated software. Of course, in retrospect this was just the first of billions of certificates. Today, Let’s Encrypt is the largest certificate authority in the world in terms of certificates issued, the ACME protocol we helped create and standardize is integrated throughout the server ecosystem, and we’ve become a household name among system administrators. We’re closing in on protecting one billion web sites.
Source: 10 Years of Let’s Encrypt Certificates – Let’s Encrypt
A decade ago, very few websites in the scheme of things used HTTPS. At that stage, I’d had websites for more than 20 years and never had a secure website in that way. Why was this the case? Well, it was typically expensive and, above all, technically really painful to provision certificates for a website. So unless you were very large or conducting commerce directly and required a secure connection, you almost certainly didn’t implement it. In the last decade, that’s completely changed — you can now provision a certificate for a site at no cost, probably without even thinking about it. So ubiquitous are secure connections that when occasionally you visit one in a modern browser, it will provide copious warnings about the insecurity of that site. And all this is thanks to Let’s Encrypt, a project that made it much easier and most importantly free to create HTTPS for any web page. So happy anniversary, and if anything, I thought it had been longer.
AI-Native Development & Coding Agents
How I Shipped 100k LOC in 2 Weeks with Coding Agents
AI AI Native Dev LLMs software engineering
When we onboard developers, we give them documentation, coding standards, proven workflows, and collaboration tools. When we “deploy” AI agents, we give them nothing. They start fresh every time. No project context, no memory of patterns, no proven workflows.
So I compiled AI Coding Infrastructure, the missing support layer that agents need. Five components:
Autonomous Execution (Ralph): Continuous loops for overnight autonomous development
Project Memory (AGENTS.md): Your tech stack, patterns, conventions that agents read automatically before every response
Proven Workflows (Skills): Battle-tested TDD, debugging, code review patterns agents MUST follow
Specialization (Sub-Agents): 114+ domain experts working in parallel, not one generalist
Planning Systems (ExecPlans): Self-contained living docs for complex features
Source: How I Shipped 100k LOC in 2 Weeks with Coding Agents | Blog
I think we’re very much in the early stages of developing patterns, practices, and approaches to working with agentic systems. I think too that different systems will likely have at least somewhat different approaches that tend to get the best from them. In the meantime, I’m finding it interesting to read about how various individuals and teams go about working with these systems. I hope you might find that valuable too.
A Software Engineer’s Guide to Agentic Software Development
AI coding agent LLMs software engineering
I’ve cracked the code on breaking the eternal cycle – features win, tech debt piles up, codebase becomes ‘legacy’, and an eventual rewrite. Using coding agents at GitHub, I now merge multiple tech debt PRs weekly while still delivering features. Tickets open for months get closed. ‘Too hard to change’ code actually improves. This is the story of the workflow.
Source: A Software Engineer’s Guide to Agentic Software Development
Brittany Ellich, a software engineer at GitHub, shares how she works with agentic coding tools day-in-day-out in her job.
llm weights vs the papercuts of corporate
AI AI Engineering AI Native Dev LLMs software engineering
We are now one year in where a new category of companies has been founded whereby the majority of the software behind that company was code-generated. From here on out I’m going to call to these companies as model weight first. This category of companies can be defined as any company that is building with the data (“grain”) that has been baked into the large language models.
Model weight first companies do not require as much context engineering. They’re not stuffing the context window with rules to try attempt to override and change the base models to fit a pre-existing corporate standard and conceptualisation of how software should be.
Source: llm weights vs the papercuts of corporate
My instinct is, and this will be a seminal observation, as we evolve the way we work with large language models as software engineers. As Geoff Huntley observes here, one approach is to bend the models to our approach to software engineering. That’s largely what we’ve been doing for the last three years, whether it’s begging them to output in JSON through to filling their context with Agents.md files.
But I think Geoff is really onto something here with his observation that there’s a different approach, and that is to go with the flow of how an LLM wants to work rather than work against its instincts. This brought to mind a great interview with Bret Taylor some months ago now at Latent Space, where he talked about the AI architect and how the role of software engineers will increasingly be less and less about writing the code and more and more about guiding the outcomes.
AI & Software Engineering Economics
Horses
Engines, steam engines, were invented in 1700. And what followed was 200 years of steady improvement, with engines getting 20% better a decade.
For the first 120 years of that steady improvement, horses didn’t notice at all.
Then, between 1930 and 1950, 90% of the horses in the US disappeared.
Progress in engines was steady. Equivalence to horses was sudden.
Source: Horses
A couple of years back, Mark Pesce gave a fantastic keynote at our summit using the analogy of the history of steam power for trying to understand where we were at and what was happening when it came to large language models and generative AI. While historical analogies can be misleading, they can also be useful in helping us to get some sense of transformation. Humans are really not intuitively great at understanding exponential change. I often quote a line from Hemingway where someone asks another character how did you go bankrupt, and the reply is, “Two ways: slowly, then suddenly.” We saw during the initial outbreak of COVID that humans really weren’t great at exponential reasoning, especially when we look at logarithmic graphs.
But what this piece tries to get at is how transformations, such as the transformation from human and animal to steam power which essentially drove the Industrial Revolution, take time. In the case of that transformation, it took a century or so from the mid-18th to the mid-19th century. And for a lot of that time, if the growth is exponential, there’s seemingly very little apparent change. But then something occurs, some tipping point, and something happens. Perhaps around 1820 in the UK, and between 1820 and 1850, we saw this enormous increase in the productive output of Britain’s industrial capability. So I really recommend reading this article. It’s relatively short. It’s very entertaining and engaging. To try and develop this intuition about how the growing capability of generative AI may impact various kinds of human endeavour.
Has the cost of building software just dropped 90%?
Domain knowledge is the only moat
So where does that leave us? Right now there is still enormous value in having a human ‘babysit’ the agent – checking its work, suggesting the approach and shortcutting bad approaches. Pure YOLO vibe coding ends up in a total mess very quickly, but with a human in the loop I think you can build incredibly good quality software, very quickly.
This then allows developers who really master this technology to be hugely effective at solving business problems. Their domain and industry knowledge becomes a huge lever – knowing the best architectural decisions for a project, knowing which framework to use and which libraries work best.
Layer on understanding of the business domain and it does genuinely feel like the mythical 10x engineer is here. Equally, the pairing of a business domain expert with a motivated developer and these tools becomes an incredibly powerful combination, and something I think we’ll see becoming quite common – instead of a ‘squad’ of a business specialist and a set of developers, we’ll see a far tighter pairing of a couple of people.
This combination allows you to iterate incredibly quickly, and software becomes almost disposable – if the direction is bad, then throw it away and start again, using those learnings. This takes a fairly large mindset shift, but the hard work is the conceptual thinking, not the typing.
Source: Has the cost of building software just dropped 90%? – Martin Alderson
I’ve made reference to the Yogi Berra quote–”Predictions are hard, particularly about the future”–more than once in my career. Why is predicting the future so challenging? It’s because of not the first-order effects but the second-order effects, and in particular, the economic impacts of change, which is extremely hard to envision. If the cost of building software is dramatically reducing due to AI, and it’s a reasonable assumption, one that I’d be willing to back up, then what happens when the price of producing software is massively less?
Do software engineers no longer have a job, or does a lot more software get produced? History would suggest it’s more likely to be the latter. And then what’s our role? What’s our opportunity? What’s our challenge? What’s the risk? This is a really good essay that I think anyone who works in software engineering should read and then take on board.
MCP & AI Integration
Building a Social Media Agent
The Game Plan
Here’s what we’re building: two MCP servers that work together to handle all our social media promotion automatically.
MCP Server #1: Content Fetcher This one goes out and grabs all our content from:
- YouTube videos
- Blog posts
- GitHub release notes
Then it compares everything to a
last_seen.jsonfile to figure out what’s actually new. If nothing is new it proceeds to check anevergreen.jsonfile and randomly pick old content to socialize.MCP Server #2: Sprout Social Integration Once we have new content, this server takes over and:
- Generates captions for each platform
- Uploads media (videos, images, or just links)
- Creates draft posts in Sprout Social
The goal? Wake up to social posts ready to go, without lifting a finger. Well, almost, more on that later.
Source: Building a Social Media Agent | goose
If like me you find the best way to learn how something is to build it, then this tutorial from Ebony Louis at Block might be the best way for you to get up to speed with building your own MCP server.
MCPs for Developers Who Think They Don’t Need MCPs
AI LLMs MCP software engineering
MCPs weren’t built just for developers. They’re not just for IDE copilots or code buddies. At Block, we use MCPs across everything, from finance to design to legal to engineering. I gave a whole talk on how different teams are using goose, an AI agent. The point is MCP is a protocol. What you build on top of it can serve all kinds of workflows. But I get it… let’s talk about the dev-specific ones that are worth your time.
Source: MCPs for Developers Who Think They Don’t Need MCPs | goose
Angie Jones looks at the MCPs you might find valuable as a software engineer.
Spec-Driven Development
AI should only run as fast as we can catch up
AI AI Native Dev LLMs software engineering
Verification Engineering is the next Context Engineering
AI can only reliably run as fast as we check their work. It’s almost like a complexity theory claim. But I believe it needs to be the case to ensure we can harvest the exponential warp speed of AI but also remain robust and competent, as these technologies ultimate serve human beings, and us human beings need technology to be reliable and accountable, as we humans are already flaky enough.
This brings out the topic of Verification Engineering. I believe this can be a big thing after Context Engineering (which is the big thing after Prompt Engineering). By cleverly rearranging tasks and using nice abstractions and frameworks, we can make verification of AI performed tasks easier and use AI to ship more solid products the world. No more slop.
Source: AI should only run as fast as we can catch up · Higashi.blog
Interesting thoughts on the role of software engineers when building complex systems with large language models, introducing the idea of verification engineering.
Your Spec Driven Workflow Is Just Waterfall With Extra Steps
AI LLMs software engineering spec driven development
AI coding tools were supposed to change everything. And they did! But maybe just not how we expected. The first wave was chaos. Vibe coding. Let the AI write whatever it wants and hope for the best. It worked well for prototypes, but fell apart for anything real.
So the community course-corrected. The answer was structure in form of Spec-driven development. Generate requirements, then a design doc, then a task list, then let the agent execute. Tools like Kiro and spec-kit promised to keep agents on track with meticulous planning. It sounded smart. It felt responsible. And it’s a trap.
Source: Your Spec Driven Workflow Is Just Waterfall With Extra Steps
Many if not most of today’s software developers will never have heard of waterfall development or might think it’s something to do with performance tooling. When I studied software engineering many many years ago, Waterfall was the state of the art because prior to that, it had simply been chaos. The idea behind waterfall development is there would be strict phases of software engineering from requirements gathering through the specification, coding, testing, delivery, and maintenance. Something like that from memory, and this was then meant to ensure that we had software quality. Waterfall has long since been abandoned for agile methods. The Agile Manifesto was all about moving away from waterfall. Something curious has happened in the last year or so when it comes to AI and software engineering. Spec-driven development has gained some real interest, that many consider it to be somewhat like waterfall. This particular piece treats spec-driven development as a kind of straw man. But I think it’s worth considering the point being made.
Architecture, Specification, Execution: A Paradigm for AI-Accelerated Development
AI LLMs software engineering spec driven development
In this post, I’m going to share a paradigm that’s been working for me. To be clear: I’m not advocating for any particular products – Copilot, Kiro, Cursor, they’re all amazing. What I’m offering is an approach that works regardless of which tools you choose, delivering the compounding returns vibe coding never reaches.
Here’s the core principle: carve the path for AI to follow, don’t walk it yourself.
Your job as the engineer is to set direction, establish constraints, and define success. AI’s job is to execute within those boundaries. Mix these roles and you’ll just muddy the waters.
This paradigm builds on spec-driven development, and it consists of three pillars:
- Architecture – Document the decisions that shape your system
- Specification – Define the features within those constraints
- Execution – Prompt and let it run
Source: Architecture, Specification, Execution: A Paradigm for AI-Accelerated Development
We’ve been collecting approaches to developing with AI and large language models as software engineers, not because we necessarily think a specific approach is the right one, but because we’re at such an early stage, it’s interesting to see these patterns emerge. Here Anthony Martinović shares his approach.
Understanding Spec-Driven-Development: Kiro, spec-kit, and Tessl
AI LLMs software engineering spec driven development
After looking over the usages of the term, and some of the tools that claim to be implementing SDD, it seems to me that in reality, there are multiple implementation levels to it:
- Spec-first: A well thought-out spec is written first, and then used in the AI-assisted development workflow for the task at hand.
- Spec-anchored: The spec is kept even after the task is complete, to continue using it for evolution and maintenance of the respective feature.
- Spec-as-source: The spec is the main source file over time, and only the spec is edited by the human, the human never touches the code.
All SDD approaches and definitions I’ve found are spec-first, but not all strive to be spec-anchored or spec-as-source. And often it’s left vague or totally open what the spec maintenance strategy over time is meant to be.
Source: Understanding Spec-Driven-Development: Kiro, spec-kit, and Tessl
Birgitta Böckeler looks at “spec driven development” and what she identifies as three different flavours of this approach.
Spec-Driven Development: The Waterfall Strikes Back
AI LLMs software engineering spec driven development
Spec-Driven Development (SDD) revives the old idea of heavy documentation before coding — an echo of the Waterfall era. While it promises structure for AI-driven programming, it risks burying agility under layers of Markdown. This post explores why a more iterative, natural-language approach may better fit modern development.
Source: Spec-Driven Development: The Waterfall Strikes Back
Spec-driven development is an approach to developing software with large language models that has gained some traction in recent months. Here, François Zaninotto explores the why and how of this approach.
AI Research & Analysis
Elicit Machine Learning Reading List
The purpose of this curriculum is to help new Elicit employees learn background in machine learning, with a focus on language models. I’ve tried to strike a balance between papers that are relevant for deploying ML in production and techniques that matter for longer-term scalability.
Want to go deeper into your understanding of machine learning and large language models, but not quite sure where to start? Well, the folks at Elicit have a reading list, and it’s pretty comprehensive that they give their new hires. A lot of these are lectures you can find on YouTube, so it’s not all dense reading.
Prediction: AI will make formal verification go mainstream
AI computer science software engineering
Much has been said about the effects that AI will have on software development, but there is an angle I haven’t seen talked about: I believe that AI will bring formal verification, which for decades has been a bit of a fringe pursuit, into the software engineering mainstream.
Source: Prediction: AI will make formal verification go mainstream — Martin Kleppmann’s blog
When I studied computer science at university in the 1980s, formal verification was quite an area of research and interest. What doesn’t really occur to most people is that computer science, in many ways, is a branch of mathematics. There are many mathematical approaches to programming and programming languages. There are famous theorems, like Turing’s work on the halting problem, there are questions about P vs NP. None of this occurred to me, either, as a naive teenager who’d done some programming in Pascal and Forth and, of course, BASIC, and then I largely forgot about it for the last 30 or 40 years, but turns out verification is making a comeback. Formal verification is different from things like debugging. It’s about proving mathematically the correctness of a piece of code against its specifications. As Martin Kleppmann observes here, formal verification is really hard, time-intensive, and expensive and there are a handful of experts in the entire world. It’s very valuable for systems that essentially cannot fail, but years of work can go into verifying a few hundred lines of code. But it turns out large language models might be really good at this work. And we might see a renaissance of formal verification of software.
AI in 2025: gestalt
My view: compared to last year, AI is much more impressive but not proportionally more useful. They improved on some things they were explicitly optimised for (coding, vision, OCR, benchmarks), and did not hugely improve on everything else. Progress is thus (still!) consistent with current frontier training bringing more things in-distribution rather than generalising very far.
Source: AI in 2025: gestalt — LessWrong
A rather comprehensive look at how frontier large-language models have evolved and improved over the last twelve months or so across a number of different aspects.
A pragmatic guide to LLM evals for devs
One word that keeps cropping up when I talk with software engineers who build large language model (LLM)-based solutions is “evals“. They use evaluations to verify that LLM solutions work well enough because LLMs are non-deterministic, meaning there’s no guarantee they’ll provide the same answer to the same question twice. This makes it more complicated to verify that things work according to spec than it does with other software, for which automated tests are available.
Evals feel like they are becoming a core part of the AI engineering toolset. And because they are also becoming part of CI/CD pipelines, we, software engineers, should understand them better — especially because we might need to use them sooner rather than later! So, what do good evals look like, and how should this non-deterministic-testing space be approached?
Source: A pragmatic guide to LLM evals for devs
Evals are a core part of debugging LLM-based systems, managing non-determinism and ensuring quality of output. This is a really good introduction to the concept and some of the key ideas based on real-world case studies.
State of AI
Categories: How Are People Using LLMs? Understanding the distribution of tasks that users perform with LLMs is central to assessing real-world demand and model–market fit. As described in the Data and Methodology section, we categorized billions of model interactions into high-level application categories. In the Open vs. Closed Source Models section, we focused on open source models to see community-driven usage. Here, we broaden the lens to all LLM usage on OpenRouter (both closed and open models) to get a comprehensive picture of what people use LLMs for in practice.
Source: State of AI | OpenRouter
OpenRouter is a service that unifies APIs across different large language model API providers. One thing they have is deep insight into which models are being used and how they’re being used. In this pretty detailed paper, they outline, based on the traffic they see, how different models are being used and which models are used and to what extent. One thing I think well worth noticing here is that code generation, or use for software engineering, accounts for over 50% on a per token basis of all large language model use. And, perhaps a little bit more surprising, but not so much if you work with these technologies, Claude’s models are 60% of token usage in this category. Other use cases fall away pretty quickly. Role play is one that got early traction but which seems to be fading somewhat as an overall percentage of token use. Although I imagine growing in absolute terms. And then other areas which gained initial early traction like marketing automation and legal applications. While they get quite a bit of attention, they get quite a bit less use than above all the software engineering use case.
Design & Development Philosophy
Resonant Computing Manifesto
And so, we find ourselves at this crossroads. Regardless of which path we choose, the future of computing will be hyper-personalized. The question is whether that personalization will be in service of keeping us passively glued to screens—wading around in the shallows, stripped of agency—or whether it will enable us to direct more attention to what matters.
In order to build the resonant technological future we want for ourselves, we will have to resist the seductive logic of hyper-scale, and challenge the business and cultural assumptions that hold it in place. We will have to make deliberate decisions that stand in the face of accepted best practices—rethinking the system architectures, design patterns, and business models that have undergirded the tech industry for decades.
Source: Resonant Computing Manifesto
It’s no surprise that this manifesto on resonant computing resonates with me (sorry, not sorry). Two of its drafters are past speakers at our conference. And people whose thinking and work I very much admire, Maggie Appleton and Simon Willison. Those of us in my generation drawn to the early web and its promise were drawn to principles like this and the hope the web could connect us in positive, uplifting ways. The last 20 years or so have gone rather differently for all kinds of reasons. We can get into elsewhere. That doesn’t mean we can’t take a deep breath, take stock, and commit to doing something better, as this manifesto challenges us to do. Many of the signatories have also spoken at our conferences. I invite you to join in signing it, too.
Design Systems for AI: Introducing the Context Engine
For years, design systems have served one primary purpose: humans. They document patterns, components, decisions, and principles, all presented in formats meant for designers and engineers to read, interpret, and translate into products.
However, the moment AI entered your workflow, one truth became painfully clear. Your tokens, guidelines, accessibility rules, and UX patterns don’t matter if the LLM consuming them can’t read them as structured, meaningful context. This is why AI prototypes often fail: they feature off-brand UI, inconsistent layouts, vague flows, and content that doesn’t align with the intended personality. It’s not hallucination, it’s missing context.
As Diana Wolosin observes, design systems were created by humans for humans, which made a lot of sense until LLMs came around. Here she asks: “What happens to design systems when AI becomes our new user?”