November 11, 2025
Summary
In this article, I explore an approach made possible by integrating code-writing LLMs into our apps: if you need to solve a problem whose complete scope is not known beforehand, give your users a way to run task-specific code snippets and ask the LLM to write a solution when the scope becomes known.
I give three examples of how I use this approach in my app—from answering questions about personal data to generating schema conversion functions and custom UIs. I then discuss the key challenge of correctness: how to make LLM-generated code reliable using strategies like type-checking and feedback loops, and how to handle inevitable mistakes when they occur.
Self-Promotion Disclosure
I wrote this article to promote myself as a consultant and the products I …
November 11, 2025
Summary
In this article, I explore an approach made possible by integrating code-writing LLMs into our apps: if you need to solve a problem whose complete scope is not known beforehand, give your users a way to run task-specific code snippets and ask the LLM to write a solution when the scope becomes known.
I give three examples of how I use this approach in my app—from answering questions about personal data to generating schema conversion functions and custom UIs. I then discuss the key challenge of correctness: how to make LLM-generated code reliable using strategies like type-checking and feedback loops, and how to handle inevitable mistakes when they occur.
Self-Promotion Disclosure
I wrote this article to promote myself as a consultant and the products I build. Of course, I also think the topic is genuinely interesting and valuable from a technical perspective, but my primary goal is self-promotion.
A Thought Experiment
Imagine you’re asked to design and develop an app from scratch. You can think of any app; the specifics are not important. You choose to build it in $LANGUAGE.
At the design stage, you learn that–bizarrely–all the users your app will have happen to be $LANGUAGE developers. If you make it possible for them to do so, they could whip out–in the blink of an eye–any code snippet to change or extend the functionality of the app.
How do you take advantage of this? Does this oddity change how you design the app? What kind of features can you develop that wouldn’t be possible if not for this peculiar user base?
I’ll get back to these questions in just a second.
Code Power to the People
Fact: LLMs can write code. How well? Up for debate, and indeed it’s been a hot topic among us developers in the past couple of years.
The discussion has been mostly from our perspective: can the LLM implement this feature for me? Is the resulting code maintainable? Can I vibe-code an entire app?
In my opinion, however, the user’s perspective is equally interesting: if we put this magical code-writing technology in the hands of users, it transforms them into the developers in the thought experiment!
Well, okay, that’s hyperbole; it takes more than an LLM to make someone a developer. But it definitely gives a regular user the ability to “write” code. That’s something! Jumping back to the questions above: can we take advantage of this capability? How?
Let me give you three examples from Superego, the open-source personal database app I’ve been building for the past few months, where I fully embrace the pattern of “making an LLM write app-extending code for the user”.
Afterward, I will talk about the strategies I employed to ensure it’s safe and effective to give users this code-writing power.
Example 1: Answering Questions
Superego allows users to record and collect every piece of information about their lives. The idea is that, once they have all this data in one place, they can slice it and dice it to get an answer to any question about themselves.
Ambitious goal, I know. How do I make it possible?
It’s actually not that difficult if I design the system as if all my users were developers:
- I give them a code execution environment (think: Jupyter Notebook);
- I connect it to their data;
- I have them write code snippets to analyze the data and answer their questions.
Then, to make it accessible to non-technical users, I throw in an LLM for step 3: when the user asks a question, the LLM generates the snippet of code that answers it. No need for the user to know how to code.
This is basically a text-to-SQL variant, though more powerful, I’d argue, since TypeScript allows expressing more complex aggregation logic and can also be used to render charts, tables, and other UI elements. Here’s an example result:
Superego (GLM-4.6)
The LLM-generated TypeScript function that aggregates data to answer the question.
/**
* Analyze running progress and determine if 2000km goal is realistic
* @param {Array} documents - Array of run documents
* @returns {Object} Analysis results with progress and projections
*/
export default function analyzeRunningGoal(documents) {
const now = LocalInstant.now();
const currentYear = now.toPlainDate().substring(0, 4);
const goalDistance = 2000;
// Filter runs for current year
const currentYearRuns = documents.filter(
(doc) => doc.content.date.substring(0, 4) === currentYear,
);
// Calculate total distance so far this year
const totalDistanceSoFar = currentYearRuns.reduce(
(sum, doc) => sum + doc.content.distanceKm,
0,
);
// Calculate progress percentage
const progressPercentage = (totalDistanceSoFar / goalDistance) * 100;
// Calculate days passed and remaining in year
const yearStart = LocalInstant.fromPlainDate(`${currentYear}-01-01`);
const yearEnd = LocalInstant.fromPlainDate(`${currentYear}-12-31`);
const daysPassed = Math.floor((now - yearStart) / (1000 * 60 * 60 * 24));
const totalDaysInYear =
Math.floor((yearEnd - yearStart) / (1000 * 60 * 60 * 24)) + 1;
const daysRemaining = totalDaysInYear - daysPassed;
// Calculate average daily pace so far
const avgDailyPaceSoFar =
daysPassed > 0 ? totalDistanceSoFar / daysPassed : 0;
// Calculate required daily pace to meet goal
const remainingDistance = goalDistance - totalDistanceSoFar;
const requiredDailyPace =
daysRemaining > 0 ? remainingDistance / daysRemaining : 0;
// Calculate weekly averages
const weeksPassed = Math.floor(daysPassed / 7);
const weeksRemaining = Math.floor(daysRemaining / 7);
const avgWeeklyPaceSoFar =
weeksPassed > 0 ? totalDistanceSoFar / weeksPassed : 0;
const requiredWeeklyPace =
weeksRemaining > 0 ? remainingDistance / weeksRemaining : 0;
// Determine if goal is realistic
const isRealistic = requiredDailyPace <= 10; // Assuming 10km/day is reasonable max
// Project final distance if current pace continues
const projectedFinalDistance =
totalDistanceSoFar + avgDailyPaceSoFar * daysRemaining;
// Calculate monthly breakdown
const monthlyData = {};
currentYearRuns.forEach((doc) => {
const month = doc.content.date.substring(0, 7);
if (!monthlyData[month]) {
monthlyData[month] = 0;
}
monthlyData[month] += doc.content.distanceKm;
});
return {
currentYear,
totalDistanceSoFar: Math.round(totalDistanceSoFar * 100) / 100,
goalDistance,
progressPercentage: Math.round(progressPercentage * 100) / 100,
daysPassed,
daysRemaining,
avgDailyPaceSoFar: Math.round(avgDailyPaceSoFar * 100) / 100,
requiredDailyPace: Math.round(requiredDailyPace * 100) / 100,
avgWeeklyPaceSoFar: Math.round(avgWeeklyPaceSoFar * 100) / 100,
requiredWeeklyPace: Math.round(requiredWeeklyPace * 100) / 100,
projectedFinalDistance: Math.round(projectedFinalDistance * 100) / 100,
isRealistic,
monthlyData,
totalRuns: currentYearRuns.length,
};
}
The LLM-generated TypeScript function that renders the chart.
/**
* Create a chart showing running progress vs 2000km goal
* @param {Array} documents - Array of run documents
* @returns {Object} ECharts configuration
*/
export default function getEchartsOption(documents) {
const now = LocalInstant.now();
const currentYear = now.toPlainDate().substring(0, 4);
const goalDistance = 2000;
// Filter runs for current year
const currentYearRuns = documents.filter(
(doc) => doc.content.date.substring(0, 4) === currentYear,
);
// Calculate cumulative distance by month
const monthlyData = {};
let cumulativeDistance = 0;
// Initialize all months
for (let month = 1; month <= 12; month++) {
const monthKey = `${currentYear}-${month.toString().padStart(2, "0")}`;
monthlyData[monthKey] = 0;
}
// Sum distances by month
currentYearRuns.forEach((doc) => {
const month = doc.content.date.substring(0, 7);
if (monthlyData[month] !== undefined) {
monthlyData[month] += doc.content.distanceKm;
}
});
// Create cumulative data
const cumulativeData = [];
const monthlyColumns = [];
const goalLine = [];
Object.keys(monthlyData)
.sort()
.forEach((month, index) => {
cumulativeDistance += monthlyData[month];
const monthDate = LocalInstant.fromPlainDate(`${month}-01`);
cumulativeData.push([
monthDate.toISO(),
Math.round(cumulativeDistance * 100) / 100,
]);
monthlyColumns.push([
monthDate.toISO(),
Math.round(monthlyData[month] * 100) / 100,
]);
goalLine.push([monthDate.toISO(), goalDistance]);
});
// Add projection line
const lastActualMonth = cumulativeData[cumulativeData.length - 1];
if (lastActualMonth) {
const lastDate = LocalInstant.fromISO(lastActualMonth[0]);
const yearEnd = LocalInstant.fromPlainDate(`${currentYear}-12-31`);
// Linear projection based on average pace
const avgPace = lastActualMonth[1] / cumulativeData.length;
const projectedDistance =
lastActualMonth[1] + avgPace * (12 - cumulativeData.length);
cumulativeData.push([
yearEnd.toISO(),
Math.round(projectedDistance * 100) / 100,
]);
}
return {
title: {
text: "2025 Running Progress: 2000km Goal Analysis",
left: "center",
},
tooltip: {
trigger: "axis",
axisPointer: {
type: "cross",
},
},
grid: {
left: 60,
right: 20,
top: 60,
bottom: 60,
},
xAxis: {
type: "time",
name: undefined,
},
yAxis: {
type: "value",
name: undefined,
min: 0,
max: Math.round(goalDistance * 1.1),
},
series: [
{
name: "Cumulative Distance",
type: "line",
data: cumulativeData,
smooth: true,
lineStyle: {
width: 3,
color: "#5470c6",
},
areaStyle: {
color: {
type: "linear",
x: 0,
y: 0,
x2: 0,
y2: 1,
colorStops: [
{ offset: 0, color: "rgba(84, 112, 198, 0.3)" },
{ offset: 1, color: "rgba(84, 112, 198, 0.05)" },
],
},
},
},
{
name: "Monthly Distance",
type: "bar",
data: monthlyColumns,
itemStyle: {
color: "#91cc75",
},
},
{
name: "Goal (2000km)",
type: "line",
data: goalLine,
lineStyle: {
width: 2,
color: "#ee6666",
type: "dashed",
},
symbol: "none",
},
],
};
}
Example 2: Conversions Between Schemas
To produce the answer in the screenshot above, Superego is reading from the local “Runs” collection the user created, which is set up to sync data from their Strava account.
The “Runs” collection has a custom schema defined by the user, based on their needs and their intended use of the collection. The Strava API, however, has its own, different schema for the resources it exposes. To make the synchronization possible, Superego needs to convert between the two.
When I set out to implement the sync feature, my first ideas for the conversion involved complex configuration objects defining a mapping between the two schemas, and even more complex UIs to allow the user to build those mappings.
Then I thought: what if all my users were developers? I could simply ask them to write an ad hoc conversion function when they are setting up the synchronization.
So that’s what I did, again throwing in an LLM to write the code:
Superego
The magic wand button (top-right) asks the LLM to implement the conversion function.
Superego is still in alpha, and is only targeting techies at the moment, so in this case its interface does show the generated code. But the plan is to keep it all behind the scenes, showing only a preview with the results of a few sample conversions so the user can check they’re correct.
Example 3: Custom UIs
When a user creates a collection in Superego, the collection’s default view is a pretty basic table listing all documents. Ideally, though, the user would like a UI that fits the data in the collection and helps them understand it at a glance.
But how can Superego possibly offer the right UI for every collection users might create?
Well, if all its users were developers… You get the drift. In Superego, users can write snippets of React code to render custom UIs. Non-developers can, of course, just ask the LLM:
Superego
The LLM-generated TypeScript function that renders the custom UI.
import React, { useState, useMemo } from "react";
import {
Grid,
Tile,
Text,
Button,
Echart,
Table,
} from "@superego/app-sandbox/components";
import type * as Collection_CUbKFLByjPaYESYhMUD3R from "./Collection_CUbKFLByjPaYESYhMUD3R.js";
interface Props {
collections: {
/**
* "Runs" collection.
*
* My runs
*/
Collection_CUbKFLByjPaYESYhMUD3R: {
id: "Collection_CUbKFLByjPaYESYhMUD3R";
versionId: "CollectionVersion_CUbKFLByjPaYETwYVjZgx";
displayName: string;
documents: {
id: `Document_${string}`;
content: Collection_CUbKFLByjPaYESYhMUD3R.Run;
}[];
};
};
}
export default function App(props: Props): React.ReactElement | null {
const runs = props.collections.Collection_CUbKFLByjPaYESYhMUD3R.documents;
// State for current month navigation
const [currentMonth, setCurrentMonth] = useState<Date>(() => {
const now = new Date();
return new Date(now.getFullYear(), now.getMonth(), 1);
});
// Filter runs for current month
const monthRuns = useMemo(() => {
return runs.filter((doc) => {
const runDate = new Date(doc.content.date);
return (
runDate.getFullYear() === currentMonth.getFullYear() &&
runDate.getMonth() === currentMonth.getMonth()
);
});
}, [runs, currentMonth]);
// Calculate monthly statistics
const monthlyStats = useMemo(() => {
if (monthRuns.length === 0) {
return {
totalRuns: 0,
totalDistance: 0,
totalDuration: 0,
avgDistance: 0,
avgDuration: 0,
avgPace: 0,
};
}
const totalDistance = monthRuns.reduce(
(sum, doc) => sum + doc.content.distanceKm,
0,
);
const totalDuration = monthRuns.reduce(
(sum, doc) => sum + doc.content.durationSeconds,
0,
);
const avgDistance = totalDistance / monthRuns.length;
const avgDuration = totalDuration / monthRuns.length;
const avgPace = avgDistance > 0 ? avgDuration / 60 / avgDistance : 0;
return {
totalRuns: monthRuns.length,
totalDistance: Math.round(totalDistance * 100) / 100,
totalDuration: Math.round(totalDuration / 60) * 60, // Round to nearest minute
avgDistance: Math.round(avgDistance * 100) / 100,
avgDuration: Math.round(avgDuration),
avgPace: Math.round(avgPace * 100) / 100,
};
}, [monthRuns]);
// Format duration to mm:ss or hh:mm:ss
const formatDuration = (seconds: number): string => {
const hours = Math.floor(seconds / 3600);
const minutes = Math.floor((seconds % 3600) / 60);
const secs = Math.round(seconds % 60);
if (hours > 0) {
return `${hours}:${minutes.toString().padStart(2, "0")}:${secs.toString().padStart(2, "0")}`;
}
return `${minutes}:${secs.toString().padStart(2, "0")}`;
};
// Navigate months
const navigateMonth = (direction: number) => {
setCurrentMonth((prev) => {
const newDate = new Date(prev);
newDate.setMonth(prev.getMonth() + direction);
return newDate;
});
};
// Format month display
const monthDisplay = currentMonth.toLocaleDateString("en-US", {
year: "numeric",
month: "long",
});
// Prepare chart data for daily distance
const dailyDistanceData = useMemo(() => {
const daysInMonth = new Date(
currentMonth.getFullYear(),
currentMonth.getMonth() + 1,
0,
).getDate();
const dailyData: number[] = new Array(daysInMonth).fill(0);
monthRuns.forEach((doc) => {
const day = new Date(doc.content.date).getDate();
dailyData[day - 1] = Math.round(doc.content.distanceKm * 100) / 100;
});
return dailyData.map((distance, index) => [
`${currentMonth.getFullYear()}-${(currentMonth.getMonth() + 1).toString().padStart(2, "0")}-${(index + 1).toString().padStart(2, "0")}`,
distance,
]);
}, [monthRuns, currentMonth]);
// Calculate max distance for chart scaling
const maxDailyDistance = useMemo(() => {
return monthRuns.reduce(
(max, doc) => Math.max(max, doc.content.distanceKm),
0,
);
}, [monthRuns]);
// ECharts option for daily distance
const distanceChartOption = {
tooltip: {
trigger: "axis",
axisPointer: { type: "cross" },
},
grid: { left: 0, right: 0, top: 0, bottom: 0 },
xAxis: {
type: "time",
name: undefined,
},
yAxis: {
type: "value",
name: undefined,
min: 0,
max:
maxDailyDistance > 0
? Math.round(maxDailyDistance * 1.2 * 100) / 100
: 10,
},
series: [
{
type: "bar",
data: dailyDistanceData,
itemStyle: { color: "#228be6" },
},
],
};
return (
<Grid>
<Grid.Col span={{ sm: 12, md: 12, lg: 12 }}>
<div
style={{
display: "flex",
justifyContent: "center",
alignItems: "center",
marginBottom: 8,
}}
>
<Button variant="invisible" onPress={() => navigateMonth(-1)}>
←
</Button>
<Text
element="h2"
size="lg"
weight="semibold"
style={{ margin: "0 16px" }}
>
{monthDisplay}
</Text>
<Button variant="invisible" onPress={() => navigateMonth(1)}>
→
</Button>
</div>
</Grid.Col>
<Grid.Col span={{ sm: 12, md: 6, lg: 3 }}>
<Tile style={{ marginBottom: 4 }}>
<Text element="h3" size="md" weight="medium" color="secondary">
Total Runs
</Text>
<Text
element="p"
size="xl3"
weight="bold"
style={{ marginTop: 4, marginBottom: 0 }}
>
{monthlyStats.totalRuns}
</Text>
</Tile>
</Grid.Col>
<Grid.Col span={{ sm: 12, md: 6, lg: 3 }}>
<Tile style={{ marginBottom: 4 }}>
<Text element="h3" size="md" weight="medium" color="secondary">
Total Distance
</Text>
<Text
element="p"
size="xl3"
weight="bold"
style={{ marginTop: 4, marginBottom: 0 }}
>
{monthlyStats.totalDistance} km
</Text>
</Tile>
</Grid.Col>
<Grid.Col span={{ sm: 12, md: 6, lg: 3 }}>
<Tile style={{ marginBottom: 4 }}>
<Text element="h3" size="md" weight="medium" color="secondary">
Total Duration
</Text>
<Text
element="p"
size="xl3"
weight="bold"
style={{ marginTop: 4, marginBottom: 0 }}
>
{formatDuration(monthlyStats.totalDuration)}
</Text>
</Tile>
</Grid.Col>
<Grid.Col span={{ sm: 12, md: 6, lg: 3 }}>
<Tile style={{ marginBottom: 4 }}>
<Text element="h3" size="md" weight="medium" color="secondary">
Avg Pace
</Text>
<Text
element="p"
size="xl3"
weight="bold"
style={{ marginTop: 4, marginBottom: 0 }}
>
{monthlyStats.avgPace > 0
? `${Math.floor(monthlyStats.avgPace)}:${Math.round(
(monthlyStats.avgPace % 1) * 60,
)
.toString()
.padStart(2, "0")}/km`
: "--:--"}
</Text>
</Tile>
</Grid.Col>
<Grid.Col span={{ sm: 12, md: 12, lg: 8 }}>
<Tile style={{ height: 400 }}>
<Text
element="h3"
size="md"
weight="medium"
color="secondary"
style={{ marginBottom: 16 }}
>
Daily Distance
</Text>
<Echart option={distanceChartOption} width="100%" height="320px" />
</Tile>
</Grid.Col>
<Grid.Col span={{ sm: 12, md: 12, lg: 4 }}>
<Tile style={{ height: 400 }}>
<Text
element="h3"
size="md"
weight="medium"
color="secondary"
style={{ marginBottom: 16 }}
>
Monthly Averages
</Text>
<div style={{ display: "flex", flexDirection: "column", gap: 12 }}>
<div>
<Text size="sm" color="secondary">
Avg Distance
</Text>
<Text
size="lg"
weight="medium"
style={{ marginTop: 4, marginBottom: 4, marginLeft: 8 }}
>
{monthlyStats.avgDistance} km
</Text>
</div>
<div>
<Text size="sm" color="secondary">
Avg Duration
</Text>
<Text
size="lg"
weight="medium"
style={{ marginTop: 4, marginBottom: 4, marginLeft: 8 }}
>
{formatDuration(monthlyStats.avgDuration)}
</Text>
</div>
<div>
<Text size="sm" color="secondary">
Avg Pace
</Text>
<Text
size="lg"
weight="medium"
style={{ marginTop: 4, marginBottom: 4, marginLeft: 8 }}
>
{monthlyStats.avgPace > 0
? `${Math.floor(monthlyStats.avgPace)}:${Math.round(
(monthlyStats.avgPace % 1) * 60,
)
.toString()
.padStart(2, "0")}/km`
: "--:--"}
</Text>
</div>
</div>
</Tile>
</Grid.Col>
<Grid.Col span={{ sm: 12, md: 12, lg: 12 }}>
<Tile>
<Text
element="h3"
size="md"
weight="medium"
color="secondary"
style={{ marginBottom: 16 }}
>
Run Details
</Text>
{monthRuns.length > 0 ? (
<Table ariaLabel="Run details for current month">
<Table.Header>
<Table.Column isRowHeader>Date</Table.Column>
<Table.Column align="right">Distance (km)</Table.Column>
<Table.Column align="right">Duration</Table.Column>
<Table.Column align="right">Pace (/km)</Table.Column>
</Table.Header>
<Table.Body>
{monthRuns
.sort(
(a, b) =>
new Date(a.content.date).getTime() -
new Date(b.content.date).getTime(),
)
.map((doc) => {
const pace =
doc.content.distanceKm > 0
? doc.content.durationSeconds /
60 /
doc.content.distanceKm
: 0;
return (
<Table.Row key={doc.id}>
<Table.Cell>
{new Date(doc.content.date).toLocaleDateString()}
</Table.Cell>
<Table.Cell align="right">
{Math.round(doc.content.distanceKm * 100) / 100}
</Table.Cell>
<Table.Cell align="right">
{formatDuration(doc.content.durationSeconds)}
</Table.Cell>
<Table.Cell align="right">
{pace > 0
? `${Math.floor(pace)}:${Math.round((pace % 1) * 60)
.toString()
.padStart(2, "0")}`
: "--:--"}
</Table.Cell>
</Table.Row>
);
})}
</Table.Body>
</Table>
) : (
<Text color="secondary">No runs recorded for this month</Text>
)}
</Tile>
</Grid.Col>
</Grid>
);
}
The Common Theme
These three examples are variations of the same theme:
- There is a problem the app wants to solve, but the details of the problem are not fully known beforehand.
- The app enables users to solve the problem by giving them a way to write and execute custom code.
- The app also gives users direct or indirect access to an LLM, which makes it trivial (or even completely transparent) to write the solution code.
You can also see the theme in ChatGPT’s “Code Interpreter” feature: to solve a certain class of problems (usually data analysis), the user has (indirect) access to a Python sandbox for running arbitrary code; then, ChatGPT writes (and executes) the code to solve those problems for the user.
Points one and two actually describe a paradigm–end-user programming–that has been around for decades in various forms: spreadsheets, interactive notebooks, userscripts, etc.
Until now, however, these capabilities have been reserved for power users–the only ones that could take advantage of them. Adding point three unlocks these capabilities for everyone.
The Elephant in the Room
I’ve titled this post “Improvised Software” because the approach of generating code just-in-time reminds me a lot of improvised music. Just like an improvised solo, the code the LLM “improvises”:
- happens in the moment–it’s not predetermined;
- is slightly different every time it’s improvised;
- can, at times, come out completely wrong.
And therein lies the elephantine problem I’ve been conveniently ignoring: how on earth do I know that the improvised code is correct? And if it’s not, can the user even tell?
Ensuring Correctness (An Attempt)
Answering the first question is easy: I don’t. But actually I also don’t know that the code I write is correct! Or, well, I can’t prove it for certain.
However, just like I do with my own code, I can use a Swiss cheese model to try to make it as unlikely as possible that the code is wrong, using type-checking, tests, code reviews, etc.
For the code improvised in Superego, I’m currently using these “slices”:
- Good instructions for the LLM. I tell it what it’s implementing, what the goal is, what template it should follow, etc. From the basics up to more specific rules for the task at hand. (Example.)
- Types. I’m asking the LLM to write TypeScript functions with well-defined input and output types.
- TypeScript compilation. I run the generated code through tsc to ensure it compiles correctly. On compilation errors, I feed them back to the LLM, asking it to fix them.
- Utilities for common or error-prone pieces of logic. When I started using this “code improvisation” approach for the question-answering feature in example 1, various LLMs I tried kept messing up dates and time zones. So, I wrote a LocalInstant utility to work with dates and made it so it’s really difficult to misuse. I instructed the LLM to “always use this”, and that effectively got rid of date-related bugs.
- Feedback loop for runtime errors. For code that executes right after it’s generated, if it throws an error while executing, I feed the error back to the LLM and ask it to fix the code.
These layers are already surprisingly effective, but, to push reliability even further, I also plan to try:
- Test generation. Ask the LLM to also implement a handful of unit tests for the function, run the tests, and feed errors back to the LLM.
- Double-entry implementation. Ask the LLM to implement the function twice, compare results, and–if they don’t match–ask the LLM to correct itself.
- User review. Show the user a preview of the result and ask them if it looks correct. (Actually, I’m sort of doing this already in example 3.)
Dealing With Mistakes
What happens when the holes line up, and the LLM produces incorrect code? Can the user even tell?
The easiest failures to deal with are the spectacular ones, where the user immediately sees that something is wrong. For those cases, I found that a retry button and a way for the user to ask for corrections are good, practical remediation strategies.
Other easy failures are those where I can validate the code execution result. Take example 2: if the conversion produces documents that don’t match the collection schema, an error is surfaced to the user. (Ideally, these failures should be automatically fed back to the LLM to allow it to correct itself.)
The real head-scratchers are the subtle failures: off-by-one errors, time zone bugs, mishandled edge cases, etc. Alas, I don’t have a solution for those. Those are hard to spot in human-written software as well, though, admittedly, improvised code that’s generated for and used by a single user has the additional disadvantage of never getting field-tested by many.
In Superego, I try to mitigate the most serious consequences by making all data-writing operations traceable and reversible (via document versioning), so users at least have the possibility to manually correct them.
And well, we live in the LLM era, where disclaimers that “the assistant can make mistakes” are omnipresent. I can only join the choir and tell my users: “use the software with caution and, if you find it useful more often than not, don’t throw the baby out with the bathwater”. It’s a cop-out, I know, but it’s a problem for which we just don’t have a solution yet.
Not Only for Superego?
I see a lot of potential for this “code improvisation” pattern to be employed in other contexts. As others do, actually. Cloudflare recently proposed an alternative way to use MCP, “Code Mode”, which is also based on code improvisation. On multiple occasions, Theo spoke about something similar.
Here I’ve only presented how I use it in the limited context of Superego. I showed some of the strategies I employ to make it work, but, of course, they might not be applicable in other scenarios. I also haven’t touched on other important aspects needed to make the pattern work:
- Security: How to ensure improvised code can’t cause any damage?1
- Observability: How to “observe” and debug a piece of code generated on the fly?
- Versioning: How to keep old improvised code working throughout app changes?
But the article is already very long, and I don’t want to write a treatise.
And again, the strategies I employed might not be very generalizable. Nonetheless, I think they’re interesting and worth sharing, and they might even be useful to someone else.
1You might have been thinking this was the elephant in the room, but actually for Superego I found it surprisingly straightforward to set up a combination of QuickJS and sandboxed iframes to ensure improvised code can’t do real damage. Cloudflare sandboxes are another recently launched service specifically designed to run AI-generated code on demand, which should make security much easier. In general, it’s a very interesting topic on its own, and I might write a follow-up article dedicated to it.