OpenAI’s Apps SDK: A Developer’s Guide to Getting Started

“The ChatGPT app ecosystem is built on trust,” OpenAI’s guidelines for app developers proclaim.

As the AI ecosystem expands, OpenAI’s ChatGPT needs help drilling down on local information — and this is primarily what the OpenAI Apps SDK, introduced at the beginning of the month, is for. The problem for OpenAI is that it is actually the web that is built on trust; but for all intents and purposes, OpenAI wants to bypass that.

The development process is relatively austere and unapproachable, so this post just focuses on the basic concepts you need to understand before you can get down to developme…

“The ChatGPT app ecosystem is built on trust,” OpenAI’s guidelines for app developers proclaim.

The development process is relatively austere and unapproachable, so this post just focuses on the basic concepts you need to understand before you can get down to development.

What Are ChatGPT Apps and What Is Their Purpose?

What exactly is a ChatGPT App for? The primary point of an app is to enhance a ChatGPT conversation, nothing more. Think of them as visual experiences that appear in various guises (for instance, carousels) that integrate seamlessly into a conversation without breaking the user context.

OpenAI is relying on your app’s metadata to accurately describe the questions that it can answer — either as a whole, or as part of the app. We will see that come up later. But beyond this, I imagine this will play out like the Apple App Store, with some apps being favored by OpenAI and some being blackballed. Without favor, your app will not be able to act as a merchant.

Let’s take a quick look at the effect OpenAI is hoping for.

Understanding the Inline App Display

The inline display mode appears directly in the flow of the conversation. Imagine that you are on ChatGPT and are asking about pizzas in San Francisco:

Let’s take a look, from top to bottom, at the various fixed aspects:

At the top is the query; this will be what your app’s metadata suggests it can answer.
Then, just above the main visuals, is an icon with the app name “Pizzazz.”
Then we see the inline display, which is your “app content.”
Finally, it tails off with a model-generated response. (Although it might also be largely driven by your app; the documentation is ambiguous here.)

I have no doubt the introduction of OpenAI’s Atlas browser will increase the interest in providing apps — much like the spread of the web increased interest in websites.

In fact you can even ask Atlas to run an App example. This one didn’t work, but at least it tried:

But don’t read anything into the above until you’ve understood what the main technology you need is.

The Core Technology: Model Context Protocol (MCP)

The central star of any OpenAI App is Model Context Protocol (or MCP), which has fortunately been written about quite a bit a few months ago. We’ve even written about rich components through MCP, so even that isn’t new. But don’t read any further until you have grokked the basics of MCP, because the Apps SDK leans heavily on it — treating your app as a set of tool calls.

The idea is that you describe enough about your tools that ChatGPT knows when it can use them, and will return some type of fixed data format that it can plug into a design.

But why MCP? A lot of the reasons come down to its protocol-agnostic nature, and use of natural language in descriptions — which large language models are good with. OAuth 2.1 flows are used for access control. Plus, you can easily run local MCP servers.

So, your minimal MCP server for Apps SDK has to implement these three capabilities at least:

It must list (or advertise) all of its tools, and the shape of the incoming and return data.
It must respond to the model calling the tool via call_tool.
The tool must then return structured content, and optionally point to anything else needed by the ChatGPT client to render your content.

One of the slightly sinister but well-intentioned quotes in the OpenAI Apps SDK document is this: “Good discovery hygiene means your app appears when it should and stays quiet when it should not.” What they mean is that each tool response needs to be against an “action-oriented” request initiated by the user in the conversation.

The Role and Rules of Your MCP Server

So your MCP server is the foundation of your app. It exposes tools that the model can call, and returns the packaged structured data plus component HTML that the ChatGPT client renders inline. It also might need to deal with any authentication needed to access certain resources.

Within your server, there are a number of rules. There should be one “action-focused” job per tool. This may not be efficient to write, but it makes the customer journey more refined. For example, read and write behavior would certainly be different calls.

Apart from the action-oriented tool name, the model is also looking for action-oriented sentences within the description of the tool that start “Use this when ….” The example in the docs is, “Use this when the user wants to view their kanban board.”

In addition to returning structured data, each tool on your MCP server should also reference an HTML UI template in its descriptor. This HTML template will be rendered in an iframe by ChatGPT. We will look at this briefly below.

Example App

The fastest way to get started is to use the supported official Python SDK or the Typescript SDK. Then add your preferred web framework.

We’ll end this post with the HTML UI template registration that ChatGPT will render with an iframe, as we saw in the examples at the top. These are all in the documentation. Otherwise, use the given examples to fashion your first attempts.

The node example is the one featured. First, the MCP server is created:


12	// Create an MCP serverconst server = new McpServer({ name: “kanban-server”, version: “1.0.0”});

Then we register the UI template:


1234567891011121314151617181920212223242526272829303132333435363738394041424344	// UI resource (no inline data assignment; host will inject data)server.registerResource( “kanban-widget”, “ui://widget/kanban-board.html”, {}, async () => ({ contents: [ { uri: “ui://widget/kanban-board.html”, mimeType: “text/html+skybridge”, text: `<div id=“kanban-root”></div>${KANBAN_CSS ? `<style>${KANBAN_CSS}</style>` : “”}<script type=“module”>${KANBAN_JS}</script> `.trim(), _meta: { /* Renders the widget within a rounded border and shadow. Otherwise, the HTML is rendered full-bleed in the conversation / “openai/widgetPrefersBorder”: true, / Assigns a subdomain for the HTML. When set, the HTML is rendered within `chatgpt-com.web-sandbox.oaiusercontent.com` It’s also used to configure the base url for external links. / “openai/widgetDomain”: ‘https://chatgpt.com’, / Required to make external network requests from the HTML code. Also used to validate `openai.openExternal()` requests. / ‘openai/widgetCSP’: { // Maps to `connect-src` rule in the iframe CSP connect_domains: [‘https://chatgpt.com’], // Maps to style-src, style-src-elem, img-src, font-src, media-src etc. in the iframe CSP resource_domains: [‘https://.oaistatic.com’], } } }, ], }));

1234567891011121314151617181920212223242526272829303132333435363738394041424344

// UI resource (no inline data assignment; host will inject data)server.registerResource( “kanban-widget”, “ui://widget/kanban-board.html”, {}, async () => ({ contents: [ { uri: “ui://widget/kanban-board.html”, mimeType: “text/html+skybridge”, text: `<div id=“kanban-root”></div>${KANBAN_CSS ? `<style>${KANBAN_CSS}</style>` : “”}<script type=“module”>${KANBAN_JS}</script> `.trim(), _meta: { /* Renders the widget within a rounded border and shadow. Otherwise, the HTML is rendered full-bleed in the conversation */ “openai/widgetPrefersBorder”: true, /* Assigns a subdomain for the HTML. When set, the HTML is rendered within `chatgpt-com.web-sandbox.oaiusercontent.com` It’s also used to configure the base url for external links. */ “openai/widgetDomain”: ‘https://chatgpt.com’, /* Required to make external network requests from the HTML code. Also used to validate `openai.openExternal()` requests. */ ‘openai/widgetCSP’: { // Maps to `connect-src` rule in the iframe CSP connect_domains: [‘https://chatgpt.com’], // Maps to style-src, style-src-elem, img-src, font-src, media-src etc. in the iframe CSP resource_domains: [‘https://*.oaistatic.com’], } } }, ], }));

From now on, the resource is recognised via that uri of ui://widget/kanban-board.html. Also, note the required mime type of text/html+skybridge. (It is assumed that CSS and HTML content exists for the kanban board itself.)

Finally, the tool is registered:


1234567891011121314151617181920	server.registerTool( “kanban-board”, { title: “Show Kanban Board”, _meta: { // associate this tool with the HTML template “openai/outputTemplate”: “ui://widget/kanban-board.html”, // labels to display in ChatGPT when the tool is called “openai/toolInvocation/invoking”: “Displaying the board”, “openai/toolInvocation/invoked”: “Displayed the board” }, inputSchema: { tasks: z.string() } }, async () => { return { content: [{ type: “text”, text: “Displayed the kanban board!” }], structuredContent: {} }; } );

1234567891011121314151617181920

server.registerTool( “kanban-board”, { title: “Show Kanban Board”, _meta: { // associate this tool with the HTML template “openai/outputTemplate”: “ui://widget/kanban-board.html”, // labels to display in ChatGPT when the tool is called “openai/toolInvocation/invoking”: “Displaying the board”, “openai/toolInvocation/invoked”: “Displayed the board” }, inputSchema: { tasks: z.string() } }, async () => { return { content: [{ type: “text”, text: “Displayed the kanban board!” }], structuredContent: {} }; } );

Note that the outTemplate matches the template uri.

Conclusion

I’m fairly certain a friendlier way of doing all this will emerge — either through pick and matched UI templates, or with better libraries. But for now, the pioneers of ChatGPT Apps must work with what OpenAI has provided; the prize being an early presence in the burgeoning AI-based economy.