PAINT25 Invited Talk transcript: “Notational Freedom via Self-Raising Diagrams”

(I gave two talks at SPLASH this year. Besides my Onward! Essays submission, I was kindly invited to open the PAINT workshop on Programming Abstractions, Interactive Notations and Tools. Because it was not recorded, and there was no associated paper, I’m posting a lightly edited script here. This is my first iteration of these ideas; I already wish to re-work them and present again somewhere. The main patch to apply to this transcript is that “vector diagram parsing” is distinct from “self-raising diagrams”, though the latter depends on diagram parsing to get off the ground.)

Abstract: So…

Abstract: Some things are better drawn than coded. However, it takes a lot of work to build a custom editing interface for each new notation, discouraging experimentation. There has to be a better way than Greenspunning poor approximations of Adobe Illustrator over and over again. “Self-raising diagrams” are a promising escape from this trap. Just as source code—a static artefact—“raises itself” into a dynamic running program, a vector graphics diagram—taken to generalise source code—can similarly be parsed, interpreted, and “animated” into a GUI. Notational engineers can then focus on the notations themselves, and their semantics, having left the implementation of drawing interfaces to the experts (implementors of standard vector graphics editors). The tasks of normalising, interpreting, and transforming vector notations contain many relevant problems for the notational engineer—far more relevant than coding line rubberbanding for the umpteenth time!

In programming, we have a maxim of “use the right tool for the job”. There are three load-bearing concepts here: right, tool, and job. I think the ideal spirit of this maxim is captured by the following interpretations:

First, whether or not a tool is *right *for the job is best determined by the person who is going to use the tool. It would be best if each person is able to use the tool that they feel they’ll be most productive with. The judgement of what is “right” is subjective and idiosyncratic to the preferences and skills of the user.
The “tool” part has a range of interpretations; a tool could be an entire programming language, it could be a design approach, or it could be a way of representing information, such as a notation.
The “job” could be an entire project, but it doesn’t have to be: it could be a smaller component, a class, a function, a data structure, even perhaps a single *line *of code, or a single expression, or a single *token *in the code.

There’s also something to be said about the word “use”: if the tool we want to use already exists, then what is the cost of using it alongside all our other tools? Call that the cost of tool integration. And if the right tool for the job doesn’t exist, what is the cost of making it ourselves for the occasion? Call this the cost of tool production.

I find all of this most relevant for how we represent code and data in programming. There is a lot of room for improvement because very often we have to use one single syntax, or one single notation, to express everything. This could be called the “one size fits all” approach, which is the antithesis of “use the right tool for the job”, and it happens because the cost of integrating other notations is too high, let alone the cost of inventing ad-hoc notations for the problem at hand.

STEPS and Mood-Specific Languages

An important reason I’m doing research today is because when I was an undergraduate, I somehow discovered the work of the STEPS project, which utterly blew my mind. The goal of the project was to replicate the stack of functionality for personal computing in under 20,000 lines of code. I believe they were successful in this. And the way they did this was, in large part, by making copious use of domain-specific languages, or by using the right language for the job as much as possible.

(Related: Alan Kay’s talks on “T-Shirt Computing“; “Complex vs. Complicated“)

Of course, to support lots of domain-specific languages, they needed to somehow lower the cost of integration and cost of production for these languages. And this was the purpose of OMeta (later Ohm), a domain-specific language for pattern matching and implementing other domain-specific languages.

I was particularly taken by this quote from one of my favourite papers from the project (my emphasis):

Applying [internal evolution] locally provides scoped, domain-specific languages in which to express arbitrarily small parts of an application (these might be better called mood-specific languages). Implementing new syntax and semantics should be (and is) as simple as defining a new function or macro in a traditional language.

A simple example of such a “mood-specific language” might be where you’re doing graphics programming and you have a single line of code that does some vector algebra, but your language doesn’t support operator overloading (🤢):

vec_a.mul(cos(ang_b/2))
.add(vec_b.mul(cos(ang_a/2)))
.add(vec_a.cross(vec_b))

In this case I would certainly prefer to be able to define a local syntax using infix operators and implicit multiplication, which would just compile to the verbose original:

cos(ang_b/2) vec_a + cos(ang_a/2) vec_b + vec_a × vec_b

(Related: the Unicode-heavy Nile stream processing language and the Gezira graphics stack built atop it, all part of STEPS.)

However, I think a much more instructive example of one of these languages can be found in the STEPS 2007 progress report. Whenever I need to convince someone of the practical motivation for these sorts of ideas, I’m always thinking that they said it best in the source material, and I couldn’t do any better by paraphrasing myself! So here it says (my emphasis):

Elevating syntax to a ‘first-class citizen’ of the programmer’s toolset suggests some unusually expressive alternatives to complex, repetitive, opaque and/or error-prone code. Network protocols are a perfect example of the clumsiness of traditional programming languages obfuscating the simplicity of the protocols and the internal structure of the packets they exchange. We thought it would be instructive to see just how transparent we could make a simple TCP/IP implementation.

Our first task is to describe the format of network packets. Perfectly good descriptions already exist in the various IETF (RFCs) in the form of “ASCII-art diagrams”. This form was probably chosen because the structure of a packet is immediately obvious just from glancing at the pictogram. For example:

If we teach our programming language to recognize pictograms as definitions of accessors for bit fields within structures, our program is the clearest of its own meaning.

Then they show an executable grammar in OMeta (or some precursor called IS) which can parse and interpret the ASCII art diagram for the packet header. And then:

We can now define accessors for the fields of an IP packet header simply by drawing its structure. The following looks like documentation, but it’s a valid program.

They subsequently re-use this domain-specific language to define the TCP packet format in a similar way.

Notational Freedom and Diagram Parsing

This was clearly a fantastic idea—use the right language for the job, no matter how small that job is—but it’s always bothered me that many jobs in programming don’t neatly fit into a language—because what we usually mean by “language”, at least in terms of what we interact with on the screen, is MONOSPACED TEXT, or syntax. So the last line of the “mood-specific languages” quote, about lowering the cost of production for ad-hoc languages—

Implementing new syntax and semantics should be (and is) as simple as defining a new function or macro in a traditional language.

—could be called syntactic freedom: the freedom to use the right *syntax *for the job. That’s not *quite *the same thing as the right TOOL for the job. And I’ve never been able to resist seeing this as a frustrating special case of a more general notational freedom, i.e. the freedom to use the right notation (including syntax) for the job:

Implementing a new *notation *(and its semantics) really ought to be as simple as defining a new function or macro. (Or, at the very least, it should be a lot simpler than it normally is.)

As brilliant as they are, those ASCII art diagrams always frustrated me, because what is ASCII art if not a CONCESSION that programmers make to the medium of plain text, whenever we want to draw a diagram! Why is this beautiful idea, of using the right notation for the job, limited to ASCII art? Why can’t we have the table, the *real *table made of *real *lines that this diagram is *clearly *approximating—why can’t we have *that *as the source code? What’s stopping us from parsing *this *instead?

The immediate answer is that parsing REAL graphics (rather than ASCII art) is not remotely an established thing the way that parsing text is. Where do you begin? How is this diagram encoded? Must we use Computer Vision or AI to recognise that these pixels are a line, while those pixels spell the word “sourceAddress”…?

Of course not! Vector Graphics exists and has existed for a long time, so no fancy computer vision techniques are necessary. As to the other questions about what it means to parse a vector diagram, I think these are very interesting questions that deserve to be answered practically—by trying to do it:

Ta-da! Now it has to be said that there’s no OMeta here; I just wrote a load of JavaScript. But there are some interesting patterns in the JavaScript. In a moment, I will come back to what’s going on here in more detail. But first, I will explain the general idea.

Self-Raising Diagrams via Diagram Parsing

Just as the set of all possible syntaxes is merely a subset of all possible notations, the set of all possible source code is a subset of all possible vector graphics diagrams—because diagrams can include text! And I know that, at least in the SVG format, you can programmatically extract strings from the text elements. So it’s interesting to consider vector diagrams as a strict *generalisation *of source code. And this is what leads me to the concept of “self-raising diagrams”.

I like to think of source code as self-raising text. Source code is this static artefact—a text file—which could be printed out on paper. Yet, with the aid of an interpreter, it raises itself into the dynamic world of a running program. Analogously, what we might call “generalised source code” exists as a static vector diagram, which could also be faithfully reproduced on paper. Yet, given an interpreter, it raises itself into a running program. A self-raising diagram. Except this time, because the source code is fully graphical, I think it’s actually easier for the running program to be graphical and interactive by default; more so than what you get with self-raising text.

For example, I drew this diagram in my go-to editor and exported it to SVG. The only change I made to the file afterwards was the insertion of <script> tags to load my infrastructure in the background; other than that, it’s an ordinary SVG file. It contains some circles, and some JavaScript code in red boxes. The red border means the code will be executed as the final step of processing the diagram (green boxes have a semantics of “ignore contents”, i.e. they are comments). The code sets up some event handlers so that, if any “circle” element is dragged, it will move with the mouse. Initially, they don’t move. But after I run the parser/processor, the code in the red boxes will have been executed, and the circles are draggable. I would probably regard this as the “Hello, World” of self-raising diagrams: the simplest one that does something interactive.

Of course, running code as a program isn’t the *only *thing we do with source code. We also *compile *source code into an executable binary (which you could say is interpreted by the hardware) or *transpile *it into source code in another language. Some languages like C, TeX, and Lisp have macros which expand into code before execution, and compilers make multiple *passes *to transform the code. All these concepts must also have interesting analogies in the realm of self-raising diagrams. Not that I know what they are yet (besides passes, to be elaborated shortly).

However, no matter whether we’re compiling or interpreting, the first step in all of these processes is *parsing *the source representation. In the early stage of this research so far, where I’ve only spent about 3 weeks of work on it, it is this first task of diagram parsing which I’ve had the most experience with. And—with apologies to Tratt—it’s very interesting to consider just how much of a solved problem text parsing is. Decades of computer science research and formalisation have been poured into parsing strings into trees. Yet if we look at a diagram like the “drag circles” demo, all of that appears to go out the window, because now there are some quite daunting differences from the world of text.

For one, we no longer have a one-dimensional stream to scan from beginning to end. Even worse, we no longer have a discrete space of characters; we’ve got a *continuous *space of floating-point coordinates. So if text parsing could be seen as a formalisation of what the eyes do when reading text, we’re now confronted with formalising what the eyes do when reading a two-dimensional diagram that might not have an obvious reading order.

In order to make some progress on this, I just tried brute-forcing several different types of diagrams. I wrote JavaScript to parse them, and I just tried to be reflective about the whole process; I tried to notice which patterns seem to recur, or what architecture seems to grow out of this experimentation. I identify 8 emergent principles that are very clear to me even after just 3 weeks and about four or five different diagrams. I’m tentatively naming them as follows:

Precise, Predictable Syntax/Semantics
The Document is the Ground Truth
Idempotent, Typed Passes
Visual Debug Annotations
SVG Normalisation / Jank Correction
De-spatialise / Graphify Early
Concrete Prototyping with Dev Tools
Local Scoping Mechanisms

I’ll run through these using an example. Take a look at this simple diagram format (on the left) called BoxGraph:

Just as text has a concrete syntax representing an abstract syntax tree or AST, BoxGraph can be thought of as one possible visual syntax for encoding an Abstract Data Graph or ADG. There are boxes, which can be named by text labels, and the boxes point to each other with labelled arrows. When we “abstract” something, we throw away information; and the information we’re using this syntax to represent is the *topology *and the names, not the distances or relative positioning or any “metric” qualities. So what it means to parse this diagram is to end up with some representation of this ADG somewhere, such as in JavaScript runtime data.

Precise, Predictable Syntax/Semantics

As a user of this visual syntax, I want to be able to rely on some basic spatial rules to know how the diagram is going to parse. I don’t want to do any guessing, and I don’t want any non-determinism. The fact that we’re using vector graphics instead of computer vision or AI is a big help here. However, I must still be careful to avoid non-determinism in things like iterating through SVG elements. In the parser, if I know that the iteration order might make a difference, then I should make sure to e.g. sort left-to-right by the x coordinate, just to be safe.

Of course, there’s also a “tree syntax” (distinct from the concrete XML syntax) of how this diagram is encoded as SVG XML elements. Different XML trees could map onto diagrams that look exactly the same. So we don’t want the parsing to depend on that either; the semantics have to be independent of properties that aren’t visible in the diagram.

(I’m confused in this graphical domain between saying “syntax” and “semantics”; I instinctively want to say “the semantics OF the syntax”, i.e: the meaning of a box is a node in the ADG, and the meaning of an arrow is a named mapping from node to node. I think this points to the fact that, when we have a long list of transformations, the line between “syntax” and “semantics” becomes somewhat arbitrary. There are other ways to regard this: perhaps syntax is concerned with Boolean evaluations of a structure (valid / invalid), as we see not just in formal languages but also in binary file formats and C structs; while what we ought to call “semantics” involves evaluating a structure to something else, or transforming it to similarly-shaped structures.)

Anyway, this BoxGraph syntax works according to the following rules, which are phrased in terms of spatial relationships:

If arrow A begins inside box B1 and ends on or inside box B2, then B1’s abstract node points to B2’s via A. Here, “on” means that the arrowhead is just touching the edge of the target box. (Empirically, Mathcha.io was outputting diagrams that look correct, but where the arrow endpoint is numerically one or two pixels short of the border, so I make this work using a parameter for tolerance or “epsilon”.)
Each arrow is labelled by the closest text label to its origin point.
Each box is labelled by its closest un-claimed text label, out to a certain max distance parameter. Past that, it’ll just be anonymous.

These rules are what I mean by precise/predictable syntax/semantics. Ideally, they would be written in some formal logic language as I’ve sketched, or even in a mood-specific notation of their own—but that’s a dream for the far future.

(My intent is for these rules to ideally apply “in parallel”, as declaratively as possible. In this BoxGraph case, I haven’t quite achieved this: there is a dependency in the fact that boxes get their labels from the “unclaimed” pool after the arrows take first priority. This could perhaps be rephrased in a purely parallel way, but since this talk is about using the right notation for the job, I used the conceptual presentation that was more intuitive to me.)

The Document Is The Ground Truth

The JavaScript code for parsing BoxGraph broke down into a lot of smaller “passes” over the diagram. These passes have dependencies on each other, and I actually programmed their dependencies by drawing another diagram…!

(function() {
const arrow = function(origin, target) {
let reqs = passReqs[origin];
if (reqs === undefined) passReqs[origin] = reqs = [];
reqs.push(target);
}
// Generated from boxGraph-deps.svg (labelGraph)
arrow('annotateContainments', 'normalizeRects');
arrow('annotateContainments', 'idLabels');
arrow('annotateArrowConnections', 'normalizeRects');
arrow('annotateArrowConnections', 'idArrows');
arrow('generateJSOG', 'nameBoxesIfApplicable');
arrow('nameBoxesIfApplicable', 'labelArrows');
arrow('nameBoxesIfApplicable', 'normalizeRects');
arrow('labelArrows', 'idArrows');
arrow('generateJSOG', 'annotateArrowConnections');
arrow('labelArrows', 'idLabels');
arrow('nameBoxesIfApplicable', 'annotateContainments');
})()

The SVG in the slide shows the structure of the passes for parsing BoxGraph. Each pass makes some change to the SVG document tree, possibly invisible at the diagram level. For example, one of the first passes we need to perform involves recognising the arrows and assigning each one an ID in the DOM. These passes write their results back to the DOM tree: the document is the ground truth. The DOM “data attributes” are invaluable for this. In the slide, you can see that pass idArrows has given the arrow <g> an ID, a class is-arrow, and a data-origin-pt (offscreen, there’s also a data-target-pt).

(Recognising arrows here is tied to Mathcha’s output conventions; more on that later. Because BoxGraph doesn’t involve any lines that aren’t arrows, I cheat and don’t bother checking for the existence or shape of arrowheads. Instead, I just look for elements with class arrow-line, which Mathcha uses for all arrows/lines. I expect a <path> child element, whose d coordinates are in the right order for origin/target. This is not robust to e.g. drawing the arrows the wrong way round in Mathcha (setting the arrowhead on the origin point and deleting it on the target point), but this is not a very interesting objection; it could be, and ought to be, made robust to this in the future—so that whatever the user sees is processed in the way they expect.)

This principle is very convenient and it means that intermediate parses can be saved (and resumed later on) by taking the XML in the DOM Inspector and copying it to a file. This wouldn’t be nearly as easy for random JavaScript heap data structures, so I’m careful to use JavaScript data only as a *cache *of what’s in the tree (The Document Is The Ground Truth!).

(This leads to a pervasive need to sync JS objects with their source DOM attributes, which are stringly typed 🤢. Since I would much rather do arithmetic on JS arrays, rather than space-delimited vectors in strings, I can’t escape the need to sync these two representations. Currently this is done manually, but it would be good to automate it.)

Visual Debug Annotations

“The Document Is The Ground Truth” also means that I can get *visual feedback, *or debug annotations, on what a pass did or what it was thinking. For example, after running labelArrows and nameBoxesIfApplicable, I can see which names got attached to which arrows and boxes from these blue lines:

Concrete Prototyping with Dev Tools

I’m basically treating the browser dev tools as my development environment. When I want to write new passes or fix bugs, I have concrete example data to experiment with in the console before I commit to the JavaScript file.

(Additionally, I am heavily exploiting the fact that the browser understands XML, SVG, DOM, and handles all of that parsing for me when I open the SVG file. My JavaScript can easily use DOM APIs to query for elements, navigate the tree, and rewrite it. This would not be the case if using node.js with a mundane XML parsing library, while the alternative of a “headless” browser setup is beyond my knowledge at the moment.)

SVG Normalisation / Jank Correction

One thing that surprised me when I started out is that Mathcha actually outputs all shapes as SVG polylines or Bezier curves (all via <path>). This is despite the fact that SVG has specific elements like <rect>angle and <c>ircle…! These patterns of “rectangle” and “circle” are semantically meaningful in my diagrams. For this reason, the normalizeRects pass looks for all line loops that look like rectangles and replaces them with the equivalent <rect> element:

This here, by the way, is my “aspirational notation” for expressing such a transformation; the real thing is currently just JavaScript. There is a similar pass for circles.

Pass normalizeCircles not shown, but this is the ad-hoc visualisation I drew to help me craft it.

(As an intermediate step, I’ve made up this ad-hoc mood-specific syntax in a comment, which captures how I’d prefer to express the JS version of normalizeRects given the constraints of plain text:)

FOR ALL matchElt(i) (
tag = 'path' ⟹ 'rect'
class ⊇ { 'real', 'connection' }
d = " Mlx,ty Lrx,ty Lrx,by Llx,by Z"
let [x, y, w, h] = [lx, ty, rx - lx, by - ty]
attributes {
x ⟹ x, y ⟹ y, width ⟹ w, height ⟹ h, d ⟹ nil
}
id ⟹ 'r'+(i+1)
)

I call this “Jank Correction” in the case of Mathcha, because its avoidance of <rect>/<c> does strike me as pretty janky. However, more generally, it seems like a pattern of *normalising *the different SVG that might get output from different editors. If I made the diagram in Inkscape instead, I’d apply the “normalise from Inkscape” preprocessing pass to get the SVG that all the later passes expect to see.

De-spatialise / Graphify Early

Usually, the first few passes that do real work are de-spatialising passes: they do all the computational geometry to reduce the “metric” properties of distance, positioning, containment, intersection etc. to *discrete *properties like “the origin box of the arrow” or “the label of this box”.

After de-spatialising, we’ve left the hardest parts behind and we’re just operating on graphs of DOM elements connected by their IDs. (Past this point, all the hard work—the visual work—is done; are we even “parsing” anymore? Just traversing a data structure like a compiler or interpreter.) This is why I also call this principle “graphify early”.

Idempotent, Typed Passes

There is a strong smell of functional programming and type theory concepts lurking in these passes! Each pass reads certain parts of the DOM tree as input and annotates the tree with its outputs, usually in a declarative way; sometimes even as simple as a map over nodes matching a CSS selector. It expects its input diagram to meet certain preconditions, and the changes it makes to the diagram might enable further passes that are compatible with it (or disable passes which are not). There is an implicit type system here—currently in the code and in my comments:

The passes are also supposed to be idempotent, so once you’ve applied a pass it should figure out that it’s already run and has no more work to do. Hence, the principle of idempotent typed passes. The idempotence seems like a disguised form of functional purity, so there’s definitely a system of typed pure unary functions waiting to be realised here…

Local Scoping Mechanisms

Finally, I want “Local Scoping Mechanisms” in order to treat certain regions of the diagram specially; maybe to hold comments, metadata, or executable code. Or even to contain an embedded diagram in a different notation, to be processed in its own way.

So far, I treat green boxes as comments to be ignored (by the way—notice how you can use real shapes and rich text in your comments! No more ASCII art required!), blue boxes as containing format metadata, and red boxes as code to execute. Since these features are common across many diagram formats, they force a common set of passes for ignoring comments and executing code before deferring to the more specific passes. You can see this common core in the pass graph for the labelGraph format; everything above the red line is shared across diagrams, while everything below it (apart from restoreComments) is syntax/format-specific.)

(“Green”, “blue” and “red” mean the specific RGB values of these colours that are convenient default colours in the Mathcha palette; this currently violates an unstated principle of “visual robustness” that should be filed with Precise/Predictable Semantics. I.e. two identical-looking diagrams (to the human eye) may have wildly different semantics. This is, obviously, improvable in the future. I would like to make the colours or even the shape patterns themselves configurable in the future; this is all reminiscent of conventions like “the top left pixel in your bitmap icon shall be treated as the transparency colour”.)

Why does Joel love parsing all of a sudden?

All of this emphasis on parsing will come as a great surprise to anyone who knows me or who has followed my work. I’ve previously harped on about “Explicit Structure“, i.e. what you get in a structured editor with no need for parsing and serialising. I’ve always regarded “escape” characters and SQL injection as unholy abominations that have no place in any sanely-designed information substrate. I’m supposed to hate all this stuff! So why would I be embracing it in this domain? Why not go for a “structured editing” of vector graphics notations?

This is a good question. We are, after all, still transforming static artefacts to static artefacts. We take a diagram on digital paper, and compile it to code on digital paper. There aren’t any moving parts or interactions; it’s not live. Even the “drag circles” self-raising diagram counts for this: the diagram had to be *parsed *before the *static *JavaScript code could be executed to set up *invisible *event handlers in the background. It’s interactive, but hardly live or in the spirit of my usual interests.

The word “notation” might have connotations of being a static thing, as if any notation could work just as well printed on paper. But this myth is laid to rest by the name of this very workshop, PAINT—Programming Abstractions and Interactive Notations, Tools, and Environments—it mentions interactive notations, and I think this is exactly the right definition. I think of notations as interactive interfaces (or if we want to be pedantic about it, possibly-interactive interfaces). Any combination of shapes and text that does *anything *in response to your input is a notation. So, a GUI is a notation. If what it happens to do is “nothing”, then it’s a *static *notation that will be faithfully represented if you print it out on paper.

So, to answer why I’m so enamoured with parsing all of a sudden: the problem with interactive notations, or making editors for them, is that the cost of production is too damn high!

To demonstrate this, I would like to return to another of my favourite papers from the STEPS project that has motivated a lot of my research. “Open, Reusable Object Models” describes an object model called Id, which has all these nice properties of *minimality *and *flexibility *and maximal late-binding that was just *catnip *to me as an undergaduate. It’s had me mesmerised for 7 years now. There is a sample implementation in C at the back, but in order to understand this system I had to look at the diagrams, and I even drew diagrams of my own. For me, the diagrams were a better tool for the job of understanding this system than the code. And in order to have a working prototype of the running system to play with, it felt like a waste to manually compile these diagrams back into code and to work in a command-line REPL. I really wanted the running system to be *made *out of these diagrams.

This was clearly an example of a domain-specific interface, or interactive notation, and it seemed like a worthy project for the first year of my PhD. So that’s exactly what I worked on, and I gave a talk about it entitled “What does it take to create with domain-appropriate tools?”. This question of “what does it take” was referring to the amount of work involved in implementing such a notation. In other words, it was about the cost of production of this notation.

(Well, it was supposed to be titled as a question, but I found out at the last minute that ACM publishing doesn’t allow question marks in titles, so I had to rephrase it. I’ve never forgiven them for this—many papers in many fields are titled with questions!)

What I ended up with was this rather … crude … notation of nested boxes, where each box has a name and an optional text box for JavaScript code, and you can draw arrows between the boxes. Now this basic implementation of the Id object model *does *work: I can send a message and play with the system to an extent. But the amount of labour that it took to get here in the technologies that I chose—namely SVG and JavaScript—was about five months of work! Even though the notation works for the thing I wanted, it’s very brittle and lacks the sort of conveniences that you really want in an interface for drawing and manipulating shapes; especially one that you spent five months on.

What sorts of conveniences are these? Well, let’s take a look at a vector graphics web app. In my case, Mathcha:

We have toolbars where we can adjust various parameters; colour pickers; we can select things and get point handles (I only implemented that for boxes, not for arrows or text labels!); we can drag things around; we have the all-important UNDO—so that when you inevitably mess up, you don’t have to kick yourself and refresh the page and start again from scratch.

If getting the bare essentials took me five months, how much longer would these features take? And do we *really *have to duplicate all this functionality every time we want some sort of interactive notation…?

We said previously how a diagram has a graphical syntax: some arrangements of shapes represent a valid abstract structure, and others don’t. A vector graphics editor is like a text editor: it doesn’t enforce these syntax rules, it just lets you draw whatever you want. And unlike a text editor, it can’t even give you helpful feedback by highlighting incorrect visual syntax!

Moreover, as soon as we want an interactive notation, we have additional constraints on what sorts of *edits *to the shapes are allowed. Some things can be moved and resized, but others can’t. Some things stick together when they’re moved. Some things automatically resize when you add items. Some shapes will appear and disappear in response to different inputs. This could be thought of as an “interactional syntax”. If you’re using external vector graphics software, most of the edits it’ll let you make will *break *these rules. So in order to create and edit shapes in a manner consistent with this interactional syntax, you have to invest in your own BESPOKE editor for the notation. There’s no way to get some off-the-shelf editor to respect the rules of your interactional syntax.

There is actually another reason you’re forced to go bespoke, even in the case of a static (non-interactive) notation, if you want any alternative to diagram parsing. The notation corresponds to an Abstract Data Graph in memory, and you want to do things with this data—like compile or interpret it. All that an off-the-shelf editor can do is save your diagram in some file format. It won’t let you access the nice parsed structures in its memory, because its memory is a heap of opaque binary blocks accessible only through debug APIs. (And even if you figure it out, the EULA probably makes it illegal to use them. Useless!)

So these were the forces that caused me to make my own bespoke editor in the first year of my PhD. As far as I can tell, this is the only way that you can get an equivalent of “structured editing” for custom notations, creating the structures interactively in a manner that obeys your particular interactional syntax instead of by parsing diagrams that you drew in a standard editor. And the benefit just wasn’t worth the cost for me to continue on that project. For five months of work I got this underwhelming, ugly interface that technically did the job while being very unforgiving to use or extend.

In contras—when I started my three weeks of work on self-raising diagrams in August, around the second or third experiment I did was to re-create these diagrams for the Id object system and see if I could parse them. This was basically the BoxGraph format with a few extensions. And would you look at this:

I’d say that drawing this in Mathcha took me … 10 minutes. And writing the parser took me 3 fun afternoons. And yes, Mathcha will allow me to create invalid diagrams, or even diagrams with subtle bugs in them (e.g. from positioning), and then when I try to parse them, it’ll blow up because there’s no proper error handling. My first-year editor, to its credit, will not let me create certain invalid structures, but that’s not enough to offset the fact that I didn’t have the ability to duplicate Mathcha’s editing conveniences in my postgrad-tier JavaScript. So it’s really no contest from the editing or creation perspective.

(Attentive readers will note that there’s some sleight-of-hand here! I am comparing an interactive notation to a static notation, and it could well be just as difficult to make the diagrams dynamic; I admit this at the end. What I should say is something like this: I’m giving up on laborious *interactive *notations for the time being, and in exchange for this concession, I get truly rapid iteration on *static *notations via drawing and diagram parsing, like I’ve never experienced before. That’s the real reason I’m enamoured with it at the moment. A further subterfuge is that I’m comparing a complete set of Id objects and methods with a sketch of a JS-biased strain of Id without all of the methods contained in the diagram. However, I think the point still stands; it wouldn’t take much longer to draw these extra parts, nor to parse them.)

The Division-of-Labour Argument

This point about duplication of existing editor functionality is at the core of why I’m so won over by parsing diagrams. I’ve been greatly influenced over the years by Stephen Kell‘s warnings about silo-ization, in which different programming language ecosystems all need their own duplications of basic infrastructure like libraries and package managers and tooling, all implemented in The Language. People in one silo aren’t able to make use of all the work that people did in another silo.

I also think of a certain talk I attended at a previous SPLASH, “Relentless Repairability or Reckless Reuse“, on the same tension between having control and understanding of your tools, versus leveraging work that’s already been done by the outside world.

The talk features these two quotes that I felt were relevant here:

Google takes care of it. A lot of people are working on that.

In Chrome, graphics are fast and I can still inspect all the nodes of the document object model.

This dilemma seems to be the curse of the notational engineer, at least when we need to build an editor that respects our interactional syntax. As I discovered when I tried it, a bespoke editor must duplicate the work of supporting basic drawing operations and shapes found in a wide variety of existing tools—all just to let you add a few novel behaviours to the shapes at the end.

This seems wasteful and hubristic to me. I expect that the existing multitude of vector drawing workflows, and especially industrial tools like Adobe Illustrator, will have figured out the optimal GUIs for basic and advanced vector drawing operations. At the very least, I expect them to have been funded, staffed, and incentivised much better than myself or a grad student to achieve such a result; consider all the money poured into Illustrator and the feedback they receive from its users. I would much prefer to accept that they have done that job well, and take advantage of it.

Instead of spending ages creating a crude GUI for drawing shapes, which can make them dynamic, and then forcing others or myself to tolerate it in order to make dynamic graphics (knowing that it will lack basic amenities like undo, which I didn’t have time to implement etc)—why not allow the user to draw using whatever existing tool they find most convenient? After all, we enjoy this pluralism in the world of plain text. You can use any text editor you like to work with your source code, and you don’t have to roll your own text editor whenever you want to experiment with a new syntax. So let them use the right drawing tool for that job and then figure out a way to leverage their skills with that tool, in order to make the drawing dynamic.

In other words, embed the dynamic instructions or dynamic behaviours *somehow *using a special visual syntax. It’s much easier for the user to obey those rules than to use a poorly-working prototype.

If you are a notational engineer, you’re interested in the essential complexity of your notation. You want rapid iteration on different designs and features. Implementing line rubberbanding is not a good use of your time. Implementing selection rectangles, or shape libraries, or even UNDO is not a good use of your time.

[Adobe] takes care of it. A lot of people are working on that.

The Argument from Quality

There is also a strong argument from quality: say you want to make a dynamic notation or GUI which is truly *beautiful *or fancy; like the sort of thing you might see in a game. In Mathcha, I was able to draw this fancy titlebar for a window:

…but imagine what a skilled digital artist could do in a professional tool. If I want those beautiful graphics available *programmatically *in my go-to programming system (e.g. JavaScript, or even Squeak) it is absurd to think that I or a team of developers could *ever *replicate the professional editing features necessary to draw such a thing in-system. So I’m prepared to claim that for any sufficiently advanced notational project, I think it’s inevitable that the basic parts *have *to be flattened into a static diagram which is drawn externally, and then imported into the programming system, and then raised into a dynamic notation.

Confused…

Then again … I honestly can’t claim to be fully comfortable with my own argument here. I do believe in the power of diagram parsing as a way to experiment with *static *notations, because I have experienced precisely that. But for *dynamic *notations, the external editor can only hope to create the initial state of the notation. If the intended interaction patterns for the notation resemble editing operations—say, moving or resizing—then there’s nowhere else for these behaviours to live, other than inside the programming system that’s responsible for raising the diagram! Won’t I have to write the same code either way?

…Perhaps we can safely assume that the complexity of drawing the *initial *state will be far greater than the complexity of *interacting *with it? (e.g. I don’t need to allow the user to adjust minute shape control points, or to select colour gradients.)

I’ve lamented the shortcomings of my first-year editor for the Id object model, but at least it still functions as a live executable environment of the system modelled by the diagrams, which is what I wanted. Meanwhile, the diagrams are currently just parsed into JavaScript objects that can only simulate the system in JavaScript. In order to make the running diagrams resemble my old editor, will I need to implement line rubberbanding or arrow re-drawing after all…?

Conclusion

I suppose this is the most important question I want to see answered: do self-raising diagrams offer a nontrivial improvement for implementing *interactive *notations? And the other unanswered questions are:

What’s going on with the typed functional programming in the pass graphs? Can it be made explicit?
Can we take it meta, and define *passes *via diagram notations, compiling them to JS or interpreting them directly?
What’s the analogue of OMeta in the domain of graphical syntax? A graphical syntax for defining graphical syntaxes?

To conclude: I’ve greatly admired the STEPS work for its advances towards syntactic freedom, but I’ve always wondered what the generalisation to Notational Freedom would look like. Only then can we “Use The Right Tool For The Job” in a meaningful sense. Diagram Parsing, and potentially Self-Raising Diagrams, represent a promising step in this direction, with many interesting problems to solve. There are interesting parallels of standard programming concepts (parsing, compilation, interpretation, macro expansion) if we take diagrams to generalise source code. By leveraging *existing *tool infrastructure, common time-sinks are avoided in favour of rapid experimentation; however, this might only have value for *static *notations, and it remains to be seen if this can be made to work for substantially interactive notations. At only three weeks in … more research is definitely needed!

(Follow-up ideas and FAQs coming in a separate post.)