Is OOXML Artifically Complex?

Sep. 5, 2025

OOXML as a Sloppy Standard
Why Microsoft’s Motive Wasn’t Deliberate Sabotage
Conclusion
引言
为什么说 OOXML 是一个潦草的标准
为什么说微软的动机并非蓄意破坏竞争
结语

A while ago, the official blog of LibreOffice published a provocative article: “An artificially complex XML schema as a lock-in tool.” Its target is Microsoft’s XML-based file formats — the Office Open XML (OOXML).

The article alleges, that although Microsoft put its Office formats through standardization, the spec is engineered to be so complex that it obstructs …

Sep. 5, 2025

OOXML as a Sloppy Standard
Why Microsoft’s Motive Wasn’t Deliberate Sabotage
Conclusion
引言
为什么说 OOXML 是一个潦草的标准
为什么说微软的动机并非蓄意破坏竞争
结语

The article alleges, that although Microsoft put its Office formats through standardization, the spec is engineered to be so complex that it obstructs interoperability with third-party software. Moreover, the complexity is allegedly gratuitous and disconnected from real-world needs; it’s like advertising an “open” railway system while designing the signaling so only one manufacturer can run trains. Users, the argument continues, often accept proprietary technology uncritically, which makes it easy for Microsoft to lock people into its ecosystem.

A quick refresher: Historically, Office used binary formats (.doc, .xls, and .ppt) whose contents weren’t human-readable. Starting with Office 2007, Microsoft switched the defaults to .docx, .xlsx, and .pptx, where the “x” stands for XML. These files are ZIP containers holding a set of XML parts and resources such as images. Both the XML structure and the packaging follow a published spec—OOXML.

With Microsoft’s backing, OOXML was adopted by international standards bodies, first as ECMA-376 and later as ISO/IEC 29500. Microsoft also put it under the Open Specification Promise (OSP), committing not to assert certain patent claims against compliant implementations.

On paper, then, anyone can parse, create, and edit OOXML to be compatible with Microsoft Office, which sounds great. But the LibreOffice article calls this premise into question, arguing that OOXML’s deliberate complexity turns this supposed openness into a one-way trap, a tool for maintaining a monopoly.

Let’s be honest: few people would describe their experience with Microsoft Office as satisfying, which is part of why this article resonated widely. In my past life doing legal grunt work, battling convoluted Word documents was a daily ritual. I also authored the Word section of an Office tutorial series, where my main approach was to explain Word’s quirks by digging into the underlying OOXML format. Thus, I’m intimately familiar with what makes Office and OOXML painful.

Despite this, I disagree with the LibreOffice’s framing and conclusion. Aiming for mass appeal, the post is heavy on emotion and accusation but light on factual analysis, missing a solid educational opportunity. (LibreOffice later published a more technical comparison, but it still jumped straight from code snippets to conclusions.)

In my view, OOXML is indeed complex, convoluted, and obscure. But that’s likely less about a plot to block third-party compatibility and more about a self-interested negligence: Microsoft prioritized the convenience of its own implementation and neglected the qualities of clarity, simplicity, and universality that a general-purpose standard should have. Yes, that neglect has anticompetitive effects in practice, but the motive is different from deliberate sabotage and thus warrants a different judgment. (A detailed legal analysis is beyond the scope of this article.)

In other words, LibreOffice identified the right problem, it may have reached the wrong conclusion. Here’s why.

OOXML as a Sloppy Standard

The LibreOffice article criticizes how “a simple sentence such as ‘To be, or not to be, that is the question’ becomes an inextricable sequence of tags that users cannot access.” Let’s use this very example to see what happens.

Create a Word document containing

To be , or not to be, that is the question.

(with “To be” in bold), save, and peek at word/document.xml inside the resulting .docx:

<w:p w14:paraId="6F3ED131" w14:textId="46C90999" w:rsidR="00BF5D1D"
w:rsidRDefault="004249FF">
<w:r w:rsidRPr="00D41C8D">
<w:rPr>
<w:b />
<w:bCs />
</w:rPr>
<w:t>To be</w:t>
</w:r>
<w:r w:rsidRPr="004249FF">
<w:t>, or not to be, that is the question</w:t>
</w:r>
<w:r>
<w:t>.</w:t>
</w:r>
</w:p>

(line breaks added for readability.)

Take a breath — The core structure is a paragraph (<w:p>) containing three runs (<w:r>). A run is a contiguous span of text sharing the same formatting. In OOXML, every paragraph comprises one or more runs.

Breaking it further down:

The outer <w:p> element represents the paragraph. The attributes like w14:paraId and w:rsidR are internal identifiers Word uses for features like collaborative editing and tracking revisions.
The first <w:r> represents the bolded text To be. It contains a <w:rPr> (Run Properties) element to define its formatting. Inside, <w:b/> and <w:bCs/> set the font to bold for Western and complex scripts, respectively (even though there are no complex scripts here). Only after all that does the <w:t> element hold the actual text.
The second <w:r> contains the rest of the text up to the period. Since it uses the default formatting, it lacks a <w:rPr> element.
The third <w:r> contains only the final period. There’s no formatting difference, and it’s split off the prior run simply because I pasted the sentence but typed the period, exactly the kind of “surprise” OOXML happily encodes. Contrast that with the same content saved as ODF (content.xml):

<text:p text:style-name="Standard">
<text:span text:style-name="T1">To be</text:span>
, or not to be, that is the question.
</text:p>

Even at a glance it’s more intelligible. Strip the text: namespaces and it’s nearly valid HTML.

The only thing that needs explaining is that ODF doesn’t wrap To be with a dedicated “bold” tag. Instead, it applies an auto-style named T1 to a <text:span>, an act of separating content and presentation that mirrors established web practices.

In short, if you have a basic understanding of the web stack, you can largely make sense of ODF’s XML. On the other hand, OOXML, with its abstruse tag names, feels like it requires a PhD to decipher.

And this is just for simple text formatting. When you get into complex elements like tables and lists — a shared nightmare for every heavy Word user — OOXML’s complexity only skyrockets. That thousand-page specification isn’t just for show.

Beyond its formal complexity, the quality of OOXML as a standard is also questionable. Contemporary critiques of the submission catalogued technical defects, for example:

Canonizing known bugs and compromises from Office (e.g., maintaining two separate date systems starting in 1900 or 1904, and incorrectly “treating 1900 as a leap year”);
Conflicting with established standards for language codes (ISO 639), vector graphics (W3C SVG), and mathematical notation (W3C MathML);
Using vaguely defined and inconsistent units of measurement; and
Lacking clear and consistent naming conventions for elements and attributes (e.g., inconsistent ccase rules). The process of OOXML becoming an ISO standard was itself highly dramatic. First, Microsoft chose to submit it via the “fast track,” a path intended for mature, widely implemented, and stable specifications. OOXML in 2006 met none of these criteria: it was new; its only complete implementation was the not-yet-released Office 2007; and nobody could plausibly review thousands of pages on that timetable. Organizations like Google and the Free Software Foundation Europe (FSFE), along with many technical experts, raised objections.

The voting that followed was among ISO’s most contentious: several national bodies abruptly swelled with new members, many Microsoft partners, who then voted in favor. Sweden’s initial approval was voided after incentives linked to support came to light.

In the end, OOXML squeaked through after two rounds, but Brazil, India, South Africa, Venezuela, and others filed formal appeals alleging procedural defects. Although these appeals failed to overturn the result, they underscored the divisive and chaotic nature of the standardization.

Why Microsoft’s Motive Wasn’t Deliberate Sabotage

So far, the evidence seems to support LibreOffice’s claim: OOXML is a sloppy standard, both technically and procedurally. But facts don’t directly prove intent. If we dig into the context of OOXML’s creation, it can be argued that harming competitors was not Microsoft’s primary aim.

First, OOXML was, in material part, a defensive posture under intensifying antitrust and “open standards” pressure. Microsoft announced OOXML in late 2005 while appealing an adverse European Commission judgment centered on interoperability disclosures. Thus, it was only a matter of time before Office file compatibility came under the regulatory microscope. (The Commission indeed opened a probe in 2008.)

Meanwhile, the rival ODF matured and became an ISO standard in May 2006. Governments, especially in Europe, began to mandate open standards in public procurement. If Microsoft did nothing, Office risked exclusion from government deals.

Given that context, the sensible inference about Microsoft’s goal is to create a format that it controlled but also carried the “international standard” seal of approval, which would be both a shield against potential regulation and a weapon against the challenge from ODF. Thus, the primary goal for this new format wasn’t to be elegant, universal, or easy to implement; it was to placate regulators while preserving Microsoft’s technological and commercial advantages. The easiest, cheapest way to do that, of course, is to package its existing complexity as the new “standard.”

To support this, it’s worth noting a more fundamental difference between OOXML and ODF. Look again at the XML snippets, but this time, pay attention to where the actual text content appears:

<!-- OOXML -->
<w:p ...><w:r ...><w:rPr>...</w:rPr><w:t>To be</w:t></w:r><w:r ...><w:t>, or not to be...</w:t></w:r><w:r><w:t>.</w:t></w:r></w:p>

<!-- ODF -->
<text:p ...><text:span ...>To be</text:span>, or not to be...</text:p>

In ODF, the text content interleaves with XML tags, just like in HTML, while in OOXML, text is always buried inside <w:t> at the leaves, and never appears as a peer of structural elements.

That reflects two opposed uses of XML:

ODF uses XML as markup. Text is first-class; tags annotate spans with structure and styling. This matches XML’s original design goal for information presentation.
OOXML uses XML as a serialization format. In other words, OOXML isn’t so much describing the document content as it is describing the abstract data structures that the Office application “sees.” Our ex

OOXML as a Sloppy Standard

Why Microsoft’s Motive Wasn’t Deliberate Sabotage

Similar Posts