Tag Systems · Scour

I’ve tried to write a blog post on tag systems for years now. Literally years, I think I first started drafting it out in 2018 or so? The problem is that there’s just so much to them, so many different approaches and models and concerns that trying to be comprehensive and rigorous is an exercise in madness.

So screw it. These are my noncomprehensive, poorly-researched thoughts on tag systems, thrown on the newsletter. This is not about implementation of tag systems, just their design.

What is a Tag

A tag is a metadata label associated with content. The tag name is also the id: two tags with the same name are the same tag. Tags appear in almost all large systems:

Social media #hashtags.
Wikipedia categories.
Blog post/CMS labels.
AWS infrastructure tags.
ACM digital library index terms.

Tags are primarily used for both querying, or discovering information based on tags, and grouping, or organizing content for processing. They are sometimes also used for business logic, like "all articles tagged covid are free even without a subscription" or "devs can upload to any s3 bucket with the qa tag".

Tags are primarily client-driven. You do not need to make changes to the codebase to add or a remove a tag. This is in contrast with the content structure, which requires dev intervention to change.

Relationships between Tags

In the simplest system, all tags are uniquely identified by their name. horse and horses are separate tags. This is easy to implement and reason about, but it’s also really annoying for taggers. Why should I have to tag everything with both horse and horses if they clearly mean the same thing?

Tag Aliases

The simplest relationship we could add are "tag aliases": if A is aliased to B, then querying A is identical to querying B. While things could be tagged horses, the internal system only "knows" horse, and automatically converts searches for horses into searches for horse.

The best example of tag aliases is the fanfiction repository Archive of Our Own (AO3). Fanfics use a lot of jargon to refer to various character pairings, which makes querying difficult. Teams of volunteers comb through stories and manually add aliases to tags, so that stories tagged Snape/Harry and snarry show up under the root tag Harry/Snape.

The additional tag structure adds expressiveness. But it also raises use-case questions, like "should users be able to query a specific tag alias?" ie, can I search for stories tagged with the alias snarry? There’s no correct choice here. AO3 went with "no".

Subtags

Loading more...