Using Woodpecker CI to publish site updates

I recently deployed Woodpecker CI.

My original post talks about using CI jobs to periodically rebuild container images, but that’s not the only thing that I’ve been using Woodpecker for.

Almost all of my sites are built using static site generators (SSG): mostly Nikola with some sites using Hugo (Note: I also want to get around to playing about with Stefano Marinelli’s BSSG).

Deploying a static site isn’t a particularly novel use of CI, but it’s still something that I wanted to get up and running.

This post describes using Woodpecker CI to…

I recently deployed Woodpecker CI.

My original post talks about using CI jobs to periodically rebuild container images, but that’s not the only thing that I’ve been using Woodpecker for.

Almost all of my sites are built using static site generators (SSG): mostly Nikola with some sites using Hugo (Note: I also want to get around to playing about with Stefano Marinelli’s BSSG).

Deploying a static site isn’t a particularly novel use of CI, but it’s still something that I wanted to get up and running.

This post describes using Woodpecker CI to build and deploy static sites before flushing the BunnyCDN cache.

Why?

Deploying and managing via git is known as Gitops and brings a number of benefits (including auditability).

In my professional life, I’ve used GitOps a lot.

Most of the benefits, though, aren’t all that relevant to a blog that’s only ever updated by a single person.

The primary benefit, for me, lies in reducing the opportunity for a class of mistake that I didn’t think I’d make... until I very nearly did.

Ooops

I’ve talked, in various places, about my psuedo-blog.

It’s a private space where I blog for an audience of none (though I do sometimes promote posts up to this site). I’ve found that doing so can help to work through thoughts, decisions and (sometimes) feelings.

It’s not entirely unfiltered1, but writing in that private space helps with things that I either can’t or don’t want to discuss publicly.

Late last year, I had quite a significant rant into it - letting off steam whilst working through something that had really pissed me off.

A day or two later, I drafted a post on a different topic for www.bentasker.co.uk and started to manually publish:

Opened the Obsidian window
Ctrl-A, Ctrl-C
Switched to my terminal window, SSH’d into my host
Truncated the post file (which had the previous draft in)
Opened the post file, Ctrl-v
Saved and exited
Triggered nikola build

As the build output flew by, there were a surprising number of warnings about Nikola not being able to resolve some magic links:

Screenshot of Nikola’s output. WARNING: Nikola: Cannot resolve path request for slug

You’ve probably guessed what had happened: I’d clicked on the wrong Obsidian window and had pasted in the content of a post that was never meant for public consumption.

I realised quickly enough to interrupt the process before it had updated my RSS feed (thus preventing my POSSE scripts from advertising my mistake across the internet).

Advertised or not, though, the post itself had still been published.

I had to very carefully work through Nikola’s output directory to make sure that I’d removed every trace of it (because the slug was different, it had been written out to a different URL than the post that I meant to publish, so simply pasting the correct content in wasn’t enough).

All that prevented this from being a very public fuck-up was luck. If I hadn’t noticed the warnings, I’d only have realised when Telegram buzzed my phone.

Moving to gitops makes this class of mistake less likely: there’s no longer any copy & pasting of posts to screw up, I simply commit and push2.

Building A Hugo site

One of the sites that I’ve hooked up to CI is built using Hugo.

The workflow (saved in .woodpecker/publish.yaml) is configured to run on any push to the main branch:

when:
- event: push
branch: main

It’s not just Hugo that CI needs to invoke, I also have a BASH script to scale images down to multiple sizes so that they can be injected into a srcset attribute:

<img src="{{ .url }}" alt="{{ .alt | safeHTML }}" srcset="{{ .srcset }}" sizes="{{ .size }}" />

The script itself is pretty simple:

#!/bin/bash
set -e
widths="600 480 250"
mkdir -p static/images/scaled
for img in static/images/*.jpg
do
fname=`basename "$img"`
for width in $widths
do
new_fname=`echo "$fname" | sed "s/\./.$width./"`
convert "$img" \
-sampling-factor 4:2:0 \
-resize ${width}x \
-quality 85% \
-interlace JPEG \
-colorspace RGB \
"static/images/scaled/$new_fname"
done
done

In woodpecker, I have a pipeline step to install the necessary dependencies and then invoke my script:

steps:
- name: scale images
image: cgr.dev/chainguard/wolfi-base
commands:
- apk -Uuv add imagemagick bash
- tools/img_scale.sh

Next, is a step to build the site by invoking Hugo:

- name: build site
image: hugomods/hugo:exts-0.134.2
commands:
- hugo

The pipeline then moves onto publishing and flushing

Building a Nikola Site

My Nikola sites follow a similar pattern.

Unlike the Hugo sites, the custom script that they invoke doesn’t scale images and instead works around Obsidian not adding a leading slash to absolute paths:

#!/bin/sh
echo "Fixing image embeds"
# The embeds we care about will look something like
#    ![alttext](images/foobar)
# We just need to inject a /
grep -Rl '(images/' posts/ | while read -r post
do
sed -i 's~(images/~(/images/~g' "$post"
done

Because it’s quite a simple task, the Nikola container image already has the necessary utilities, so doesn’t require a separate pipeline step:

when:
- event: push
branch: main
steps:
- name: build
image: dragas/nikola:alpine
commands:
- ./scripts/fix_image_embeds.sh
- nikola build

Publishing and Flushing CDN Caches

Once a site has been built, it needs to be pushed to the hosting server.

My first implementation of the publishing workflow was for my psuedo-blog, which doesn’t sit behind a CDN and just needs to rsync the generated files to the relevant host.

In Woodpecker I created a secret called ssh_key and pasted a (freshly generated) private key into it.

The pipeline step pulls a base image, installs the necessary packages and then adds the SSH key before attempting to rsync the output directory up to the hosting server:

- name: upload
image: cgr.dev/chainguard/wolfi-base
environment:
REMOTE: www@internal-host:/mnt/nginx/psuedo_blog
SOURCE: output
SSH_KEY:
from_secret: ssh_key
commands:
- apk -Uuv add rsync openssh
- mkdir -p /root/.ssh
- echo "$${SSH_KEY}" > "/root/.ssh/id_rsa"
- chmod 0600 /root/.ssh/id_rsa
- rsync -racv --delete -e 'ssh -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -o StrictHostKeyChecking=no' $${SOURCE} $${REMOTE}

I added the associated public key to the hosting server and things just worked.

However, most of my sites sit behind BunnyCDN.

Ages ago, I wrote a python script to purge a URL from Bunny’s cache, so I decided to turn it into a Woodpecker CI Plugin allowing me to add a simple pipeline step for cache purges.

The plugin just needs to be passed an API key and a list of the URLs to flush3:

- name: Flush CDN Cache
image: codeberg.org/bentasker/woodpecker-ci-bunnycdn-cache-flush
settings:
BUNNY_API_KEY:
from_secret: bunny_api_key
FLUSH_URLS: "https://www.bentasker.co.uk/posts/* https://www.bentasker.co.uk/"

Having created this plugin, I decided to expand it so that it could also be used to rsync published files up3:

- name: Publish
image: codeberg.org/bentasker/woodpecker-ci-bunnycdn-cache-flush
settings:
LOAD_KEY: "y"
SSH_KEY:
from_secret: ssh_key
DO_RSYNC: "y"
RSYNC_REMOTE: sites@ext-host:/var/sites/bt
RSYNC_PORT: 1322
RSYNC_SOURCE: public
DEBUG: "n"
DO_FLUSH: "n"

- name: Flush CDN Cache
image: codeberg.org/bentasker/woodpecker-ci-bunnycdn-cache-flush
settings:
BUNNY_API_KEY:
from_secret: bunny_api_key
FLUSH_URLS: "https://www.bentasker.co.uk/posts/* https://www.bentasker.co.uk/"

You can find the plugin source on Codeberg.

Putting It All Together

For a Nikola site, my YAML looks something like this

when:
- event: push
branch: main

steps:
- name: build
image: dragas/nikola:alpine
commands:
- ./scripts/fix_image_embeds.sh
- nikola build

- name: Publish
image: codeberg.org/bentasker/woodpecker-ci-bunnycdn-cache-flush
settings:
LOAD_KEY: "y"
SSH_KEY:
from_secret: ssh_key
DO_RSYNC: "y"
RSYNC_REMOTE: sites@ext-host:/var/sites/bt
RSYNC_PORT: 1322
RSYNC_SOURCE: public
DEBUG: "n"
DO_FLUSH: "n"

- name: Flush CDN Cache
image: codeberg.org/bentasker/woodpecker-ci-bunnycdn-cache-flush
settings:
BUNNY_API_KEY:
from_secret: bunny_api_key
FLUSH_URLS: "https://www.bentasker.co.uk/posts/* https://www.bentasker.co.uk/"
- name: Flush CDN Cache
image: codeberg.org/bentasker/woodpecker-ci-bunnycdn-cache-flush
settings:
BUNNY_API_KEY:
from_secret: bunny_api_key
FLUSH_URLS: "https://www.bentasker.co.uk/posts/* https://www.bentasker.co.uk/"

Conclusion

I haven’t moved everything over yet, but most of my sites can now be updated with no more effort than a git push.

As well as being more convenient for me, the new flows also significantly reduce the likelihood of publishing content onto the wrong site.

It also means that I have a meaningful audit history of changes to my site - if I do ever significantly screw up, putting things right should just be a git revert away.

Even though I consider it a private space, I still end up self-censoring out of concern it might one day leak/be compromised ↩ 1.

I could, of course, still write them in the wrong repo to begin with, but that’s less likely ↩ 1.

The idea being that it’ll eventually support a mode where it figures out which URLs to flush for itself ↩↩