Setting up Anti Hot Linking Protections for ChatGPT

I’m not going to start this post with the words “I don’t like LLMs” because, at this point, it’d seem a little redundant.

What I really don’t like, though, is paying to support the business model of companies like OpenAI.

It’s bad enough that their business is built off the back of broad scale plagiarism, but, on top of that their activities continue to cost website operators money and resources.

Although it’s understandable that it might come to mind, I’m not referring to the repeated crawling of their scrapers (along with the activity of all the other crawlers trying to get in on this latest gold rush).

ChatGPT’s web search mode is able to search the web and then summarise results (not unlike the [psuedo-browsin…

I’m not going to start this post with the words “I don’t like LLMs” because, at this point, it’d seem a little redundant.

What I really don’t like, though, is paying to support the business model of companies like OpenAI.

It’s bad enough that their business is built off the back of broad scale plagiarism, but, on top of that their activities continue to cost website operators money and resources.

ChatGPT’s web search mode is able to search the web and then summarise results (not unlike the psuedo-browsing experience that they now promise with Atlas). When the LLM’s response includes images, they are hotlinked directly from the original website.

Despite their ridiculous valuation, OpenAI have apparently been too cheap to build an image cache into their product (or perhaps, are trying to sidestep copyright concerns).

This means that, every time ChatGPT includes one of my images in its answer, I pay for the bandwidth necessary to serve it to ChatGPT’s customer (who will very likely never visit my site or read an accurate representation of what I’ve written).

Whether or not we’re talking about a small or a large amount of money, this is a 500 billion dollar AI company freeloading on other people’s bills (it is somewhat fitting that a company which acts as a leech on the arse of creativity would also be a bandwidth leech).

I’ve been feeling somewhat grumpy this weekend anyway, so I decided to mitigate this by catching the requests and redirecting to a smaller (and therefore cheaper to serve) image.

Identifying Requests

chatgpt.com sets a Referrer-Policy of strict-origin-when-cross-origin:

Screenshot of response headers in developer tools, showing the value

This tells the user’s browser that, for cross-origin requests (i.e. those that go to another domain), it should include a referer header specifying scheme and domain only.

So, when ChatGPT.com embeds one of my images the request headers look like this:

Screenshot of the request headers

The existence of that Referer header means that it’s easily possible to identify requests which originated from chatgpt.com.

The Block Page

Twenty odd years ago, it was quite common for anti-hotlinking protections to serve up a different image to the one that the user’s browser had requested1.

Although outright dropping the request is possible, doing so can lead to support overhead: well-intentioned people will helpfully tell you that your images aren’t working in ChatGPT.

To avoid this, I wanted it to be quite clear that the request was blocked - the easiest way to achieve this was to serve an image which indicated the objection.

Bing’s AI really didn’t want to generate the image that I wanted - it seems that Microsoft have configured the filters to try and avoid showing Trump in any kind of satirical or mocking context2, even if what’s being asked for is a depiction of something that exists in real life:

Bing says: I can’t generate that image due to restrictions around depicting current political figures in exaggerated or satirical contexts. If you’d like, I can create a fictional balloon character with similar cartoonish flair — just let me know what kind of personality or features you’d like it to have

Still, this is not my first rodeo, so I eventually got Bing to generate the imagery[^2] that I wanted (though I did add the text by hand):

A cartoon style image of the Trump Baby balloon. Just in front of it are a set of kneeling white men in suits. The text reads Too Rich to be stealing bandwidth

The image is hosted on a dedicated subdomain, which should allow me to more easily see how often it’s used.

The Ruleset

Although ChatGPT referrals make up an insignificant proportion of my real traffic, I didn’t want to interfere with the few users who were actually visiting a page from there: the aim was to only impact hotlinking.

The ruleset therefore needed to consider what was being requested:

IF request is for an image
AND referrer contains chatgpt.com
OR referrer contains perplexity.ai
THEN redirect to blockimage

I use BunnyCDN, so the constructed ruleset looks like this:

Screenshot of the ruleset in BunnyCDN

If these rules match, the CDN serves up a temporary redirect (a HTTP 302) to send the user’s browser to the block image.

Including The App

The ruleset above only accounts for people who visit chatgpt.com in their browser.

Although there are obviously some who do that (otherwise they wouldn’t have appeared in my logs in the first place), it’s quite likely that they’re in the minority.

We also need to account for embeds within the app, which (rightfully) doesn’t set a Referer header.

We can, however, identify the app by its user-agent

ChatGPT/1.2025.287 (Android 13; FP4; build 2528715)

This is different to the user-agent that ChatGPT uses when fetching something (like a web page) to feed into the LLM for summarisation.

A second ruleset catches the app’s embeds:

Screenshot of the ruleset to block ChatGPT by user agent

Testing

My logs indicate a particular bias towards hotlinking of images included in Vauxhall repair posts (I’ve no idea why, it’s not like they’re uncommon cars).

So, I went to chatgpt.com, toggled the search lozenge and asked it to provide me with images showing how to replace the oil pressure sensor on a Corsa D.

The result was even better than I’d expected:

Screenshot of ChatGPT returning images - it seems to have only opted to serve them from my site. Both images have been replaced with the block image, which ChatGPT has cropped

I hadn’t considered that chatgpt.com would crop the image, but the effect is all the better.

If the user taps the image, ChatGPT opens a modal displaying the full image:

Screenshot of the full image in a modal

Because the CDN serves a temporary redirect (a HTTP 302), the correct images are displayed if the user actually clicks the link to visit my site (and will continue to display correctly while the images are in their cache).

I couldn’t test the mechanism with Perplexity because they actually seem to have stopped hotlinking my images. Although I’m not complaining, it’s a little odd: they still hotlink images from other sites and Perplexity is perfectly willing to regurgitate my content.

I’ve no idea whether that’s just luck or whether it might be related to my previous anti-hotlink setup for Perplexity.

Robustness

Anti-hotlinking protections haven’t been particularly robust for years.

They used to be a “good enough” measure because browsers sent a referer header by default and most users wouldn’t know how to (or wouldn’t bother) changing that.

However, that changed with the introduction of the Referrer-Policy header, which allows sites to instruct their visitor’s browsers to send a more limited referer header (or not to send one at all).

This means that chatgpt.com could trivially side-step this mechanism by updating their site to set Referrer-Policy to no-referrer.

Of course, it’d be an obvious bad faith move when they could also do what they should have done from the outset: set up a cache so that it’s them carrying the bandwidth bill3 rather than the people who’s content they’re (mis)using.

There are a variety of more robust approaches (including tokenisation), but as long as referer headers are available, it’s probably not yet worth the additional effort.

Conclusion

I appreciate that, for some, it might come across as petty to be complaining about what should quite be small costs. However, they’re still costs that I incur entirely for someone else’s benefit: if I wanted to support OpenAI, I’d be paying a monthly subscription.

Aside from this being another example of AI companies outsourcing what should be their own costs, it’s also a matter of freedom.

If, as some contend, AI companies are free to consume the entire public commons and regurgitate error-prone facsimilies of it, I am just as free to serve up whatever I see fit in response to requests for my content.

It is true that I could have served a simple “request blocked” JPG but, in a political context where Trump is issuing executive orders that will censor AI, it’s much more amusing to ensure that the product of one of his minions supporters serves something more pertinent to the situation.

They tended to be quite explicit (or worse, Goatse) ↩ 1.

Which is quite fitting, really, considering the fact that I wanted the image to show billionaire CEO’s as being Trump lackies ↩ 1.

This is far from a niche idea and is what, Google, whose activities actually bring my site traffic/benefit have done for years. ↩