I’m not going to start this post with the words “I don’t like LLMs” because, at this point, it’d seem a little redundant.
What I really don’t like, though, is paying to support the business model of companies like OpenAI.
It’s bad enough that their business is built off the back of broad scale plagiarism, but, on top of that their activities continue to cost website operators money and resources.
Although it’s understandable that it might come to mind, I’m not referring to the repeated crawling of their scrapers (along with the activity of all the other crawlers trying to get in on this latest gold rush).
ChatGPT’s web search mode is able to search the web and then summarise results (not unlike the [psuedo-browsin…
I’m not going to start this post with the words “I don’t like LLMs” because, at this point, it’d seem a little redundant.
What I really don’t like, though, is paying to support the business model of companies like OpenAI.
It’s bad enough that their business is built off the back of broad scale plagiarism, but, on top of that their activities continue to cost website operators money and resources.
Although it’s understandable that it might come to mind, I’m not referring to the repeated crawling of their scrapers (along with the activity of all the other crawlers trying to get in on this latest gold rush).
ChatGPT’s web search mode is able to search the web and then summarise results (not unlike the psuedo-browsing experience that they now promise with Atlas). When the LLM’s response includes images, they are hotlinked directly from the original website.
Despite their ridiculous valuation, OpenAI have apparently been too cheap to build an image cache into their product (or perhaps, are trying to sidestep copyright concerns).
This means that, every time ChatGPT includes one of my images in its answer, I pay for the bandwidth necessary to serve it to ChatGPT’s customer (who will very likely never visit my site or read an accurate representation of what I’ve written).
Whether or not we’re talking about a small or a large amount of money, this is a 500 billion dollar AI company freeloading on other people’s bills (it is somewhat fitting that a company which acts as a leech on the arse of creativity would also be a bandwidth leech).
I’ve been feeling somewhat grumpy this weekend anyway, so I decided to mitigate this by catching the requests and redirecting to a smaller (and therefore cheaper to serve) image.
Identifying Requests
chatgpt.com sets a Referrer-Policy of strict-origin-when-cross-origin:

This tells the user’s browser that, for cross-origin requests (i.e. those that go to another domain), it should include a referer header specifying scheme and domain only.
So, when ChatGPT.com embeds one of my images the request headers look like this:

The existence of that Referer header means that it’s easily possible to identify requests which originated from chatgpt.com.
The Block Page
Twenty odd years ago, it was quite common for anti-hotlinking protections to serve up a different image to the one that the user’s browser had requested1.
Although outright dropping the request is possible, doing so can lead to support overhead: well-intentioned people will helpfully tell you that your images aren’t working in ChatGPT.
To avoid this, I wanted it to be quite clear that the request was blocked - the easiest way to achieve this was to serve an image which indicated the objection.
Bing’s AI really didn’t want to generate the image that I wanted - it seems that Microsoft have configured the filters to try and avoid showing Trump in any kind of satirical or mocking context2, even if what’s being asked for is a depiction of something that exists in real life:

Still, this is not my first rodeo, so I eventually got Bing to generate the imagery[^2] that I wanted (though I did add the text by hand):

The image is hosted on a dedicated subdomain, which should allow me to more easily see how often it’s used.
The Ruleset
Although ChatGPT referrals make up an insignificant proportion of my real traffic, I didn’t want to interfere with the few users who were actually visiting a page from there: the aim was to only impact hotlinking.
The ruleset therefore needed to consider what was being requested:
IF request is for an image
AND referrer contains chatgpt.com
OR referrer contains perplexity.ai
THEN redirect to blockimage
I use BunnyCDN, so the constructed ruleset looks like this:

If these rules match, the CDN serves up a temporary redirect (a HTTP 302) to send the user’s browser to the block image.
Including The App
The ruleset above only accounts for people who visit chatgpt.com in their browser.
Although there are obviously some who do that (otherwise they wouldn’t have appeared in my logs in the first place), it’s quite likely that they’re in the minority.
We also need to account for embeds within the app, which (rightfully) doesn’t set a Referer header.
We can, however, identify the app by its user-agent
ChatGPT/1.2025.287 (Android 13; FP4; build 2528715)
This is different to the user-agent that ChatGPT uses when fetching something (like a web page) to feed into the LLM for summarisation.
A second ruleset catches the app’s embeds:

Testing
My logs indicate a particular bias towards hotlinking of images included in Vauxhall repair posts (I’ve no idea why, it’s not like they’re uncommon cars).
So, I went to chatgpt.com, toggled the search lozenge and asked it to provide me with images showing how to replace the oil pressure sensor on a Corsa D.
The result was even better than I’d expected:

I hadn’t considered that chatgpt.com would crop the image, but the effect is all the better.
If the user taps the image, ChatGPT opens a modal displaying the full image:

Because the CDN serves a temporary redirect (a HTTP 302), the correct images are displayed if the user actually clicks the link to visit my site (and will continue to display correctly while the images are in their cache).
I couldn’t test the mechanism with Perplexity because they actually seem to have stopped hotlinking my images. Although I’m not complaining, it’s a little odd: they still hotlink images from other sites and Perplexity is perfectly willing to regurgitate my content.
I’ve no idea whether that’s just luck or whether it might be related to my previous anti-hotlink setup for Perplexity.
Robustness
Anti-hotlinking protections haven’t been particularly robust for years.
They used to be a “good enough” measure because browsers sent a referer header by default and most users wouldn’t know how to (or wouldn’t bother) changing that.
However, that changed with the introduction of the Referrer-Policy header, which allows sites to instruct their visitor’s browsers to send a more limited referer header (or not to send one at all).
This means that chatgpt.com could trivially side-step this mechanism by updating their site to set Referrer-Policy to no-referrer.
Of course, it’d be an obvious bad faith move when they could also do what they should have done from the outset: set up a cache so that it’s them carrying the bandwidth bill3 rather than the people who’s content they’re (mis)using.
There are a variety of more robust approaches (including tokenisation), but as long as referer headers are available, it’s probably not yet worth the additional effort.
Conclusion
I appreciate that, for some, it might come across as petty to be complaining about what should quite be small costs. However, they’re still costs that I incur entirely for someone else’s benefit: if I wanted to support OpenAI, I’d be paying a monthly subscription.
Aside from this being another example of AI companies outsourcing what should be their own costs, it’s also a matter of freedom.
If, as some contend, AI companies are free to consume the entire public commons and regurgitate error-prone facsimilies of it, I am just as free to serve up whatever I see fit in response to requests for my content.
It is true that I could have served a simple “request blocked” JPG but, in a political context where Trump is issuing executive orders that will censor AI, it’s much more amusing to ensure that the product of one of his minions supporters serves something more pertinent to the situation.
They tended to be quite explicit (or worse, Goatse) ↩ 1.
Which is quite fitting, really, considering the fact that I wanted the image to show billionaire CEO’s as being Trump lackies ↩ 1.
This is far from a niche idea and is what, Google, whose activities actually bring my site traffic/benefit have done for years. ↩