Check 404 URLs in Database (opens in new tab)

perishablepress.com·11w·Open original (opens in new tab)

There are many ways to do this. I wanted the quickest and easiest. I run Yourls on several sites to create shortlinks for my books. Each instance of Yourls contains many URLs. I like to keep my books current. URLs tend to change and break over time. It is a chore to check 800+ links in each of my books, page after page. So I wanted a quick way to check for 404 and other broken links. In this post, I share the technique that works best for me; your mileage may vary.

Contents

Overview

Here is a quick overview of what we’re doing here:

  1. Export the URLs from your database. This can be easy or tricky depending on how things are set up and the app you’re using to interact with the database.
  2. Create a PHP or other script to output the exported URLs in HTML format, as a plain list of hyperlinks, all on the same page.
  3. Use a free browser extension to crawl each link and check for any errors.

That’s all there is to it. I am writing this down so I can easily reference it for future book updates. Hopefully it helps you too.

Types of errors

This technique catches most common errors, for example:

  • 5xx errors (e.g., 503 Service Unavailable, 504 Gateway Timeout)
  • 4xx errors (e.g., 404 Not Found, 403 Forbidden)
  • 3xx redirects (e.g., 301 Permanent, 302 Temporary)
  • 2xx issues (e.g., 205 Reset Content, 206 Partial Content)
  • Empty response

Note: The errors that may be reported depend on which addon or extension or script you use to analyze the data. More about this in Step 3 below.

This technique does not catch the case where a link points to a page where the content itself has changed. For example, if you have a link that originally pointed to an article about SQL, but the article content has changed and now is about squirrels or something. Maybe you can tap AI for that sort of task?

Fortunately, in my experience, the content-completely-changing-but-at-the-same-URL case is rare. As most of the time when links change it means that the content went offline or was redirected. This technique catches both of these key scenarios.

Specific example walkthrough

To provide a real-world example, I’ll walk through the steps and code for doing this with the URLs in a Yourls database. There may already be an extension/addon or plugin/script that can do this, I didn’t check tbh. Without further ado..

Step 1: Export the data

Loading more...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help