Scour Bot
Scour is a personalized content feed service. It is effectively an RSS reader where users specify topics they are interested in and feeds to follow. Scour uses semantic search to rank items based on user interests.
For website owners, Scour can help direct traffic to your site by surfacing your content to users who are interested in related topics.
Scour does not use website content to train AI models.
Opt-Out
To opt out of having your content featured on Scour, email opt-out@scour.ing or block the bot's IP Address or User Agent.
Identifying the Scour Bot
User Agent
The bot sends the User-Agent header: ScourRSSBot/1.0 (+https://scour.ing/bot).
IP Addresses
The bot's outgoing IP addresses can be found in the IP list.
Feed Polling Behavior
Users Submit Feed URLs to Poll
Scour does not crawl or scrape every URL on a website.
Users submit feed URLs to check for updates. On submission, Scour checks whether the URL is a supported feed type (see below). If the URL is not a supported feed type, Scour will automatically check common feed paths (for example, /rss.xml, /atom.xml, /feed.json, /feed, etc.).
Supported Feed Types
Scour supports the following feed types:
- RSS
- Atom
- JSON Feed
Scour additionally supports some blogs that do not have RSS feeds but have a page that lists posts. For example, Mixedbread's blog does not have an RSS feed but the blog page lists posts by title and date. Scour checks pages such as these for new posts as if they were RSS feeds.
Checking for Feed Updates
- Scour checks feeds for updates every 900 seconds.
- The bot sends 1 request per feed, independent of how many users are subscribed to the feed through Scour.
- The bot sends the
If-Modified-Sinceheader with the exact contents of theLast-Modifiedheader of the feed or, if that header was not present, the date that the most recent feed item was published. If the feed returns a304 Not Modifiedresponse, Scour assumes the feed has not been updated and skips the rest of the process. - If the feed contains new content, Scour will parse the feed content. It will also make a
GETrequest to the post URL to get the full content of the post.
Robots.txt
Scour does not follow the robots.txt exclusion list.
My understanding of the Robot Exclusion Protocol is that it is not intended to apply to software that acts as an agent on behalf of a human user (like a web browser). All feeds that Scour polls were manually subscribed to by a human user and Scour merely checks them for updates.
Sadly, I also found that many websites with RSS feeds have their robots.txt file unintentionally or intentionally set to block access to the feed URLs. If RSS readers like Scour would honor these exclusions, it would defeat the purpose of having an RSS feed.
How Scour Uses Website Content
Scour does not use website content to train AI models.
Scour generates an embedding for each post and user interest. It uses these embeddings to determine which posts match a user's interests.
Scour primarily surfaces links for users to click on, directing traffic to the original source.
Complaints, Questions, and Feedback
Scour is run by a single developer, Evan Schwartz.
If you have any complaints, questions, or feedback, please reach out to me at bot@scour.ing.