Web Archive Analysis, Internet Archaeology, Crawl Data, Historical Web

Google Just Blocked 749 Million URLs for Anna’s Archive
lifehacker.com·1d
🌐WARC Forensics
Flag this post
Project researches harmful social media content and its influence on middle-aged
smidgeproject.eu·5h·
Discuss: Hacker News
📰RSS Archaeology
Flag this post
Google Plans Secret AI Military Outpost on Tiny Island Overrun By Crabs
tech.slashdot.org·22h
🏴󠁧󠁢󠁳󠁣󠁴󠁿Scottish Computing
Flag this post
World’s largest web houses 110,000 spiders thriving in total darkness
newatlas.com·1d·
Discuss: Hacker News
🧅Tor Networks
Flag this post
Our latest fraud and scams advisory
blog.google·1d
📡Feed Security
Flag this post
Inception raises $50 million to build diffusion models for code and text
techcrunch.com·1d
🧠Learned Codecs
Flag this post
How the UAE Built a $140 Billion Crypto Empire in Just Five Years
hackernoon.com·1d
🖥️Terminal Renaissance
Flag this post
How to Use GPT-5 Effectively
towardsdatascience.com·7h
🎙️Whisper
Flag this post
I Built a Task Manager for the AI Coding Era (and It's Just Markdown Files)
dev.to·7h·
Discuss: DEV
💾Persistence Strategies
Flag this post
Zero-RAG: Towards Retrieval-Augmented Generation with Zero Redundant Knowledge
arxiv.org·3d
🔍Information Retrieval
Flag this post
Expected Value Analysis in AI Product Management
towardsdatascience.com·1d
🧮Kolmogorov Bounds
Flag this post
BondBERT: What we learn when assigning sentiment in the bond market
arxiv.org·2d
📋Document Grammar
Flag this post
OpenAI RAG Starter Kit with File Search and Chat UI
github.com·1d·
Discuss: Hacker News
🔌Archive APIs
Flag this post
Post-Training LLMs as Better Decision-Making Agents: A Regret-Minimization Approach
arxiv.org·14h
💻Local LLMs
Flag this post
Unveiling Deep Semantic Uncertainty Perception for Language-Anchored Multi-modal Vision-Brain Alignment
arxiv.org·14h
🌀Riemannian Computing
Flag this post