Shopping online and signing up for new websites are everyday activities, but so is stumbling across scam domains. These shady sites may take your money, steal sensitive info, or vanish after operating for only a few weeks. So, how can you tell if a domain is sketchy—before you get burned?
In this complete guide, you’ll learn how to use Python to automatically screen websites for scam signals. You’ll see why and how each check works, get a working script, learn to interpret results, and discover how to tailor it for your needs.
What You’ll Learn
- Why scam domains are so hard to spot
- Which technical signals matter most—and why
- How to fetch WHOIS, DNS, HTTPS, and content info in Python
- A full script that weighs each check to give you a risk score
- How to interpret resul…
Shopping online and signing up for new websites are everyday activities, but so is stumbling across scam domains. These shady sites may take your money, steal sensitive info, or vanish after operating for only a few weeks. So, how can you tell if a domain is sketchy—before you get burned?
In this complete guide, you’ll learn how to use Python to automatically screen websites for scam signals. You’ll see why and how each check works, get a working script, learn to interpret results, and discover how to tailor it for your needs.
What You’ll Learn
- Why scam domains are so hard to spot
- Which technical signals matter most—and why
- How to fetch WHOIS, DNS, HTTPS, and content info in Python
- A full script that weighs each check to give you a risk score
- How to interpret results wisely (and avoid false positives)
- Where to look for deeper verification or more advanced checks
Why Are Scam Domains So Common?
Scammers can create a slick web store or fake landing page in minutes. Most use:
- Cheap or free domains registered in the last year, often just weeks ago
- WHOIS privacy shields to hide their real identity
- No real email setup—just a web form, if that
- Broken or missing HTTPS
- Aggressive sales or big discounts (to lure impulse buyers)
- Almost no “real” policy pages, social proof, or company footprint
- Many legitimate startups show some of these signals at first, of course. But the more red flags you spot together, the higher the risk.
Red Flags You Can Automatically Check
- Domain Age: Was the website registered in the last few months? Most scams use brand new domains.
- WHOIS Privacy: When a domain owner hides behind privacy services (like WhoisGuard, DomainsByProxy), you can’t verify them.
- No MX Record: Real businesses usually have a public email setup. Scam sites often don’t bother.
- HTTPS/SSL: No HTTPS or expired certificates are big trust issues.
- Suspicious On-Page Content: Language like “70% off today only!” and generic “secure checkout” badges are classic scam tactics.
- Missing or Fake Contact/Policy Pages: If there’s no easy way to reach out, or refund and privacy policies are missing or copy-pasted, beware.
- Each single signal isn’t proof of a scam, but several together raise the odds considerably.
What you’ll need:
- Python 3.7 or above
- Required libraries: python-whois, requests, beautifulsoup4, dnspython, tldextract Install them with: pip install python-whois requests beautifulsoup4 dnspython tldextract
The Python Code Explained
Below is a script that does the following:
- Fetches WHOIS info to check domain age and privacy
- Checks DNS records for email (MX)
- Tries to fetch homepage using HTTPS (and falls back to HTTP)
- Scrapes for suspicious text (flash sales, missing policies, trust badges)
- Combines evidence into a risk score and verdict
- Just fill in your target domain as a command line argument.
import re
import json
import whois
import requests
import dns.resolver
from bs4 import BeautifulSoup
from datetime import datetime, timezone
import tldextract
HEADERS = {"User-Agent": "Mozilla/5.0 (DomainRisk/0.1)"}
TIMEOUT = 10
def domain_age_days(w):
created = w.get("creation_date")
if isinstance(created, list): created = created[0] if created else None
if not isinstance(created, datetime): return None
if created.tzinfo is None: created = created.replace(tzinfo=timezone.utc)
return (datetime.now(timezone.utc) - created).days
def whois_privacy(w):
text = " ".join(str(w.get(k, "")).lower() for k in ["registrar","org","name"])
return any(t in text for t in ["privacy","proxy","whoisguard","redacted","withheld"])
def resolve_dns(domain):
out = {"A": [], "MX": []}
try: out["A"] = [r.to_text() for r in dns.resolver.resolve(domain, "A")]
except Exception: pass
try: out["MX"] = [r.to_text() for r in dns.resolver.resolve(domain, "MX")]
except Exception: pass
return out
def fetch(url):
try:
r = requests.get(url, headers=HEADERS, timeout=TIMEOUT)
if 200 <= r.status_code < 400: return r.text
except Exception: pass
return None
def text_signals(html):
soup = BeautifulSoup(html, "html.parser")
text = re.sub(r"\s+", " ", soup.get_text(" ").lower())
signals = {
"aggressive_discounts": bool(re.search(r"\b(\d{2,3})% off\b|flash sale|limited time", text)),
"no_contact_info": not any(k in text for k in ["contact us","email","phone","address"]),
"no_returns_policy": not any(k in text for k in ["refund","returns","return policy"]),
}
trust_imgs = [img for img in soup.find_all("img", alt=True) if "trust" in img.get("alt","").lower()]
signals["trust_badges_unverified"] = any(img.parent.name != "a" for img in trust_imgs)
return signals
def risk_score(signals):
weights = {
"domain_very_new": 20,
"whois_privacy": 5,
"no_mx": 4,
"no_https": 8,
"aggressive_discounts": 10,
"no_contact_info": 10,
"no_returns_policy": 8,
"trust_badges_unverified": 8,
}
return sum(weights[k] for k, v in signals.items() if v and k in weights)
def analyze(domain):
ext = tldextract.extract(domain)
norm = ".".join(p for p in [ext.domain, ext.suffix] if p)
if ext.subdomain: norm = f"{ext.subdomain}.{norm}"
w = whois.whois(norm) or {}
age = domain_age_days(w)
dns = resolve_dns(norm)
https_ok = fetch(f"https://{norm}") is not None
html = fetch(f"https://{norm}") or fetch(f"http://{norm}")
signals = {
"domain_very_new": age is None or age < 90,
"whois_privacy": whois_privacy(w),
"no_mx": len(dns.get("MX", [])) == 0,
"no_https": not https_ok,
}
if html:
signals.update(text_signals(html))
score = risk_score(signals)
band = "High Risk" if score >= 50 else ("Moderate Risk" if score >= 30 else "Lower Risk")
return {
"domain": norm,
"age_days": age,
"dns": dns,
"signals": signals,
"risk_score": score,
"risk_band": band,
}
if __name__ == "__main__":
import sys
if len(sys.argv) < 2:
print("Usage: python scan.py <domain>")
raise SystemExit(1)
print(json.dumps(analyze(sys.argv[1]), indent=2))
Running the Script
- Save your script as scan.py.
- Open your terminal.
- Type: pip install python-whois requests beautifulsoup4 dnspython tldextract
- To test a domain, type: python scan.py example.com
You’ll get a clear JSON output with:
- Domain age
- DNS status (especially MX/email)
- Aggressive discount detection, missing policies, unverified trust badges
- A risk_score and one of: High Risk, Moderate Risk, Lower Risk
Real World Example
Want to see this in action? Here’s a review where an automatic “robot” check was part of catching a likely scam: CKlinen.com: A Scam Fashion Store Review
Conclusion
With a little Python and the right checks, you can screen for scam domains faster and more accurately than ever before. Stay curious, share what works, and help keep friends and family safe when they shop online.
If you have suggestions or want to contribute improvements to this script, leave a comment or open a GitHub gist—every bit helps in the fight against online fraud!