HAsh_Scanner — ML-Enhanced Phishing Detection System
I developed HAsh_Scanner, a web application that checks URLs in real time to detect phishing websites using machine learning and security checks.
Overview: Phishing attacks are one of the most common online threats, often using fake websites to steal passwords, bank details, or personal information. HAsh_Scanner is designed to help users verify links before visiting them, providing a clear risk assessment rather than a simple safe/unsafe label.
How It Works:
When a URL is submitted, the system performs multiple checks simultaneously.
URL Analysis: Examines subdomains, special characters, unusual encoding, and IP-based URLs. Detects patterns common in phishing links.
Domain Intelligence: Evaluates domain age, registration i…
HAsh_Scanner — ML-Enhanced Phishing Detection System
I developed HAsh_Scanner, a web application that checks URLs in real time to detect phishing websites using machine learning and security checks.
Overview: Phishing attacks are one of the most common online threats, often using fake websites to steal passwords, bank details, or personal information. HAsh_Scanner is designed to help users verify links before visiting them, providing a clear risk assessment rather than a simple safe/unsafe label.
How It Works:
When a URL is submitted, the system performs multiple checks simultaneously.
URL Analysis: Examines subdomains, special characters, unusual encoding, and IP-based URLs. Detects patterns common in phishing links.
Domain Intelligence: Evaluates domain age, registration info, SSL certificate validity, and high-risk TLDs. New domains or unusual registrars are flagged.
Content Inspection: Searches for phishing keywords (verify, update, suspend, login) and common phishing page structures.
Machine Learning: Trained on over 156,000 real-world phishing URLs, the system identifies brand impersonation, suspicious paths, and subdomain abuse.
Risk Scoring: Aggregates all findings into a risk score from 0–100 to give users a nuanced view of potential threats.
Validation and Testing:
Tested on 51 legitimate websites, including major brands and banks: 0 false positives, average risk score 1.2.
Tested on 20 known phishing patterns: 70% detection rate, highest scores aligned with the most dangerous sites.
The system balances detection accuracy with minimal false positives to maintain trust.
Platform Security:
Custom middleware to handle requests securely.
Rate limiting (15 requests per minute, 100 per hour) and bot detection to prevent abuse.
Content Security Policy, XSS, and clickjacking protections.
HTTPS-only deployment ensures encrypted communications.
Technology Stack:
Backend: Python, Flask, Gunicorn
Machine Learning: Pattern recognition and heuristic analysis
Frontend: HTML5 / CSS3, responsive design
Deployment: Render with auto-deploy from GitHub, global CDN
Design Principles:
Minimize false positives
Explainable results rather than black-box decisions
Privacy-first approach: no URL storage or user tracking
Built for practical use in real-world scenarios, not only lab testing
Live Demo: https://hash-scanner-1.onrender.com
I welcome feedback from my mentors, developers and cyber security professionals to improve the system and expand it’s capabilities.
Note: I am still a student and learning, so I may make mistakes. Your feedback is appreciated.