Support me with advice and feedback

HAsh_Scanner — ML-Enhanced Phishing Detection System

I developed HAsh_Scanner, a web application that checks URLs in real time to detect phishing websites using machine learning and security checks.

Overview: Phishing attacks are one of the most common online threats, often using fake websites to steal passwords, bank details, or personal information. HAsh_Scanner is designed to help users verify links before visiting them, providing a clear risk assessment rather than a simple safe/unsafe label.

How It Works:

When a URL is submitted, the system performs multiple checks simultaneously.

URL Analysis: Examines subdomains, special characters, unusual encoding, and IP-based URLs. Detects patterns common in phishing links.

Domain Intelligence: Evaluates domain age, registration i…

HAsh_Scanner — ML-Enhanced Phishing Detection System

I developed HAsh_Scanner, a web application that checks URLs in real time to detect phishing websites using machine learning and security checks.

How It Works:

When a URL is submitted, the system performs multiple checks simultaneously.

URL Analysis: Examines subdomains, special characters, unusual encoding, and IP-based URLs. Detects patterns common in phishing links.

Domain Intelligence: Evaluates domain age, registration info, SSL certificate validity, and high-risk TLDs. New domains or unusual registrars are flagged.

Content Inspection: Searches for phishing keywords (verify, update, suspend, login) and common phishing page structures.

Machine Learning: Trained on over 156,000 real-world phishing URLs, the system identifies brand impersonation, suspicious paths, and subdomain abuse.

Risk Scoring: Aggregates all findings into a risk score from 0–100 to give users a nuanced view of potential threats.

Validation and Testing:

Tested on 51 legitimate websites, including major brands and banks: 0 false positives, average risk score 1.2.

Tested on 20 known phishing patterns: 70% detection rate, highest scores aligned with the most dangerous sites.

The system balances detection accuracy with minimal false positives to maintain trust.

Platform Security:

Custom middleware to handle requests securely.

Rate limiting (15 requests per minute, 100 per hour) and bot detection to prevent abuse.

Content Security Policy, XSS, and clickjacking protections.

HTTPS-only deployment ensures encrypted communications.

Technology Stack:

Backend: Python, Flask, Gunicorn

Machine Learning: Pattern recognition and heuristic analysis

Frontend: HTML5 / CSS3, responsive design

Deployment: Render with auto-deploy from GitHub, global CDN

Design Principles:

Minimize false positives

Explainable results rather than black-box decisions

Privacy-first approach: no URL storage or user tracking

Built for practical use in real-world scenarios, not only lab testing

Live Demo: https://hash-scanner-1.onrender.com

I welcome feedback from my mentors, developers and cyber security professionals to improve the system and expand it’s capabilities.

Note: I am still a student and learning, so I may make mistakes. Your feedback is appreciated.

Similar Posts