Authors:Fatimo Adenike Adeniya (York St John University, London Campus, London, United Kingdom)
Abstract:Cyberattacks on e-commerce platforms have grown in sophistication, threatening consumer trust and operational continuity. This research presents a hybrid analytical framework that integrates statistical modelling and machine learning for detecting and forecasting cyberattack patterns in the e-commerce domain. Using the Verizon Community Data Breach (VCDB) dataset, the study applies Auto ARIMA for temporal forecasting and significance testing, including a Mann-Whitney U test (U = 2579981.5, p = 0.0121), which confirmed that holiday shopping events experienced sig…
Authors:Fatimo Adenike Adeniya (York St John University, London Campus, London, United Kingdom)
Abstract:Cyberattacks on e-commerce platforms have grown in sophistication, threatening consumer trust and operational continuity. This research presents a hybrid analytical framework that integrates statistical modelling and machine learning for detecting and forecasting cyberattack patterns in the e-commerce domain. Using the Verizon Community Data Breach (VCDB) dataset, the study applies Auto ARIMA for temporal forecasting and significance testing, including a Mann-Whitney U test (U = 2579981.5, p = 0.0121), which confirmed that holiday shopping events experienced significantly more severe cyberattacks than non-holiday periods. ANOVA was also used to examine seasonal variation in threat severity, while ensemble machine learning models (XGBoost, LightGBM, and CatBoost) were employed for predictive classification. Results reveal recurrent attack spikes during high-risk periods such as Black Friday and holiday seasons, with breaches involving Personally Identifiable Information (PII) exhibiting elevated threat indicators. Among the models, CatBoost achieved the highest performance (accuracy = 85.29%, F1 score = 0.2254, ROC AUC = 0.8247). The framework uniquely combines seasonal forecasting with interpretable ensemble learning, enabling temporal risk anticipation and breach-type classification. Ethical considerations, including responsible use of sensitive data and bias assessment, were incorporated. Despite class imbalance and reliance on historical data, the study provides insights for proactive cybersecurity resource allocation and outlines directions for future real-time threat detection research.
| Comments: | 32 pages, 9 figures, 6 tables; MSc Research Dissertation, York St John University, London Campus |
| Subjects: | Cryptography and Security (cs.CR); Machine Learning (cs.LG) |
| MSC classes: | 68M25, 68T05 68M25, 68T05 |
| ACM classes: | C.2.0; K.6.5; I.2.6; C.2.0; K.6.5 |
| Cite as: | arXiv:2511.03020 [cs.CR] |
| (or arXiv:2511.03020v1 [cs.CR] for this version) | |
| https://doi.org/10.48550/arXiv.2511.03020 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Fatimo Adenike Adeniya [view email] [v1] Tue, 4 Nov 2025 21:38:59 UTC (2,820 KB)