Machine Learning–Enhanced Threat Intelligence for Understanding the Underground Cybercrime Market
Keywords:
Cybercrime analytics, illicit actors, data engineering, data warehousing, data science, dark web intelligence, threat profiling.Abstract
The global expansion of digital networks has enabled cybercriminals to develop complex underground ecosystems that facilitate illegal trade, fraud, and coordinated cyber-attacks. Traditional security mechanisms struggle to accurately identify and profile illicit actors due to the high volume, velocity, and variety of cybercrime data generated across dark web forums, encrypted channels, and distributed threat infrastructures. This research proposes a unified analytics-powered framework integrating data engineering pipelines, data warehousing solutions, and advanced data science models to detect and profile threat actors within the cybercrime ecosystem. A scalable ETL/ELT architecture is designed for collecting and standardizing heterogeneous cyber intelligence sources, while a cloud-based warehouse supports high-performance analytical queries. Machine learning, graph analytics, and clustering methods are applied to uncover behavioral patterns, role hierarchies, and hidden relationships among cybercriminals. Experimental results demonstrate that the proposed system enhances actor identification accuracy, improves anomaly detection rates, and strengthens overall cyber-intelligence capabilities. The study concludes with recommendations for integrating automated pipelines into enterprise security operations.
Downloads
References
J. Smith and R. Alvarez, “Mapping criminal communities in dark web markets using graph analytics,” IEEE Security & Privacy, vol. 17, no. 4, pp. 34–42, 2019.
L. Chen, K. Patel, and D. Koh, “Machine learning approaches for detecting malicious cyber patterns,” IEEE Access, vol. 8, pp. 11234–11248, 2020.
S. Liang and R. Doshi, “Big data-driven threat intelligence architecture for real-time security,” IEEE Transactions on Big Data, vol. 6, no. 2, pp. 256–267, 2021.
B. B. Gupta, A. T. V. S. Kumar, and R. R. Sharma, “Dark web monitoring for cyber threat intelligence: challenges and techniques,” Computers & Security, vol. 88, art. no. 101568, 2020.
N. Z. Khan, “Threat intelligence mining from dark web sources,” in Proc. IEEE Int. Conf. Machine Learning and Applications, 2020, pp. 221–228.
M. Zaharia, A. Konwinski, G. A. Konwinski, and I. Stoica, “Designing scalable ETL pipelines for security analytics,” ACM Transactions on Data Engineering, vol. 6, no. 3, pp. 1–18, 2018.
P. Kaur and S. Mehta, “Stream processing architectures for real-time cyber threat detection,” Journal of Cybersecurity Engineering, vol. 2, no. 1, pp. 45–60, 2019.
A. Jain and H. Verma, “Data warehousing techniques for security analytics,” International Journal of Data Science, vol. 4, no. 2, pp. 88–101, 2019.
D. Lee and S. Herrera, “Lakehouse architectures for unified storage and analytics in threat intelligence,” IEEE Cloud Computing, vol. 7, no. 1, pp. 30–39, 2020.
R. Roman, J. Lopez, and M. Fernandez, “ETL frameworks in cybersecurity intelligence pipelines,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 1231–1243, 2020.
K. B. Patel and L. Singh, “Metadata management and feature engineering for cyber threat datasets,” Information Systems, vol. 83, pp. 102–115, 2018.
Z. Sun and Y. Wang, “Behavior analytics for cyber actor profiling using graph and ML methods,” in Proc. ACM Conference on Data and Application Security, 2019, pp. 77–86.
H. R. Gomez and T. P. Reddy, “NLP techniques for dark web forum analysis,” International Journal of Information Security, vol. 9, no. 4, pp. 201–213, 2018.
F. Oliveira and J. Santos, “Combining centrality metrics and clustering for criminal network detection,” IEEE Transactions on Network Science and Engineering, vol. 7, no. 1, pp. 54–66, 2020.
S. K. Das and M. R. Nair, “Integrating data engineering and data science for proactive cyber threat hunting,” Journal of Digital Forensics, Security and Law, vol. 14, no. 3, pp. 5–22, 2021.
T. Holt and E. Lampke, “Exploring stolen data markets online: products and market forces,” Crime Science, vol. 9, no. 1, pp. 1–12, 2020.
D. Décary-Hétu and J. Aldridge, “Reputation systems in darknet markets,” International Journal of Drug Policy, vol. 35, pp. 42–49, 2016.
R. Portnoff, J. Afanasyev, and D. McCoy, “Backpage and Bitcoin: unraveling the darknet illicit economy,” in Proc. USENIX Security Symposium, 2017, pp. 159–176.
M. Motoyama, K. Levchenko, and S. Savage, “Characterizing underground forums: analysis of networks and interactions,” in Proc. ACM SIGCOMM Workshop on Security and Privacy, 2011, pp. 71–80.
K. Thomas, D. Huang, and C. Kruegel, “Malicious account networks: detection and behavioral modeling,” in Proc. IEEE Symposium on Security and Privacy, 2015, pp. 1–15.
Y. Boshmaf, I. Muslukhov, and K. Beznosov, “The socialbot network: modeling and analyzing identity relationships,” Computer Security, vol. 78, pp. 45–59, 2018.
S. Afroz, M. Brennan, and R. Greenstadt, “Detecting deception in online fraud through stylometric analysis,” in Proc. IEEE Security and Privacy Workshops, 2012, pp. 59–62.
R. Samani and R. Paget, “Dark web analysis using topic modeling: uncovering cybercrime trends,” McAfee Labs Threat Report, 2019.
S. Samtani, R. Chinn, and H. Chen, “Cyber-threat analysis using machine learning on dark-web intelligence,” Journal of Cybersecurity, vol. 5, no. 1, pp. 1–13, 2019.
D. Bigelow and J. Riedl, “Distributed analytics architectures for cyber-threat detection,” IEEE Cloud Computing, vol. 4, no. 3, pp. 45–53, 2017.
M. Parmar and A. Aggarwal, “Lakehouse architectures for large-scale security analytics,” International Journal of Information Management, vol. 54, pp. 101–117, 2020.
A. Abbas and S. Khan, “Security and privacy in data warehouses: a comprehensive review,” IEEE Access, vol. 7, pp. 126742–126759, 2019.
H. Aksu and S. Ünal, “Machine learning–based classification of dark-web actors,” Journal of Information Security and Applications, vol. 55, art. no. 102601, 2020.
A. Kshirsagar and H. Joshi, “Anomaly detection for cybercrime patterns using clustering techniques,” International Journal of Computer Applications, vol. 175, no. 7, pp. 22–29, 2020.
S. Nayak and M. Tripathy, “Ensemble learning approaches for predicting coordinated cyber-attacks,” ICT Express, vol. 7, no. 4, pp. 456–462, 2021.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.


