Dark Web Digger: Modular Scraping for Dark Web Intel
08-15, 11:00–11:50 (US/Eastern), Marillac Auditorium

Dark web forums are a major resource for the hacking community and play a large role in the spread of information, data leaks, tools, services, and related transactions. While it is common for users to keep similar usernames and identifiers across different forums to maintain their credibility, these users often need to create or change accounts. The prototype presented here looks to tie anonymized accounts to the same user, as they will likely have similar language usage, post content, tools, tactics, and procedures (TTPs). The presenters developed a modular web scraper that can extract data from forums and store said data for analysis. They explore opportunities to leverage machine learning techniques to automate and enhance the process of cyber threat intelligence (CTI) analysis in the future. This includes using natural language processing (NLP) to digitally fingerprint users based on speech patterns, trend detection between users and forums, and even a chatbot to assist the tool’s users in finding specific information. The project provides analysts with a wholistic view of how users interact on these forums, making it more functional and versatile.

Samantha Stortz is a sophomore at Marist University majoring in cybersecurity. She is currently a first-year research intern on the dark web scraping project, where she researches machine learning. Samantha is expected to graduate in the spring of 2027 and aspires to pursue a career in the cybersecurity field.
linkedin: samantha-stortz-s33

Dominick Foti is a professor of cybersecurity at Marist University. Dominick began his career developing cybersecurity intelligence for the Department of Homeland Security in 2014. Since then, he has held roles with large corporations, such as Price Waterhouse, Coopers, and Advance Publications, consulting Fortune 500 companies on cybersecurity strategy, risk, vulnerability management, threat intelligence, application security, and incident response.

Past HOPEs: i ii iii iv vi vii viii ix xi xii xiii xiv xv