Linking Exploits from the Dark Web to Known Vulnerabilities for Proactive Cyber Threat Intelligence:

An Attention-based Deep Structured Semantic Model

ANTES DE QUALQUER COISA

ESSE É UM ARTIGO DE CIBERSEGURANÇA?

  • Sim! Mas não!!!

  • Vamos ler esse artigo como Computational Design Science!

  • Coleta de dados, aparato, E TESTE E VALIDAÇÃO DESSE APARATO!!!

  • O QUE PODEMOS APRENDER COM ELE PARA A NOSSAS DISSERTAÇÕES/TESES??

  • (Obs: Esse artigo poderia ser lido tb sob uma perspectiva de IA/Redes Neurais!!!)

Computational Design Science

  • As a specific genre of design science, CDS aims to develop novel computational algorithms and methods to solve business and societal problems with significant impact;

  • “how can we create a system that does X?” Vs. “why does X happen?”

  • Fang, Xiao and Hu, Paul J. and Chau, Michael and Chen, Hsinchun, Computational Design Science: A Critical Information Systems Research Area Contributing to Artificial Intelligence and Data Science (February 01, 2025). Available at SSRN: https://ssrn.com/abstract=5455094 or http://dx.doi.org/10.2139/ssrn.5455094

  • Rai, A. 2017. “Editor’s Comments: Diversity of Design Science Research,” MIS Quarterly, (41:1), pp. iii– xviii.

RESEARCH GAPS AND QUESTIONS

  • Lack of Integrated Linking Approaches;
  • Underutilization of Vulnerability Text;
  • Methodological Limitations of DSSM;
  • Absence of Specialized Design Artifacts.

Questions

  • What vulnerabilities do hacker exploits found in Dark Web forums actually target?
  • How can device-level severity scores be calculated by incorporating both vulnerability data and hacker exploit metadata to better facilitate CTI?
  • How can attention mechanisms be integrated into a DSSM to effectively capture and prioritize the overlapping features found within exploit and vulnerability names for more accurate linking?

Proposed Exploit-Vulnerability Linking Framework

    1. Data Collection;
    1. Exploit-Vulnerability Linking and Prioritization (o artefato de fato);
    1. Technical Benchmark Experiments;
    1. Case Studies and Expert Evaluation.

Data Collection

  • Two distinct datasets: hacker exploits from the Dark Web and known vulnerabilities from professional security repositories;

  • Hacker Exploit Collection: They used a web crawler routed through the Tor network to gather and parse data into a relational database.

  • The final dataset consisted of 18,052 unique exploits.

Data Collection

Data Collection

  • “In addition to collecting hacker forum exploits, we also compiled a comprehensive list of vulnerability names, their descriptions, and severity scores from Securityfocus.com, a trusted INFOSEC resource providing vulnerability information for tools such as Nessus, Qualys, and Burp Suite (Mell et al. 2007)”;

  • They removed Social Engineering and informational.

  • After filtering, 64,104 vulnerabilities.

Exploit-Vulnerability Attention Deep Structured

  • Deep Structured Semantic Model (DSSM): processes input texts separately until the final embedding comparison. As a result, cannot capture global relationships across input texts during the training process to improve overall matching performance.

  • ATTENTION IS ALL YOU NEED: Attention mechanisms can be customized to focus on entire input sequences or portions, depending on the data characteristics and/or network architecture;

  • BIDIRECTIONAL PROCESSING AND ATTENTION MECHANISM TO CAPTURE TEH UNIQUE SEQUENTIAL DEPENDECNCIES AND SEMANTICS: Ao invés de associar uma palavra sempre a um mesmo vetor (embedding), a camada de atenção da rede neural REFINA esse vetor de acordo com o contexto do texto. A mesma palavra tem diferentes embeddings de acordo com a localização e os “vizinhos” dela no corpus.

Exploit-Vulnerability Attention Deep Structured

EVA-DSSM PROCESS

  • Pre-processing: All exploit and vulnerability names are stemmed, lowercased, and have stop words removed. Implementing these steps normalizes irregularities (e.g., capitalization) and follows common practice for hacker forum analysis;

  • Word Hashing: letter trigrams are extracted from pre-processed text;

  • Bi-LSTM Processing: It replaces the standard feed-forward layer with a Bidirectional Long-Short Term Memory (Bi-LSTM) layer to capture sequential dependencies in both forward and backward directions (e.g., recognizing that a system name usually appears before a version number).

  • Context Attention Layer Específico (KQV+Scoring+Contex Vector): Operating in this fashion captures the relationships across exploit and vulnerability texts (i.e., global information) with the context vector, and information within the exploit texts (i.e., local information);

  • Self-Attention Layer): This component iteratively re-weights the embeddings to prioritize the most important features, further refining the final matching performance;

EVA-DSSM PROCESS

  • DNN Processing with Shared Dense Layers: To facilitate embedding similarity calculation, we input both generated embeddings into shared dense layers to project them into the same embedding space;

  • Computing Embedding Similarity:Cosine similarity computes the distance between output from previous layer. The softmax is used to obtain conditional probability of P(E|V) and phase, the loss is backpropagated to update network parameters according to gradient-based methods;

  • EVA-DSSM was implemented with the Keras, TensorFlow, Natural Language Toolkit (NLTK), numpy, pandas, genism, and scikit-learn packages.

Device Vulnerability Severity Metric (DVSM)

  • Coupling hacker exploit and vulnerability metadata based on EVA-DSSM’s output to create specialized severity (risk) scores can further create holistic CTI and facilitate enhanced device prioritization capabilities;

  • Device Vulnerability Severity Metric: encompasses the number of vulnerabilities in a device, each vulnerability’s severity, and the hacker exploit age for each vulnerability;

  • A device’s overall score is higher if it has more severe vulnerabilities or newer exploits for vulnerabilities.

Technical Benchmark Experiments and results

  • Consistent with computational design science principles and DL fundamentals, we evaluated the proposed EVA-DSSM with three technical benchmark experiments: (1) EVA-DSSM vs Conventional Short Text Matching Algorithms, (2) EVA-DSSM vs Deep Learning-based Short Text Matching Algorithms, and (3) EVA-DSSM Sensitivity Analysis;

  • To validate the labels from the dataset, we recruited a security analyst from a well-known, international healthcare organization; We used Cohen’s kappa to compute the level of agreement between ratings;

  • In this research, we employed three performance metrics that are commonly used to evaluate DSSMs: Normalized Discounted Cumulative Gain (NDCG); Mean Reciprocal Rank (MRR); and Mean Average Precision (MAP).

METRICS:

  • MRR is the “Frustration Metric”: If the first relevant result drops from rank 1 to rank 2, the score halves (from \(1\) to \(0.5\)). If it drops from 10 to 11, the score barely changes. It measures the “Instant Gratification” of your retriever.

  • MAP is the “Information Density Metric”: It tells you how “clean” your top results are. If your model returns 10 cases but only the 1st and 10th are relevant, MAP penalizes you heavily because the user had to sift through 8 irrelevant items.

  • NDCG is the “Nuance Metric”: Use the formula below to show you are accounting for the quality of the match. It uses a logarithmic discount, meaning a “Perfect Match” at rank 4 is worth much less than a “Perfect Match” at rank 1.

Experiment 1 Results: EVA-DSSM vs Conventional Short Text Matching Algorithms

  • EVA-DSSM outperformed non-DL short text matching algorithms in NDCG (at all levels), MRR, and MAP.

  • These results suggest that EVA-DSSM’s attention mechanisms combined with feed-forward, backpropagation, and error correction enables the model to identify finer grained linguistic patterns within exploit and vulnerability names that benchmark methods miss.

  • The consistency of these issues across all four datasets indicates that simple matching approaches, while appearing to have some face validity for exploit-vulnerability matching due to overlapping technology names in exploit and vulnerability names, cannot capture the semantics or context of selected technology names that EVA-DSSM can. (AQUI ELE DEMONSTRA ALGUMAS INTERSECÇÕES ENTRE OS DATASETS).

Experiment 2 Results: EVA-DSSM vs Deep Learning-based Short Text Matching Algorithms

  • In Experiment 2, we evaluated the performance of EVA-DSSM against state-of-the-art DL-based short text matching algorithms;

  • Eleven models were selected for benchmarking; all models were evaluated based on MAP, MRR, and NDCG;

  • EVA-DSSM’s outperformance of both CNN and LSTM-based methods suggests that incorporating attention mechanisms can help capture global relationships and semantics across exploit and vulnerability short texts missed by prevailing approachs;

  • EVA-DSSM’s outperformance of both CNN and LSTM-based methods suggests that incorporating attention mechanisms can help capture global relationships and semantics across exploit and vulnerability short texts missed by prevailing approaches.

Experiment 3 Results: EVA-DSSM Sensitivity Analysis

  • the base EVA-DSSM model using letter trigrams, one-layer Bi-LSTM, two dense layers, and self-attention and context attention mechanisms achieved the strongest performance;

  • Letter trigrams create a window large enough to capture these key three letter acronyms;

  • When considering the Bi-LSTM sensitivity analysis, results indicated that using only the LSTM that processes in a single direction rather than two directions resulted in performance degradation. This is likely due to the nature of how sequential dependencies appear in exploit and vulnerability names;

  • layers, the performance increased when having two layers as opposed to one, but the differences were negligible when adding a third layer. However, removing either attention mechanism from the EVA-DSSM substantially reduced the performance. This performance decrease was most pronounced when removing the context attention, which dropped by nearly 15% in some cases;

US Hospital and SCADA Case Study Results

  • we used Nessus, a state-of-the-art vulnerability assessment tool, to discover the vulnerabilities of each device without port scanning and payload dropping. Scanning for vulnerabilities in this fashion has been noted in past literature to avoid adverse events;

  • After identifying vulnerabilities, EVA-DSSM determined the most relevant hacker exploit for each vulnerability;

  • After creating exploit-vulnerability links, we used the metadata from the exploit (post date) and vulnerability (CVSS score) for each exploit-vulnerability pair for each device. The DVSM score for each device is computed using these data. The final outputted DVSM values are ranked in descending order to help facilitate vulnerable device prioritization;

  • The exploit-vulnerability linkages identified by EVA-DSSM and the DVSM scores can offer cybersecurity experts an excellent starting point for their mitigation and remediation activities.

US Hospital Case Study Results

  • 344/1,879 (18.31%) of scanned devices have vulnerabilities, and 176 of those have multiple vulnerabilities, while the remaining 168 have only one;
Table 20. Selected Hospital Vulnerabilities and Their Most Relevant Exploits Identified by the EVA-DSSM
Risk Level Vulnerability Names (Severity) Top Linked Exploit Name and its Post Date # of Devices
Critical “PHP Unsupported Version Detection” (10.0) “phpshop 2.0 Injection Vulnerability” (1/14/2013) 11
“OpenSSL Unsupported” (10.0) “OpenSSL TLS Heartbeat Extension - Memory Disclosure” (4/8/2014) 7
“Unix OS Unsupported Version Detection” (10.0) “TCP/IP Invisible Userland Unix Backdoor with Reverse Shell” (6/30/2012) 6
High “Multiple Apache Vulnerabilities” (8.3) “Apache 2.4.17 - Denial of Service” (12/18/2015) 17
Medium “HTTP TRACE / TRACK Methods Allowed” (5.0) “traceroute Local Root Exploit” (11/15/2000) 58
“SSH Weak Algorithms” (4.3) “OpenSSH attack DoS” (7/4/2010) 55

Acho que é isso!

Muito obrigado!!!

APENDICE METADATA

Table 3. Summary of Metadata in Hacker Forums that Provide Exploits
Category Metadata Description
Description Exploit Name Exploit name that defines its function and target
Author Name Name of hacker who posted
Post Date Date when exploit was posted
Exploit Category Major category an exploit belongs to
Operation Targeted Platform Specific platform and exploit targets
Common Vulnerabilities and Exposure (CVE) Standardized representation of a vulnerability
Verified Exploit Verified by community that the exploit is operational
Content Exploit Description Natural language explanation of the exploit
Exploit Discussion Discussions between forum members
Exploit Content Raw exploit source code

APENDICE METADATA 2

Table 7. Summary of Vulnerability Information Collection
Risk Level CVSS Score Number of Vulnerability Listings Number Amenable for Text Analytics
Critical 9.0 – 10.0 8,355 8,170
High 7.0 – 8.9 24,098 23,897
Medium 4.0 – 6.9 28,707 28,674
Low 0.1 – 3.9 3,163 3,163
Informational 0.0 – 0.0 22,696 0
Total: - 87,019 64,104

Apendice Metadata 3

Table 8. Features Incorporated into the Device Vulnerability Severity Metric (DVSM)
Feature Category Feature Justification for Inclusion References
Vulnerability Vulnerability severity (CVSS, 0.0-10.0) A higher severity score indicates more severe consequences if device is compromised. Mell et al. 2007; Weidman 2014; Kennedy et al. 2011
Number of device vulnerabilities Devices with more vulnerabilities have a higher exploit susceptibility.
Hacker Exploit # of exploits targeting vulnerabilities More hacker exploits targeting a vulnerability increases the probability of the device’s harm. Friedman 2015; Robertson et al. 2017
Age of hacker exploits (i.e., forum post date) Newer exploits are more valuable for CTI since there is less time to formulate defenses. Shackleford 2016