Linking Exploits from the Dark Web to Known Vulnerabilities for Proactive Cyber Threat Intelligence:

An Attention-based Deep Structured Semantic Model

ANTES DE QUALQUER COISA

ESSE É UM ARTIGO DE CIBERSEGURANÇA?

  • Sim! Mas não!!!

  • Vamos ler esse artigo como Computational Design Science!

  • Coleta de dados, aparato, E TESTE E VALIDAÇÃO DESSE APARATO!!!

  • O QUE PODEMOS APRENDER COM ELE PARA A NOSSAS DISSERTAÇÕES/TESES??

  • (Obs: Esse artigo poderia ser lido tb sob uma perspectiva de IA/Redes Neurais!!!)

RESEARCH GAPS AND QUESTIONS

  • “organizations need to take a more proactive approach to cybersecurity”
  • “What vulnerabilities do hacker exploits from hacker forums target?”
  • “How can device-level severity scores be calculated that incorporate vulnerability and hacker exploit metadata to facilitate CTI?”
  • “How can attention mechanisms be incorporated into the DSSM to capture and prioritize overlapping features within exploit and vulnerability names to create exploit-vulnerability links?”

Computational Design Science

  • As a specific genre of design science, CDS aims to develop novel computational algorithms and methods to solve business and societal problems with significant impact;

  • “how can we create a system that does X?” Vs. “why does X happen?”

  • Fang, Xiao and Hu, Paul J. and Chau, Michael and Chen, Hsinchun, Computational Design Science: A Critical Information Systems Research Area Contributing to Artificial Intelligence and Data Science (February 01, 2025). Available at SSRN: https://ssrn.com/abstract=5455094 or http://dx.doi.org/10.2139/ssrn.5455094

  • Rai, A. 2017. “Editor’s Comments: Diversity of Design Science Research,” MIS Quarterly, (41:1), pp. iii– xviii.

Proposed Exploit-Vulnerability Linking Framework

    1. Data Collection;
    1. Exploit-Vulnerability Linking and Prioritization (o artefato de fato);
    1. Technical Benchmark Experiments;
    1. Case Studies and Expert Evaluation.

Data Collection

  • A web crawler routed through Tor collected and parsed all exploit category, post date, author name, platforms targeted, and exploit description data into a relational database;
Table 3. Summary of Metadata in Hacker Forums that Provide Exploits
Category Metadata Description
Description Exploit Name Exploit name that defines its function and target
Author Name Name of hacker who posted
Post Date Date when exploit was posted
Exploit Category Major category an exploit belongs to
Operation Targeted Platform Specific platform and exploit targets
Common Vulnerabilities and Exposure (CVE) Standardized representation of a vulnerability
Verified Exploit Verified by community that the exploit is operational
Content Exploit Description Natural language explanation of the exploit
Exploit Discussion Discussions between forum members
Exploit Content Raw exploit source code

Data Collection

Data Collection

  • “In addition to collecting hacker forum exploits, we also compiled a comprehensive list of vulnerability names, their descriptions, and severity scores from Securityfocus.com, a trusted INFOSEC resource providing vulnerability information for tools such as Nessus, Qualys, and Burp Suite (Mell et al. 2007). The overall collection is summarized in Table 7.”
Table 7. Summary of Vulnerability Information Collection
Risk Level CVSS Score Number of Vulnerability Listings Number Amenable for Text Analytics
Critical 9.0 – 10.0 8,355 8,170
High 7.0 – 8.9 24,098 23,897
Medium 4.0 – 6.9 28,707 28,674
Low 0.1 – 3.9 3,163 3,163
Informational 0.0 – 0.0 22,696 0
Total: - 87,019 64,104

Exploit-Vulnerability Attention Deep Structured

  • Deep Structured Semantic Model (DSSM): processes input texts separately until the final embedding comparison. As a result, cannot capture global relationships across input texts during the training process to improve overall matching performance.

  • ATTENTION IS ALL YOU NEED: Attention mechanisms can be customized to focus on entire input sequences or portions, depending on the data characteristics and/or network architecture;

  • Ao invés de associar uma palavra sempre a um mesmo vetor (embedding), a camada de atenção da rede neural REFINA esse vetor de acordo com o contexto do texto. A mesma palavra tem diferentes embeddings de acordo com a localização e os “vizinhos” dela no corpus.

Exploit-Vulnerability Attention Deep Structured

EVA-DSSM PROCESS

  • Pre-processing: All exploit and vulnerability names are stemmed, lowercased, and have stop words removed. Implementing these steps normalizes irregularities (e.g., capitalization) and follows common practice for hacker forum analysis;

  • Word Hashing: letter trigrams are extracted from pre-processed text;

  • Bi-LSTM Processing: The standard DSSM uses a bag-of-trigrams representation of input texts and therefore does not capture sequential dependencies within text.Each Bi-LSTM time-step processes a letter trigram sequentially in both forward and backward;

  • Context Attention Layer Específico (KQV+Scoring+Contex Vector): Operating in this fashion captures the relationships across exploit and vulnerability texts (i.e., global information) with the context vector, and information within the exploit texts (i.e., local information);

  • Self-Attention Layer): computes the attention weights assigned for the hidden state, summarizes the exploit texts information according to the relationships across exploit and vulnerability texts and the relationship within the exploit texts;

EVA-DSSM PROCESS

  • DNN Processing with Shared Dense Layers: To facilitate embedding similarity calculation, we input both generated embeddings into shared dense layers to project them into the same embedding space;

  • Computing Embedding Similarity:Cosine similarity computes the distance between output from previous layer. The softmax is used to obtain conditional probability of P(E|V) and phase, the loss is backpropagated to update network parameters according to gradient-based methods;

  • EVA-DSSM was implemented with the Keras, TensorFlow, Natural Language Toolkit (NLTK), numpy, pandas, genism, and scikit-learn packages.

Device Vulnerability Severity Metric (DVSM);

  • Coupling hacker exploit and vulnerability metadata based on EVA-DSSM’s output to create specialized severity (risk) scores can further create holistic CTI and facilitate enhanced device prioritization capabilities;

  • Device Vulnerability Severity Metric: encompasses the number of vulnerabilities in a device, each vulnerability’s severity, and the hacker exploit age for each vulnerability;

  • A device’s overall score is higher if it has more severe vulnerabilities or newer exploits for vulnerabilities.

Table 8. Features Incorporated into the Device Vulnerability Severity Metric (DVSM)
Feature Category Feature Justification for Inclusion References
Vulnerability Vulnerability severity (CVSS, 0.0-10.0) A higher severity score indicates more severe consequences if device is compromised. Mell et al. 2007; Weidman 2014; Kennedy et al. 2011
Number of device vulnerabilities Devices with more vulnerabilities have a higher exploit susceptibility.
Hacker Exploit # of exploits targeting vulnerabilities More hacker exploits targeting a vulnerability increases the probability of the device’s harm. Friedman 2015; Robertson et al. 2017
Age of hacker exploits (i.e., forum post date) Newer exploits are more valuable for CTI since there is less time to formulate defenses. Shackleford 2016

Technical Benchmark Experiments and results

  • Consistent with computational design science principles and DL fundamentals, we evaluated the proposed EVA-DSSM with three technical benchmark experiments: (1) EVA-DSSM vs Conventional Short Text Matching Algorithms, (2) EVA-DSSM vs Deep Learning-based Short Text Matching Algorithms, and (3) EVA-DSSM Sensitivity Analysis;

  • To validate the labels from the dataset, we recruited a security analyst from a well-known, international healthcare organization; We used Cohen’s kappa to compute the level of agreement between ratings;

  • In this research, we employed three performance metrics that are commonly used to evaluate DSSMs: Normalized Discounted Cumulative Gain (NDCG); Mean Reciprocal Rank (MRR); and Mean Average Precision (MAP).

METRICS:

  • MRR is the “Frustration Metric”: If the first relevant result drops from rank 1 to rank 2, the score halves (from \(1\) to \(0.5\)). If it drops from 10 to 11, the score barely changes. It measures the “Instant Gratification” of your retriever.

  • MAP is the “Information Density Metric”: It tells you how “clean” your top results are. If your model returns 10 cases but only the 1st and 10th are relevant, MAP penalizes you heavily because the user had to sift through 8 irrelevant items.

  • NDCG is the “Nuance Metric”: Use the formula below to show you are accounting for the quality of the match. It uses a logarithmic discount, meaning a “Perfect Match” at rank 4 is worth much less than a “Perfect Match” at rank 1.

Experiment 1 Results: EVA-DSSM vs Conventional Short Text Matching Algorithms

  • EVA-DSSM outperformed non-DL short text matching algorithms in NDCG (at all levels), MRR, and MAP.

  • These results suggest that EVA-DSSM’s attention mechanisms combined with feed-forward, backpropagation, and error correction enables the model to identify finer grained linguistic patterns within exploit and vulnerability names that benchmark methods miss.

  • The consistency of these issues across all four datasets indicates that simple matching approaches, while appearing to have some face validity for exploit-vulnerability matching due to overlapping technology names in exploit and vulnerability names, cannot capture the semantics or context of selected technology names that EVA-DSSM can. (AQUI ELE DEMONSTRA ALGUMAS INTERSECÇÕES ENTRE OS DATASETS).

Experiment 2 Results: EVA-DSSM vs Deep Learning-based Short Text Matching Algorithms

  • In Experiment 2, we evaluated the performance of EVA-DSSM against state-of-the-art DL-based short text matching algorithms;

  • Eleven models were selected for benchmarking; all models were evaluated based on MAP, MRR, and NDCG;

  • EVA-DSSM’s outperformance of both CNN and LSTM-based methods suggests that incorporating attention mechanisms can help capture global relationships and semantics across exploit and vulnerability short texts missed by prevailing approachs;

  • EVA-DSSM’s outperformance of both CNN and LSTM-based methods suggests that incorporating attention mechanisms can help capture global relationships and semantics across exploit and vulnerability short texts missed by prevailing approaches.

Experiment 3 Results: EVA-DSSM Sensitivity Analysis

  • the base EVA-DSSM model using letter trigrams, one-layer Bi-LSTM, two dense layers, and self-attention and context attention mechanisms achieved the strongest performance;

  • Letter trigrams create a window large enough to capture these key three letter acronyms;

  • When considering the Bi-LSTM sensitivity analysis, results indicated that using only the LSTM that processes in a single direction rather than two directions resulted in performance degradation. This is likely due to the nature of how sequential dependencies appear in exploit and vulnerability names;

  • layers, the performance increased when having two layers as opposed to one, but the differences were negligible when adding a third layer. However, removing either attention mechanism from the EVA-DSSM substantially reduced the performance. This performance decrease was most pronounced when removing the context attention, which dropped by nearly 15% in some cases;

US Hospital and SCADA Case Study Results

  • we used Nessus, a state-of-the-art vulnerability assessment tool, to discover the vulnerabilities of each device without port scanning and payload dropping. Scanning for vulnerabilities in this fashion has been noted in past literature to avoid adverse events;

  • After identifying vulnerabilities, EVA-DSSM determined the most relevant hacker exploit for each vulnerability;

  • After creating exploit-vulnerability links, we used the metadata from the exploit (post date) and vulnerability (CVSS score) for each exploit-vulnerability pair for each device. The DVSM score for each device is computed using these data. The final outputted DVSM values are ranked in descending order to help facilitate vulnerable device prioritization;

  • The exploit-vulnerability linkages identified by EVA-DSSM and the DVSM scores can offer cybersecurity experts an excellent starting point for their mitigation and remediation activities.

US Hospital Case Study Results

  • 344/1,879 (18.31%) of scanned devices have vulnerabilities, and 176 of those have multiple vulnerabilities, while the remaining 168 have only one;
Table 20. Selected Hospital Vulnerabilities and Their Most Relevant Exploits Identified by the EVA-DSSM
Risk Level Vulnerability Names (Severity) Top Linked Exploit Name and its Post Date # of Devices
Critical “PHP Unsupported Version Detection” (10.0) “phpshop 2.0 Injection Vulnerability” (1/14/2013) 11
“OpenSSL Unsupported” (10.0) “OpenSSL TLS Heartbeat Extension - Memory Disclosure” (4/8/2014) 7
“Unix OS Unsupported Version Detection” (10.0) “TCP/IP Invisible Userland Unix Backdoor with Reverse Shell” (6/30/2012) 6
High “Multiple Apache Vulnerabilities” (8.3) “Apache 2.4.17 - Denial of Service” (12/18/2015) 17
Medium “HTTP TRACE / TRACK Methods Allowed” (5.0) “traceroute Local Root Exploit” (11/15/2000) 58
“SSH Weak Algorithms” (4.3) “OpenSSH attack DoS” (7/4/2010) 55

Acho que é isso!

Muito obrigado!!!