Nikki Scott II
2025-12-06
The issue I want to investigate is cybersecurity threats. This area sparked my curiosity because of my background and interests and the moment in time we are in currently. This will be a very interesting project since I want to get into the cybersecurity field in the future. Also, in the present time cybersecurity is prominent and thriving with new vulnerabilities and exploits; and ways to prevent and protect against these things.
With this project I want to dive into concepts like what kind of attacks are the most popular, which one the most effective. Or, even what are some vulnerabilities seem to be lacking at large. I will be doing a lot of analysis but with this project’s certain structure and requirements, I will focus on only two main questions that will ultimately see a result that includes both the red team and blue team.
Questions:
1.) What kind of vulnerability is the worst vulnerability to have?
2.) What kind of attack seems to be the most effective?
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
## Rows: 3000 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): Country, Attack Type, Target Industry, Attack Source, Security Vuln...
## dbl (4): Year, Financial Loss (in Million $), Number of Affected Users, Inci...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Convert numeric fields
df <- df %>%
mutate(
financial_loss_in_million = as.numeric(financial_loss_in_million),
number_of_affected_users = as.numeric(number_of_affected_users),
incident_resolution_time_in_hours = as.numeric(incident_resolution_time_in_hours)
)## country year attack_type target_industry
## Length:3000 Min. :2015 Length:3000 Length:3000
## Class :character 1st Qu.:2017 Class :character Class :character
## Mode :character Median :2020 Mode :character Mode :character
## Mean :2020
## 3rd Qu.:2022
## Max. :2024
## financial_loss_in_million number_of_affected_users attack_source
## Min. : 0.50 Min. : 424 Length:3000
## 1st Qu.:25.76 1st Qu.:255805 Class :character
## Median :50.80 Median :504513 Mode :character
## Mean :50.49 Mean :504684
## 3rd Qu.:75.63 3rd Qu.:758088
## Max. :99.99 Max. :999635
## security_vulnerability_type defense_mechanism_used
## Length:3000 Length:3000
## Class :character Class :character
## Mode :character Mode :character
##
##
##
## incident_resolution_time_in_hours
## Min. : 1.00
## 1st Qu.:19.00
## Median :37.00
## Mean :36.48
## 3rd Qu.:55.00
## Max. :72.00
## # A tibble: 6 × 10
## country year attack_type target_industry financial_loss_in_million
## <chr> <dbl> <chr> <chr> <dbl>
## 1 China 2019 Phishing Education 80.5
## 2 China 2019 Ransomware Retail 62.2
## 3 India 2017 Man-in-the-Middle IT 38.6
## 4 UK 2024 Ransomware Telecommunications 41.4
## 5 Germany 2018 Man-in-the-Middle IT 74.4
## 6 Germany 2017 Man-in-the-Middle Retail 98.2
## # ℹ 5 more variables: number_of_affected_users <dbl>, attack_source <chr>,
## # security_vulnerability_type <chr>, defense_mechanism_used <chr>,
## # incident_resolution_time_in_hours <dbl>
## Start Investigating Vulnerabilities
``` r
vulnerability_summary <- df %>%
group_by(security_vulnerability_type) %>%
summarize(
avg_loss = mean(financial_loss_in_million),
avg_users = mean(number_of_affected_users),
avg_resolution = mean(incident_resolution_time_in_hours),
count = n()
) %>%
arrange(desc(avg_loss))
vulnerability_summary
## # A tibble: 4 × 5
## security_vulnerability_type avg_loss avg_users avg_resolution count
## <chr> <dbl> <dbl> <dbl> <int>
## 1 Social Engineering 50.9 500802. 36.5 747
## 2 Weak Passwords 50.5 519339. 35.6 730
## 3 Zero-day 50.4 504836. 36.0 785
## 4 Unpatched Software 50.2 493956. 37.9 738
ggplot(vulnerability_summary,
aes(x = reorder(security_vulnerability_type, avg_loss), y = avg_loss)) +
geom_col(fill = "steelblue") +
coord_flip() +
labs(
title = "Average Financial Loss by Vulnerability Type",
x = "Vulnerability Type",
y = "Average Loss (Million USD)"
)
## Start Investigating Attacks
attack_summary <- df %>%
group_by(attack_type) %>%
summarize(
avg_loss = mean(financial_loss_in_million),
avg_users = mean(number_of_affected_users),
avg_resolution = mean(incident_resolution_time_in_hours),
count = n()
) %>%
arrange(desc(avg_loss))
attack_summary## # A tibble: 6 × 5
## attack_type avg_loss avg_users avg_resolution count
## <chr> <dbl> <dbl> <dbl> <int>
## 1 DDoS 52.0 499437. 35.7 531
## 2 Man-in-the-Middle 51.3 520064. 36.9 459
## 3 Phishing 50.5 487180. 35.9 529
## 4 SQL Injection 50.0 512470. 36.9 503
## 5 Ransomware 49.7 502825. 36.5 493
## 6 Malware 49.4 508780. 37.1 485
ggplot(attack_summary,
aes(x = reorder(attack_type, avg_loss), y = avg_loss)) +
geom_col(fill = "darkred") +
coord_flip() +
labs(
title = "Average Financial Loss by Attack Type",
x = "Attack Type",
y = "Average Loss (Million USD)"
)## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
ggplot(trend, aes(x = year, y = incidents, color = attack_type)) +
geom_line(size = 1.1) +
labs(
title = "Cyberattack Frequency Over Time",
x = "Year",
y = "Number of Incidents"
)## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.