Project 1

Quarto

Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.

Running Code

When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:

1 + 1

[1] 2

You can add options to executable code like this

[1] 4

The echo: false option disables the printing of code (only output is displayed).

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

missile <- read_csv("missile_attacks_daily.csv")

Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)

Rows: 3495 Columns: 22
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (11): model, launch_place, target, target_main, border_crossing, carrie...
dbl   (9): launched, destroyed, not_reach_goal, still_attacking, is_shahed, ...
dttm  (2): time_start, time_end

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#The dataset contains information about launched and shot down missiles and drones during russian massive missile and drone (UAV) strikes on infrastructure (since October 2022) as part of its invasion of Ukraine. 
#The dataset consists of both quantitative variables, such as the number of missiles launched, destroyed, or still attacking, and categorical variables, such as weapon model, launch place, target region, and carrier type
#The dataset was created manually based on the official reports of Air Force Command of UA Armed Forces and General Staff of the Armed Forces of Ukraine published on social media such as Facebook or Telegram.

model <- lm(destroyed ~ launched, data = missile)
summary(model)


Call:
lm(formula = destroyed ~ launched, data = missile)

Residuals:
     Min       1Q   Median       3Q      Max 
-249.240   -0.045    1.206    1.954  140.238 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.956356   0.264410  -3.617 0.000302 ***
launched     0.750269   0.003658 205.084  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 14.55 on 3484 degrees of freedom
  (9 observations deleted due to missingness)
Multiple R-squared:  0.9235,    Adjusted R-squared:  0.9235 
F-statistic: 4.206e+04 on 1 and 3484 DF,  p-value: < 2.2e-16

library(tidyverse)

top_models <- missile %>%
  filter(!is.na(model)) %>%
  count(model, sort = TRUE) %>%
  slice_head(n = 5)

ggplot(top_models, aes(x = reorder(model, n), y = n, fill = model)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Top 5 Weapon Models in the Dataset",
    x = "Weapon Model",
    y = "Number of Records",
    fill = "Weapon Model",
    caption = "Source: Public reports from the Armed Forces of Ukraine and official Ukrainian government communications."
  ) +
  scale_fill_brewer(palette = "Set2") +
  theme_classic()

#Before conducting the analysis, the dataset was cleaned to improve accuracy. First, missing values (NA) were identified across several variables, particularly in categorical fields such as model and target_main. Since target_main contained a large number of missing values, it was excluded from the analysis to avoid bias and incomplete results. For the visualization, rows with missing values in the model variable were removed using filter(!is.na(model)).

#Next, the data was transformed using dplyr functions. Specifically, the dataset was grouped by weapon model and aggregated using the count() function to calculate the number of occurrences for each model. The results were then sorted in descending order and limited to the top five most frequent models using slice_head(n = 5). This allowed for a clearer and more focused comparison between the most common weapon types.

#The visualization represents the top 5 most frequently recorded weapon models in the dataset. A horizontal bar chart was created using ggplot2, with weapon models on the y-axis and the number of records on the x-axis. Different colors were used to distinguish categories, and a custom color palette (Set2) was applied instead of default ggplot colors. The theme_classic() theme was selected to improve readability and presentation.

#From the visualization, a clear pattern emerges: the Shahed-136/131 drone appears significantly more frequently than other weapon types, indicating its dominant use in the dataset.