military <- read_excel("../00_data/my_data.xlsx")
In this analysis, I am exploring a dataset regarding military personnel and their TBI diagnoses. The data includes the branch of service, currrent status (active, reserve, guard) severity of TBI, number of diagnoses, and year.
This analysis is guided by the following questions:
What are the most typical values for key variables?
Are there any unusual or extreme values in the data?
Are there missing values, and how might they affect the analysis?
How do different variables relate to each other?
ggplot(data = military) +
geom_bar(mapping = aes(x = service))
military %>% count(service)
## # A tibble: 4 × 2
## service n
## <chr> <int>
## 1 Air Force 135
## 2 Army 135
## 3 Marines 90
## 4 Navy 90
military %>%
summarize(
mean_year = mean(year, na.rm = TRUE),
median_year = median(year, na.rm = TRUE)
)
## # A tibble: 1 × 2
## mean_year median_year
## <dbl> <dbl>
## 1 2010 2010
ggplot(military) +
geom_boxplot(aes(y = year))
military %>%
filter(is.na(diagnosed))
## # A tibble: 0 × 5
## # ℹ 5 variables: service <chr>, component <chr>, severity <chr>,
## # diagnosed <chr>, year <dbl>
ggplot(military) +
geom_boxplot(aes(x = severity, y = year))
ggplot(military) +
geom_bar(aes(x = service, fill = severity))
I cannot do as the dataset only has one numeric variable
ggplot(military) +
geom_boxplot(aes(x = service, y = year))