Week 7: Apply it to your data 6

Import Data

military <- read_excel("../00_data/my_data.xlsx")

Introduction

In this analysis, I am exploring a dataset regarding military personnel and their TBI diagnoses. The data includes the branch of service, currrent status (active, reserve, guard) severity of TBI, number of diagnoses, and year.

Questions

This analysis is guided by the following questions:

What are the most typical values for key variables?

Are there any unusual or extreme values in the data?

Are there missing values, and how might they affect the analysis?

How do different variables relate to each other?

Variation

Visualizing distributions

ggplot(data = military) +
    geom_bar(mapping = aes(x = service))

military %>% count(service)

## # A tibble: 4 × 2
##   service       n
##   <chr>     <int>
## 1 Air Force   135
## 2 Army        135
## 3 Marines      90
## 4 Navy         90

Typical values

military %>%
  summarize(
    mean_year = mean(year, na.rm = TRUE),
    median_year = median(year, na.rm = TRUE)
  )

## # A tibble: 1 × 2
##   mean_year median_year
##       <dbl>       <dbl>
## 1      2010        2010

Unusual values

ggplot(military) +
  geom_boxplot(aes(y = year))

Missing Values

military %>%
  filter(is.na(diagnosed))

## # A tibble: 0 × 5
## # ℹ 5 variables: service <chr>, component <chr>, severity <chr>,
## #   diagnosed <chr>, year <dbl>

Covariation

A categorical and continuous variable

ggplot(military) +
    geom_boxplot(aes(x = severity, y = year))

Two categorical variables

ggplot(military) +
  geom_bar(aes(x = service, fill = severity))

Two continous variables

I cannot do as the dataset only has one numeric variable

Patterns and models

ggplot(military) +
  geom_boxplot(aes(x = service, y = year))