Part 1 - Introduction

This script examines how age relates to the type of medical intervention received when a person sustains a traumatic brain injury. Understanding the relationship between severity of injury and age can guide age specific prevention strategies and improve treatment outcomes for brain injuries.

The comparison of the brain scans illustrates the contrasting effects of Alzheimer’s disease and traumatic brain injury (TBI), and focuses on how age impacts the severity and recovery from brain injuries.
The comparison of the brain scans illustrates the contrasting effects of Alzheimer’s disease and traumatic brain injury (TBI), and focuses on how age impacts the severity and recovery from brain injuries.

Research Question: How does age correspond to the type of traumatic brain injury sustained and does age affect the severity of the injury? The response variable is the severity of injury regarding the type of hospital care required in terms of the rate of ER visits, hospitalizations and deaths. Severity can vary depending on the age group, with younger and older populations potentially experiencing more severe injuries from the same types of accidents The independent variable is age, as we hypothesized that age affects the severity of the injury, even if it is from the same mechanism.

Part 2 - Exploring the Data (Descriptive Statistics)

The data was grouped by age group and then the mean, standard deviation, median and standard error of the actual number of people sampled was taken. This was to ensure that the data had enough numerical values to perform these tests and to allow us to visualize the type of data that would be analyzed statistically.

new_tbi_age %>% group_by(age_group) %>% summarise(count = n(), mean = mean(number_est), std = sd(number_est), median = median(number_est), SE = sd(number_est)/sqrt(length(number_est)))
## # A tibble: 11 × 6
##    age_group count    mean     std median     SE
##    <chr>     <int>   <dbl>   <dbl>  <dbl>  <dbl>
##  1 0-17         18  46484. 103135.  2865  24309.
##  2 0-4          18  18411.  54550.   828. 12858.
##  3 15-24        21  22698.  37196.  2419   8117.
##  4 25-34        21  15205.  24059.  2600   5250.
##  5 35-44        21  11451.  18492.  2255   4035.
##  6 45-54        21  12639.  22164.  3140   4837.
##  7 5-14         18  19384.  40342.  1208.  9509.
##  8 55-64        21  11501.  24441.  3165   5333.
##  9 65-74        20  10552.  26601.  2284   5948.
## 10 75+          20  22119.  64226.  2381  14361.
## 11 Total        21 137026. 274604. 18485  59924.

A bar graph of Type of hospital visit vs injury mechanism, sorted by age group was created to visualize the relationship between those two potential explanatory variables work with each other. This data seemed to have two humps and it was determined that this data would be better looked at once at a time. Therefore, a bar graph of age group vs type of hospital stay was made to see the relationship between the severity of the injury and the age groups were most affected. From this graph we can see that people over the age of 75 seemed to have a higher chance of facing all three types of stays. Chat GPT was used to help style the graphs so they displayed all the information we wanted to show on the bar graphs.

ggplot(new_tbi_age, aes(x = age_group, y = rate_est, fill = injury_mechanism)) +
  geom_bar(stat = "identity") +
  labs(title = "Injuries by Age Group and Type",
       x = "Age Group", y = "Rate of Injuries/100,000 people") +
  theme_minimal()+ 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplot(new_tbi_age, aes(x = age_group, y = rate_est, fill = age_group)) +
  geom_bar(stat = "identity") +
  facet_wrap(~type) +
  labs(title = "Injuries by Age Group",
       x = "Age Group", y = "Rate of Injuries/100,000 people") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Part 3 - Statistical Test (Inferential Statistics)

The data was split into 3 new data sets, one for death rate by age group, the other for ER visit rate by age group and the third for hospitalization rate by age group. This was done so that each variable could be tested independently for statistical significance.

death_data <- new_tbi_age %>% 
  filter(type == "Deaths")
ER_data <- new_tbi_age %>% 
  filter(type == "Emergency Department Visit")
Hosp_data <- new_tbi_age %>% 
  filter(type == "Hospitalizations")

Shapiro-Wilks Test was used to test for normality. We wanted to assess whether the age distribution within each outcome, hospitalization, ER or death, was normally distributed. The data indicated that for death rates (W= 0.369, p = 1.928e-15), ER visits (W=0.5146, p = 2.484e-13), and hospitalization rates (W= 0.2714, p < 2.2e-16), all deviated from a normal distribution.

# Test for normality and equal variance
shapiro.test(death_data$rate_est)
## 
##  Shapiro-Wilk normality test
## 
## data:  death_data$rate_est
## W = 0.37272, p-value = 3.551e-16
shapiro.test(ER_data$rate_est)
## 
##  Shapiro-Wilk normality test
## 
## data:  ER_data$rate_est
## W = 0.52016, p-value = 5.68e-14
shapiro.test(Hosp_data$rate_est)
## 
##  Shapiro-Wilk normality test
## 
## data:  Hosp_data$rate_est
## W = 0.27599, p-value < 2.2e-16

Levene’s Test was used to test for equal variance among the outcome groups. These test indicated that the assumption of equal variance was met for the three different types of outcomes (hospitalization, ER visit or death), as p-value were all greater than 0.05 (hospital: p=0.305, ER: p= 0.672, death: p=0.256)

leveneTest(data = death_data, rate_est ~ age_group)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group 10  1.2909 0.2551
##       63
leveneTest(data = ER_data, rate_est ~ age_group)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group 10  0.7288 0.6945
##       61
leveneTest(data = Hosp_data, rate_est ~ age_group)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group 10  1.1969 0.3104
##       63

QQplot with QQ line of the residuals for each type of stay was used to visualize whether or not age distribution against the possible outcome (type) would be normally distributed. However none of them showed normal data. A quick histogram of the residuals was also performed which showed highly right skewed data. Furthermore, a Shapiro Wilks test was done on each of the residuals, and it also came back as not normal in terms of the p values, ER = (W = 0.6917, p-value = 2.14e-10), death = (W = 0.52892, p-value = 2.469e-13), hospital = (W = 0.44345, p-value = 1.616e-14).

is.factor(ER_data$age_group)
## [1] FALSE
ER_data$age_group <- as.factor(ER_data$age_group)
age_ER_model <-lm(rate_est ~ age_group, data=ER_data)
age_ER_model
## 
## Call:
## lm(formula = rate_est ~ age_group, data = ER_data)
## 
## Coefficients:
##    (Intercept)    age_group0-4  age_group15-24  age_group25-34  age_group35-44  
##         183.87           85.88          -39.57          -89.37         -108.28  
## age_group45-54   age_group5-14  age_group55-64  age_group65-74    age_group75+  
##        -109.02          -46.27         -113.21          -78.93           96.43  
## age_groupTotal  
##         -70.30
qqnorm(resid(age_ER_model), main= "QQ PLot of Residuals: ER Visist") 
qqline(resid(age_ER_model), col="red")

hist(resid(age_ER_model), col= "#f3b5d0", main= "Residuals of ER Visits")

shapiro.test(resid(age_ER_model))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(age_ER_model)
## W = 0.6941, p-value = 5.84e-11
is.factor(death_data$age_group)
## [1] FALSE
death_data$age_group <- as.factor(death_data$age_group)
age_death_model <- lm(rate_est ~ age_group, data = death_data)
age_death_model
## 
## Call:
## lm(formula = rate_est ~ age_group, data = death_data)
## 
## Coefficients:
##    (Intercept)    age_group0-4  age_group15-24  age_group25-34  age_group35-44  
##         0.4667          0.1333          1.5762          1.6333          1.5048  
## age_group45-54   age_group5-14  age_group55-64  age_group65-74    age_group75+  
##         1.9476         -0.2333          2.2762          3.0476         10.7333  
## age_groupTotal  
##         2.0619
qqnorm(resid(age_death_model), main= "QQ PLot of Residuals: Death")
qqline(resid(age_death_model), col="red")

hist(resid(age_death_model), col= "#78e3f6", main= "Residuals of Death")

shapiro.test(resid(age_death_model))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(age_death_model)
## W = 0.53006, p-value = 5.082e-14
is.factor(Hosp_data$age_group)
## [1] FALSE
Hosp_data$age_group <- as.factor(Hosp_data$age_group)
age_Hosp_model <- lm(rate_est ~ age_group, data = Hosp_data)
age_Hosp_model
## 
## Call:
## lm(formula = rate_est ~ age_group, data = Hosp_data)
## 
## Coefficients:
##    (Intercept)    age_group0-4  age_group15-24  age_group25-34  age_group35-44  
##          5.217           2.283           3.369           3.155           2.169  
## age_group45-54   age_group5-14  age_group55-64  age_group65-74    age_group75+  
##          4.912          -1.883           7.569          15.569          62.026  
## age_groupTotal  
##          7.698
qqnorm(resid(age_Hosp_model), main= "QQ PLot of Residuals: Hospitalization")
qqline(resid(age_Hosp_model), col="red")

hist(resid(age_Hosp_model), col= "#87efc2", main= "Residuals of Hospitalization")

shapiro.test(resid(age_Hosp_model))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(age_Hosp_model)
## W = 0.44676, p-value = 3.227e-15

Kruskal-Wallis test was done instead of ANOVA to compare age across the three types because it does not assume normality or equal variance. For death rates, the data showed a significant difference across age groups (p = 0.01205), whereas the tests for ER visits (p = 0.6401) and hospitalizations (p = 0.8894) was not significant, suggesting there is no strong relationship between age group and these two injury outcomes.

Kruskal-Wallis Test (death data) Null: The distribution of death rates is the same across all age groups. Alternative:At least one of the age groups will have a different distribution of death rates compared to the others within the study (Kruskal-Wallis chi-squared(10) = 22.115, p-value = 0.01453).

(ER data) Null: The distribution of ER visit rates is normal across all age groups Alternative: The data will contain at least one age group in which will have a different distribution (Kruskal-Wallis chi-squared(10) = 7.347, p-value = 0.6923).

(Hospitalization) Null: The distribution of hospitalization rates will be the same across the variety of age groups used within the testing Alternative: There will be at least one age groups that has a different distribution (Kruskal-Wallis chi-squared(10) = 4.4734, p-value = 0.9235).

There is no evidence from this testing that ER or hospitalization rates significantly differ across age groups.

# Kruskal Wallis 
kruskal.test(rate_est ~ age_group, data = death_data)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  rate_est by age_group
## Kruskal-Wallis chi-squared = 22.115, df = 10, p-value = 0.01453
kruskal.test(rate_est ~ age_group, data = ER_data)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  rate_est by age_group
## Kruskal-Wallis chi-squared = 7.347, df = 10, p-value = 0.6923
kruskal.test(rate_est ~ age_group, data = Hosp_data)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  rate_est by age_group
## Kruskal-Wallis chi-squared = 4.4734, df = 10, p-value = 0.9235

Dunn Test was used to identify which specific pairs of age groups had significant differences within the death data as it was the only one to show significance during the Kruskal Wallis test. Although the Kruskal-Wallis test indicated a significant difference in death rates among age groups, the Dunn post-hoc tests with Bonferroni correction found no statistically significant pairwise differences between specific age groups (all adjusted p-values > 0.05). Due to the small sample size, this indicates that the sample size is too small to make significant pairwise comparisons. Stack Overflow was used to determine which post hoc test to perform after the Kruskal Wallis

death_data %>%
  dunn_test(rate_est~age_group, p.adjust.method = "bonferroni")
## # A tibble: 55 × 9
##    .y.      group1 group2    n1    n2 statistic       p p.adj p.adj.signif
##  * <chr>    <chr>  <chr>  <int> <int>     <dbl>   <dbl> <dbl> <chr>       
##  1 rate_est 0-17   0-4        6     6     0.269 0.788   1     ns          
##  2 rate_est 0-17   15-24      6     7     1.21  0.225   1     ns          
##  3 rate_est 0-17   25-34      6     7     1.44  0.151   1     ns          
##  4 rate_est 0-17   35-44      6     7     1.57  0.117   1     ns          
##  5 rate_est 0-17   45-54      6     7     1.85  0.0638  1     ns          
##  6 rate_est 0-17   5-14       6     6    -0.578 0.563   1     ns          
##  7 rate_est 0-17   55-64      6     7     2.12  0.0343  1     ns          
##  8 rate_est 0-17   65-74      6     7     2.33  0.0200  1     ns          
##  9 rate_est 0-17   75+        6     7     2.83  0.00459 0.253 ns          
## 10 rate_est 0-17   Total      6     7     1.97  0.0485  1     ns          
## # ℹ 45 more rows

The log transformation was attempted, but did not make the data normal. A variety of other transformations were conducted but none proved any normality of the comparisons and thus were omitted from this script.

death_data$log_rate <- log(death_data$rate_est)
shapiro.test(death_data$log_rate)
## 
##  Shapiro-Wilk normality test
## 
## data:  death_data$log_rate
## W = NaN, p-value = NA

Part 4 - Conclusion

We have no definitive answer for the relationship between age groups and injury mechanisms. Although we saw some relation between mortality and older age groups, the results are not statistically significant enough to confirm. The comparison between age and rate of injury and mechanism shows no conclusive relationship which was determined through the statistical testing which primarily resulted in a failure to reject the null hypothesis. While the Kruskal Wallis shows significance in older individuals dying from traumatic brain injuries, once it comes to pairwise comparisons it shows no significance. This is because the sample size is so small that there is no room for pairwise comparison. Therefore, if there was a bigger sample taken there would be room for these comparisons and we would see a difference in age versus the severity of brain injury. This analysis is important because being able to understand the potential trends between age, injury types, and severity has the potential to help inform public health strategies, improve medical responses, and tailor prevention efforts for different age groups, even if no clear relationship was found. The potential limitations within this study could include a lack of individual-level data, possible missing or incomplete information for certain age groups or injury types (e.g., NA values). The study also did not include all relevant variables (like pre-existing conditions, access to care, or injury severity scores) were included, which could hide true relationships among the categories. The statistical tests lack power since the sample sizes in certain categories were too small and all contained equal sample sizes.

References

[1] Bigler, E. D. (2013). Traumatic brain injury, neuroimaging, and neurodegeneration. Frontiers in Human Neuroscience, 7. https://doi.org/10.3389/fnhum.2013.00395

[2] OpenAI. (2023). ChatGPT (Mar 14 version) [Large language model]. https://chat.openai.com/chat

[3] Contributer, G. (2023, May 8). A Post-hoc Test for Kruskal-Wallis. The Analysis Factor. https://www.theanalysisfactor.com/dunns-test-post-hoc-test-after-kruskal-wallis/