This is just a practice. We use a data set of 20 variables in 316 prostate cancer patients.
prostate %>%
mutate(aa = factor(aa, levels = c(0,1),
labels = c("White", "African-American"))) %>%
mutate(fam_hx = factor(fam_hx, levels = c(0,1),
labels = c("No Family History", "FHx of Prostate Cancer"))) ->
prostate_factors
prostate %>%
select(age, p_vol, preop_psa, aa, fam_hx) %>%
group_by(aa, fam_hx) %>%
summarize(across(age:preop_psa, mean, na.rm=TRUE), .groups = "drop")
## # A tibble: 4 × 5
## aa fam_hx age p_vol preop_psa
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 0 61.8 56.9 8.06
## 2 0 1 59.5 57.3 7.22
## 3 1 0 60.7 54.3 9.90
## 4 1 1 60.1 51.4 8.71
Within the same race group (White or African-American), patients with family history tend to have cancer at a slightly early age and have less preoperative prostate-specific antigen.
If we compare across race groups, African-American group has less prostate volume and more preoperative prostate-specific antigen, no matter with or without family history.
Our primary judgement needs to be further verified.
ggplot(prostate_factors) +
aes(x = p_vol, y = preop_psa, col = aa) +
geom_point() +
geom_smooth(method = "lm") +
facet_grid(aa ~ fam_hx) +
labs(x = 'Prostate Volume', y = "Preoperative PSA",
title = 'Relationship Between Prostate Volume and Preop PSA,\nSubdivided by Family History and Race') +
theme(legend.position = "bottom")
## `geom_smooth()` using formula = 'y ~ x'
The slopes of the linear regression lines are all close to zero, indicating that the preoperative PSA level maybe more or less independent from prostate volume and may remain similar within a given group. Especially, we notice that the average level of preoperative PSA in African-American group is higher.
Based on our previous summary and plot, we hypothesize that the African-American patients in this data set have, on average, a higher preoperative PSA level. We can test this with Student’s t test.
prostate_factors %>%
t_test(formula = preop_psa ~ aa,
detailed = TRUE)
## # A tibble: 1 × 15
## estimate estimate1 estimate2 .y. group1 group2 n1 n2 statistic p
## * <dbl> <dbl> <dbl> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
## 1 -1.89 7.86 9.75 preop… White Afric… 259 54 -1.96 0.0534
## # ℹ 5 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## # alternative <chr>
However, the test shows that the confidence interval of the difference between two groups is [-3.81,0.03], with a p-value of 0.0534. Therefore, we cannot conclude a significant difference due to the restriction of our sample size.