Analysis of Prostate Cancer

This is just a practice. We use a data set of 20 variables in 316 prostate cancer patients.

A Glimpse of Prostate Data

prostate %>% 
  select(age, p_vol, preop_psa, aa, fam_hx) %>% 
  group_by(aa, fam_hx) %>% 
  summarize(across(age:preop_psa, mean, na.rm=TRUE), .groups = "drop")

## # A tibble: 4 × 5
##      aa fam_hx   age p_vol preop_psa
##   <dbl>  <dbl> <dbl> <dbl>     <dbl>
## 1     0      0  61.8  56.9      8.06
## 2     0      1  59.5  57.3      7.22
## 3     1      0  60.7  54.3      9.90
## 4     1      1  60.1  51.4      8.71

Within the same race group (White or African-American), patients with family history tend to have cancer at a slightly early age and have less preoperative prostate-specific antigen.

If we compare across race groups, African-American group has less prostate volume and more preoperative prostate-specific antigen, no matter with or without family history.

Our primary judgement needs to be further verified.

Visulalization and Statistical Testing

ggplot(prostate_factors) + 
  aes(x = p_vol, y = preop_psa, col = aa) + 
  geom_point() +
  geom_smooth(method = "lm") +
  facet_grid(aa ~ fam_hx) +
  labs(x = 'Prostate Volume', y = "Preoperative PSA",
       title = 'Relationship Between Prostate Volume and Preop PSA,\nSubdivided by Family History and Race') +
  theme(legend.position = "bottom")

## `geom_smooth()` using formula = 'y ~ x'

The slopes of the linear regression lines are all close to zero, indicating that the preoperative PSA level maybe more or less independent from prostate volume and may remain similar within a given group. Especially, we notice that the average level of preoperative PSA in African-American group is higher.

Based on our previous summary and plot, we hypothesize that the African-American patients in this data set have, on average, a higher preoperative PSA level. We can test this with Student’s t test.

prostate_factors %>% 
  t_test(formula = preop_psa ~ aa,
         detailed = TRUE)

## # A tibble: 1 × 15
##   estimate estimate1 estimate2 .y.    group1 group2    n1    n2 statistic      p
## *    <dbl>     <dbl>     <dbl> <chr>  <chr>  <chr>  <int> <int>     <dbl>  <dbl>
## 1    -1.89      7.86      9.75 preop… White  Afric…   259    54     -1.96 0.0534
## # ℹ 5 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>

However, the test shows that the confidence interval of the difference between two groups is [-3.81,0.03], with a p-value of 0.0534. Therefore, we cannot conclude a significant difference due to the restriction of our sample size.

Analysis of Prostate Cancer

Yutong An

2025-02-05

A Glimpse of Prostate Data

Visulalization and Statistical Testing