Analysis of Prostate Data

Hmm, just some lines from an amateur student who hasn’t escaped from the wordy writing style… According to the data, most patients had prostatectomy at the age around 61 (median = 61.85), with the youngest patient ageing 38.4. They mostly took the procedures when their tumor entered the first stage. There are more than a quarter of cases detecting the invasion of the tumor into the fibrous capsule of the prostate. Some patients (less than 25%) were found the tumor had metastasized into the lymph nodes.

prostate %>%
  mutate(aa=factor(aa,levels = c(0,1), labels=c("White","African-American"))) %>%
  mutate(fam_hx=factor(fam_hx,levels = c(0,1),labels = c("No Family History","FHx of Prostate Cancer"))) ->
prostate_factors
prostate %>%
  select(age, p_vol, preop_psa, aa, fam_hx) %>%
  group_by(aa, fam_hx) %>%
  summarise(across(age:preop_psa, ~ mean(.x, na.rm=TRUE)))
## `summarise()` has regrouped the output.
## ℹ Summaries were computed grouped by aa and fam_hx.
## ℹ Output is grouped by aa.
## ℹ Use `summarise(.groups = "drop_last")` to silence this message.
## ℹ Use `summarise(.by = c(aa, fam_hx))` for per-operation grouping
##   (`?dplyr::dplyr_by`) instead.
## # A tibble: 4 × 5
## # Groups:   aa [2]
##      aa fam_hx   age p_vol preop_psa
##   <dbl>  <dbl> <dbl> <dbl>     <dbl>
## 1     0      0  61.8  56.9      8.06
## 2     0      1  59.5  57.3      7.22
## 3     1      0  60.7  54.3      9.90
## 4     1      1  60.1  51.4      8.71

With African American groups’ higher preoperative PSA means (9.9 and 8.71 compared with 8.06 and 7.22) and lower prostate volume, we hypothesize that these groups may experience a delayed diagnosis of prostate cancer due to a lower occurrence of obstructive urinary symptoms. This allows for undetected tumor progression, leading to elevated PSA levels.

Including Plots

You can also embed plots, for example:

## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 11 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_point()`).

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Statistical Testing

prostate_factors %>% 
  t_test(formula = preop_psa ~ aa,
         detailed = TRUE)
## # A tibble: 1 × 15
##   estimate estimate1 estimate2 .y.    group1 group2    n1    n2 statistic      p
## *    <dbl>     <dbl>     <dbl> <chr>  <chr>  <chr>  <int> <int>     <dbl>  <dbl>
## 1    -1.89      7.86      9.75 preop… White  Afric…   259    54     -1.96 0.0534
## # ℹ 5 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>