Tasting

Analysis of Prostate Data

Headline about Prostate data

This is data about prostates. Cool stuff. There 316 rows and 20 columns.

prostate %>% 
  mutate(aa = factor(aa, levels = c(0,1), 
                     labels = c("White", "African-American"))) %>% 
  mutate(fam_hx = factor(fam_hx, levels = c(0,1), 
      labels = c("No Family History", "FHx of Prostate Cancer"))) ->
prostate_factors

prostate %>% 
  select(age, p_vol, preop_psa, aa, fam_hx) %>% 
  group_by(aa, fam_hx) %>% 
  summarize(across(age:preop_psa, ~ mean(.x, na.rm=TRUE)))

## `summarise()` has regrouped the output.
## ℹ Summaries were computed grouped by aa and fam_hx.
## ℹ Output is grouped by aa.
## ℹ Use `summarise(.groups = "drop_last")` to silence this message.
## ℹ Use `summarise(.by = c(aa, fam_hx))` for per-operation grouping
##   (`?dplyr::dplyr_by`) instead.

## # A tibble: 4 × 5
## # Groups:   aa [2]
##      aa fam_hx   age p_vol preop_psa
##   <dbl>  <dbl> <dbl> <dbl>     <dbl>
## 1     0      0  61.8  56.9      8.06
## 2     0      1  59.5  57.3      7.22
## 3     1      0  60.7  54.3      9.90
## 4     1      1  60.1  51.4      8.71

Interpretation of Results

Mean age is similar across all four groups (~59–62 years). Prostate volume (p_vol) is also fairly similar (~51–57) Pre-op PSA (preop_psa) is notably higher in the AA + no family history group (9.90) compared to others (~7–9)

Including Plots

You can also embed plots, for example:

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 11 rows containing non-finite outside the scale range
## (`stat_smooth()`).

## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_point()`).

T-Test

prostate_factors %>% 
  t_test(formula = preop_psa ~ aa,
         detailed = TRUE)

## # A tibble: 1 × 15
##   estimate estimate1 estimate2 .y.    group1 group2    n1    n2 statistic      p
## *    <dbl>     <dbl>     <dbl> <chr>  <chr>  <chr>  <int> <int>     <dbl>  <dbl>
## 1    -1.89      7.86      9.75 preop… White  Afric…   259    54     -1.96 0.0534
## # ℹ 5 more variables: df <dbl>, conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>

Interpetation of T-Test

There’s a trend toward AA patients having higher PSA (9.75 vs 7.86), but it just misses statistical significance (p = 0.053). This is a classic situation in clinical research — the difference may be real but the study is likely underpowered to detect it, since there are only 54 AA patients vs 259 White patients. With a larger AA sample, this might reach significance. Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.