Analysis of Prostate Data

This is data about prostate stuff. It includes lots of categroies like age, family hx, prostate volume, t-stage and more.

On summary, the mean age was 61, and other suff blah blah blah.

prostate %>% 
  mutate(aa = factor(aa, levels = c(0,1), 
                     labels = c("White", "African-American"))) %>% 
  mutate(fam_hx = factor(fam_hx, levels = c(0,1), 
      labels = c("No Family History", "FHx of Prostate Cancer"))) ->
prostate_factors
prostate %>% 
  select(age, p_vol, preop_psa, aa, fam_hx) %>% 
  group_by(aa, fam_hx) %>% 
  summarize(across(age:preop_psa, mean, na.rm=TRUE))
## `summarise()` has grouped output by 'aa'. You can override using the `.groups`
## argument.
## # A tibble: 4 × 5
## # Groups:   aa [2]
##      aa fam_hx   age p_vol preop_psa
##   <dbl>  <dbl> <dbl> <dbl>     <dbl>
## 1     0      0  61.8  56.9      8.06
## 2     0      1  59.5  57.3      7.22
## 3     1      0  60.7  54.3      9.90
## 4     1      1  60.1  51.4      8.71

analysis of summary

This is an analysis of the summary of the results. The results suggest that those who have a positive family history of prostate cancer and are caucasian have the highest prostate volume. It is important to note that this group are also on average the youngest, with the lowest pre-op psa. This could be statistically significant so I would want to look at sample size and run a stastistical test

Including Plots

You can also embed plots, for example:

ggplot(prostate_factors) + 
  aes(x = p_vol, y = preop_psa, col = aa) + 
  geom_point() +
  geom_smooth(method = "lm") +
  facet_grid(aa ~ fam_hx) +
  labs(x = 'Prostate Volume', y = "Preoperative PSA",
       title = 'Relationship Between Prostate Volume and Preop PSA,\nSubdivided by Family History and Race') +
  theme(legend.position = "bottom")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 11 rows containing non-finite values (stat_smooth).
## Warning: Removed 11 rows containing missing values (geom_point).

In here I would write about the graph but idc.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

##Stastistical Testing: T-test

prostate_factors %>%
  t_test(formula = preop_psa ~ aa, detailed = TRUE)
## # A tibble: 1 × 15
##   estimate estima…¹ estim…² .y.   group1 group2    n1    n2 stati…³      p    df
## *    <dbl>    <dbl>   <dbl> <chr> <chr>  <chr>  <int> <int>   <dbl>  <dbl> <dbl>
## 1    -1.89     7.86    9.75 preo… White  Afric…   261    55   -1.96 0.0534  71.7
## # … with 4 more variables: conf.low <dbl>, conf.high <dbl>, method <chr>,
## #   alternative <chr>, and abbreviated variable names ¹​estimate1, ²​estimate2,
## #   ³​statistic

Here I would write about the results of the t-test.