stent 30-365

Load the library

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Set the working directory and Load the datasets

setwd("C:/Users/bkslu/Downloads/MATH 217")
stent30_365 <- read_csv("stent30_365.csv")
Rows: 451 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): group_type, outcome_30, outcome_365

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Scan the 1st 6 rows of the dataset

head(stent30_365)
# A tibble: 6 × 3
  group_type outcome_30 outcome_365
  <chr>      <chr>      <chr>      
1 treatment  stroke     stroke     
2 treatment  stroke     stroke     
3 treatment  stroke     stroke     
4 treatment  stroke     stroke     
5 treatment  stroke     stroke     
6 treatment  stroke     stroke     

Create a table of counts of strokes or no stroke for 365 days between 2 groups

t_365 <- table(stent30_365$group_type, stent30_365$outcome_365)
t_365
           
            no event stroke
  control        199     28
  treatment      179     45

Count number of stroke victims by control or treatment group

This is similar to the prior exercise, though not in table form. Notice a new variable, n, is created to show the counts.

stent30_365 %>%
  count(group_type, outcome_365)
# A tibble: 4 × 3
  group_type outcome_365     n
  <chr>      <chr>       <int>
1 control    no event      199
2 control    stroke         28
3 treatment  no event      179
4 treatment  stroke         45

Now use “group_by” to do the same thing

stent30_365 %>% 
  group_by(group_type) %>% 
  count(outcome_365)
# A tibble: 4 × 3
# Groups:   group_type [2]
  group_type outcome_365     n
  <chr>      <chr>       <int>
1 control    no event      199
2 control    stroke         28
3 treatment  no event      179
4 treatment  stroke         45

Calculate proportions of stroke victims between 2 groups

prop1 <- stent30_365  %>%
  # Group by "group"
  group_by(group_type) %>%
  count(outcome_365) %>%
  # Create new variable, prop, using mutate
  mutate(prop = n/sum(n)) %>%
  # Filter for admitted
  filter(outcome_365 == "stroke")
prop1 
# A tibble: 2 × 4
# Groups:   group_type [2]
  group_type outcome_365     n  prop
  <chr>      <chr>       <int> <dbl>
1 control    stroke         28 0.123
2 treatment  stroke         45 0.201

The result is after one year, 20% had a stroke in the treatment group, and 20% had a stroke in the control group.

These two summary statistics are useful in looking for differences in the groups, and we are in for a surprise: an additional 8% of patients in the treatment group had a stroke! This is important for two reasons. First, it is contrary to what doctors expected, which was that stents would reduce the rate of strokes. Second, it leads to a statistical question: do the data show a “real” difference between the groups?

We will answer this question later in the course.

Now explore loan50 dataset - load it first

library(openintro)
Loading required package: airports
Loading required package: cherryblossom
Loading required package: usdata
data("loan50")

View the dataset

head(loan50)
# A tibble: 6 × 18
  state emp_length  term homeownership annual_income verified_income
  <fct>      <dbl> <dbl> <fct>                 <dbl> <fct>          
1 NJ             3    60 rent                  59000 Not Verified   
2 CA            10    36 rent                  60000 Not Verified   
3 SC            NA    36 mortgage              75000 Verified       
4 CA             0    36 rent                  75000 Not Verified   
5 OH             4    60 mortgage             254000 Not Verified   
6 IN             6    36 mortgage              67000 Source Verified
# ℹ 12 more variables: debt_to_income <dbl>, total_credit_limit <int>,
#   total_credit_utilized <int>, num_cc_carrying_balance <int>,
#   loan_purpose <fct>, loan_amount <int>, grade <fct>, interest_rate <dbl>,
#   public_record_bankrupt <int>, loan_status <fct>, has_second_income <lgl>,
#   total_income <dbl>