Rows: 451 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): group_type, outcome_30, outcome_365
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 4 × 3
# Groups: group_type [2]
group_type outcome_365 n
<chr> <chr> <int>
1 control no event 199
2 control stroke 28
3 treatment no event 179
4 treatment stroke 45
Calculate proportions of stroke victims between 2 groups
prop1 <- stent30_365 %>%# Group by "group"group_by(group_type) %>%count(outcome_365) %>%# Create new variable, prop, using mutatemutate(prop = n/sum(n)) %>%# Filter for admittedfilter(outcome_365 =="stroke")prop1
# A tibble: 2 × 4
# Groups: group_type [2]
group_type outcome_365 n prop
<chr> <chr> <int> <dbl>
1 control stroke 28 0.123
2 treatment stroke 45 0.201
The result is after one year, 20% had a stroke in the treatment group, and 20% had a stroke in the control group.
These two summary statistics are useful in looking for differences in the groups, and we are in for a surprise: an additional 8% of patients in the treatment group had a stroke! This is important for two reasons. First, it is contrary to what doctors expected, which was that stents would reduce the rate of strokes. Second, it leads to a statistical question: do the data show a “real” difference between the groups?
We will answer this question later in the course.
Now explore loan50 dataset - load it first
library(openintro)
Loading required package: airports
Loading required package: cherryblossom
Loading required package: usdata
data("loan50")
View the dataset
head(loan50)
# A tibble: 6 × 18
state emp_length term homeownership annual_income verified_income
<fct> <dbl> <dbl> <fct> <dbl> <fct>
1 NJ 3 60 rent 59000 Not Verified
2 CA 10 36 rent 60000 Not Verified
3 SC NA 36 mortgage 75000 Verified
4 CA 0 36 rent 75000 Not Verified
5 OH 4 60 mortgage 254000 Not Verified
6 IN 6 36 mortgage 67000 Source Verified
# ℹ 12 more variables: debt_to_income <dbl>, total_credit_limit <int>,
# total_credit_utilized <int>, num_cc_carrying_balance <int>,
# loan_purpose <fct>, loan_amount <int>, grade <fct>, interest_rate <dbl>,
# public_record_bankrupt <int>, loan_status <fct>, has_second_income <lgl>,
# total_income <dbl>