0 — 4792 1-2 — 925 3-5 — 493 6-9 — 311 10-19 — 373 20-29 — 298 30 — 827 did not drive — 4646 NA — 918
## # A tibble: 9 × 2
## text_while_driving_30d n
## <chr> <int>
## 1 0 4792
## 2 1-2 925
## 3 10-19 373
## 4 20-29 298
## 5 3-5 493
## 6 30 827
## 7 6-9 311
## 8 did not drive 4646
## 9 <NA> 918
ggplot(yrbss, aes(x=text_while_driving_30d)) +
geom_bar(fill='red') + labs(x='Team') +
labs(title = "Texting While Driving the Past 30 Days")0.07214901 is the proportion of people who texted while driving all 30 days and never wore helmets.
allTextingAndNoHelmet <- nrow(yrbss[(yrbss$helmet_12m == 'never') & (yrbss$text_while_driving_30d == '30'), ])
totalPeople <- nrow(yrbss)
prop <- allTextingAndNoHelmet / totalPeople
prop## [1] 0.07214901
0.004440036 is the margin of error for the yrbss survey’s non-helmet wearers that texted while driving the past 30 days.
## [1] 0.004440036
Confidence Interval for less than 1 hour of TV per school day: (0.2951302, 0.3102341) Confidence Interval for American Indian or Alaska Native: (0.0267211, 0.03349415)
set.seed(12345)
yrbss <-
yrbss |>
mutate(lowTV = ifelse((hours_tv_per_school_day == "<1") | (hours_tv_per_school_day == "do not watch") , "yes", "no"))
yrbss |>
filter(!is.na(lowTV)) |>
specify(response = lowTV, success = "yes") |>
generate(reps = 1000, type = "bootstrap") |>
calculate(stat = "prop") |>
get_ci(level = 0.95)## # A tibble: 1 × 2
## lower_ci upper_ci
## <dbl> <dbl>
## 1 0.295 0.310
yrbss <-
yrbss |>
mutate(amerIndianAlaskaNative = ifelse(race == "American Indian or Alaska Native" , "yes", "no"))
yrbss |>
filter(!is.na(amerIndianAlaskaNative)) |>
specify(response = amerIndianAlaskaNative, success = "yes") |>
generate(reps = 1000, type = "bootstrap") |>
calculate(stat = "prop") |>
get_ci(level = 0.95)## # A tibble: 1 × 2
## lower_ci upper_ci
## <dbl> <dbl>
## 1 0.0267 0.0335
p and me have a parabolic distribution. The margin of error is highest when p has a value of 0.50.
n <- 1000
p <- seq(from = 0, to = 1, by = 0.01)
me <- 2 * sqrt(p * (1 - p)/n)
dd <- data.frame(p = p, me = me)
ggplot(data = dd, aes(x = p, y = me)) +
geom_line() +
labs(title = "Margin of Error for Population Proportions",
x = "Population Proportion", y = "Margin of Error")The center is near 0.1. The spread is from about 0.05 to 0.15. The shape is a bell-curve but with two peaks close together.
The shape and spread remained about the same, but the center shifted closer to the new p.
When p was constant and n decreased, the spread got wider and the center remained about the same. The shape was also similar.
H0: People who sleep 10+ hours strength train every day at the same percentage as people who sleep less than 10 hours. HA: People who sleep 10+ hours strength train every day more often than people who sleep less than 10 hours.
The two groups did not sleep the same. People who slept 10+ hours had a higher percentage of people strength training every day of the week. Therefore, I would reject the null hypothesis. The confidence interval is (0.005210003, 0.008257855). Since 0 is not part of the range, I would say the results support the alternate hypothesis. However, 0 is very close to the range, so I would want more evidence to claim the alternate hypothesis is true. It does not seem to be a significant difference.
set.seed(1001)
tenPlusStrength <-
yrbss |>
filter(school_night_hours_sleep == "10+") |>
count(strength_training_7d)
tenPlusStrength |>
mutate(percent = tenPlusStrength$n / sum(tenPlusStrength$n))## # A tibble: 9 × 3
## strength_training_7d n percent
## <int> <int> <dbl>
## 1 0 100 0.316
## 2 1 17 0.0538
## 3 2 31 0.0981
## 4 3 31 0.0981
## 5 4 18 0.0570
## 6 5 23 0.0728
## 7 6 8 0.0253
## 8 7 84 0.266
## 9 NA 4 0.0127
noTenPlusStrength <-
yrbss |>
filter(school_night_hours_sleep != "10+") |>
count(strength_training_7d)
noTenPlusStrength |>
mutate(percent = noTenPlusStrength$n / sum(noTenPlusStrength$n))## # A tibble: 9 × 3
## strength_training_7d n percent
## <int> <int> <dbl>
## 1 0 3484 0.290
## 2 1 979 0.0815
## 3 2 1260 0.105
## 4 3 1418 0.118
## 5 4 1027 0.0854
## 6 5 1289 0.107
## 7 6 492 0.0409
## 8 7 1958 0.163
## 9 NA 112 0.00932
yrbss <-
yrbss |>
mutate(strengthTraning = ifelse((strength_training_7d == "7") & (school_night_hours_sleep == "10+") , "yes", "no"))
yrbss |>
filter(!is.na(strengthTraning)) |>
specify(response = strengthTraning, success = "yes") |>
generate(reps = 1000, type = "bootstrap") |>
calculate(stat = "prop") |>
get_ci(level = 0.95)## # A tibble: 1 × 2
## lower_ci upper_ci
## <dbl> <dbl>
## 1 0.00521 0.00826
Since the significance level is 0.05, the probability of a type 1 error is 5%.
9604 people
## [1] 9604