## Warning: package 'tidyverse' was built under R version 4.0.2
## Warning: package 'tibble' was built under R version 4.0.3
## Warning: package 'tidyr' was built under R version 4.0.2
## Warning: package 'dplyr' was built under R version 4.0.2
## Warning: package 'openintro' was built under R version 4.0.2
## Warning: package 'airports' was built under R version 4.0.2
## Warning: package 'cherryblossom' was built under R version 4.0.2
## Warning: package 'usdata' was built under R version 4.0.2
## Warning: package 'infer' was built under R version 4.0.2
## [1] "C:/Users/Jerome/Documents/From_Toshiba_HD_Work_Files/0000_Montgomery_College/Math_217/Week_10/201108_Math217_Lab6"
Exercise 1 Counts in each category of texted while driving
Insert any text here.
table(yrbss$text_while_driving_30d)
##
## 0 1-2 10-19 20-29 3-5
## 4792 925 373 298 493
## 30 6-9 did not drive
## 827 311 4646
Exercise 2 Proportion of people who hae texted while driving every day in the past 30 days and never wear helmets
no_helmet <- yrbss %>%
filter(helmet_12m =="never")
no_helmet <- no_helmet %>%
mutate(text_ind = ifelse(text_while_driving_30d == "30", "yes", "no"))
table (no_helmet$text_ind)
##
## no yes
## 6040 463
## [1] 6.63609
The proportion of people who have texted while driving every day in the past 30 days and never wear helmets is 6.636%.
Exercise 3 - Margin of Error for the estimate of the proportion of non-helmet wearers that have texted while driving each da for the past 30 days based on this survey?
no_helmet %>%
specify(response = text_ind, success = "yes") %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "prop") %>%
get_ci(level = 0.95)
## Warning: Removed 474 rows containing missing values.
## # A tibble: 1 x 2
## lower_ci upper_ci
## <dbl> <dbl>
## 1 0.0649 0.0778
But this code did not calculate a proportion. It only gave the CI. I deleted the get_ci command and got an error - it wouldn’t run. I looked at your code to calculate the margin of error in the Chapter 6 notes, but I have data in this problem with which I could compute the margin of error. text_ind is a categorical variable - i can’t calculate a mean and sd. So how do i do this??
Use <- R-Code to solve this: monarch 3 / , Mi/0cvn cw^ Z meanjength <- s <- sd( SE <- s/sqrt( n) E <- qt( .975, df=n-l )*SE meanjength + c(-E, E) tT& n # Use the sample size from the problem # Use the mean value from the problem # Use the st. dev. from the problem CtA ) # margin of error # This will give the lower and upper bounds of the Cl
Exercise 4
Create New Variables
hrs_tv <- yrbss %>%
filter(hours_tv_per_school_day == "do not watch")
hrs_tv <- hrs_tv %>%
mutate(text_ind = ifelse(text_while_driving_30d == "30", "yes", "no"))
hrs_tv %>%
specify(response = text_ind, success = "yes") %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "prop") %>%
get_ci(level = 0.95)
## Warning: Removed 118 rows containing missing values.
## # A tibble: 1 x 2
## lower_ci upper_ci
## <dbl> <dbl>
## 1 0.0540 0.0772
no_ride <- yrbss %>%
filter(helmet_12m == "did not ride")
no_ride <- no_ride %>%
mutate(text_ind = ifelse(text_while_driving_30d == "30", "yes", "no"))
no_ride %>%
specify(response = text_ind, success = "yes") %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "prop") %>%
get_ci(level = 0.95)
## Warning: Removed 324 rows containing missing values.
## # A tibble: 1 x 2
## lower_ci upper_ci
## <dbl> <dbl>
## 1 0.0622 0.0776
Exercise 5
n <- 1000
p <- seq(from = 0, to = 1, by =0.01)
me <- 2 * sqrt(p * (1 - p)/n)
dd <- data.frame(p = p, me = me)
ggplot(data = dd, aes (x = p, y = me)) +
geom_line() +
labs(x = "Population Proportion", y = "Margin of Error")

The answer to question 5 seems to be the margin of error is greatest when the population proportion is 0.50 and least when it is either 0 or 1.0 But since interviewing no one makes no sense and interviewing everyone is financially impossible (in most cases), we are left with working towards the middle.
Exercise 6
Exercise 6
Describe the sampling distribution of sample proportions at n=300 and p=0.1. Be sure to note the center, spread, and shape.
# Center
n<-300
p <- .1
center <- n*p
# Spread
se <- sqrt(p * (1 - p)/n)
center
## [1] 30
## [1] 0.01732051
Exercise 7 - Change P
# Center
n<-300
p <- .5
center <- n*p
# Spread
se <- sqrt(p * (1 - p)/n)
center
## [1] 150
## [1] 0.02886751
Exercise 8 - Change n
# Center
n<-200
p <- .1
center <- n*p
# Spread
se <- sqrt(p * (1 - p)/n)
center
## [1] 20
## [1] 0.0212132
# Center
m<-500
o <- .1
center <- m*o
# Spread
se <- sqrt(o * (1 - o)/m)
center
## [1] 50
## [1] 0.01341641
df <- data.frame(p = p, se = se)
ggplot(data = df, aes (x = p, y = se)) +
geom_line() +
labs(x = "Population Proportion", y = "Standard Error")
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?

Exercise 9 There’s a lot of code here that is irrelevant. I have no idea what the Lab wants me to do.
I finally found the prop.test command and ran that.
table(yrbss$strength_training_7d)
##
## 0 1 2 3 4 5 6 7
## 3632 1012 1305 1468 1059 1333 513 2085
table(yrbss$school_night_hours_sleep)
##
## <5 10+ 5 6 7 8 9
## 965 316 1480 2658 3461 2692 763
strength <- yrbss %>%
filter(strength_training_7d >= "4")
sleep <- yrbss %>%
filter (school_night_hours_sleep >= "8")
strength <- strength %>%
mutate(text_ind = ifelse(strength_training_7d > "5", "yes", "no"))
strength %>%
specify(response = text_ind, success = "yes") %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "prop") %>%
get_ci(level = 0.95)
## # A tibble: 1 x 2
## lower_ci upper_ci
## <dbl> <dbl>
## 1 0.507 0.534
##
## no yes
## 2392 2598
sleep <- sleep %>%
mutate(text_ind = ifelse(school_night_hours_sleep > "8", "yes", "no"))
##
## no yes
## 2692 763
sleep %>%
specify(response = text_ind, success = "yes") %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "prop") %>%
get_ci(level = 0.95)
## # A tibble: 1 x 2
## lower_ci upper_ci
## <dbl> <dbl>
## 1 0.206 0.234
prop.test (c(2598, 763), c(4990,3455))
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(2598, 763) out of c(4990, 3455)
## X-squared = 764.6, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.2799752 0.3196286
## sample estimates:
## prop 1 prop 2
## 0.5206413 0.2208394
Problem 10 is likewise confusing. What’s the question asking? A difference in proportion between those who sleep more than 10 hours/day and those who exercise 7 days/week? But that’s not how the question is worded. Does it mean those who strength train 7 days/week vs. those who don’t, for everyone who sleeps 10+ hours/week?
All the nonsense below (till problem 11) is my attempt to figure this out. I think I failed. I found there are 316 cases in the file in which sleep == 10+. I divided those 316 cases into those who exercised 7 days/week (84 cases) and those who didn’t (232). There is obviously a statistically significant difference in those proportions. (See the calculations.) What’s the probability I could detect a change? A change in what? How much of a change? What’s changing? I still maintain this question makes no sense. I reviewed the definition of a Type I error. I suppose I could say I have a 5% chance of detecting a change, since the only 2 numbers I have w/ the null hypothesis under the facts of the problem are 95% and 5%, and saying I have a 95% chance of detecting a change makes no sense - that’s like saying I have a 100% chance, which I don’t.
table(yrbss$strength_training_7d)
##
## 0 1 2 3 4 5 6 7
## 3632 1012 1305 1468 1059 1333 513 2085
table(yrbss$school_night_hours_sleep)
##
## <5 10+ 5 6 7 8 9
## 965 316 1480 2658 3461 2692 763
yrbss <- yrbss %>%
mutate(text_ind1 = ifelse(school_night_hours_sleep == "10+", "yes", "no"))
strength7 <- yrbss %>%
filter(text_ind1 == "yes")
table(strength7$strength_training_7d)
##
## 0 1 2 3 4 5 6 7
## 100 17 31 31 18 23 8 84
strength7 <- strength7 %>%
mutate(text_ind2 = ifelse(strength_training_7d == "7", "yes", "no"))
prop.test (c(232, 84), c(316,316))
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(232, 84) out of c(316, 316)
## X-squared = 136.77, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.3963062 0.5404026
## sample estimates:
## prop 1 prop 2
## 0.7341772 0.2658228
Problem 11
The answer is “it depends.” An assumption must be made about the proportion. If, for example, the proportion of churchgoers is assumed to be either 25% or 75%, the sample size will be smaller than if the the proportion of churchgoers approaches 50%. In other words, the more extreme the difference in proportion between churchgoers and non-churchgoers, the smaller the sample size is needed.
As explained in Example 9.6.6 and Example 9.6.7, the way to approach this problem is to assume a proportion of 50%, which will give the maximum sample size needed. Using the Wilson correction, the sample size would be 9600 observations.
