Question 1

# import and clean dataset

data(mtcars)
mtcars_clean <- na.omit(mtcars)


# 95% confidence interval

x_bar <- mean(mtcars_clean$mpg)
stdv <- sd(mtcars_clean$mpg)
size <- length(mtcars_clean$mpg)
alpha <- 0.05
t_star <- dt(1 - alpha/2, df = size - 1)
margin_error <- t_star * (stdv/sqrt(size))

ci_lower = x_bar - margin_error
ci_upper = x_bar + margin_error 


print(ci_lower)
## [1] 19.83058
print(ci_upper)
## [1] 20.35067
# 90% confidence interval

alpha_2 <- 0.10
t_star_2 <- dt(1 - alpha_2/2, df= size - 1)
margin_error_2 <- t_star_2 * (stdv/sqrt(size))

ci_lower_2 = x_bar - margin_error_2
ci_upper_2 = x_bar + margin_error_2

print(ci_lower_2)
## [1] 19.82423
print(ci_upper_2)
## [1] 20.35702
# 99% confidence interval

alpha_3 <- 0.01
t_star_3 <- dt(1 - alpha_3/2, df= size - 1)
margin_error_3 <- t_star_3 * (stdv/sqrt(size))

ci_lower_3 = x_bar - margin_error_3
ci_upper_3 = x_bar + margin_error_3

print(ci_lower_3)
## [1] 19.83566
print(ci_upper_3)
## [1] 20.34559
  1. Question: How does the width of the interval change as the confidence level increases? Explain.

As the confidence level increases, the width of the confidence level increases. Due to the higher confidence percentage, a larger margin of error is needed. Therefore, as the confidence level percentages increase, the width of the intervals also increase.

Question 2

screen_time <- c(6.2 , 7.1 , 5.8 , 6.5 , 7.9 , 6.3 , 5.5 , 7.4 , 6.0 , 6.7 ,
6.8 , 7.2 , 5.9 , 6.1 , 7.5 , 6.6 , 6.4 , 7.0 , 6.9 , 5.7)

x_bar_4 <- mean(screen_time)
stdv_2 <- sd(screen_time)
size_2 <- length(screen_time)
t_star_4 <- dt(1 - alpha/2, df = size - 1)
margin_error <- t_star_4 * (stdv/sqrt(size))

ci_lower_4 = x_bar_4 - margin_error
ci_upper_4 = x_bar_4 + margin_error 

print(ci_lower_4)
## [1] 6.314955
print(ci_upper_4)
## [1] 6.835045

Interpret your confidence interval in context: What does it mean about the average daily screen time?

In this context, the confidence interval (6.315, 6.835) tells us that we are 95% confident that the daily screen time value will fall in between these set of values.

Extension: If your sample had twice as many people (n = 40) but a similar standard deviation, how would you expect the interval’s width to change?

If the sample had twice as many people, the interval’s width would most likely decrease since the margin of error relies on the sample size. As the sample size gets larger, we have a better visual on the data itself, ultimately decreasing the interval’s width.

Think critically: List at least two potential sources of bias that could affect the accuracy of your results.

One source of bias could be the assumption that the data is normally distributed. If the data isn’t normally distributed, it wouldn’t follow the Central Limit Theorem, meaning a smaller sample size wouldn’t be factored in at all.There could also be a selection bias, which wouldn’t represent the population.