Question 1

Calculate a 95% confidence interval for the average length of pregnancies (weeks) and interpret it in context. Note that since you're doing inference on a single population parameter, there is no explanatory variable, so you can omit the x variable from the function.

inference(y = nc$weeks, est = "mean", type = "ci", null = 0,
alternative = "twosided", method = "theoretical")
## Single mean 
## Summary statistics:

## mean = 38.3347 ;  sd = 2.9316 ;  n = 998 
## Standard error = 0.0928 
## 95 % Confidence interval = ( 38.1528 , 38.5165 )

The true mean of the length of pregenancy in the sampled area will be captured by the interval 95% of the time.

Question 2

Calculate a new confidence interval for the same parameter at the 90% confidence level. You can change the confidence level by adding a new argument to the function: conflevel = 0.90.

inference(y = nc$weeks, est = "mean", type = "ci", null = 0,
alternative = "twosided", method = "theoretical", conflevel = 0.90)
## Single mean 
## Summary statistics:

## mean = 38.3347 ;  sd = 2.9316 ;  n = 998 
## Standard error = 0.0928 
## 90 % Confidence interval = ( 38.182 , 38.4873 )

Question 3

Conduct a hypothesis test evaluating whether the average weight gained by younger mothers is different than the average weight gained by mature mothers.

inference(y = nc$weight, x = nc$mature, est = "mean", type = "ht", null = 0,
alternative = "twosided", method = "theoretical",
order = c("younger mom","mature mom"))
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_younger mom = 867, mean_younger mom = 7.0972, sd_younger mom = 1.4855
## n_mature mom = 133, mean_mature mom = 7.1256, sd_mature mom = 1.6591
## Observed difference between means (younger mom-mature mom) = -0.0283
## 
## H0: mu_younger mom - mu_mature mom = 0 
## HA: mu_younger mom - mu_mature mom != 0 
## Standard error = 0.152 
## Test statistic: Z =  -0.186 
## p-value =  0.8526

P-value = 0.8526 > 0.05, we are unable to reject the null hypothesis and hereby conclude that there is no weight difference between younger moms and mature moms.

Question 4

Now, a non-inference task: Determine the age cutoff for younger and mature mothers. Use a method of your choice, and explain how your method works.

 nc %>% group_by(mature) %>% na.omit() %>%
 summarise(min_age = min(mage),
 max_age = max(mage))
## # A tibble: 2 × 3
##   mature      min_age max_age
##   <fct>         <int>   <int>
## 1 mature mom       35      50
## 2 younger mom      15      34

Based on the summary table above, I would say the cut off age would be 35 and above for mature and young otherwise.

Question 5

Pick a pair of numerical and categorical variables and come up with a research question evaluating the relationship between these variables. Formulate the question in a way that it can be answered using a hypothesis test and/or a confidence interval. Answer your question using the inference function, report the statistical results, and also provide an explanation in plain language.

I would like to analyze the realtionship of whether a mom's smoking habit would affect number of visits to the hospital

H0: mom's smoking habit would not affect number of visits to the hospital H1: mom's smoking habit would affect the number of visits to the hospital

inference(y = nc$visits, x = nc$habit, est = "mean", type = "ht", null = 0,
alternative = "twosided", method = "theoretical", order = c("smoker","nonsmoker"))
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_smoker = 125, mean_smoker = 11.336, sd_smoker = 4.1639
## n_nonsmoker = 866, mean_nonsmoker = 12.2159, sd_nonsmoker = 3.9139
## Observed difference between means (smoker-nonsmoker) = -0.8799
## 
## H0: mu_smoker - mu_nonsmoker = 0 
## HA: mu_smoker - mu_nonsmoker != 0 
## Standard error = 0.395 
## Test statistic: Z =  -2.225 
## p-value =  0.026

Because of the P-value report is 0.026 < 0.05, at 95% confidence we are able to reject the null and hereby conclude that the difference in the number of visits between smoking mom and non smoking mom are significant.