Question 1
Calculate a 95% confidence interval for the average length of pregnancies (weeks) and interpret it in context. Note that since you're doing inference on a single population parameter, there is no explanatory variable, so you can omit the x variable from the function.
inference(y = nc$weeks, est = "mean", type = "ci", null = 0,
alternative = "twosided", method = "theoretical")
## Single mean
## Summary statistics:
## mean = 38.3347 ; sd = 2.9316 ; n = 998
## Standard error = 0.0928
## 95 % Confidence interval = ( 38.1528 , 38.5165 )
The true mean of the length of pregenancy in the sampled area will be captured by the interval 95% of the time.
Question 2
Calculate a new confidence interval for the same parameter at the 90% confidence level. You can change the confidence level by adding a new argument to the function: conflevel = 0.90.
inference(y = nc$weeks, est = "mean", type = "ci", null = 0,
alternative = "twosided", method = "theoretical", conflevel = 0.90)
## Single mean
## Summary statistics:
## mean = 38.3347 ; sd = 2.9316 ; n = 998
## Standard error = 0.0928
## 90 % Confidence interval = ( 38.182 , 38.4873 )
Question 3
Conduct a hypothesis test evaluating whether the average weight gained by younger mothers is different than the average weight gained by mature mothers.
inference(y = nc$weight, x = nc$mature, est = "mean", type = "ht", null = 0,
alternative = "twosided", method = "theoretical",
order = c("younger mom","mature mom"))
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_younger mom = 867, mean_younger mom = 7.0972, sd_younger mom = 1.4855
## n_mature mom = 133, mean_mature mom = 7.1256, sd_mature mom = 1.6591
## Observed difference between means (younger mom-mature mom) = -0.0283
##
## H0: mu_younger mom - mu_mature mom = 0
## HA: mu_younger mom - mu_mature mom != 0
## Standard error = 0.152
## Test statistic: Z = -0.186
## p-value = 0.8526
P-value = 0.8526 > 0.05, we are unable to reject the null hypothesis and hereby conclude that there is no weight difference between younger moms and mature moms.
Question 4
Now, a non-inference task: Determine the age cutoff for younger and mature mothers. Use a method of your choice, and explain how your method works.
nc %>% group_by(mature) %>% na.omit() %>%
summarise(min_age = min(mage),
max_age = max(mage))
## # A tibble: 2 × 3
## mature min_age max_age
## <fct> <int> <int>
## 1 mature mom 35 50
## 2 younger mom 15 34
Based on the summary table above, I would say the cut off age would be 35 and above for mature and young otherwise.
Question 5
Pick a pair of numerical and categorical variables and come up with a research question evaluating the relationship between these variables. Formulate the question in a way that it can be answered using a hypothesis test and/or a confidence interval. Answer your question using the inference function, report the statistical results, and also provide an explanation in plain language.
I would like to analyze the realtionship of whether a mom's smoking habit would affect number of visits to the hospital
H0: mom's smoking habit would not affect number of visits to the hospital H1: mom's smoking habit would affect the number of visits to the hospital
inference(y = nc$visits, x = nc$habit, est = "mean", type = "ht", null = 0,
alternative = "twosided", method = "theoretical", order = c("smoker","nonsmoker"))
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_smoker = 125, mean_smoker = 11.336, sd_smoker = 4.1639
## n_nonsmoker = 866, mean_nonsmoker = 12.2159, sd_nonsmoker = 3.9139
## Observed difference between means (smoker-nonsmoker) = -0.8799
##
## H0: mu_smoker - mu_nonsmoker = 0
## HA: mu_smoker - mu_nonsmoker != 0
## Standard error = 0.395
## Test statistic: Z = -2.225
## p-value = 0.026
Because of the P-value report is 0.026 < 0.05, at 95% confidence we are able to reject the null and hereby conclude that the difference in the number of visits between smoking mom and non smoking mom are significant.