download.file("http://www.openintro.org/stat/data/nc.RData", destfile = "nc.RData")
load("nc.RData")

Question 1

Calculate a 95% confidence interval for the average length of pregnancies (weeks) and interpret it in context. Note that since you’re doing inference on a single population parameter, there is no explanatory variable, so you can omit the x variable from the function. Give a sentence interpretting the interval.

inference(y = nc$weight, est = "mean", type = "ci", null = 0, alternative = "twosided", method = "theoretical", conflevel = 0.95)
## Warning: package 'openintro' was built under R version 3.3.2
## Warning: package 'BHH2' was built under R version 3.3.2
## Single mean 
## Summary statistics:

## mean = 7.101 ;  sd = 1.5089 ;  n = 1000 
## Standard error = 0.0477 
## 95 % Confidence interval = ( 7.0075 , 7.1945 )

We can state that, given a roughly normal distribution, the point estimate is 7.101 and given a 95% confidence interval, the mean length of pregnancy is between 7.0075 and 7.1945 months.

Question 2

Calculate a new confidence interval for the same parameter at the 90% confidence level. You can change the confidence level by adding a new argument to the function: conflevel = 0.90. Give a sentence interpretting the interval.

inference(y = nc$weight, est = "mean", type = "ci", null = 0, alternative = "twosided", method = "theoretical", conflevel = 0.90)
## Single mean 
## Summary statistics:

## mean = 7.101 ;  sd = 1.5089 ;  n = 1000 
## Standard error = 0.0477 
## 90 % Confidence interval = ( 7.0225 , 7.1795 )

We can state that, given a roughly normal distribution, the point estimate is 7.101 and given a 90% confidence interval, the mean length of pregnancy is between 7.0225 and 7.1795 months.

Question 3

Conduct a hypothesis test evaluating whether the average weight gained by younger mothers is different than the average weight gained by mature mothers. (Steps 1 through 3 (State \(H_0\) and \(H_A\), Do the test, Find the p-value) are given in the output, but you need to state your conclusion both in terms of \(H_0\) and in terms of the question.)

inference (y= nc$weight, x= nc$mature, est = "mean", type = "ht", null = 0, alternative = "twosided", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_mature mom = 133, mean_mature mom = 7.1256, sd_mature mom = 1.6591
## n_younger mom = 867, mean_younger mom = 7.0972, sd_younger mom = 1.4855
## Observed difference between means (mature mom-younger mom) = 0.0283
## 
## H0: mu_mature mom - mu_younger mom = 0 
## HA: mu_mature mom - mu_younger mom != 0 
## Standard error = 0.152 
## Test statistic: Z =  0.186 
## p-value =  0.8526

\(H_0\): Weight Gain Of Mature Mother=Weight Gain of Younger Mother \(H_A\): Weight Gain of Mature Mother>/< Weight Gain of Younger Mother p-value=0.8526 The p-value for this hypothesis test is .8526, which is higher than the assumed Alpha-level of .005, meaning that we fail to reject hypothesis \(H_0\).

There is no evidence to suggest that the weight gain of younger mothers is significantly different than the weight gain of mature mothers.

Question 4

Now, a non-inference task: Determine the age cutoff for younger and mature mothers. Use a method of your choice, and explain how your method works. (There is no wrong answer here, just explain your logic.)

Question 5

Pick a pair of numerical and categorical variables and come up with a research question evaluating the relationship between these variables. Formulate the question in a way that it can be answered using a hypothesis test and/or a confidence interval. Answer your question using the inference function, report the statistical results, and also provide an explanation in plain language.

Research Question: Is there a significant difference in the length of pregnancy between smokers and non-smokers?

inference (y= nc$weeks, x= nc$habit, est = "mean", type = "ht", null = 0, alternative = "twosided", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_nonsmoker = 872, mean_nonsmoker = 38.3188, sd_nonsmoker = 2.9936
## n_smoker = 126, mean_smoker = 38.4444, sd_smoker = 2.4676
## Observed difference between means (nonsmoker-smoker) = -0.1256
## 
## H0: mu_nonsmoker - mu_smoker = 0 
## HA: mu_nonsmoker - mu_smoker != 0 
## Standard error = 0.242 
## Test statistic: Z =  -0.519 
## p-value =  0.6038

H0: Length of pregnancy of non-smokers = Length of pregnancy of smokers H0: Length of pregnancy of non-smokers >/< Length of pregnancy of smokers Standard error: .242 P-value: .6038

The p-value for this hypothesis test is .6038, which is higher than the assumed Alpha-level of .005, meaning that we fail to reject hypothesis H0.

Conclusion: There is no evidence to suggest that the length of pregnancy of smoking mothers is significantly different than the length of pregnancy of non-smoking mothers.