Change the Background Color of the main body of the document
body {
background-color: #e1dce8;
}
download.file("http://www.openintro.org/stat/data/nc.RData", destfile = "nc.RData")
load("nc.RData")
What are the cases in this data set? How many cases are there in our sample?
dim(nc)
## [1] 1000 13
Answer: There are 1000 cases in the sample (with 13 variables measured). The cases are births in North Carolina in 2004
summary(nc)
## fage mage mature weeks premie
## Min. :14.00 Min. :13 mature mom :133 Min. :20.00 full term:846
## 1st Qu.:25.00 1st Qu.:22 younger mom:867 1st Qu.:37.00 premie :152
## Median :30.00 Median :27 Median :39.00 NA's : 2
## Mean :30.26 Mean :27 Mean :38.33
## 3rd Qu.:35.00 3rd Qu.:32 3rd Qu.:40.00
## Max. :55.00 Max. :50 Max. :45.00
## NA's :171 NA's :2
## visits marital gained weight
## Min. : 0.0 married :386 Min. : 0.00 Min. : 1.000
## 1st Qu.:10.0 not married:613 1st Qu.:20.00 1st Qu.: 6.380
## Median :12.0 NA's : 1 Median :30.00 Median : 7.310
## Mean :12.1 Mean :30.33 Mean : 7.101
## 3rd Qu.:15.0 3rd Qu.:38.00 3rd Qu.: 8.060
## Max. :30.0 Max. :85.00 Max. :11.750
## NA's :9 NA's :27
## lowbirthweight gender habit whitemom
## low :111 female:503 nonsmoker:873 not white:284
## not low:889 male :497 smoker :126 white :714
## NA's : 1 NA's : 2
##
##
##
##
boxplot(nc$fage, main = "Father's Age")
boxplot(nc$mage, main = "Mother's Age")
boxplot(nc$weeks, main = "Length of Pregnancy in Weeks")
boxplot(nc$visits, main = "Number of Hospital Visits")
boxplot(nc$gained, main = "Mother's Weight Gain During Pregnancy")
boxplot(nc$weight, main = "Baby's Birth Weight")
Make a side-by-side boxplot of habit and weight. What does the plot highlight about the relationship between these two variables?
boxplot(nc$weight ~ nc$habit, main = "Mother's Smoking Habits and Newborn Weight Gain")
by(nc$weight, nc$habit, mean)
## nc$habit: nonsmoker
## [1] 7.144273
## ------------------------------------------------------------
## nc$habit: smoker
## [1] 6.82873
Check if the conditions necessary for inference are satisfied. Note that you will need to obtain sample sizes to check the conditions. You can compute the group size using the same by command above but replacing mean with length.
by(nc$weight, nc$habit, length)
## nc$habit: nonsmoker
## [1] 873
## ------------------------------------------------------------
## nc$habit: smoker
## [1] 126
Answer: Samples are randomly selected and independent. Sample sizes of the two groups are greater than 30, so basic conditions for calculating inference are satisfied.
Write the hypotheses for testing if the average weights of babies born to smoking and non-smoking mothers are different.
Answer:
Ho: mean birth weight for babies of mothers who smoke is equal to mean birth weight for babies of mothers who do not smoke.
inference(y = nc$weight, x = nc$habit, est = "mean", type = "ht", null = 0,
alternative = "twosided", method = "theoretical")
## Warning: package 'openintro' was built under R version 4.1.1
## Warning: package 'airports' was built under R version 4.1.1
## Warning: package 'cherryblossom' was built under R version 4.1.1
## Warning: package 'usdata' was built under R version 4.1.1
## Warning: package 'BHH2' was built under R version 4.1.1
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_nonsmoker = 873, mean_nonsmoker = 7.1443, sd_nonsmoker = 1.5187
## n_smoker = 126, mean_smoker = 6.8287, sd_smoker = 1.3862
## Observed difference between means (nonsmoker-smoker) = 0.3155
##
## H0: mu_nonsmoker - mu_smoker = 0
## HA: mu_nonsmoker - mu_smoker != 0
## Standard error = 0.134
## Test statistic: Z = 2.359
## p-value = 0.0184
Answer:
p-value = 0.0184 There is strong evidence that the mean birthweight for smokers is not equal to the mean birthweight for nonsmokers.
Change the type argument to “ci” to construct and record a confidence interval for the difference between the weights of babies born to smoking and non-smoking mothers.
inference(y = nc$weight, x = nc$habit, est = "mean", type = "ci",
method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_nonsmoker = 873, mean_nonsmoker = 7.1443, sd_nonsmoker = 1.5187
## n_smoker = 126, mean_smoker = 6.8287, sd_smoker = 1.3862
## Observed difference between means (nonsmoker-smoker) = 0.3155
##
## Standard error = 0.1338
## 95 % Confidence interval = ( 0.0534 , 0.5777 )
Answer: We are 95% confident that the true mean difference in birthweights between smokers and nonsmokers is ( 0.0534 , 0.5777 ) lbs. Zero is not included, so we see that nonsmokers have a higher mean birthweight.
inference(y = nc$weight, x = nc$habit, est = "mean", type = "ci", null = 0,
alternative = "twosided", method = "theoretical",
order = c("smoker","nonsmoker"))
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_smoker = 126, mean_smoker = 6.8287, sd_smoker = 1.3862
## n_nonsmoker = 873, mean_nonsmoker = 7.1443, sd_nonsmoker = 1.5187
## Observed difference between means (smoker-nonsmoker) = -0.3155
##
## Standard error = 0.1338
## 95 % Confidence interval = ( -0.5777 , -0.0534 )
inference(y = nc$weeks, est = "mean", type = "ci", null = 0,
alternative = "twosided", method = "theoretical")
## Single mean
## Summary statistics:
## mean = 38.3347 ; sd = 2.9316 ; n = 998
## Standard error = 0.0928
## 95 % Confidence interval = ( 38.1528 , 38.5165 )
inference(y = nc$weeks, est = "mean", type = "ci", null = 0,
alternative = "twosided", conflevel = 0.90, method = "theoretical")
## Single mean
## Summary statistics:
## mean = 38.3347 ; sd = 2.9316 ; n = 998
## Standard error = 0.0928
## 90 % Confidence interval = ( 38.182 , 38.4873 )
inference(y = nc$gained, x = nc$mature, est = "mean", type = "ht", null = 0,
alternative = "twosided", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_mature mom = 129, mean_mature mom = 28.7907, sd_mature mom = 13.4824
## n_younger mom = 844, mean_younger mom = 30.5604, sd_younger mom = 14.3469
## Observed difference between means (mature mom-younger mom) = -1.7697
##
## H0: mu_mature mom - mu_younger mom = 0
## HA: mu_mature mom - mu_younger mom != 0
## Standard error = 1.286
## Test statistic: Z = -1.376
## p-value = 0.1686
inference(y = nc$mage, x = nc$mature, est = "mean", type = "ht", null = 0,
alternative = "twosided", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_mature mom = 133, mean_mature mom = 37.1805, sd_mature mom = 2.4303
## n_younger mom = 867, mean_younger mom = 25.4383, sd_younger mom = 5.0278
## Observed difference between means (mature mom-younger mom) = 11.7422
##
## H0: mu_mature mom - mu_younger mom = 0
## HA: mu_mature mom - mu_younger mom != 0
## Standard error = 0.271
## Test statistic: Z = 43.292
## p-value = 0
by(nc$mage, nc$mature, min)
## nc$mature: mature mom
## [1] 35
## ------------------------------------------------------------
## nc$mature: younger mom
## [1] 13
by(nc$mage, nc$mature, max)
## nc$mature: mature mom
## [1] 50
## ------------------------------------------------------------
## nc$mature: younger mom
## [1] 34
inference(y = nc$weight, x = nc$premie, est = "mean", type = "ht", null = 0,
alternative = "twosided", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_full term = 846, mean_full term = 7.4594, sd_full term = 1.075
## n_premie = 152, mean_premie = 5.1284, sd_premie = 1.9696
## Observed difference between means (full term-premie) = 2.331
##
## H0: mu_full term - mu_premie = 0
## HA: mu_full term - mu_premie != 0
## Standard error = 0.164
## Test statistic: Z = 14.216
## p-value = 0