library(tidyverse)
library(openintro)
download.file("http://www.openintro.org/stat/data/nc.RData", destfile = "nc.RData")
load("nc.RData")The following are cases in our dataset: 1) Is the baby premature or full term? 2) Is the mother married or unmarried? 3) Is the baby low birthweight or high birthweight? 4) Is the baby male or female? 5) Is the mother a smoker or nonsmoker? 6) Is the mother white or not white?
I’m not sure what the question “how many cases are there in our sample?” means, but I see that there are 1000 observations for all 13 variables.
There are clearly outliers for our numerical data as shown by the boxplot for visits.
summary(nc)## fage mage mature weeks premie
## Min. :14.00 Min. :13 mature mom :133 Min. :20.00 full term:846
## 1st Qu.:25.00 1st Qu.:22 younger mom:867 1st Qu.:37.00 premie :152
## Median :30.00 Median :27 Median :39.00 NA's : 2
## Mean :30.26 Mean :27 Mean :38.33
## 3rd Qu.:35.00 3rd Qu.:32 3rd Qu.:40.00
## Max. :55.00 Max. :50 Max. :45.00
## NA's :171 NA's :2
## visits marital gained weight
## Min. : 0.0 married :386 Min. : 0.00 Min. : 1.000
## 1st Qu.:10.0 not married:613 1st Qu.:20.00 1st Qu.: 6.380
## Median :12.0 NA's : 1 Median :30.00 Median : 7.310
## Mean :12.1 Mean :30.33 Mean : 7.101
## 3rd Qu.:15.0 3rd Qu.:38.00 3rd Qu.: 8.060
## Max. :30.0 Max. :85.00 Max. :11.750
## NA's :9 NA's :27
## lowbirthweight gender habit whitemom
## low :111 female:503 nonsmoker:873 not white:284
## not low:889 male :497 smoker :126 white :714
## NA's : 1 NA's : 2
##
##
##
##
boxplot(nc$visits)These plots show that the median birthweight for smoker mothers is slightly lower than the median birthweight for the non-smoker mothers, and the upper range of birthweights for non-smoker mothers is higher than for smoker mothers.
boxplot(nc$weight ~ nc$habit,
col='steelblue',
main='Smoking habit vs. Birthweight',
) by(nc$weight, nc$habit, mean)## nc$habit: nonsmoker
## [1] 7.144273
## ------------------------------------------------------------
## nc$habit: smoker
## [1] 6.82873
Because there is no information about the method, we will assume that the variables are i.i.d. The sample sizes are sufficiently large to account for skewing, therefore the conditions are met.
by(nc$weight, nc$habit, length)## nc$habit: nonsmoker
## [1] 873
## ------------------------------------------------------------
## nc$habit: smoker
## [1] 126
H_0: There is no difference in mean birthweight for smoker and non-smoker mothers.(mu’s are equal) H_a: There is a difference in mean birthweight for smoker and non-smoker mothers. (mu’s are not equal)
The inference function produces a p-value of p = 0.0184, which is < 0.05 but > 0.01, therefore there is moderate evidence to suggest that a mother’s smoking habit is correlated with the birthweight of the baby.
inference(y = nc$weight, x = nc$habit, est = "mean", type = "ht", null = 0,
alternative = "twosided", method = "theoretical")## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_nonsmoker = 873, mean_nonsmoker = 7.1443, sd_nonsmoker = 1.5187
## n_smoker = 126, mean_smoker = 6.8287, sd_smoker = 1.3862
## Observed difference between means (nonsmoker-smoker) = 0.3155
##
## H0: mu_nonsmoker - mu_smoker = 0
## HA: mu_nonsmoker - mu_smoker != 0
## Standard error = 0.134
## Test statistic: Z = 2.359
## p-value = 0.0184
The researcher can be 95% confident that the true difference in mean birthweights for non-smoker vs. smoker babies is between -0.578 lbs and -0.053 lbs.
inference(y = nc$weight, x = nc$habit, est = "mean", type = "ci", null = 0,
alternative = "twosided", method = "theoretical",
order = c("smoker","nonsmoker"))## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_smoker = 126, mean_smoker = 6.8287, sd_smoker = 1.3862
## n_nonsmoker = 873, mean_nonsmoker = 7.1443, sd_nonsmoker = 1.5187
## Observed difference between means (smoker-nonsmoker) = -0.3155
##
## Standard error = 0.1338
## 95 % Confidence interval = ( -0.5777 , -0.0534 )