A data set of 1,000 randomly sampled births from North Carolina in 2004 are of interest to medical researchers who are studying the habits and practices of expecting mothers and the birth of their children.We are focusing on determining if there is a link between birth weight and smoking habits of the mother. We will begin by determining the median age at which mothers are more likely to smoke. Next, we will determine the average baby weight of smoking and nonsmoking moms and compare the two. Lastly, we will run a confidence interval to determine if we can reject the hypothesis or not. This data was accessed at [“http://www.openintro.org/stat/data/nc.RData”].
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 16.00 21.00 24.50 25.25 28.75 39.00
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13.00 22.00 27.00 27.24 32.00 50.00
I expected the nonsmoker age to be younger but the boxplot shows that the actual median for nonsmokers is over the age of 30 and the median for smokers was 24 years old. 75% of nonsmoking mothers are over the age of 35 and only 25% is under 30 years old.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.690 6.077 7.060 6.829 7.735 9.190
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 6.440 7.310 7.144 8.060 11.750
The nonsmokers’ babies are healthier than the smokers’ babies because the average weight of the nonsmokers’ babies is .3 pounds more than the smokers’ average. The smoking mothers’ graph is slightly skewed to the left.
Because both distributions are skewed, I need to create sampling distributions in order to have approximatley normally distributed distributions. The following code chunk allows me to gather 50 samples (n=10) for both groups.
#pullinng samples of size 10 to create a sampling distribution that is approximately normal.
set.seed(52319)
smokemeans<-rep(0, 50)
for (i in 1:50) {
sam <- sample(smoker$weight, 10)
smokemeans[i] <-mean(sam)
}
#running a loop, 50 samples of 10 with mean calculated for each, vector is called smokemeans
nonsmokemeans<-rep(0, 50)
for (i in 1:50) {
sam <- sample(nonsmoker$weight, 10)
nonsmokemeans[i] <-mean(sam)
}
#running a loop, 50 samples of 10 with mean calculated for each, vector is called nonsmokemeans
#creating side-by-side histograms of sampling distributions so that I can run tests which rquire the data to be approximately normally distributed.
par(mfrow=c(1,2))
hist(smokemeans, main = "Smoker Sample Means", xlab= "weight in pounds")
hist(nonsmokemeans, main= "Non-Smoker Sample Means", xlab="weight in pounds")
The sampling distributions (n=10) for both groups, smoker and nonsmoker, are approximately normal. I can verify this with histograms above. I can now proceed with inference testing.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.526 6.559 6.814 6.816 7.071 8.215
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.857 6.979 7.205 7.136 7.373 7.990
## [1] 0.4812031
## [1] 0.4730851
Both sampling distibutions are unimodal, but smoker distribution is symmetric while the nonsmoker is slightly skewed left. The sample means for the smoker group is centered at 6.68 pounds with a standard deviation of .48 pounds, while the nonsmoker group is centered at .47 pounds.
Null Hypothesis: There is no difference in average baby weights between the smoking and nonsmoking groups. Alternative hypothesis: The smoking group has a higher average baby weight than the nonsmoking group.
##
## Welch Two Sample t-test
##
## data: smokemeans and nonsmokemeans
## t = -3.3456, df = 97.972, p-value = 0.001165
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.5086629 -0.1298971
## sample estimates:
## mean of x mean of y
## 6.81624 7.13552
Because the P value is essentially 0, I will reject the null. There is evidence that the nonsmoking group have healthier, heavier babies.
I am 95% confident that TRUE mean difference in baby weight between the nonsmoking group and the smoking group is between .5 and .12 pounds.
As stated in the introduction: “the original purpose of the data is studying the habits and practices of expecting mothers and the birth of their children.” According to my statistical analysis, I have shown that there is a difference in newborn baby weight. We found that mothers who smoke have babies that weigh less than the babies of mothers who do not smoke.