This data set has been of interest to medical researchers who are studying the relation between habits and practices of expectant mothers and the birth of their children. The data set on 1,000 randomly sampled births from the records released by the state of North Carolina in 2004. We are focusing on determining if there is a link between birth weight and smoking habits of the mother. We begin by determining the median age in which mothers are more likely to smoke. Are young mothers more likely to smoke? Or are older mothers more likely to smoke? We then determined the average baby weights of smoking and non smoking moms to determine the effect of smoking on baby weight. We then did compative histograms to see the clear diffirence between baby weights. Lastly we ran a confidence interval to see if there was a TRUE mean diffirence in baby weight between the non smoking and the smoking moms.[https://docs.google.com/document/d/1Q39CRY0A0EtJle3jX4XfZ-C-3H4nfEsqMe5DKhcOoMI/edit#]
download.file("http://www.openintro.org/stat/data/nc.RData",
destfile = "nc.RData")
load("nc.RData")
mature<-(nc$mature)
weeks<-(nc$weeks)
mage<-(nc$mage)
fage<-(nc$fage)
marital<-(nc$marital)
habit<-(nc$habit)
premie<-(nc$premie)
lowbirthweight<-(nc$lowbirthweight)
gained<-(nc$mature)
weight<-(nc$weight)
gender<-(nc$gender)
whitemom<-(nc$whitemom)
group<-(nc$habit)
smoker<-nc[which (group == "smoker"),] #Mothers who were smokers
nonsmoker <-nc[which (group == "nonsmoker"),] #Non-Smoking mothers
##Boxplot to compare female smokers and non smokers by age
boxplot(smoker$mage, nonsmoker$mage, main="Female Smokers and Non Smokers by Age", ylab= "Female's Age", xlab= "1 = Smoker, 2= Non Smoker")
summary(smoker$mage)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 16.00 21.00 24.50 25.25 28.75 39.00
summary(nonsmoker$mage)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13.00 22.00 27.00 27.24 32.00 50.00
The boxplot shows that younger mothers are more likely to smoke than older mothers. The average age of mother smokers was 24 years of age while the average mothers age of non smokers was 27 years of age. We expected the average age of mothers smokers to be less since younger females are more likely to smoke than older females. This data tells us that a younger mother is more likely to smoke than an older mother while pregnant.
#Creating side-by-side histograms of the difference of baby weights of smoker and non smoker moms so that I can run tests which require the data to be approximately normally distributed.
par(mfrow=c(1,2))
hist(smoker$weight,main = "Baby Weight of Smoking Moms", xlab= "Baby Weight in Pounds")
hist(nonsmoker$weight, main = "Baby Weight of Non Smoking Moms", xlab= "Baby Weight in Pounds")
summary(smoker$weight)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.690 6.077 7.060 6.829 7.735 9.190
summary(nonsmoker$weight)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 6.440 7.310 7.144 8.060 11.750
Both histograms are unimodal and it can be determined based on the average baby weight that there is a difference based on smoking and non smoking moms. The gap between birth weight is .25 pounds which is a significant difference.
Because both distributions are skewed, I need to create sampling distributions in order to have approximately normally distributed distributions. The following code chunck allos me to gather 50 samples (n=10) for both groups.
#Pulling samples of this size 10 to create a sampling distribution that is approximately normal.
set.seed(51919) #this function allows you to repeatedly run code chunk but sample remains consistent
smokemeans<- rep(0,50)
for (i in 1:50) {
sam <- sample(smoker$weight, 10)
smokemeans[i] <- mean(sam)
}
#running a loop, 50 samples of 10 with mean calculated for each. Vector is called smokesmean
set.seed(51919) #this function allows you to repeatedly run code chunks but sample remains consistent
nonsmokemeans<- rep(0,50)
for (i in 1:50) {
sam <- sample(nonsmoker$weight, 10)
nonsmokemeans[i] <- mean(sam)
}
#running a loop, 50 samples of 10 with mean calculated for each. Vector is called nonsmokesmean
#Creating side-by-side histograms of the difference of baby weights of smoker and non smoker moms so that I can run tests which require the data to be approximately normaly distributed.
par(mfrow=c(1,2))
hist(smokemeans,main = "Smoker Sample Means", xlab= "Baby Weight in Pounds")
hist(nonsmokemeans, main = "Non-Smoker Sample Means", xlab= "Baby Weight in Pounds")
#summary statistics for sampling distributions to determine whether or not data fits within 3 standard deviations of the mean
summary(smokemeans)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.675 6.608 6.910 6.834 7.088 7.676
summary(nonsmokemeans)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.970 6.896 7.152 7.154 7.448 7.985
sd(smokemeans)
## [1] 0.4519295
sd(nonsmokemeans)
## [1] 0.4344416
Both sammples are unimodal and have an almost identical median but have a difference of .32 pounds. The data of the non smoking mothers is slightly skewed to the left but the data of the smoking mother is symmetrical. The sample means for the smoking mothers is centered at 6.68 pounds with a standard deviation of .48 pounds, while the nonsmoker group is centered at .47 pounds.
#t test for a diffirence in sample means
t.test(smokemeans, nonsmokemeans)
##
## Welch Two Sample t-test
##
## data: smokemeans and nonsmokemeans
## t = -3.6066, df = 97.848, p-value = 0.0004906
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.4956751 -0.1438049
## sample estimates:
## mean of x mean of y
## 6.83390 7.15364
Because the P-Value is essentially 0, I will reject the null. There is evidence that the non smoking moms have healthier heavier babies.
I am 95% confident that the TRUE mean difference in baby weight between the non smoking moms and the smoking moms is .5 and .12 pounds
As stated in the introduction, “the original purpose of the data was to determine the relation between habits and practices of expectant mothers and the birth of their children.” According to my statistical analysis, I have shown that there is a difference in new born baby weight. Mothers who smoke have a baby that weighs less than a mother who does not smoke.