This project is about the relationship between expectant mothers “the length ofpregnancy” and their age, habit, and the partner’s age.
I will be using a dataset called “ncbirths” from the open intro library. The data is a random sample of 1,000 cases from the state of North Carolina. The sample taken by medical researchers, shows the relationship between the pregnant mothers and their habit “smoking”.I have to analyze if the varible that was studied in ncbirths are they related to each other, how strong that relationship is?
# Store NC Births data
library(openintro)
## Please visit openintro.org for free statistics materials
##
## Attaching package: 'openintro'
## The following objects are masked from 'package:datasets':
##
## cars, trees
ncbirths <- ncbirths
# Scatterplot of predictor for pregnancy length and mothers age.
plot(weeks~mage,data = ncbirths)
title("pregnancy length & mother age")
# Scatterplot of predictor for pregnancy length and faders age.
plot(weeks~fage,data = ncbirths)
title("pregnancy length & Father age")
Create a linear of the mother age, father age, and the pregnancy length.
#change the data in habit colume to numeric.
ncbirths <- openintro::ncbirths
ncbirths$habit<-factor(ncbirths$habit,levels = c("nonsmoker", "smoker"))
ncbirths$habit<- as.numeric(ncbirths$habit)
table(ncbirths$habit)
##
## 1 2
## 873 126
# Create linear models for mage and pregnancy length.
lm1<- lm(weeks~mage,data = ncbirths)
# Summary outputs
summary(lm1)
##
## Call:
## lm(formula = weeks ~ mage, data = ncbirths)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.168 -1.213 0.605 1.635 6.772
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.74389 0.41442 93.489 <2e-16 ***
## mage -0.01517 0.01497 -1.013 0.311
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.932 on 996 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.00103, Adjusted R-squared: 2.662e-05
## F-statistic: 1.027 on 1 and 996 DF, p-value: 0.3112
# Create linear models for fage and pregnancy length.
lm2<- lm(weeks~fage,data = ncbirths)
# Summary outputs
summary(lm2)#$r.squared
##
## Call:
## lm(formula = weeks ~ fage, data = ncbirths)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.4314 -0.4726 0.5755 1.5824 6.6510
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.609906 0.460764 83.795 <2e-16 ***
## fage -0.006867 0.014864 -0.462 0.644
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.893 on 826 degrees of freedom
## (172 observations deleted due to missingness)
## Multiple R-squared: 0.0002583, Adjusted R-squared: -0.000952
## F-statistic: 0.2134 on 1 and 826 DF, p-value: 0.6442
I would use the t-test to analyze the variables smoking, non-smoking mothers and pregnacy lengths, and baby’s weigth.
# Smokers subset
smokers <- subset(ncbirths, ncbirths$habit == "2")
# Nonsmokers subset
nonsmokers <- subset(ncbirths, ncbirths$habit == "1")
# t.test function for pregnancylength and smokers, nonsmokers
t.test(smokers$weeks, nonsmokers$weeks)
##
## Welch Two Sample t-test
##
## data: smokers$weeks and nonsmokers$weeks
## t = 0.519, df = 182.63, p-value = 0.6044
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.3519903 0.6032646
## sample estimates:
## mean of x mean of y
## 38.44444 38.31881
Based on p-value in t.test of the pregnacy lenght and smokers , nonsmoker mothers is 0.6044, Therefore we fail to reject H_0. Conclusion: The data suggests there is no difference in the pregnancy lengt between smoking mothers and non-smoking mothers.
# t.test function
t.test(smokers$weight, nonsmokers$weight)
##
## Welch Two Sample t-test
##
## data: smokers$weight and nonsmokers$weight
## t = -2.359, df = 171.32, p-value = 0.01945
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.57957328 -0.05151165
## sample estimates:
## mean of x mean of y
## 6.828730 7.144273
Based on p-value in t.test of the baby weight and smoker mothers, nonsmoker mothers is 0.0195, Therefore we reject the H_0. Conclusion: The data suggests there is a differant in the baby weight between smoking mothers and non-smoking mothers.
Between all the data that I used in this analysis, it appears to be that the baby’s weight does depend on if the mothers are smokers or non-smokers. The baby’s weight from a smoking mother is 6.829 and the baby’s weight from a non-smoking mother is 7.1443.
The dataset of “ncbirths” does not have enough information about the mothers. There are other variables that can affect the pregnancy length and the baby’s weight, like if the mothers are helthy or have helth issues, or if they are taking medecation during the pregnancy or not, and or if they are eating healthy or unhealthy foods. Those examples can change the result and would be more reliable . Also, if the data was gathered from the same mothers with similar attributes the info would be more reliable.
This document was produced as a final project for MAT 143H - Introduction to Statistics (Honors) at North Shore Community College.
The course was led by Professor Billy Jackson.
Student Name:
Semester: