download.file("http://www.openintro.org/stat/data/nc.RData", destfile = "nc.RData")
load("nc.RData")
head(nc)
## fage mage mature weeks premie visits marital gained weight
## 1 NA 13 younger mom 39 full term 10 married 38 7.63
## 2 NA 14 younger mom 42 full term 15 married 20 7.88
## 3 19 15 younger mom 37 full term 11 married 38 6.63
## 4 21 15 younger mom 41 full term 6 married 34 8.00
## 5 NA 15 younger mom 39 full term 9 married 27 6.38
## 6 NA 15 younger mom 38 full term 19 married 22 5.38
## lowbirthweight gender habit whitemom
## 1 not low male nonsmoker not white
## 2 not low male nonsmoker not white
## 3 not low female nonsmoker white
## 4 not low male nonsmoker white
## 5 not low female nonsmoker not white
## 6 low male nonsmoker not white
summary(nc)
## fage mage mature weeks
## Min. :14.00 Min. :13 mature mom :133 Min. :20.00
## 1st Qu.:25.00 1st Qu.:22 younger mom:867 1st Qu.:37.00
## Median :30.00 Median :27 Median :39.00
## Mean :30.26 Mean :27 Mean :38.33
## 3rd Qu.:35.00 3rd Qu.:32 3rd Qu.:40.00
## Max. :55.00 Max. :50 Max. :45.00
## NA's :171 NA's :2
## premie visits marital gained
## full term:846 Min. : 0.0 married :386 Min. : 0.00
## premie :152 1st Qu.:10.0 not married:613 1st Qu.:20.00
## NA's : 2 Median :12.0 NA's : 1 Median :30.00
## Mean :12.1 Mean :30.33
## 3rd Qu.:15.0 3rd Qu.:38.00
## Max. :30.0 Max. :85.00
## NA's :9 NA's :27
## weight lowbirthweight gender habit
## Min. : 1.000 low :111 female:503 nonsmoker:873
## 1st Qu.: 6.380 not low:889 male :497 smoker :126
## Median : 7.310 NA's : 1
## Mean : 7.101
## 3rd Qu.: 8.060
## Max. :11.750
##
## whitemom
## not white:284
## white :714
## NA's : 2
##
##
##
##
A case in this data would be a pregnancy event. We have 1000 cases.
boxplot(nc$weight~nc$habit)
It seems that mothers who smoke have babies with lower weight.
by(nc$weight, nc$habit,mean)
## nc$habit: nonsmoker
## [1] 7.144273
## --------------------------------------------------------
## nc$habit: smoker
## [1] 6.82873
Sample should be random, normal (n more than 30), and independent. Our samples are bigger than 30. It appears that conditions for inference are met. Our sample is random and it should be less than 10% of all NC births.
by(nc$weight, nc$habit,length)
## nc$habit: nonsmoker
## [1] 873
## --------------------------------------------------------
## nc$habit: smoker
## [1] 126
H0: Smoking mothers have the same average weight of babies as non-smoking mothers H1: Smoking mothers do not have the same average weight of babies as non-smoking mothers.
inference(y=nc$weight, x=nc$habit, est="mean", type="ht", null=0, alternative="twosided", method="theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_nonsmoker = 873, mean_nonsmoker = 7.1443, sd_nonsmoker = 1.5187
## n_smoker = 126, mean_smoker = 6.8287, sd_smoker = 1.3862
## Observed difference between means (nonsmoker-smoker) = 0.3155
##
## H0: mu_nonsmoker - mu_smoker = 0
## HA: mu_nonsmoker - mu_smoker != 0
## Standard error = 0.134
## Test statistic: Z = 2.359
## p-value = 0.0184
95% Confidence interval is (-0.5777, -0.0534)
inference(y=nc$weight, x=nc$habit, est="mean", type="ci", null=0, alternative="twosided", method="theoretical", order=c("smoker","nonsmoker"))
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_smoker = 126, mean_smoker = 6.8287, sd_smoker = 1.3862
## n_nonsmoker = 873, mean_nonsmoker = 7.1443, sd_nonsmoker = 1.5187
## Observed difference between means (smoker-nonsmoker) = -0.3155
##
## Standard error = 0.1338
## 95 % Confidence interval = ( -0.5777 , -0.0534 )
95% Confidence interval is (38.1528, 38.5165). We are 95% confident that the mean of length of pregnancy is between 38.1528 and 38.5165 for 2004 NC population.
inference(y=nc$weeks, est="mean", type="ci", null=0, alternative="twosided", method="theoretical")
## Single mean
## Summary statistics:
## mean = 38.3347 ; sd = 2.9316 ; n = 998
## Standard error = 0.0928
## 95 % Confidence interval = ( 38.1528 , 38.5165 )
90% Confidence interval for mean pregnancy durution of our population is (38.182, 38.4873).
inference(y=nc$weeks, est="mean", type="ci", null=0, alternative="twosided", method="theoretical", conflevel=0.9)
## Single mean
## Summary statistics:
## mean = 38.3347 ; sd = 2.9316 ; n = 998
## Standard error = 0.0928
## 90 % Confidence interval = ( 38.182 , 38.4873 )
H0: Average weight gain by young mothers is the same as by mature mothers. H1: Average weight gain by young mothers is not the same as by mature mothers.
P value is high, so we do not reject H0. We do not have enough evidence to conclude that young mothers on average gain different amount of weight than mature mothers.
inference(y=nc$gained, x=nc$mature, est="mean", type="ht", null=0, alternative="twosided", method="theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_mature mom = 129, mean_mature mom = 28.7907, sd_mature mom = 13.4824
## n_younger mom = 844, mean_younger mom = 30.5604, sd_younger mom = 14.3469
## Observed difference between means (mature mom-younger mom) = -1.7697
##
## H0: mu_mature mom - mu_younger mom = 0
## HA: mu_mature mom - mu_younger mom != 0
## Standard error = 1.286
## Test statistic: Z = -1.376
## p-value = 0.1686
Using table function we can see that age cutoff for younger mothers is 34. If a woman is 34 or younger she is considered to be a younger mother and in reverse if a woman is 35 or older she is considered to be a mature mother.
table(nc$mature, nc$mage)
##
## 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
## mature mom 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## younger mom 1 1 6 10 19 38 35 66 51 60 51 53 54 51 47 53 52 39 52 38
##
## 33 34 35 36 37 38 39 40 41 42 45 46 50
## mature mom 0 0 35 31 26 12 7 9 8 2 1 1 1
## younger mom 45 45 0 0 0 0 0 0 0 0 0 0 0
Let’s consider father’s age and if a baby was born premature. We are trying to see if premature babies have on average older fathers.
Let’s formulate our hypotesis:
H0: Average age for premature baby’s fathers is the same as full-term’s. H1: Average age of fathers is not the same for these 2 categories.
P value is very high, so we have no reasons to reject H0. So, we conclude that based on data we have on average there is difference in fathers’ age between premature and full-term babies.
In plain words, we can see from mean results that average age of two groups is practically the same - 30.24 and 30.32. So, on average fathers’ age does not correlate with babies premature status.
inference(y=nc$fage, x=nc$premie, est="mean", type="ht", null=0, alternative="twosided", method="theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_full term = 714, mean_full term = 30.2423, sd_full term = 6.6329
## n_premie = 114, mean_premie = 30.3158, sd_premie = 7.5859
## Observed difference between means (full term-premie) = -0.0735
##
## H0: mu_full term - mu_premie = 0
## HA: mu_full term - mu_premie != 0
## Standard error = 0.753
## Test statistic: Z = -0.098
## p-value = 0.9222