Our study looks at 1000 randomly sampled cases of births in North Carolina in 2004.
## 'data.frame': 1000 obs. of 13 variables:
## $ fage : int NA NA 19 21 NA NA 18 17 NA 20 ...
## $ mage : int 13 14 15 15 15 15 15 15 16 16 ...
## $ mature : Factor w/ 2 levels "mature mom","younger mom": 2 2 2 2 2 2 2 2 2 2 ...
## $ weeks : int 39 42 37 41 39 38 37 35 38 37 ...
## $ premie : Factor w/ 2 levels "full term","premie": 1 1 1 1 1 1 1 2 1 1 ...
## $ visits : int 10 15 11 6 9 19 12 5 9 13 ...
## $ marital : Factor w/ 2 levels "married","not married": 1 1 1 1 1 1 1 1 1 1 ...
## $ gained : int 38 20 38 34 27 22 76 15 NA 52 ...
## $ weight : num 7.63 7.88 6.63 8 6.38 5.38 8.44 4.69 8.81 6.94 ...
## $ lowbirthweight: Factor w/ 2 levels "low","not low": 2 2 2 2 2 1 2 1 2 2 ...
## $ gender : Factor w/ 2 levels "female","male": 2 2 1 2 1 2 2 2 2 1 ...
## $ habit : Factor w/ 2 levels "nonsmoker","smoker": 1 1 1 1 1 1 1 1 1 1 ...
## $ whitemom : Factor w/ 2 levels "not white","white": 1 1 2 2 1 1 1 1 2 2 ...
## fage mage mature weeks
## Min. :14.00 Min. :13 mature mom :133 Min. :20.00
## 1st Qu.:25.00 1st Qu.:22 younger mom:867 1st Qu.:37.00
## Median :30.00 Median :27 Median :39.00
## Mean :30.26 Mean :27 Mean :38.33
## 3rd Qu.:35.00 3rd Qu.:32 3rd Qu.:40.00
## Max. :55.00 Max. :50 Max. :45.00
## NA's :171 NA's :2
## premie visits marital gained
## full term:846 Min. : 0.0 married :386 Min. : 0.00
## premie :152 1st Qu.:10.0 not married:613 1st Qu.:20.00
## NA's : 2 Median :12.0 NA's : 1 Median :30.00
## Mean :12.1 Mean :30.33
## 3rd Qu.:15.0 3rd Qu.:38.00
## Max. :30.0 Max. :85.00
## NA's :9 NA's :27
## weight lowbirthweight gender habit
## Min. : 1.000 low :111 female:503 nonsmoker:873
## 1st Qu.: 6.380 not low:889 male :497 smoker :126
## Median : 7.310 NA's : 1
## Mean : 7.101
## 3rd Qu.: 8.060
## Max. :11.750
##
## whitemom
## not white:284
## white :714
## NA's : 2
##
##
##
##



Smokers in North Carolina in our subset of 2004 births had babies that were slightly smaller, on average. There is a curious pattern of low birth-weight outliers in the nonsmoker category that doesn’t show up for smokers. The reason for this, and whether it exists in any broader population, could be a fruitful topic for future research. It might lend itsself well to machine learning classifiers.
by(nc$weight, nc$habit, mean)
## nc$habit: nonsmoker
## [1] 7.144273
## --------------------------------------------------------
## nc$habit: smoker
## [1] 6.82873
by(nc$weight, nc$habit, length)
## nc$habit: nonsmoker
## [1] 873
## --------------------------------------------------------
## nc$habit: smoker
## [1] 126
library(e1071)
test.for.nonsmoker.skew.n<-subset(nc, nc$habit=="nonsmoker")
skewness(test.for.nonsmoker.skew.n$weight)
## [1] -1.182798
test.for.nonsmoker.skew.s<-subset(nc, nc$habit=="smoker")
skewness(test.for.nonsmoker.skew.s$weight)
## [1] -0.9898247
The conditions for inference are not met. The observations are randomly chosen and independent. The sample size is large enough for inference, but not over 10%. They both, however, are quite skewed.
H0 : The birth weights for babies born to smokers are the same.
H1 : The birth weights for babies born to smokers are different than for those born to non-smokers.
inference(y = nc$weight, x = nc$habit, est = "mean", type = "ht", null = 0,
alternative = "twosided", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_nonsmoker = 873, mean_nonsmoker = 7.1443, sd_nonsmoker = 1.5187
## n_smoker = 126, mean_smoker = 6.8287, sd_smoker = 1.3862
## Observed difference between means (nonsmoker-smoker) = 0.3155
##
## H0: mu_nonsmoker - mu_smoker = 0
## HA: mu_nonsmoker - mu_smoker != 0
## Standard error = 0.134
## Test statistic: Z = 2.359
## p-value = 0.0184

inference(y = nc$weight, x = nc$habit, est = "mean", type = "ci", null = 0,
alternative = "twosided", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_nonsmoker = 873, mean_nonsmoker = 7.1443, sd_nonsmoker = 1.5187
## n_smoker = 126, mean_smoker = 6.8287, sd_smoker = 1.3862

## Observed difference between means (nonsmoker-smoker) = 0.3155
##
## Standard error = 0.1338
## 95 % Confidence interval = ( 0.0534 , 0.5777 )
On your own
The 95% confidence interval for weeks’ gestation is from 38.1528 to 38.5165. That means we expect the mean of the population to be in this interval, with 95% confidence.
inference(y = nc$weeks, est = "mean", type = "ci", null = 0,
alternative = "twosided", method = "theoretical")
## Single mean
## Summary statistics:

## mean = 38.3347 ; sd = 2.9316 ; n = 998
## Standard error = 0.0928
## 95 % Confidence interval = ( 38.1528 , 38.5165 )
The 90% confidence interval for weeks’ gestation is from 38.182 to 38.4873. That means we expect the mean of the population to be in this interval, with 90% confidence. It is only slightly narrower than our 95% c.i.
inference(y = nc$weeks, est = "mean", type = "ci", null = 0,
alternative = "twosided", method = "theoretical",conflevel = 0.90)
## Single mean
## Summary statistics:

## mean = 38.3347 ; sd = 2.9316 ; n = 998
## Standard error = 0.0928
## 90 % Confidence interval = ( 38.182 , 38.4873 )
Is there a difference in weight gain between mature mothers and younger mothers?
H0 : The weight gain for older and younger mothers are the same.
H1 : The weight gain for older and younger mothers are different.
inference(y = nc$gained, x = nc$mature, est = "mean", type = "ht", null = 0,
alternative = "twosided", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_mature mom = 129, mean_mature mom = 28.7907, sd_mature mom = 13.4824
## n_younger mom = 844, mean_younger mom = 30.5604, sd_younger mom = 14.3469
## Observed difference between means (mature mom-younger mom) = -1.7697
##
## H0: mu_mature mom - mu_younger mom = 0
## HA: mu_mature mom - mu_younger mom != 0
## Standard error = 1.286
## Test statistic: Z = -1.376
## p-value = 0.1686

There is insufficient evidence to show that mature mothers are likely to gain more weight during pregnancy. The p-value is .1686
older<-subset(nc,nc$mature=="mature mom")
min(older$mage,na.rm = TRUE)
## [1] 35
younger<-subset(nc,nc$mature=="younger mom")
max(younger$mage,na.rm = TRUE)
## [1] 34
Mothers are younger if they are 34 or younger. We created a subset for both younger and mature and found a min and a max to find the boundary.
Is there a difference between babies born to mature mothers compared to younger mothers?
H0 : The birth timing for babies born to older mothers are the same.
H1 : The birth timing for babies born to older mothers are different than for those born to younger mothers.
inference(y = nc$weeks, x = nc$mature, est = "mean", type = "ht", null = 0,
alternative = "less", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_mature mom = 132, mean_mature mom = 38.0227, sd_mature mom = 3.2184
## n_younger mom = 866, mean_younger mom = 38.3822, sd_younger mom = 2.8844
## Observed difference between means (mature mom-younger mom) = -0.3595
##
## H0: mu_mature mom - mu_younger mom = 0
## HA: mu_mature mom - mu_younger mom < 0
## Standard error = 0.297
## Test statistic: Z = -1.211
## p-value = 0.1129

There is insufficient evidence to show that mature mothers are more likely to give birth to babies earlier in their gestation. The p-value is .1129