Lab 5. Inference for Numerical Data

download.file("http://www.openintro.org/stat/data/nc.RData", destfile = "nc.RData")

load("nc.RData")

head(nc)
##   fage mage      mature weeks    premie visits marital gained weight
## 1   NA   13 younger mom    39 full term     10 married     38   7.63
## 2   NA   14 younger mom    42 full term     15 married     20   7.88
## 3   19   15 younger mom    37 full term     11 married     38   6.63
## 4   21   15 younger mom    41 full term      6 married     34   8.00
## 5   NA   15 younger mom    39 full term      9 married     27   6.38
## 6   NA   15 younger mom    38 full term     19 married     22   5.38
##   lowbirthweight gender     habit  whitemom
## 1        not low   male nonsmoker not white
## 2        not low   male nonsmoker not white
## 3        not low female nonsmoker     white
## 4        not low   male nonsmoker     white
## 5        not low female nonsmoker not white
## 6            low   male nonsmoker not white
summary(nc)
##       fage            mage            mature        weeks      
##  Min.   :14.00   Min.   :13   mature mom :133   Min.   :20.00  
##  1st Qu.:25.00   1st Qu.:22   younger mom:867   1st Qu.:37.00  
##  Median :30.00   Median :27                     Median :39.00  
##  Mean   :30.26   Mean   :27                     Mean   :38.33  
##  3rd Qu.:35.00   3rd Qu.:32                     3rd Qu.:40.00  
##  Max.   :55.00   Max.   :50                     Max.   :45.00  
##  NA's   :171                                    NA's   :2      
##        premie        visits            marital        gained     
##  full term:846   Min.   : 0.0   married    :386   Min.   : 0.00  
##  premie   :152   1st Qu.:10.0   not married:613   1st Qu.:20.00  
##  NA's     :  2   Median :12.0   NA's       :  1   Median :30.00  
##                  Mean   :12.1                     Mean   :30.33  
##                  3rd Qu.:15.0                     3rd Qu.:38.00  
##                  Max.   :30.0                     Max.   :85.00  
##                  NA's   :9                        NA's   :27     
##      weight       lowbirthweight    gender          habit    
##  Min.   : 1.000   low    :111    female:503   nonsmoker:873  
##  1st Qu.: 6.380   not low:889    male  :497   smoker   :126  
##  Median : 7.310                               NA's     :  1  
##  Mean   : 7.101                                              
##  3rd Qu.: 8.060                                              
##  Max.   :11.750                                              
##                                                              
##       whitemom  
##  not white:284  
##  white    :714  
##  NA's     :  2  
##                 
##                 
##                 
## 

Exercise 1

A case in this data would be a pregnancy event. We have 1000 cases.

boxplot(nc$weight~nc$habit)

Exercise 2

It seems that mothers who smoke have babies with lower weight.

by(nc$weight, nc$habit,mean)
## nc$habit: nonsmoker
## [1] 7.144273
## -------------------------------------------------------- 
## nc$habit: smoker
## [1] 6.82873

Exercise 3

Sample should be random, normal (n more than 30), and independent. Our samples are bigger than 30. It appears that conditions for inference are met. Our sample is random and it should be less than 10% of all NC births.

by(nc$weight, nc$habit,length)
## nc$habit: nonsmoker
## [1] 873
## -------------------------------------------------------- 
## nc$habit: smoker
## [1] 126

Exercise 4

H0: Smoking mothers have the same average weight of babies as non-smoking mothers H1: Smoking mothers do not have the same average weight of babies as non-smoking mothers.

inference(y=nc$weight, x=nc$habit, est="mean", type="ht", null=0, alternative="twosided", method="theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_nonsmoker = 873, mean_nonsmoker = 7.1443, sd_nonsmoker = 1.5187
## n_smoker = 126, mean_smoker = 6.8287, sd_smoker = 1.3862
## Observed difference between means (nonsmoker-smoker) = 0.3155
## 
## H0: mu_nonsmoker - mu_smoker = 0 
## HA: mu_nonsmoker - mu_smoker != 0 
## Standard error = 0.134 
## Test statistic: Z =  2.359 
## p-value =  0.0184

Exercise 5

95% Confidence interval is (-0.5777, -0.0534)

inference(y=nc$weight, x=nc$habit, est="mean", type="ci", null=0, alternative="twosided", method="theoretical", order=c("smoker","nonsmoker"))
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_smoker = 126, mean_smoker = 6.8287, sd_smoker = 1.3862
## n_nonsmoker = 873, mean_nonsmoker = 7.1443, sd_nonsmoker = 1.5187

## Observed difference between means (smoker-nonsmoker) = -0.3155
## 
## Standard error = 0.1338 
## 95 % Confidence interval = ( -0.5777 , -0.0534 )

On my own

1.

95% Confidence interval is (38.1528, 38.5165). We are 95% confident that the mean of length of pregnancy is between 38.1528 and 38.5165 for 2004 NC population.

inference(y=nc$weeks, est="mean", type="ci", null=0, alternative="twosided", method="theoretical")
## Single mean 
## Summary statistics:

## mean = 38.3347 ;  sd = 2.9316 ;  n = 998 
## Standard error = 0.0928 
## 95 % Confidence interval = ( 38.1528 , 38.5165 )

2.

90% Confidence interval for mean pregnancy durution of our population is (38.182, 38.4873).

inference(y=nc$weeks, est="mean", type="ci", null=0, alternative="twosided", method="theoretical", conflevel=0.9)
## Single mean 
## Summary statistics:

## mean = 38.3347 ;  sd = 2.9316 ;  n = 998 
## Standard error = 0.0928 
## 90 % Confidence interval = ( 38.182 , 38.4873 )

3.

H0: Average weight gain by young mothers is the same as by mature mothers. H1: Average weight gain by young mothers is not the same as by mature mothers.

P value is high, so we do not reject H0. We do not have enough evidence to conclude that young mothers on average gain different amount of weight than mature mothers.

inference(y=nc$gained, x=nc$mature, est="mean", type="ht", null=0, alternative="twosided", method="theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_mature mom = 129, mean_mature mom = 28.7907, sd_mature mom = 13.4824
## n_younger mom = 844, mean_younger mom = 30.5604, sd_younger mom = 14.3469
## Observed difference between means (mature mom-younger mom) = -1.7697
## 
## H0: mu_mature mom - mu_younger mom = 0 
## HA: mu_mature mom - mu_younger mom != 0 
## Standard error = 1.286 
## Test statistic: Z =  -1.376 
## p-value =  0.1686

4

Using table function we can see that age cutoff for younger mothers is 34. If a woman is 34 or younger she is considered to be a younger mother and in reverse if a woman is 35 or older she is considered to be a mature mother.

table(nc$mature, nc$mage)
##              
##               13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
##   mature mom   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
##   younger mom  1  1  6 10 19 38 35 66 51 60 51 53 54 51 47 53 52 39 52 38
##              
##               33 34 35 36 37 38 39 40 41 42 45 46 50
##   mature mom   0  0 35 31 26 12  7  9  8  2  1  1  1
##   younger mom 45 45  0  0  0  0  0  0  0  0  0  0  0

5

Let’s consider father’s age and if a baby was born premature. We are trying to see if premature babies have on average older fathers.

Let’s formulate our hypotesis:

H0: Average age for premature baby’s fathers is the same as full-term’s. H1: Average age of fathers is not the same for these 2 categories.

P value is very high, so we have no reasons to reject H0. So, we conclude that based on data we have on average there is difference in fathers’ age between premature and full-term babies.

In plain words, we can see from mean results that average age of two groups is practically the same - 30.24 and 30.32. So, on average fathers’ age does not correlate with babies premature status.

inference(y=nc$fage, x=nc$premie, est="mean", type="ht", null=0, alternative="twosided", method="theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_full term = 714, mean_full term = 30.2423, sd_full term = 6.6329
## n_premie = 114, mean_premie = 30.3158, sd_premie = 7.5859
## Observed difference between means (full term-premie) = -0.0735
## 
## H0: mu_full term - mu_premie = 0 
## HA: mu_full term - mu_premie != 0 
## Standard error = 0.753 
## Test statistic: Z =  -0.098 
## p-value =  0.9222