Change the Background Color of the main body of the document

body {
  background-color: #e1dce8;
}
download.file("http://www.openintro.org/stat/data/nc.RData", destfile = "nc.RData")
load("nc.RData")

Exercise 1

What are the cases in this data set? How many cases are there in our sample?

dim(nc)
## [1] 1000   13

Answer: There are 1000 cases in the sample (with 13 variables measured). The cases are births in North Carolina in 2004

summary(nc)
##       fage            mage            mature        weeks             premie   
##  Min.   :14.00   Min.   :13   mature mom :133   Min.   :20.00   full term:846  
##  1st Qu.:25.00   1st Qu.:22   younger mom:867   1st Qu.:37.00   premie   :152  
##  Median :30.00   Median :27                     Median :39.00   NA's     :  2  
##  Mean   :30.26   Mean   :27                     Mean   :38.33                  
##  3rd Qu.:35.00   3rd Qu.:32                     3rd Qu.:40.00                  
##  Max.   :55.00   Max.   :50                     Max.   :45.00                  
##  NA's   :171                                    NA's   :2                      
##      visits            marital        gained          weight      
##  Min.   : 0.0   married    :386   Min.   : 0.00   Min.   : 1.000  
##  1st Qu.:10.0   not married:613   1st Qu.:20.00   1st Qu.: 6.380  
##  Median :12.0   NA's       :  1   Median :30.00   Median : 7.310  
##  Mean   :12.1                     Mean   :30.33   Mean   : 7.101  
##  3rd Qu.:15.0                     3rd Qu.:38.00   3rd Qu.: 8.060  
##  Max.   :30.0                     Max.   :85.00   Max.   :11.750  
##  NA's   :9                        NA's   :27                      
##  lowbirthweight    gender          habit          whitemom  
##  low    :111    female:503   nonsmoker:873   not white:284  
##  not low:889    male  :497   smoker   :126   white    :714  
##                              NA's     :  1   NA's     :  2  
##                                                             
##                                                             
##                                                             
## 
boxplot(nc$fage, main = "Father's Age")

boxplot(nc$mage, main = "Mother's Age")

boxplot(nc$weeks, main = "Length of Pregnancy in Weeks")

boxplot(nc$visits, main = "Number of Hospital Visits")

boxplot(nc$gained, main = "Mother's Weight Gain During Pregnancy")

boxplot(nc$weight, main = "Baby's Birth Weight")

Exercise 2

Make a side-by-side boxplot of habit and weight. What does the plot highlight about the relationship between these two variables?

boxplot(nc$weight ~ nc$habit, main = "Mother's Smoking Habits and Newborn Weight Gain")

Exercise 2

by(nc$weight, nc$habit, mean)
## nc$habit: nonsmoker
## [1] 7.144273
## ------------------------------------------------------------ 
## nc$habit: smoker
## [1] 6.82873

Exercise 3

Check if the conditions necessary for inference are satisfied. Note that you will need to obtain sample sizes to check the conditions. You can compute the group size using the same by command above but replacing mean with length.

by(nc$weight, nc$habit, length)
## nc$habit: nonsmoker
## [1] 873
## ------------------------------------------------------------ 
## nc$habit: smoker
## [1] 126

Answer: Samples are randomly selected and independent. Sample sizes of the two groups are greater than 30, so basic conditions for calculating inference are satisfied.

Exercise 4

Write the hypotheses for testing if the average weights of babies born to smoking and non-smoking mothers are different.

Answer:

Ho: mean birth weight for babies of mothers who smoke is equal to mean birth weight for babies of mothers who do not smoke.

inference(y = nc$weight, x = nc$habit, est = "mean", type = "ht", null = 0,
          alternative = "twosided", method = "theoretical")
## Warning: package 'openintro' was built under R version 4.1.1
## Warning: package 'airports' was built under R version 4.1.1
## Warning: package 'cherryblossom' was built under R version 4.1.1
## Warning: package 'usdata' was built under R version 4.1.1
## Warning: package 'BHH2' was built under R version 4.1.1
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_nonsmoker = 873, mean_nonsmoker = 7.1443, sd_nonsmoker = 1.5187
## n_smoker = 126, mean_smoker = 6.8287, sd_smoker = 1.3862
## Observed difference between means (nonsmoker-smoker) = 0.3155
## 
## H0: mu_nonsmoker - mu_smoker = 0 
## HA: mu_nonsmoker - mu_smoker != 0 
## Standard error = 0.134 
## Test statistic: Z =  2.359 
## p-value =  0.0184

Answer:
p-value = 0.0184 There is strong evidence that the mean birthweight for smokers is not equal to the mean birthweight for nonsmokers.

Exercise 5

Change the type argument to “ci” to construct and record a confidence interval for the difference between the weights of babies born to smoking and non-smoking mothers.

inference(y = nc$weight, x = nc$habit, est = "mean", type = "ci",
           method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_nonsmoker = 873, mean_nonsmoker = 7.1443, sd_nonsmoker = 1.5187
## n_smoker = 126, mean_smoker = 6.8287, sd_smoker = 1.3862

## Observed difference between means (nonsmoker-smoker) = 0.3155
## 
## Standard error = 0.1338 
## 95 % Confidence interval = ( 0.0534 , 0.5777 )

Answer: We are 95% confident that the true mean difference in birthweights between smokers and nonsmokers is ( 0.0534 , 0.5777 ) lbs. Zero is not included, so we see that nonsmokers have a higher mean birthweight.

inference(y = nc$weight, x = nc$habit, est = "mean", type = "ci", null = 0, 
          alternative = "twosided", method = "theoretical", 
          order = c("smoker","nonsmoker"))
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_smoker = 126, mean_smoker = 6.8287, sd_smoker = 1.3862
## n_nonsmoker = 873, mean_nonsmoker = 7.1443, sd_nonsmoker = 1.5187

## Observed difference between means (smoker-nonsmoker) = -0.3155
## 
## Standard error = 0.1338 
## 95 % Confidence interval = ( -0.5777 , -0.0534 )

On Your

1

inference(y = nc$weeks, est = "mean", type = "ci", null = 0, 
          alternative = "twosided", method = "theoretical")
## Single mean 
## Summary statistics:

## mean = 38.3347 ;  sd = 2.9316 ;  n = 998 
## Standard error = 0.0928 
## 95 % Confidence interval = ( 38.1528 , 38.5165 )

2

inference(y = nc$weeks, est = "mean", type = "ci", null = 0, 
          alternative = "twosided", conflevel = 0.90, method = "theoretical")
## Single mean 
## Summary statistics:

## mean = 38.3347 ;  sd = 2.9316 ;  n = 998 
## Standard error = 0.0928 
## 90 % Confidence interval = ( 38.182 , 38.4873 )

3

inference(y = nc$gained, x = nc$mature, est = "mean", type = "ht", null = 0,
          alternative = "twosided", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_mature mom = 129, mean_mature mom = 28.7907, sd_mature mom = 13.4824
## n_younger mom = 844, mean_younger mom = 30.5604, sd_younger mom = 14.3469
## Observed difference between means (mature mom-younger mom) = -1.7697
## 
## H0: mu_mature mom - mu_younger mom = 0 
## HA: mu_mature mom - mu_younger mom != 0 
## Standard error = 1.286 
## Test statistic: Z =  -1.376 
## p-value =  0.1686

4

inference(y = nc$mage, x = nc$mature, est = "mean", type = "ht", null = 0,
          alternative = "twosided", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_mature mom = 133, mean_mature mom = 37.1805, sd_mature mom = 2.4303
## n_younger mom = 867, mean_younger mom = 25.4383, sd_younger mom = 5.0278
## Observed difference between means (mature mom-younger mom) = 11.7422
## 
## H0: mu_mature mom - mu_younger mom = 0 
## HA: mu_mature mom - mu_younger mom != 0 
## Standard error = 0.271 
## Test statistic: Z =  43.292 
## p-value =  0

by(nc$mage, nc$mature, min)
## nc$mature: mature mom
## [1] 35
## ------------------------------------------------------------ 
## nc$mature: younger mom
## [1] 13
by(nc$mage, nc$mature, max)
## nc$mature: mature mom
## [1] 50
## ------------------------------------------------------------ 
## nc$mature: younger mom
## [1] 34

5

inference(y = nc$weight, x = nc$premie, est = "mean", type = "ht", null = 0,
          alternative = "twosided", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_full term = 846, mean_full term = 7.4594, sd_full term = 1.075
## n_premie = 152, mean_premie = 5.1284, sd_premie = 1.9696
## Observed difference between means (full term-premie) = 2.331
## 
## H0: mu_full term - mu_premie = 0 
## HA: mu_full term - mu_premie != 0 
## Standard error = 0.164 
## Test statistic: Z =  14.216 
## p-value =  0