download.file("http://www.openintro.org/stat/data/nc.RData", destfile = "nc.RData")
load("nc.RData")

What are the cases in this data set? How many cases are there in our sample?

the cases are tha variables there are 13

summary(nc)
##       fage            mage            mature        weeks             premie   
##  Min.   :14.00   Min.   :13   mature mom :133   Min.   :20.00   full term:846  
##  1st Qu.:25.00   1st Qu.:22   younger mom:867   1st Qu.:37.00   premie   :152  
##  Median :30.00   Median :27                     Median :39.00   NA's     :  2  
##  Mean   :30.26   Mean   :27                     Mean   :38.33                  
##  3rd Qu.:35.00   3rd Qu.:32                     3rd Qu.:40.00                  
##  Max.   :55.00   Max.   :50                     Max.   :45.00                  
##  NA's   :171                                    NA's   :2                      
##      visits            marital        gained          weight      
##  Min.   : 0.0   married    :386   Min.   : 0.00   Min.   : 1.000  
##  1st Qu.:10.0   not married:613   1st Qu.:20.00   1st Qu.: 6.380  
##  Median :12.0   NA's       :  1   Median :30.00   Median : 7.310  
##  Mean   :12.1                     Mean   :30.33   Mean   : 7.101  
##  3rd Qu.:15.0                     3rd Qu.:38.00   3rd Qu.: 8.060  
##  Max.   :30.0                     Max.   :85.00   Max.   :11.750  
##  NA's   :9                        NA's   :27                      
##  lowbirthweight    gender          habit          whitemom  
##  low    :111    female:503   nonsmoker:873   not white:284  
##  not low:889    male  :497   smoker   :126   white    :714  
##                              NA's     :  1   NA's     :  2  
##                                                             
##                                                             
##                                                             
## 

Boxplot of weight and habit

boxplot(nc$weight, nc$habit)

mean of weight by habit

by(nc$weight, nc$habit, mean)
## nc$habit: nonsmoker
## [1] 7.144273
## ------------------------------------------------------------ 
## nc$habit: smoker
## [1] 6.82873

sample size

by(nc$weight, nc$habit, length)
## nc$habit: nonsmoker
## [1] 873
## ------------------------------------------------------------ 
## nc$habit: smoker
## [1] 126

since there is a large sample size and the mean of the smoker kids are smaller you could hypothisis that smoking has an effect on childs weights.

more information on interval

average weight comparision

inference(y = nc$weight, x = nc$habit, est = "mean", type = "ht", null = 0, 
          alternative = "twosided", method = "theoretical")
## Warning: package 'BHH2' was built under R version 3.6.3
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_nonsmoker = 873, mean_nonsmoker = 7.1443, sd_nonsmoker = 1.5187
## n_smoker = 126, mean_smoker = 6.8287, sd_smoker = 1.3862
## Observed difference between means (nonsmoker-smoker) = 0.3155
## 
## H0: mu_nonsmoker - mu_smoker = 0 
## HA: mu_nonsmoker - mu_smoker != 0 
## Standard error = 0.134 
## Test statistic: Z =  2.359 
## p-value =  0.0184

weight confidence interval

inference(y = nc$weight, x = nc$habit, est = "mean", type = "ci", null = 0, 
          alternative = "twosided", method = "theoretical", 
          order = c("smoker","nonsmoker"))
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_smoker = 126, mean_smoker = 6.8287, sd_smoker = 1.3862
## n_nonsmoker = 873, mean_nonsmoker = 7.1443, sd_nonsmoker = 1.5187

## Observed difference between means (smoker-nonsmoker) = -0.3155
## 
## Standard error = 0.1338 
## 95 % Confidence interval = ( -0.5777 , -0.0534 )

on your own

1. Calculate a 95% confidence interval for the average length of pregnancies (weeks) and interpret it in context

by(nc$weeks, nc$habit, length)
## nc$habit: nonsmoker
## [1] 873
## ------------------------------------------------------------ 
## nc$habit: smoker
## [1] 126
inference(y = nc$weeks, x = nc$habit, est = "mean", type = "ci", null = 0, 
          alternative = "twosided", method = "theoretical", 
          order = c("smoker","nonsmoker"))
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_smoker = 126, mean_smoker = 38.4444, sd_smoker = 2.4676
## n_nonsmoker = 872, mean_nonsmoker = 38.3188, sd_nonsmoker = 2.9936

## Observed difference between means (smoker-nonsmoker) = 0.1256
## 
## Standard error = 0.2421 
## 95 % Confidence interval = ( -0.3488 , 0.6001 )

Calculate a new confidence interval for the same parameter at the 90% confidence level

inference(y = nc$weeks, x = nc$habit, est = "mean", type = "ci", null = 0, conflevel = 0.90,
          alternative = "twosided", method = "theoretical", 
          order = c("smoker","nonsmoker"))
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_smoker = 126, mean_smoker = 38.4444, sd_smoker = 2.4676
## n_nonsmoker = 872, mean_nonsmoker = 38.3188, sd_nonsmoker = 2.9936

## Observed difference between means (smoker-nonsmoker) = 0.1256
## 
## Standard error = 0.2421 
## 90 % Confidence interval = ( -0.2725 , 0.5238 )

conduct a hypothesis test evaluating whether the average weight gained by younger mothers is different than the average weight gained by mature mothers.

mean

by(nc$gained, nc$mature, mean)
## nc$mature: mature mom
## [1] NA
## ------------------------------------------------------------ 
## nc$mature: younger mom
## [1] NA

sample size

by(nc$gained, nc$mature, length)
## nc$mature: mature mom
## [1] 133
## ------------------------------------------------------------ 
## nc$mature: younger mom
## [1] 867

is there a diffrence in the average amount of weight gained from a young mother verse an older mother.I cannot figure out how to get the mean it’s just comming up as N/A. used gained and mature.

Now, a non-inference task: Determine the age cutoff for younger and mature mothers. Use a method of your choice, and explain how your method works.

by(nc$fage, nc$mature, length)
## nc$mature: mature mom
## [1] 133
## ------------------------------------------------------------ 
## nc$mature: younger mom
## [1] 867

i was trying to find the point where 133/(133+867) was on fage, and then find the value in the funtion, but i couln’t get the code to work

Pick a pair of numerical and categorical variables and come up with a research question evaluating the relationship between these variables.

the realtioship between weight and gender. does gender effect the weight of the baby at birth?

inference(y = nc$weight, x = nc$gender, est = "mean", type = "ci", null = 0, 
          alternative = "twosided", method = "theoretical", 
          order = c("female","male"))
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_female = 503, mean_female = 6.9029, sd_female = 1.4759
## n_male = 497, mean_male = 7.3015, sd_male = 1.5168

## Observed difference between means (female-male) = -0.3986
## 
## Standard error = 0.0947 
## 95 % Confidence interval = ( -0.5841 , -0.2131 )
by(nc$weight, nc$gender, mean)
## nc$gender: female
## [1] 6.902883
## ------------------------------------------------------------ 
## nc$gender: male
## [1] 7.301509

it appears on averge the mean of boys seems to be a bit higher then that of girls, howereve boys are not exclusibly larger.