download.file("http://www.openintro.org/stat/data/nc.RData", destfile = "nc.RData")
load("nc.RData")
What are the cases in this data set? How many cases are there in our sample?
the cases are tha variables there are 13
summary(nc)
## fage mage mature weeks premie
## Min. :14.00 Min. :13 mature mom :133 Min. :20.00 full term:846
## 1st Qu.:25.00 1st Qu.:22 younger mom:867 1st Qu.:37.00 premie :152
## Median :30.00 Median :27 Median :39.00 NA's : 2
## Mean :30.26 Mean :27 Mean :38.33
## 3rd Qu.:35.00 3rd Qu.:32 3rd Qu.:40.00
## Max. :55.00 Max. :50 Max. :45.00
## NA's :171 NA's :2
## visits marital gained weight
## Min. : 0.0 married :386 Min. : 0.00 Min. : 1.000
## 1st Qu.:10.0 not married:613 1st Qu.:20.00 1st Qu.: 6.380
## Median :12.0 NA's : 1 Median :30.00 Median : 7.310
## Mean :12.1 Mean :30.33 Mean : 7.101
## 3rd Qu.:15.0 3rd Qu.:38.00 3rd Qu.: 8.060
## Max. :30.0 Max. :85.00 Max. :11.750
## NA's :9 NA's :27
## lowbirthweight gender habit whitemom
## low :111 female:503 nonsmoker:873 not white:284
## not low:889 male :497 smoker :126 white :714
## NA's : 1 NA's : 2
##
##
##
##
Boxplot of weight and habit
boxplot(nc$weight, nc$habit)

mean of weight by habit
by(nc$weight, nc$habit, mean)
## nc$habit: nonsmoker
## [1] 7.144273
## ------------------------------------------------------------
## nc$habit: smoker
## [1] 6.82873
sample size
by(nc$weight, nc$habit, length)
## nc$habit: nonsmoker
## [1] 873
## ------------------------------------------------------------
## nc$habit: smoker
## [1] 126
since there is a large sample size and the mean of the smoker kids are smaller you could hypothisis that smoking has an effect on childs weights.
more information on interval
average weight comparision
inference(y = nc$weight, x = nc$habit, est = "mean", type = "ht", null = 0,
alternative = "twosided", method = "theoretical")
## Warning: package 'BHH2' was built under R version 3.6.3
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_nonsmoker = 873, mean_nonsmoker = 7.1443, sd_nonsmoker = 1.5187
## n_smoker = 126, mean_smoker = 6.8287, sd_smoker = 1.3862
## Observed difference between means (nonsmoker-smoker) = 0.3155
##
## H0: mu_nonsmoker - mu_smoker = 0
## HA: mu_nonsmoker - mu_smoker != 0
## Standard error = 0.134
## Test statistic: Z = 2.359
## p-value = 0.0184

weight confidence interval
inference(y = nc$weight, x = nc$habit, est = "mean", type = "ci", null = 0,
alternative = "twosided", method = "theoretical",
order = c("smoker","nonsmoker"))
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_smoker = 126, mean_smoker = 6.8287, sd_smoker = 1.3862
## n_nonsmoker = 873, mean_nonsmoker = 7.1443, sd_nonsmoker = 1.5187

## Observed difference between means (smoker-nonsmoker) = -0.3155
##
## Standard error = 0.1338
## 95 % Confidence interval = ( -0.5777 , -0.0534 )
on your own
1. Calculate a 95% confidence interval for the average length of pregnancies (weeks) and interpret it in context
by(nc$weeks, nc$habit, length)
## nc$habit: nonsmoker
## [1] 873
## ------------------------------------------------------------
## nc$habit: smoker
## [1] 126
inference(y = nc$weeks, x = nc$habit, est = "mean", type = "ci", null = 0,
alternative = "twosided", method = "theoretical",
order = c("smoker","nonsmoker"))
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_smoker = 126, mean_smoker = 38.4444, sd_smoker = 2.4676
## n_nonsmoker = 872, mean_nonsmoker = 38.3188, sd_nonsmoker = 2.9936

## Observed difference between means (smoker-nonsmoker) = 0.1256
##
## Standard error = 0.2421
## 95 % Confidence interval = ( -0.3488 , 0.6001 )
Calculate a new confidence interval for the same parameter at the 90% confidence level
inference(y = nc$weeks, x = nc$habit, est = "mean", type = "ci", null = 0, conflevel = 0.90,
alternative = "twosided", method = "theoretical",
order = c("smoker","nonsmoker"))
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_smoker = 126, mean_smoker = 38.4444, sd_smoker = 2.4676
## n_nonsmoker = 872, mean_nonsmoker = 38.3188, sd_nonsmoker = 2.9936

## Observed difference between means (smoker-nonsmoker) = 0.1256
##
## Standard error = 0.2421
## 90 % Confidence interval = ( -0.2725 , 0.5238 )
conduct a hypothesis test evaluating whether the average weight gained by younger mothers is different than the average weight gained by mature mothers.
mean
by(nc$gained, nc$mature, mean)
## nc$mature: mature mom
## [1] NA
## ------------------------------------------------------------
## nc$mature: younger mom
## [1] NA
sample size
by(nc$gained, nc$mature, length)
## nc$mature: mature mom
## [1] 133
## ------------------------------------------------------------
## nc$mature: younger mom
## [1] 867
is there a diffrence in the average amount of weight gained from a young mother verse an older mother.I cannot figure out how to get the mean it’s just comming up as N/A. used gained and mature.
Now, a non-inference task: Determine the age cutoff for younger and mature mothers. Use a method of your choice, and explain how your method works.
by(nc$fage, nc$mature, length)
## nc$mature: mature mom
## [1] 133
## ------------------------------------------------------------
## nc$mature: younger mom
## [1] 867
i was trying to find the point where 133/(133+867) was on fage, and then find the value in the funtion, but i couln’t get the code to work
Pick a pair of numerical and categorical variables and come up with a research question evaluating the relationship between these variables.
the realtioship between weight and gender. does gender effect the weight of the baby at birth?
inference(y = nc$weight, x = nc$gender, est = "mean", type = "ci", null = 0,
alternative = "twosided", method = "theoretical",
order = c("female","male"))
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_female = 503, mean_female = 6.9029, sd_female = 1.4759
## n_male = 497, mean_male = 7.3015, sd_male = 1.5168

## Observed difference between means (female-male) = -0.3986
##
## Standard error = 0.0947
## 95 % Confidence interval = ( -0.5841 , -0.2131 )
by(nc$weight, nc$gender, mean)
## nc$gender: female
## [1] 6.902883
## ------------------------------------------------------------
## nc$gender: male
## [1] 7.301509
it appears on averge the mean of boys seems to be a bit higher then that of girls, howereve boys are not exclusibly larger.