download.file("http://www.openintro.org/stat/data/nc.RData", destfile = "nc.RData")
load("nc.RData")
names(nc)
## [1] "fage" "mage" "mature" "weeks"
## [5] "premie" "visits" "marital" "gained"
## [9] "weight" "lowbirthweight" "gender" "habit"
## [13] "whitemom"
#__________ ___________
# fage father’s age in years. # mage mother’s age in years. # mature
maturity status of mother. # weeks length of pregnancy in weeks. #
premie whether the birth was classified as premature (premie) or
full-term. # visits number of hospital visits during pregnancy. #
marital whether mother is married or not married at birth. # gained
weight gained by mother during pregnancy in pounds. # weight weight of
the baby at birth in pounds. # lowbirthweight whether baby was
classified as low birthweight (low) or not (not low). # gender gender of
the baby, female or male. # habit status of the mother as a nonsmoker or
a smoker. # whitemom whether mom is white or not white. We have
observations on 13 different variables, some categorical and some
numerical. # The meaning of each variable is as follows.
dim(nc) summary(nc)
plot(nc\(weight ~ nc\)habit)
boxplot(nc\(weight ~ nc\)habit)
by(nc\(weight, nc\)habit, mean)
by(nc\(weight, nc\)habit, length) table(nc$habit) # Exercise 4 # Write the hypotheses for testing if the average weights of babies born to smoking and non-smoking mothers are different.
inference(y = nc\(weight, x = nc\)habit, est = “mean”, type = “ht”, null = 0, alternative = “twosided”, method = “theoretical”)
# Let’s pause for a moment to go through the arguments of this custom function. # The first argument is y, which is the response variable that we are interested in: nc\(weight. # The second argument is the explanatory variable, x, which is the variable that splits the data into two groups, # smokers and non-smokers: nc\)habit. # The third argument, est, is the parameter we’re interested in: “mean” (other options are “median”, or “proportion”.) # Next we decide on the type of inference we want: a hypothesis test (“ht”) or a confidence interval (“ci”). # When performing a hypothesis test, we also need to supply the null value, which in this case is 0, # since the null hypothesis sets the two population means equal to each other. # The alternative hypothesis can be “less”, “greater”, or “twosided”. # Lastly, the method of inference can be “theoretical” or “simulation” based.
# the weights of babies born to smoking and non-smoking mothers.
inference(y = nc\(weight, x = nc\)habit, est = “mean”, type = “ci”, null = 0, alternative = “twosided”, method = “theoretical”, order = c(“smoker”,“nonsmoker”))
#On your own
#1.Calculate a 95% confidence interval for the average length of pregnancies (weeks) and interpret it in context. Note that since you’re doing inference on a single population parameter, there is no explanatory variable, so you can omit the x variable from the function.
Summary statistics: mean = 7.101 ; sd = 1.5089 ; n = 1000 Standard error = 0.0477 95 % Confidence interval = ( 7.0075 , 7.1945 )
inference(y = nc$weight, est = "mean", type = "ci", null = 0, alternative = "twosided", method = "theoretical", conflevel = 0.95)
## Warning: package 'openintro' was built under R version 4.2.2
## Warning: package 'airports' was built under R version 4.2.2
## Warning: package 'cherryblossom' was built under R version 4.2.2
## Warning: package 'usdata' was built under R version 4.2.2
## Warning: package 'BHH2' was built under R version 4.2.2
## Single mean
## Summary statistics:
## mean = 7.101 ; sd = 1.5089 ; n = 1000
## Standard error = 0.0477
## 95 % Confidence interval = ( 7.0075 , 7.1945 )
#2.Calculate a new confidence interval for the same parameter at the 90% confidence level. You can change the confidence level by adding a new argument to the function: conflevel = 0.90. Single mean Summary statistics: mean = 7.101 ; sd = 1.5089 ; n = 1000 Standard error = 0.0477 90 % Confidence interval = ( 7.0225 , 7.1795 )
inference(y = nc$weight, est = "mean", type = "ci", null = 0, alternative = "twosided", method = "theoretical", conflevel = 0.90)
## Single mean
## Summary statistics:
## mean = 7.101 ; sd = 1.5089 ; n = 1000
## Standard error = 0.0477
## 90 % Confidence interval = ( 7.0225 , 7.1795 )
#3.Conduct a hypothesis test evaluating whether the average weight gained by younger mothers is different than the average weight gained by mature mothers.
inference (y= nc$weight, x= nc$mature, est = "mean", type = "ht", null = 0, alternative = "twosided", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_mature mom = 133, mean_mature mom = 7.1256, sd_mature mom = 1.6591
## n_younger mom = 867, mean_younger mom = 7.0972, sd_younger mom = 1.4855
## Observed difference between means (mature mom-younger mom) = 0.0283
##
## H0: mu_mature mom - mu_younger mom = 0
## HA: mu_mature mom - mu_younger mom != 0
## Standard error = 0.152
## Test statistic: Z = 0.186
## p-value = 0.8526
#4.Now, a non-inference task: Determine the age cutoff for younger and mature mothers. Use a method of your choice, and explain how your method works.
For young mothers, the age cutoff is 17 years For mature mothers, age cutoff is 18 years
#5.Pick a pair of numerical and categorical variables and come up with a research question evaluating the relationship between these variables. Formulate the question in a way that it can be answered using a hypothesis test and/or a confidence interval. Answer your question using the inference function, report the statistical results, and also provide an explanation in plain language.
inference (y= nc$weeks, x= nc$habit, est = "mean", type = "ht", null = 0, alternative = "twosided", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_nonsmoker = 872, mean_nonsmoker = 38.3188, sd_nonsmoker = 2.9936
## n_smoker = 126, mean_smoker = 38.4444, sd_smoker = 2.4676
## Observed difference between means (nonsmoker-smoker) = -0.1256
##
## H0: mu_nonsmoker - mu_smoker = 0
## HA: mu_nonsmoker - mu_smoker != 0
## Standard error = 0.242
## Test statistic: Z = -0.519
## p-value = 0.6038