10/8/2015
## [1] 0.0006203474
Due Date: October 8, 2015
Total Points: 30
1 The babyboom dataset (UsingR) contains the time of birth, sex, and birth weight for 44 babies born in one 24-hour period at a hospital in Brisbane, Australia.
require (UsingR)
## Loading required package: UsingR
## Loading required package: MASS
## Loading required package: HistData
## Loading required package: Hmisc
## Loading required package: grid
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'Hmisc'
##
## The following objects are masked from 'package:base':
##
## format.pval, round.POSIXt, trunc.POSIXt, units
##
##
## Attaching package: 'UsingR'
##
## The following object is masked from 'package:ggplot2':
##
## movies
##
## The following object is masked from 'package:survival':
##
## cancer
bb = babyboom
ggplot(bb, aes(x = factor(gender), y = wt)) + geom_boxplot(fill = "#FF3320",
color = "black") + xlab("Gender") + ylab("Weight in grams (g)") + theme_bw()
t.test(wt ~ gender, data = bb)
##
## Welch Two Sample t-test
##
## data: wt by gender
## t = -1.4211, df = 27.631, p-value = 0.1665
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -593.1538 107.4273
## sample estimates:
## mean in group girl mean in group boy
## 3132.444 3375.308
Based on the p value we fail to reject the null hypthesis in this case.
2 The BushApproval dataset (UsingR) contains approval ratings (%) for George W. Bush from different polling outlets. Perform a t-test under the hypothesis that there is no difference in approval rating between Fox and UPenn versus the alternative that there is a difference. Hint: Subset the data first. The ‘or’ logical predicate is indicate by the vertical line | on your keyboard. (5)
ba = BushApproval
head(ba)
## date approval who
## 1 2/4/04 53 fox
## 2 1/21/04 53 fox
## 3 1/7/04 58 fox
## 4 12/3/03 52 fox
## 5 11/18/03 52 fox
## 6 10/28/03 53 fox
3 The mtcars dataset contains the miles per gallon and whether or not the transmission is automatic (0 = automatic, 1 = manual) for 32 automobiles.
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
library(ggplot2)
ggplot(mtcars,aes(mpg))+geom_histogram(binwidth=3)
wilcox.test(mpg ~ am, data = mtcars)
## Warning in wilcox.test.default(x = c(21.4, 18.7, 18.1, 14.3, 24.4, 22.8, :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: mpg by am
## W = 42, p-value = 0.001871
## alternative hypothesis: true location shift is not equal to 0
The P Value is below 0.1 so we will rejct the null hypthesis based on the test and the table output.
4 The data set diamond (UsingR) contains data about the price of 48 diamond rings. The variable price records the price in Singapore dollars and the variable carat records the size of the diamond and you are interested in predicting price from carat size.
di = diamond
head(di)
## carat price
## 1 0.17 355
## 2 0.16 328
## 3 0.17 350
## 4 0.18 325
## 5 0.25 642
## 6 0.16 342
diplot = ggplot(di, aes(x = carat, y = price)) + geom_point(size = 3, col = "red") +
xlab("Carats") + ylab("Price") + theme_bw()
diplot
diplot + geom_smooth(method = lm, se = FALSE, color = "green")
dimodel = lm(price ~ carat, data = di)
predict(dimodel, data.frame(carat = 1/3))
## 1
## 980.7157