Problem Set # 3

Tyler Fricker

date()
## [1] "Tue Oct 15 12:43:35 2013"

Due Date: October 17, 2013 Total Points: 30

1 The babyboom dataset (UsingR) contains the time of birth, sex, and birth weight for 44 babies born in one 24-hour period at a hospital in Brisbane, Australia.

require(UsingR)
## Loading required package: UsingR Loading required package: MASS
require(ggplot2)
## Loading required package: ggplot2
## 
## Attaching package: 'ggplot2'
## 
## The following object is masked from 'package:UsingR':
## 
## movies

a) Create side-by-side box plots of birth weight (grams) by gender. (2)

str(babyboom)
## 'data.frame':    44 obs. of  4 variables:
##  $ clock.time  : num  5 104 118 155 257 405 407 422 431 708 ...
##  $ gender      : Factor w/ 2 levels "girl","boy": 1 1 2 2 2 1 1 2 2 2 ...
##  $ wt          : num  3837 3334 3554 3838 3625 ...
##  $ running.time: num  5 64 78 115 177 245 247 262 271 428 ...
ggplot(babyboom, aes(x = gender, y = wt)) + geom_boxplot() + ylab("birth weight (grams)") + 
    xlab("gender")

plot of chunk unnamed-chunk-3

b) Perform a t-test under the hypothesis that there is no difference in birth weight against the alternative hypothesis that girls weight less. What do you conclude? (5)

t.test(wt ~ gender, data = babyboom, alternative = "less")
## 
##  Welch Two Sample t-test
## 
## data:  wt by gender
## t = -1.421, df = 27.63, p-value = 0.08324
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##  -Inf   48
## sample estimates:
## mean in group girl  mean in group boy 
##               3132               3375

Due to the p-value of .08324, there is suggestive, but inconclusive evidence that girls weigh less than boys.

2 The BushApproval dataset (UsingR) contains approval ratings (%) for George W. Bush from different polling outlets. Perform a t-test under the hypothesis that there is no difference in approval rating between Fox and UPenn versus the alternative that there is a difference. Hint: Subset the data first. The 'or' logical predicate is indicate by the vertical line | on your keyboard. (5)

BA = BushApproval
str(BA)
## 'data.frame':    323 obs. of  3 variables:
##  $ date    : chr  "2/4/04" "1/21/04" "1/7/04" "12/3/03" ...
##  $ approval: num  53 53 58 52 52 53 52 50 58 57 ...
##  $ who     : Factor w/ 6 levels "fox","gallup",..: 1 1 1 1 1 1 1 1 1 1 ...
head(BA)
##       date approval who
## 1   2/4/04       53 fox
## 2  1/21/04       53 fox
## 3   1/7/04       58 fox
## 4  12/3/03       52 fox
## 5 11/18/03       52 fox
## 6 10/28/03       53 fox
t.test(approval ~ who, data = subset(BA, who == "fox" | who == "upenn"))
## 
##  Welch Two Sample t-test
## 
## data:  approval by who
## t = 4.269, df = 46.21, p-value = 9.65e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   3.791 10.553
## sample estimates:
##   mean in group fox mean in group upenn 
##               65.67               58.50

3 The mtcars dataset contains the miles per gallon and whether or not the transmission is automatic (0 = automatic, 1 = manual) for 32 automobiles.

a) Plot a histogram of the miles per gallon over all cars. Use a bin width of 3 mpg. (3)

str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
ggplot(mtcars, aes(mpg)) + geom_histogram(binwidth = 3) + xlab("miles per gallon")

plot of chunk unnamed-chunk-6

b) Perform a Mann-Whitney-Wilcoxon test under the hypothesis that there is no difference in mpg between automatic and manual transmission cars without assuming they follow a normal distribution. The alternative is there is a difference. What do you conclude? (5)

auto = mtcars$mpg[mtcars$am == 0]
man = mtcars$mpg[mtcars$am == 1]
wilcox.test(auto, man)
## Warning: cannot compute exact p-value with ties
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  auto and man
## W = 42, p-value = 0.001871
## alternative hypothesis: true location shift is not equal to 0

Due to the p-value of .001871, there is convincing evidence that there is a difference in mpg based on automatic and manual transmission cars.

4 The data set diamond (UsingR) contains data about the price of 48 diamond rings. The variable price records the price in Singapore dollars and the variable carat records the size of the diamond and you are interested in predicting price from carat size.

a) Make a scatter plot of carat versus price. (3)

P = ggplot(diamond, aes(x = carat, y = price)) + geom_point()
P

plot of chunk unnamed-chunk-8

b) Add a linear regression line to the plot. (3)

P + geom_smooth(method = lm, se = FALSE)

plot of chunk unnamed-chunk-9

c) Use the model to predict the amount a 1/3 carat diamond ring would cost. (4)

model = lm(price ~ carat, data = diamond)
predict(model, data.frame(carat = 1/3))
##     1 
## 980.7