Problem Set # 3

Your Name

date()
## [1] "Tue Oct  8 18:20:17 2013"

Due Date: October 17, 2013 Total Points: 30

1 The babyboom dataset (UsingR) contains the time of birth, sex, and birth weight for 44 babies born in one 24-hour period at a hospital in Brisbane, Australia.

a) Create side-by-side box plots of birth weight (grams) by gender. (2)

library(UsingR)
## Loading required package: MASS
library(ggplot2)
## Attaching package: 'ggplot2'
## 
## The following object is masked from 'package:UsingR':
## 
## movies
ggplot(babyboom, aes(x = factor(gender), y = wt)) + geom_boxplot() + xlab("Sex") + 
    ylab("Birth weight (grams)")

plot of chunk sbysBox

b) Perform a t-test under the hypothesis that there is no difference in birth weight against the alternative hypothesis that girls weight less. What do you conclude? (5)

t.test(wt ~ gender, data = babyboom)
## 
##  Welch Two Sample t-test
## 
## data:  wt by gender
## t = -1.421, df = 27.63, p-value = 0.1665
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -593.2  107.4
## sample estimates:
## mean in group girl  mean in group boy 
##               3132               3375
"The p-value of approximately 0.165 indicates there that there is not convincing evidence to reject the null hypothesis of no difference between the mean birth weight of boys and girls."
## [1] "The p-value of approximately 0.165 indicates there that there is not convincing evidence to reject the null hypothesis of no difference between the mean birth weight of boys and girls."

2 The BushApproval dataset (UsingR) contains approval ratings (%) for George W. Bush from different polling outlets. Perform a t-test under the hypothesis that there is no difference in approval rating between Fox and UPenn versus the alternative that there is a difference. Hint: Subset the data first. The 'or' logical predicate is indicate by the vertical line | on your keyboard. (5)

s1 = BushApproval$approval[BushApproval$who == "fox"]
s2 = BushApproval$approval[BushApproval$who == "upenn"]
t.test(s1, s2)
## 
##  Welch Two Sample t-test
## 
## data:  s1 and s2
## t = 4.269, df = 46.21, p-value = 9.65e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   3.791 10.553
## sample estimates:
## mean of x mean of y 
##     65.67     58.50
"The resulting p-value from the t-test of less than 0.001 means there is convincing evidence to reject the null hypothesis of no difference in Pres. Bush's mean approval rating between the Fox and Upenn polling operations. The mean approval rating in Fox polls is significantly higher, seemingly due to Fox's polls being taken from 2001-2004 rather than Upenn's 2003-2004."
## [1] "The resulting p-value from the t-test of less than 0.001 means there is convincing evidence to reject the null hypothesis of no difference in Pres. Bush's mean approval rating between the Fox and Upenn polling operations. The mean approval rating in Fox polls is significantly higher, seemingly due to Fox's polls being taken from 2001-2004 rather than Upenn's 2003-2004."

3 The mtcars dataset contains the miles per gallon and whether or not the transmission is automatic (0 = automatic, 1 = manual) for 32 automobiles.

a) Plot a histogram of the miles per gallon over all cars. Use a bin width of 3 mpg. (3)

hist(mtcars$mpg, breaks = seq(0, 36, 3), main = "Distribution of Vehicle Gas Mileages", 
    xlab = "Miles per gallon", col = "lightblue")

plot of chunk carhist

b) Perform a Mann-Whitney-Wilcoxon test under the hypothesis that there is no difference in mpg between automatic and manual transmission cars without assuming they follow a normal distribution. The alternative is there is a difference. What do you conclude? (5)

wilcox.test(mpg ~ am, data = mtcars)
## Warning: cannot compute exact p-value with ties
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  mpg by am
## W = 42, p-value = 0.001871
## alternative hypothesis: true location shift is not equal to 0
"The resulting p-value from the Mann-Whitney-Wilcoxon test of approximately 0.002 means there is convincing evidence to reject the null hypothesis of no difference in mpg between manual and automatic transmission cars. The mpg of manual transmission vehicles is significantly higher."
## [1] "The resulting p-value from the Mann-Whitney-Wilcoxon test of approximately 0.002 means there is convincing evidence to reject the null hypothesis of no difference in mpg between manual and automatic transmission cars. The mpg of manual transmission vehicles is significantly higher."

4 The data set diamond (UsingR) contains data about the price of 48 diamond rings. The variable price records the price in Singapore dollars and the variable carat records the size of the diamond and you are interested in predicting price from carat size.

a) Make a scatter plot of carat versus price. (3)

p = ggplot(diamond, aes(carat, price)) + geom_point(size = 4) + ggtitle("Carat versus Price of Diamonds")
p

plot of chunk diamondplot

b) Add a linear regression line to the plot. (3)

p = p + geom_smooth(method = lm, se = FALSE, col = "red")
p

plot of chunk diamondplot2

c) Use the model to predict the amount a 1/3 carat diamond ring would cost. (4)

model = lm(price ~ carat, data = diamond)
predict(model, data.frame(carat = 1/3))
##     1 
## 980.7