Problem Set # 3

Camille Hatch

date()

## [1] "Thu Oct 18 16:56:11 2012"

Due Date: October 18, 2012
Total Points: 38

1 The use of a cell phone while driving is hypothosized to increase the chance of an accident. The data set reaction.time (UsingR) is simulated data on the time it takes to react to an external event while driving. Subjects with control=C are not using a cell phone, and those with control=T are. The time to respond to some external event is recorded in seconds.
a) Perform a t-test on the difference in mean reaction time for groups T and C. What do you conclude. (2)
b) Repeat the test separately for women and men. What do you conclude? (4)
c) Repeat the test separately for the two age groups. What do you conclude? (4)

require(UsingR)

## Loading required package: UsingR

## Loading required package: MASS

attach(reaction.time)
reaction.time

##      age gender control  time
## 1  16-24      F       T 1.360
## 2  16-24      M       T 1.468
## 3  16-24      M       T 1.512
## 4  16-24      F       T 1.391
## 5  16-24      M       T 1.384
## 6  16-24      M       C 1.394
## 7  16-24      M       T 1.419
## 8  16-24      F       T 1.461
## 9  16-24      F       T 1.382
## 10 16-24      M       C 1.260
## 11 16-24      M       T 1.293
## 12 16-24      F       T 1.331
## 13 16-24      M       T 1.369
## 14 16-24      F       C 1.330
## 15 16-24      F       T 1.345
## 16 16-24      M       C 1.358
## 17 16-24      F       T 1.520
## 18 16-24      M       T 1.352
## 19 16-24      F       T 1.348
## 20 16-24      F       T 1.373
## 21   25+      M       C 1.307
## 22   25+      F       C 1.384
## 23   25+      M       T 1.346
## 24   25+      M       C 1.595
## 25   25+      M       T 1.342
## 26   25+      F       T 1.472
## 27   25+      F       T 1.441
## 28   25+      M       T 1.503
## 29   25+      F       T 1.502
## 30   25+      F       T 1.563
## 31   25+      F       C 1.476
## 32   25+      F       T 1.475
## 33   25+      M       C 1.312
## 34   25+      F       T 1.447
## 35   25+      M       T 1.463
## 36   25+      M       C 1.293
## 37   25+      F       C 1.403
## 38   25+      M       C 1.245
## 39   25+      F       T 1.476
## 40   25+      M       T 1.520
## 41   25+      F       T 1.437
## 42   25+      F       C 1.584
## 43   25+      F       T 1.499
## 44   25+      M       C 1.437
## 45   25+      F       T 1.524
## 46   25+      F       T 1.511
## 47   25+      F       C 1.523
## 48   25+      M       C 1.462
## 49   25+      F       C 1.314
## 50   25+      F       T 1.440
## 51   25+      M       T 1.445
## 52   25+      F       C 1.317
## 53   25+      M       T 1.517
## 54   25+      M       T 1.536
## 55   25+      M       T 1.445
## 56   25+      M       C 1.443
## 57   25+      M       C 1.355
## 58   25+      F       T 1.615
## 59   25+      M       T 1.528
## 60   25+      M       T 1.467

t.test(time ~ control, data = reaction.time)

## 
##  Welch Two Sample t-test
## 
## data:  time by control 
## t = -2.205, df = 29.83, p-value = 0.03529
## alternative hypothesis: true difference in means is not equal to 0 
## 95 percent confidence interval:
##  -0.107793 -0.004122 
## sample estimates:
## mean in group C mean in group T 
##           1.390           1.446

detach(reaction.time)

Reaction time is affected by using cell phones while driving.

require(UsingR)
attach(reaction.time)
head(reaction.time)

##     age gender control  time
## 1 16-24      F       T 1.360
## 2 16-24      M       T 1.468
## 3 16-24      M       T 1.512
## 4 16-24      F       T 1.391
## 5 16-24      M       T 1.384
## 6 16-24      M       C 1.394

t.test(time[gender == "F"] ~ control[gender == "F"])

## 
##  Welch Two Sample t-test
## 
## data:  time[gender == "F"] by control[gender == "F"] 
## t = -0.875, df = 9.966, p-value = 0.4021
## alternative hypothesis: true difference in means is not equal to 0 
## 95 percent confidence interval:
##  -0.12179  0.05313 
## sample estimates:
## mean in group C mean in group T 
##           1.416           1.451

t.test(time[gender == "M"] ~ control[gender == "M"])

## 
##  Welch Two Sample t-test
## 
## data:  time[gender == "M"] by control[gender == "M"] 
## t = -1.989, df = 19.13, p-value = 0.06121
## alternative hypothesis: true difference in means is not equal to 0 
## 95 percent confidence interval:
##  -0.138626  0.003504 
## sample estimates:
## mean in group C mean in group T 
##           1.372           1.439

detach(reaction.time)

Womens p-value is approx .4, we fail to reject null hypothesis completely (moderte overlap in reaction time and control amongst women) based on data given. Mens p-value is approx .06 indicating smaller relationship between time and control. Although neither cause rejection of null, more tests could be performed to determine why p-values between gender are different.

attach(reaction.time)
t.test(time[age == "16-24"] ~ control[age == "16-24"])

## 
##  Welch Two Sample t-test
## 
## data:  time[age == "16-24"] by control[age == "16-24"] 
## t = -1.8, df = 5.191, p-value = 0.1296
## alternative hypothesis: true difference in means is not equal to 0 
## 95 percent confidence interval:
##  -0.14171  0.02422 
## sample estimates:
## mean in group C mean in group T 
##           1.336           1.394

t.test(time[age == "25+"] ~ control[age == "25+"])

## 
##  Welch Two Sample t-test
## 
## data:  time[age == "25+"] by control[age == "25+"] 
## t = -2.627, df = 21.58, p-value = 0.01553
## alternative hypothesis: true difference in means is not equal to 0 
## 95 percent confidence interval:
##  -0.13715 -0.01607 
## sample estimates:
## mean in group C mean in group T 
##           1.403           1.480

Between ages 16-24 the p-value is approx .1, thus it is inferred reation time and control for ages 16-24 have significant difference and perhaps influence eachother. For 25+ age group (p-value approx .02) cell phones also have an affect with slightly smaller overlap. It can be argued using a cell phone while driving regardless of age affects reaction time although it affects 16-24 slightly moreso than 25+.

2 The data set diamond (UsingR) contains data about the price of 48 diamond rings. The variable price records the price in Singapore dollars and the variable carat records the size of the diamond and you are interested in predicting price from carat size.
a) Make a scatter plot of carat versus price. (2)
b) Add a linear regression line to the plot. (2)
c) Use the model to predict the amount a 1/3 carat diamond ring would cost. (4)

require(UsingR)
attach(diamond)
(diamond)

##    carat price
## 1   0.17   355
## 2   0.16   328
## 3   0.17   350
## 4   0.18   325
## 5   0.25   642
## 6   0.16   342
## 7   0.15   322
## 8   0.19   485
## 9   0.21   483
## 10  0.15   323
## 11  0.18   462
## 12  0.28   823
## 13  0.16   336
## 14  0.20   498
## 15  0.23   595
## 16  0.29   860
## 17  0.12   223
## 18  0.26   663
## 19  0.25   750
## 20  0.27   720
## 21  0.18   468
## 22  0.16   345
## 23  0.17   352
## 24  0.16   332
## 25  0.17   353
## 26  0.18   438
## 27  0.17   318
## 28  0.18   419
## 29  0.17   346
## 30  0.15   315
## 31  0.17   350
## 32  0.32   918
## 33  0.32   919
## 34  0.15   298
## 35  0.16   339
## 36  0.16   338
## 37  0.23   595
## 38  0.23   553
## 39  0.17   345
## 40  0.33   945
## 41  0.25   655
## 42  0.35  1086
## 43  0.18   443
## 44  0.25   678
## 45  0.25   675
## 46  0.15   287
## 47  0.26   693
## 48  0.15   316

require(ggplot2)

## Loading required package: ggplot2

## Attaching package: 'ggplot2'

## The following object(s) are masked from 'package:UsingR':
## 
## movies

D = ggplot(diamond, aes(x = carat, y = price)) + geom_point() + ylab("Price") + 
    xlab("Carat")
D

plot of chunk unnamed-chunk-5

D + geom_smooth(method = lm, col = "magenta", se = FALSE)

plot of chunk unnamed-chunk-6

model = lm(price ~ carat, data = diamond)
predict(model, data.frame(carat = 1/3))

##     1 
## 980.7

detach(diamond)

3 The data set trees contains the girth (inches), height (feet) and volume of timber from 31 felled Black Cherry trees. Suppose you want to predict the volume of timber from a measure of girth.
a) Create a scatter plot of the data and label the axes. (4)
b) Add a linear regression line to the plot. (2)
c) Determine the sum of squared residuals? (4)
d) Repeat a, b, and c but use the square of the girth instead of girth as the explanatory variable. Which model do you prefer and why? (10)

3A-D

attach(trees)
T = ggplot(trees, aes(x = Girth, y = Volume)) + geom_point() + xlab("Girth (inches)") + 
    ylab("Volume")
T

plot of chunk unnamed-chunk-8

T + geom_smooth(method = lm, col = "purple", se = FALSE)

plot of chunk unnamed-chunk-8

sum(residuals(lm(trees$Volume ~ trees$Girth))^2)

## [1] 524.3

G2 = (Girth)^2
T2 = ggplot(trees, aes(x = G2, y = Volume)) + geom_point() + xlab("Girth (inches squared)") + 
    ylab("Volume")
T2

plot of chunk unnamed-chunk-8

T2 + geom_smooth(method = lm, col = "blue", se = FALSE)

plot of chunk unnamed-chunk-8

sum(residuals(lm(trees$Volume ~ G2))^2)

## [1] 329.3

detach(trees)