Exercise 1:
pirates <- read.table("http://nathanieldphillips.com/wp-content/uploads/2015/05/pirate_survey_noerrors.txt",
sep = "\t", header = T, stringsAsFactors = F)
head(pirates)
## id sex headband age college tattoos tchests.found parrots.lifetime
## 1 1 female yes 35 JSSFP 18 8 9
## 2 2 male yes 21 CCCC 6 5 1
## 3 3 female yes 27 CCCC 12 8 1
## 4 4 male yes 19 CCCC 9 8 1
## 5 5 male yes 31 CCCC 11 2 13
## 6 6 male yes 21 CCCC 7 1 0
## favorite.pirate sword.type sword.speed
## 1 Blackbeard cutlass 0.0638977084
## 2 Blackbeard cutlass 0.5601675763
## 3 Anicetus cutlass 0.0005400172
## 4 Jack Sparrow cutlass 3.8770396912
## 5 Jack Sparrow cutlass 0.5080594239
## 6 Jack Sparrow cutlass 0.6248019344
Exercise 2:
The function pairs() can create a matrix of scatterplots of different ratio or interval variables in a dataset. Enter the following code to see a matrix of scatterplots for the pirate dataset
pairs(~ age + tattoos + tchests.found + parrots.lifetime + sword.speed,
data = pirates)
Exercise 3: What variables reliably predict the number of treasure chests a pirate has found? Conduct a simple linear regression analysis with treasure chests found as the dependent variable and 3 independent variables: parrots.lifetime, age, and tattoos. Save the model as the object model.1. Then, use the summary() function to see the coefficients. What are your conclusions?
model.1<- lm(tchests.found ~ parrots.lifetime + age + tattoos, data = pirates)
summary(model.1)
##
## Call:
## lm(formula = tchests.found ~ parrots.lifetime + age + tattoos,
## data = pirates)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.566 -5.225 -2.271 2.636 46.003
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.266327 1.407182 0.900 0.368389
## parrots.lifetime -0.007083 0.088349 -0.080 0.936115
## age 0.123838 0.044491 2.783 0.005480 **
## tattoos 0.274414 0.074297 3.693 0.000233 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.785 on 996 degrees of freedom
## Multiple R-squared: 0.02135, Adjusted R-squared: 0.0184
## F-statistic: 7.241 on 3 and 996 DF, p-value: 8.283e-05
model.1$coefficients
## (Intercept) parrots.lifetime age tattoos
## 1.266326826 -0.007083373 0.123837998 0.274413931
model.1$df.residual
## [1] 996
Conclusion: The overall regression analysis was significant F(3,996) = 7,24, p < 0.01 There was no significant effect of parrots.lifetimen (t(996)=-0.08, p=0.94). There was a significant positive effect of age (t(996)=2.78, p < 0.01) and of tattoos (t(996)=3.69, p < 0.01)
Exercise 4: Using the results from the previous question, create a scatterplot with the true values of the dependent variable (treasure chests found) on the x-axis and the model fits on the y-axis. Make the plot look nice with appropriate labels.
plot(x = pirates$tchests.found, y = model.1$fitted.values,
xlab = "True values Treasure Chests found",
ylab = "Parrots.lifetime and age and tattoos", main = "Pirate Treasure Chests found\n True versus Model", pch = 16, col = gray(.05, .15))
Exercise 5: Repeat your analysis from question 3, but only include pirates who are female and have owned less than 5 parrots in their lives. Do your conclusions change?
female.less.5 <- subset(pirates, subset = sex == "female" & parrots.lifetime < 5)
model.female<- lm(tchests.found ~ parrots.lifetime + age + tattoos, data = female.less.5)
summary(model.female)
##
## Call:
## lm(formula = tchests.found ~ parrots.lifetime + age + tattoos,
## data = female.less.5)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.426 -4.722 -2.113 2.851 44.026
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.72969 2.61996 -1.042 0.29821
## parrots.lifetime -0.18443 0.29642 -0.622 0.53423
## age 0.25240 0.08075 3.126 0.00193 **
## tattoos 0.28510 0.11336 2.515 0.01237 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.258 on 338 degrees of freedom
## Multiple R-squared: 0.04689, Adjusted R-squared: 0.03843
## F-statistic: 5.542 on 3 and 338 DF, p-value: 0.001006
model.female$coefficients
## (Intercept) parrots.lifetime age tattoos
## -2.7296867 -0.1844340 0.2523967 0.2851046
model.female$df.residual
## [1] 338
Conclusion: No our conclusion does not change. It is the same as in number 3. (age and tattoos are significant).
Exercise 6: Is there a relationship between whether or not a pirate wears a headband and his/her sword speed? Test this using linear regression. What is your conclusion?
model.2 <- lm(sword.speed ~ headband, data = pirates)
anova(model.2)
## Analysis of Variance Table
##
## Response: sword.speed
## Df Sum Sq Mean Sq F value Pr(>F)
## headband 1 26.0 26.0079 4.1152 0.04276 *
## Residuals 998 6307.3 6.3199
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model.2)
##
## Call:
## lm(formula = sword.speed ~ headband, data = pirates)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.658 -0.895 -0.576 0.063 43.483
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.6576 0.2553 6.494 1.32e-10 ***
## headbandyes -0.5449 0.2686 -2.029 0.0428 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.514 on 998 degrees of freedom
## Multiple R-squared: 0.004107, Adjusted R-squared: 0.003109
## F-statistic: 4.115 on 1 and 998 DF, p-value: 0.04276
Conclusion: Pirates sword speed is faster in pirates wearing a headband