Exercise 1:

pirates <- read.table("http://nathanieldphillips.com/wp-content/uploads/2015/05/pirate_survey_noerrors.txt", 
                      sep = "\t", header = T, stringsAsFactors = F)

head(pirates)
##   id    sex headband age college tattoos tchests.found parrots.lifetime
## 1  1 female      yes  35   JSSFP      18             8                9
## 2  2   male      yes  21    CCCC       6             5                1
## 3  3 female      yes  27    CCCC      12             8                1
## 4  4   male      yes  19    CCCC       9             8                1
## 5  5   male      yes  31    CCCC      11             2               13
## 6  6   male      yes  21    CCCC       7             1                0
##   favorite.pirate sword.type  sword.speed
## 1      Blackbeard    cutlass 0.0638977084
## 2      Blackbeard    cutlass 0.5601675763
## 3        Anicetus    cutlass 0.0005400172
## 4    Jack Sparrow    cutlass 3.8770396912
## 5    Jack Sparrow    cutlass 0.5080594239
## 6    Jack Sparrow    cutlass 0.6248019344

Exercise 2:

The function pairs() can create a matrix of scatterplots of different ratio or interval variables in a dataset. Enter the following code to see a matrix of scatterplots for the pirate dataset

pairs(~ age + tattoos + tchests.found + parrots.lifetime + sword.speed, 
      data = pirates)

Exercise 3: What variables reliably predict the number of treasure chests a pirate has found? Conduct a simple linear regression analysis with treasure chests found as the dependent variable and 3 independent variables: parrots.lifetime, age, and tattoos. Save the model as the object model.1. Then, use the summary() function to see the coefficients. What are your conclusions?

model.1<- lm(tchests.found ~ parrots.lifetime + age + tattoos, data = pirates)
summary(model.1)
## 
## Call:
## lm(formula = tchests.found ~ parrots.lifetime + age + tattoos, 
##     data = pirates)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.566 -5.225 -2.271  2.636 46.003 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       1.266327   1.407182   0.900 0.368389    
## parrots.lifetime -0.007083   0.088349  -0.080 0.936115    
## age               0.123838   0.044491   2.783 0.005480 ** 
## tattoos           0.274414   0.074297   3.693 0.000233 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.785 on 996 degrees of freedom
## Multiple R-squared:  0.02135,    Adjusted R-squared:  0.0184 
## F-statistic: 7.241 on 3 and 996 DF,  p-value: 8.283e-05
model.1$coefficients
##      (Intercept) parrots.lifetime              age          tattoos 
##      1.266326826     -0.007083373      0.123837998      0.274413931
model.1$df.residual
## [1] 996

Conclusion: The overall regression analysis was significant F(3,996) = 7,24, p < 0.01 There was no significant effect of parrots.lifetimen (t(996)=-0.08, p=0.94). There was a significant positive effect of age (t(996)=2.78, p < 0.01) and of tattoos (t(996)=3.69, p < 0.01)

Exercise 4: Using the results from the previous question, create a scatterplot with the true values of the dependent variable (treasure chests found) on the x-axis and the model fits on the y-axis. Make the plot look nice with appropriate labels.

plot(x = pirates$tchests.found, y = model.1$fitted.values,
xlab = "True values Treasure Chests found",
ylab = "Parrots.lifetime and age and tattoos", main = "Pirate Treasure Chests found\n True versus Model", pch = 16, col = gray(.05, .15))

Exercise 5: Repeat your analysis from question 3, but only include pirates who are female and have owned less than 5 parrots in their lives. Do your conclusions change?

female.less.5 <- subset(pirates, subset = sex == "female" & parrots.lifetime < 5)
model.female<- lm(tchests.found ~ parrots.lifetime + age + tattoos, data = female.less.5)

summary(model.female)
## 
## Call:
## lm(formula = tchests.found ~ parrots.lifetime + age + tattoos, 
##     data = female.less.5)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.426  -4.722  -2.113   2.851  44.026 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)      -2.72969    2.61996  -1.042  0.29821   
## parrots.lifetime -0.18443    0.29642  -0.622  0.53423   
## age               0.25240    0.08075   3.126  0.00193 **
## tattoos           0.28510    0.11336   2.515  0.01237 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.258 on 338 degrees of freedom
## Multiple R-squared:  0.04689,    Adjusted R-squared:  0.03843 
## F-statistic: 5.542 on 3 and 338 DF,  p-value: 0.001006
model.female$coefficients
##      (Intercept) parrots.lifetime              age          tattoos 
##       -2.7296867       -0.1844340        0.2523967        0.2851046
model.female$df.residual
## [1] 338

Conclusion: No our conclusion does not change. It is the same as in number 3. (age and tattoos are significant).

Exercise 6: Is there a relationship between whether or not a pirate wears a headband and his/her sword speed? Test this using linear regression. What is your conclusion?

model.2 <- lm(sword.speed ~ headband, data = pirates)
anova(model.2)
## Analysis of Variance Table
## 
## Response: sword.speed
##            Df Sum Sq Mean Sq F value  Pr(>F)  
## headband    1   26.0 26.0079  4.1152 0.04276 *
## Residuals 998 6307.3  6.3199                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model.2)
## 
## Call:
## lm(formula = sword.speed ~ headband, data = pirates)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -1.658 -0.895 -0.576  0.063 43.483 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.6576     0.2553   6.494 1.32e-10 ***
## headbandyes  -0.5449     0.2686  -2.029   0.0428 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.514 on 998 degrees of freedom
## Multiple R-squared:  0.004107,   Adjusted R-squared:  0.003109 
## F-statistic: 4.115 on 1 and 998 DF,  p-value: 0.04276

Conclusion: Pirates sword speed is faster in pirates wearing a headband