Assignment 7 : Chapter 11, Regression and the General Linear Model

Download the dataframe

pirates <- read.table("http://nathanieldphillips.com/wp-content/uploads/2015/05/pirate_survey_noerrors.txt", 
                      sep = "\t", header = T, stringsAsFactors = F)

The function pairs() can create a matrix of scatterplots of different ratio or interval variables in a dataset. Enter the following code to see a matrix of scatterplots for the pirate dataset

pairs(~ age + tattoos + tchests.found + parrots.lifetime + sword.speed, data = pirates)

What variables reliably predict the number of treasure chests a pirate has found? Conduct a simple linear regression analysis with treasure chests found as the dependent variable and 3 independent variables: parrots.lifetime, age, and tattoos. Save the model as the object model.1. Then, use the summary() function to see the coefficients. What are your conclusions?

model.1 <- lm(tchests.found ~ parrots.lifetime + age + tattoos, data= pirates)
summary(model.1)

## 
## Call:
## lm(formula = tchests.found ~ parrots.lifetime + age + tattoos, 
##     data = pirates)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.566 -5.225 -2.271  2.636 46.003 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       1.266327   1.407182   0.900 0.368389    
## parrots.lifetime -0.007083   0.088349  -0.080 0.936115    
## age               0.123838   0.044491   2.783 0.005480 ** 
## tattoos           0.274414   0.074297   3.693 0.000233 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.785 on 996 degrees of freedom
## Multiple R-squared:  0.02135,    Adjusted R-squared:  0.0184 
## F-statistic: 7.241 on 3 and 996 DF,  p-value: 8.283e-05

# conclusion: The two independent variables Tattoos and age reliably predict the number of treasures found. 
# for parrots, the result was non-significant t(996) = - 0,080 p= 0,367
# for age, the result was significant t(996)= 2,73, p < 0,001
# for tattoos, the result was significant ; t(996)= 3,69  p < 0,0001
# the omnibus test F(3, 996) = 7,241 p < 0,0001, R2=0,0184 (adjusted)

4.Using the results from the previous question, create a scatterplot with the true values of the dependent variable (treasure chests found) on the x-axis and the model fits on the y-axis. Make the plot look nice with appropriate labels.

plot(x=pirates$tchests.found, y= model.1$fitted.values, xlab= "True number of treasures Found", ylab= "Model number of treasures found", main= "True vs. predicted number of treasures found", pch=16, col=gray(0.05, 0.15)
     )
abline(a=0, b= 1)

# The model seems not to do a good job predicting the values.

Repeat your analysis from question 3, but only include pirates who are female and have owned less than 5 parrots in their lives. Do your conclusions change?

fempirates.lessthanfive <- subset (pirates, sex== "female" & parrots.lifetime < 5)
model.2 <- lm(tchests.found ~ parrots.lifetime + age + tattoos, data= fempirates.lessthanfive)
summary(model.2)

## 
## Call:
## lm(formula = tchests.found ~ parrots.lifetime + age + tattoos, 
##     data = fempirates.lessthanfive)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.426  -4.722  -2.113   2.851  44.026 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)      -2.72969    2.61996  -1.042  0.29821   
## parrots.lifetime -0.18443    0.29642  -0.622  0.53423   
## age               0.25240    0.08075   3.126  0.00193 **
## tattoos           0.28510    0.11336   2.515  0.01237 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.258 on 338 degrees of freedom
## Multiple R-squared:  0.04689,    Adjusted R-squared:  0.03843 
## F-statistic: 5.542 on 3 and 338 DF,  p-value: 0.001006

# The omnibus test F(3, 338) = 7,258 p < 0,01, R2=0,04 (adjusted)
# For the subset fempirates.lessthanfive the model shows a better prediction of the values as R2 as almost doubled.

6.Is there a relationship between whether or not a pirate wears a headband and his/her sword speed? Test this using linear regression. What is your conclusion?

headband.swordspeed.lm <- lm(sword.speed  ~ headband, data=pirates)
summary(headband.swordspeed.lm)

## 
## Call:
## lm(formula = sword.speed ~ headband, data = pirates)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -1.658 -0.895 -0.576  0.063 43.483 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.6576     0.2553   6.494 1.32e-10 ***
## headbandyes  -0.5449     0.2686  -2.029   0.0428 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.514 on 998 degrees of freedom
## Multiple R-squared:  0.004107,   Adjusted R-squared:  0.003109 
## F-statistic: 4.115 on 1 and 998 DF,  p-value: 0.04276

# People wearing a headband have a significant slower sword.speed, but the adjusted R is very bad R2= 0.003109

7.Now, repeat the analysis from question 6, but this time add sword.type as a second independent variable. What is your conclusion now?

headband.swordspeed2.lm <- lm(sword.speed  ~ headband + sword.type, data=pirates)
summary(headband.swordspeed2.lm)

## 
## Call:
## lm(formula = sword.speed ~ headband + sword.type, data = pirates)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.304 -0.564 -0.261  0.249 36.805 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          3.8331     0.3433  11.164  < 2e-16 ***
## headbandyes          3.9581     0.3044  13.003  < 2e-16 ***
## sword.typecutlass   -7.0595     0.3967 -17.796  < 2e-16 ***
## sword.typesabre     -3.3909     0.4250  -7.978 4.06e-15 ***
## sword.typescimitar  -1.5348     0.4317  -3.556 0.000395 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.063 on 995 degrees of freedom
## Multiple R-squared:  0.3314, Adjusted R-squared:  0.3287 
## F-statistic: 123.3 on 4 and 995 DF,  p-value: < 2.2e-16

 anova(headband.swordspeed2.lm )

## Analysis of Variance Table
## 
## Response: sword.speed
##             Df Sum Sq Mean Sq  F value Pr(>F)    
## headband     1   26.0   26.01   6.1116 0.0136 *  
## sword.type   3 2073.1  691.02 162.3837 <2e-16 ***
## Residuals  995 4234.2    4.26                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Including the sword type as a other indenpendent variable the results show a positive significant relationship between wearing a headband and sword speed.Adding sword.type explains more variance than wearing a headband.

8.Is there an interaction between sex and headband use when predicting a pirate’s sword speed? Test this only using pirates whose sex is male or female

interaction.sex.headband <- lm(sword.speed  ~ headband * sex, data=pirates)
summary(interaction.sex.headband)

## 
## Call:
## lm(formula = sword.speed ~ headband * sex, data = pirates)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -1.721 -0.894 -0.582  0.065 43.452 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           1.72108    0.35613   4.833 1.56e-06 ***
## headbandyes          -0.62761    0.37687  -1.665   0.0962 .  
## sexmale              -0.04571    0.53419  -0.086   0.9318    
## sexother             -0.61876    1.01623  -0.609   0.5427    
## headbandyes:sexmale   0.09614    0.56089   0.171   0.8639    
## headbandyes:sexother  0.45839    1.11105   0.413   0.6800    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.518 on 994 degrees of freedom
## Multiple R-squared:  0.004748,   Adjusted R-squared:  -0.0002582 
## F-statistic: 0.9484 on 5 and 994 DF,  p-value: 0.4487

 anova(interaction.sex.headband)

## Analysis of Variance Table
## 
## Response: sword.speed
##               Df Sum Sq Mean Sq F value  Pr(>F)  
## headband       1   26.0 26.0079  4.1014 0.04312 *
## sex            2    2.9  1.4719  0.2321 0.79290  
## headband:sex   2    1.1  0.5597  0.0883 0.91553  
## Residuals    994 6303.2  6.3413                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Wearing a headband has a significant effect on the sword.speed. In contrast to that there is no significant effect for the interaction of headband and sex and no significant effect for sex.

9.Is there an effect of a pirate’s favorite pirate on the number of tattoos they have? Test this once using an ANOVA (the aov() function) and once using linear regression. How do the two p-values compare?

favorite.tattoos.lm <- lm (tattoos ~ favorite.pirate, data= pirates)
summary(favorite.tattoos.lm)

## 
## Call:
## lm(formula = tattoos ~ favorite.pirate, data = pirates)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.713 -1.713  0.287  2.380  9.393 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  9.10000    0.30283  30.050   <2e-16 ***
## favorite.pirateBlackbeard    0.52000    0.44917   1.158    0.247    
## favorite.pirateEdward Low    0.24211    0.43387   0.558    0.577    
## favorite.pirateHook          0.61304    0.43290   1.416    0.157    
## favorite.pirateJack Sparrow  0.50706    0.34059   1.489    0.137    
## favorite.pirateLewis Scot   -0.01837    0.45167  -0.041    0.968    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.317 on 994 degrees of freedom
## Multiple R-squared:  0.004601,   Adjusted R-squared:  -0.000406 
## F-statistic: 0.9189 on 5 and 994 DF,  p-value: 0.4678

anova(favorite.tattoos.lm)

## Analysis of Variance Table
## 
## Response: tattoos
##                  Df  Sum Sq Mean Sq F value Pr(>F)
## favorite.pirate   5    50.6  10.113  0.9189 0.4678
## Residuals       994 10939.0  11.005