pirates <- read.table("http://nathanieldphillips.com/wp-content/uploads/2015/05/pirate_survey_noerrors.txt",
sep = "\t", header = T, stringsAsFactors = F)
pairs(~ age + tattoos + tchests.found + parrots.lifetime + sword.speed, data = pirates)
model.1 <- lm(tchests.found ~ parrots.lifetime + age + tattoos, data= pirates)
summary(model.1)
##
## Call:
## lm(formula = tchests.found ~ parrots.lifetime + age + tattoos,
## data = pirates)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.566 -5.225 -2.271 2.636 46.003
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.266327 1.407182 0.900 0.368389
## parrots.lifetime -0.007083 0.088349 -0.080 0.936115
## age 0.123838 0.044491 2.783 0.005480 **
## tattoos 0.274414 0.074297 3.693 0.000233 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.785 on 996 degrees of freedom
## Multiple R-squared: 0.02135, Adjusted R-squared: 0.0184
## F-statistic: 7.241 on 3 and 996 DF, p-value: 8.283e-05
# conclusion: The two independent variables Tattoos and age reliably predict the number of treasures found.
# for parrots, the result was non-significant t(996) = - 0,080 p= 0,367
# for age, the result was significant t(996)= 2,73, p < 0,001
# for tattoos, the result was significant ; t(996)= 3,69 p < 0,0001
# the omnibus test F(3, 996) = 7,241 p < 0,0001, R2=0,0184 (adjusted)
4.Using the results from the previous question, create a scatterplot with the true values of the dependent variable (treasure chests found) on the x-axis and the model fits on the y-axis. Make the plot look nice with appropriate labels.
plot(x=pirates$tchests.found, y= model.1$fitted.values, xlab= "True number of treasures Found", ylab= "Model number of treasures found", main= "True vs. predicted number of treasures found", pch=16, col=gray(0.05, 0.15)
)
abline(a=0, b= 1)
# The model seems not to do a good job predicting the values.
fempirates.lessthanfive <- subset (pirates, sex== "female" & parrots.lifetime < 5)
model.2 <- lm(tchests.found ~ parrots.lifetime + age + tattoos, data= fempirates.lessthanfive)
summary(model.2)
##
## Call:
## lm(formula = tchests.found ~ parrots.lifetime + age + tattoos,
## data = fempirates.lessthanfive)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.426 -4.722 -2.113 2.851 44.026
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.72969 2.61996 -1.042 0.29821
## parrots.lifetime -0.18443 0.29642 -0.622 0.53423
## age 0.25240 0.08075 3.126 0.00193 **
## tattoos 0.28510 0.11336 2.515 0.01237 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.258 on 338 degrees of freedom
## Multiple R-squared: 0.04689, Adjusted R-squared: 0.03843
## F-statistic: 5.542 on 3 and 338 DF, p-value: 0.001006
# The omnibus test F(3, 338) = 7,258 p < 0,01, R2=0,04 (adjusted)
# For the subset fempirates.lessthanfive the model shows a better prediction of the values as R2 as almost doubled.
6.Is there a relationship between whether or not a pirate wears a headband and his/her sword speed? Test this using linear regression. What is your conclusion?
headband.swordspeed.lm <- lm(sword.speed ~ headband, data=pirates)
summary(headband.swordspeed.lm)
##
## Call:
## lm(formula = sword.speed ~ headband, data = pirates)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.658 -0.895 -0.576 0.063 43.483
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.6576 0.2553 6.494 1.32e-10 ***
## headbandyes -0.5449 0.2686 -2.029 0.0428 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.514 on 998 degrees of freedom
## Multiple R-squared: 0.004107, Adjusted R-squared: 0.003109
## F-statistic: 4.115 on 1 and 998 DF, p-value: 0.04276
# People wearing a headband have a significant slower sword.speed, but the adjusted R is very bad R2= 0.003109
7.Now, repeat the analysis from question 6, but this time add sword.type as a second independent variable. What is your conclusion now?
headband.swordspeed2.lm <- lm(sword.speed ~ headband + sword.type, data=pirates)
summary(headband.swordspeed2.lm)
##
## Call:
## lm(formula = sword.speed ~ headband + sword.type, data = pirates)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.304 -0.564 -0.261 0.249 36.805
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.8331 0.3433 11.164 < 2e-16 ***
## headbandyes 3.9581 0.3044 13.003 < 2e-16 ***
## sword.typecutlass -7.0595 0.3967 -17.796 < 2e-16 ***
## sword.typesabre -3.3909 0.4250 -7.978 4.06e-15 ***
## sword.typescimitar -1.5348 0.4317 -3.556 0.000395 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.063 on 995 degrees of freedom
## Multiple R-squared: 0.3314, Adjusted R-squared: 0.3287
## F-statistic: 123.3 on 4 and 995 DF, p-value: < 2.2e-16
anova(headband.swordspeed2.lm )
## Analysis of Variance Table
##
## Response: sword.speed
## Df Sum Sq Mean Sq F value Pr(>F)
## headband 1 26.0 26.01 6.1116 0.0136 *
## sword.type 3 2073.1 691.02 162.3837 <2e-16 ***
## Residuals 995 4234.2 4.26
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Including the sword type as a other indenpendent variable the results show a positive significant relationship between wearing a headband and sword speed.Adding sword.type explains more variance than wearing a headband.
8.Is there an interaction between sex and headband use when predicting a pirate’s sword speed? Test this only using pirates whose sex is male or female
interaction.sex.headband <- lm(sword.speed ~ headband * sex, data=pirates)
summary(interaction.sex.headband)
##
## Call:
## lm(formula = sword.speed ~ headband * sex, data = pirates)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.721 -0.894 -0.582 0.065 43.452
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.72108 0.35613 4.833 1.56e-06 ***
## headbandyes -0.62761 0.37687 -1.665 0.0962 .
## sexmale -0.04571 0.53419 -0.086 0.9318
## sexother -0.61876 1.01623 -0.609 0.5427
## headbandyes:sexmale 0.09614 0.56089 0.171 0.8639
## headbandyes:sexother 0.45839 1.11105 0.413 0.6800
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.518 on 994 degrees of freedom
## Multiple R-squared: 0.004748, Adjusted R-squared: -0.0002582
## F-statistic: 0.9484 on 5 and 994 DF, p-value: 0.4487
anova(interaction.sex.headband)
## Analysis of Variance Table
##
## Response: sword.speed
## Df Sum Sq Mean Sq F value Pr(>F)
## headband 1 26.0 26.0079 4.1014 0.04312 *
## sex 2 2.9 1.4719 0.2321 0.79290
## headband:sex 2 1.1 0.5597 0.0883 0.91553
## Residuals 994 6303.2 6.3413
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Wearing a headband has a significant effect on the sword.speed. In contrast to that there is no significant effect for the interaction of headband and sex and no significant effect for sex.
9.Is there an effect of a pirate’s favorite pirate on the number of tattoos they have? Test this once using an ANOVA (the aov() function) and once using linear regression. How do the two p-values compare?
favorite.tattoos.lm <- lm (tattoos ~ favorite.pirate, data= pirates)
summary(favorite.tattoos.lm)
##
## Call:
## lm(formula = tattoos ~ favorite.pirate, data = pirates)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.713 -1.713 0.287 2.380 9.393
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.10000 0.30283 30.050 <2e-16 ***
## favorite.pirateBlackbeard 0.52000 0.44917 1.158 0.247
## favorite.pirateEdward Low 0.24211 0.43387 0.558 0.577
## favorite.pirateHook 0.61304 0.43290 1.416 0.157
## favorite.pirateJack Sparrow 0.50706 0.34059 1.489 0.137
## favorite.pirateLewis Scot -0.01837 0.45167 -0.041 0.968
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.317 on 994 degrees of freedom
## Multiple R-squared: 0.004601, Adjusted R-squared: -0.000406
## F-statistic: 0.9189 on 5 and 994 DF, p-value: 0.4678
anova(favorite.tattoos.lm)
## Analysis of Variance Table
##
## Response: tattoos
## Df Sum Sq Mean Sq F value Pr(>F)
## favorite.pirate 5 50.6 10.113 0.9189 0.4678
## Residuals 994 10939.0 11.005