What are the names of the columns in the capture dataframe? What are the first few rows of the dataframe?
names(capture)
## [1] "size" "cannons" "style" "warnshot"
## [5] "date" "heardof" "decorations" "daysfromshore"
## [9] "speed" "treasure"
head(capture)
## size cannons style warnshot date heardof decorations daysfromshore
## 1 48 54 classic 0 172 1 8 28
## 2 51 56 modern 0 15 0 3 6
## 3 50 44 modern 0 63 0 3 23
## 4 54 54 modern 0 362 1 2 23
## 5 50 56 modern 0 183 1 2 12
## 6 51 48 modern 0 279 0 1 3
## speed treasure
## 1 16 2175
## 2 29 2465
## 3 18 1925
## 4 19 2200
## 5 21 2290
## 6 24 2195
plot(x = capture$size,
y = capture$treasure,
xlab = "Size",
ylab = "Treasure",
main = "Regression between Size and Treasure")
abline(lm(treasure ~size, data = capture))
plot(x = capture$cannons,
y = capture$treasure,
xlab = "Cannons",
ylab = "Treasure",
main = "Regression between Cannons and Treasure")
abline(lm(treasure ~cannons, data = capture))
plot(x = capture$date ,
y = capture$treasure,
xlab = "Date",
ylab = "Treasure",
main = "Regression between Date and Treasure")
abline(lm(treasure ~date, data = capture))
plot(x = capture$decorations ,
y = capture$treasure,
xlab = "Decorations",
ylab = "Treasure",
main = "Regression between Decorations and Treasure")
abline(lm(treasure ~decorations, data = capture))
plot(x = capture$daysfromshore ,
y = capture$treasure,
xlab = "Days from Shore",
ylab = "Treasure",
main = "Regression between Days from Shore and Treasure")
abline(lm(treasure ~daysfromshore, data = capture))
plot(x = capture$speed ,
y = capture$treasure,
xlab = "Speed",
ylab = "Treasure",
main = "Regression between Speed and Treasure")
abline(lm(treasure ~speed, data = capture))
I’m skipping Q2 because I’m working on a Linux-Pc and there is something wrong with the package dependencies which is why I cannot install the latest yarrr package.
capture %>%
group_by(style) %>%
summarise(
treasure.mean = mean(treasure)
)
## Source: local data frame [2 x 2]
##
## style treasure.mean
## (fctr) (dbl)
## 1 classic 2184.301
## 2 modern 2095.645
capture %>%
group_by(warnshot) %>%
summarise(
treasure.mean = mean(treasure)
)
## Source: local data frame [2 x 2]
##
## warnshot treasure.mean
## (int) (dbl)
## 1 0 2085.940
## 2 1 2174.802
capture %>%
group_by(decorations) %>%
summarise(
treasure.mean = mean(treasure)
)
## Source: local data frame [10 x 2]
##
## decorations treasure.mean
## (int) (dbl)
## 1 1 3175.472
## 2 2 1764.758
## 3 3 1865.688
## 4 4 1847.904
## 5 5 1881.486
## 6 6 1879.208
## 7 7 1954.426
## 8 8 1998.511
## 9 9 1962.857
## 10 10 2005.526
Using the formula notation above, conduct a correlation test between the number of cannons a ship has and its size. What is the p-value?
cor1 <- cor.test(~cannons + size, data = capture)
apa(cor1)
## [1] "r = 0.03, t(998) = 0.91, p = 0.37 (2-tailed)"
Now do the same with linear regression. What is the p-value?
lm1 <- lm(cannons ~ size, data = capture)
summary(lm1)
##
## Call:
## lm(formula = cannons ~ size, data = capture)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.549 -14.324 -0.324 12.498 63.414
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 26.3039 7.2645 3.621 0.000308 ***
## size 0.1309 0.1446 0.905 0.365679
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.92 on 998 degrees of freedom
## Multiple R-squared: 0.00082, Adjusted R-squared: -0.0001812
## F-statistic: 0.819 on 1 and 998 DF, p-value: 0.3657
Conduct a linear regression with treasure as the dependent variable, and with all other variables as independent variables. Save the object as treasure.model
treasure.model <- lm(treasure~., data = capture)
Using the summary() function, print the coefficients and main statistics of the regression
summary(treasure.model)
##
## Call:
## lm(formula = treasure ~ ., data = capture)
##
## Residuals:
## Min 1Q Median 3Q Max
## -880.96 -443.16 -211.02 66.08 2427.97
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 749.8957 351.0514 2.136 0.032913 *
## size 22.5203 5.9602 3.778 0.000167 ***
## cannons 19.3817 1.2932 14.987 < 2e-16 ***
## stylemodern -165.0932 84.6314 -1.951 0.051371 .
## warnshot 89.0164 61.0610 1.458 0.145205
## date 0.1508 0.2313 0.652 0.514511
## heardof 92.1270 54.7238 1.683 0.092595 .
## decorations -96.3998 10.0249 -9.616 < 2e-16 ***
## daysfromshore -8.6119 2.8180 -3.056 0.002303 **
## speed 9.2639 8.3892 1.104 0.269750
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 771.4 on 990 degrees of freedom
## Multiple R-squared: 0.2661, Adjusted R-squared: 0.2594
## F-statistic: 39.88 on 9 and 990 DF, p-value: < 2.2e-16
What are your conclusions? Which variables are significantly related to treasure and in which direction (i.e.; positive or negative)?
Significantly related to treasure (<.05) are the variables size, cannons, ecorations, daysfromshore. The coherence between decorations, daysfromshore and treasure is negative.
Which variables are NOT significantly related to treasure? stylemodern, warnshot, date, heardof, speed.
Now tell me again, what was your conclusion about the relationship between decorations and treasure?
The relationsship between treasure and decorations is highly correlated in a negative way. The more decorations the ship has, the less it is worth capturing.
Ok, now plot the relationship between decorations and treasure again. Do you see anything strange?
plot(
x = capture$decorations,
y = capture$treasure
)
abline(lm(treasure ~ decorations, data = capture))
The regression is manipulated by outliers.
Repeat your regression analysis from Question 5 again, but ONLY include ships with treasure less than 3500. Save the object as treasure.lt3500.model
treasure.lt3500 <- subset(capture, capture$treasure < 3500)
treasure.lt3500.model <- lm(treasure ~., data = treasure.lt3500)
summary(treasure.lt3500.model)
##
## Call:
## lm(formula = treasure ~ ., data = treasure.lt3500)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.703 -1.926 2.320 5.420 8.845
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.046e+01 3.844e+00 -2.722 0.00662 **
## size 2.000e+01 6.540e-02 305.746 < 2e-16 ***
## cannons 1.999e+01 1.405e-02 1422.085 < 2e-16 ***
## stylemodern 6.147e+00 9.457e-01 6.500 1.32e-10 ***
## warnshot 1.001e+02 6.702e-01 149.289 < 2e-16 ***
## date -6.736e-04 2.561e-03 -0.263 0.79258
## heardof 1.462e+01 6.046e-01 24.182 < 2e-16 ***
## decorations 3.183e+01 1.137e-01 279.890 < 2e-16 ***
## daysfromshore -1.000e+01 3.107e-02 -321.940 < 2e-16 ***
## speed 9.972e+00 9.173e-02 108.711 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.101 on 905 degrees of freedom
## Multiple R-squared: 0.9996, Adjusted R-squared: 0.9996
## F-statistic: 2.587e+05 on 9 and 905 DF, p-value: < 2.2e-16
plot(x = treasure.lt3500$decorations, y = treasure.lt3500$treasure)
abline(lm(treasure.lt3500$treasure~treasure.lt3500$decorations))
The correlation changed from negatave to positive.