Q0

What are the names of the columns in the capture dataframe? What are the first few rows of the dataframe?

names(capture)
##  [1] "size"          "cannons"       "style"         "warnshot"     
##  [5] "date"          "heardof"       "decorations"   "daysfromshore"
##  [9] "speed"         "treasure"
head(capture)
##   size cannons   style warnshot date heardof decorations daysfromshore
## 1   48      54 classic        0  172       1           8            28
## 2   51      56  modern        0   15       0           3             6
## 3   50      44  modern        0   63       0           3            23
## 4   54      54  modern        0  362       1           2            23
## 5   50      56  modern        0  183       1           2            12
## 6   51      48  modern        0  279       0           1             3
##   speed treasure
## 1    16     2175
## 2    29     2465
## 3    18     1925
## 4    19     2200
## 5    21     2290
## 6    24     2195

Q1

plot(x = capture$size,
     y = capture$treasure,
     xlab = "Size",
     ylab = "Treasure",
     main = "Regression between Size and Treasure")
abline(lm(treasure ~size, data = capture))

plot(x = capture$cannons,
     y = capture$treasure,
     xlab = "Cannons",
     ylab = "Treasure",
     main = "Regression between Cannons and Treasure")
abline(lm(treasure ~cannons, data = capture))

plot(x = capture$date ,
     y = capture$treasure,
     xlab = "Date",
     ylab = "Treasure",
     main = "Regression between Date and Treasure")
abline(lm(treasure ~date, data = capture))

plot(x = capture$decorations ,
     y = capture$treasure,
     xlab = "Decorations",
     ylab = "Treasure",
     main = "Regression between Decorations and Treasure")
abline(lm(treasure ~decorations, data = capture))

plot(x = capture$daysfromshore ,
     y = capture$treasure,
     xlab = "Days from Shore",
     ylab = "Treasure",
     main = "Regression between Days from Shore and Treasure")
abline(lm(treasure ~daysfromshore, data = capture))

plot(x = capture$speed ,
     y = capture$treasure,
     xlab = "Speed",
     ylab = "Treasure",
     main = "Regression between Speed and Treasure")
abline(lm(treasure ~speed, data = capture))

Q2

I’m skipping Q2 because I’m working on a Linux-Pc and there is something wrong with the package dependencies which is why I cannot install the latest yarrr package.

Q3

capture %>% 
  group_by(style) %>%
  summarise(
    treasure.mean = mean(treasure)
  )
## Source: local data frame [2 x 2]
## 
##     style treasure.mean
##    (fctr)         (dbl)
## 1 classic      2184.301
## 2  modern      2095.645
capture %>% 
  group_by(warnshot) %>%
  summarise(
    treasure.mean = mean(treasure)
  )
## Source: local data frame [2 x 2]
## 
##   warnshot treasure.mean
##      (int)         (dbl)
## 1        0      2085.940
## 2        1      2174.802
capture %>% 
  group_by(decorations) %>%
  summarise(
    treasure.mean = mean(treasure)
  )
## Source: local data frame [10 x 2]
## 
##    decorations treasure.mean
##          (int)         (dbl)
## 1            1      3175.472
## 2            2      1764.758
## 3            3      1865.688
## 4            4      1847.904
## 5            5      1881.486
## 6            6      1879.208
## 7            7      1954.426
## 8            8      1998.511
## 9            9      1962.857
## 10          10      2005.526

Q4

Using the formula notation above, conduct a correlation test between the number of cannons a ship has and its size. What is the p-value?

cor1 <- cor.test(~cannons + size, data = capture)
apa(cor1)
## [1] "r = 0.03, t(998) = 0.91, p = 0.37 (2-tailed)"

Now do the same with linear regression. What is the p-value?

lm1 <- lm(cannons ~ size, data = capture)
summary(lm1)
## 
## Call:
## lm(formula = cannons ~ size, data = capture)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.549 -14.324  -0.324  12.498  63.414 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  26.3039     7.2645   3.621 0.000308 ***
## size          0.1309     0.1446   0.905 0.365679    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.92 on 998 degrees of freedom
## Multiple R-squared:  0.00082,    Adjusted R-squared:  -0.0001812 
## F-statistic: 0.819 on 1 and 998 DF,  p-value: 0.3657

Q5

Conduct a linear regression with treasure as the dependent variable, and with all other variables as independent variables. Save the object as treasure.model

treasure.model <- lm(treasure~., data = capture)

Using the summary() function, print the coefficients and main statistics of the regression

summary(treasure.model)
## 
## Call:
## lm(formula = treasure ~ ., data = capture)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -880.96 -443.16 -211.02   66.08 2427.97 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    749.8957   351.0514   2.136 0.032913 *  
## size            22.5203     5.9602   3.778 0.000167 ***
## cannons         19.3817     1.2932  14.987  < 2e-16 ***
## stylemodern   -165.0932    84.6314  -1.951 0.051371 .  
## warnshot        89.0164    61.0610   1.458 0.145205    
## date             0.1508     0.2313   0.652 0.514511    
## heardof         92.1270    54.7238   1.683 0.092595 .  
## decorations    -96.3998    10.0249  -9.616  < 2e-16 ***
## daysfromshore   -8.6119     2.8180  -3.056 0.002303 ** 
## speed            9.2639     8.3892   1.104 0.269750    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 771.4 on 990 degrees of freedom
## Multiple R-squared:  0.2661, Adjusted R-squared:  0.2594 
## F-statistic: 39.88 on 9 and 990 DF,  p-value: < 2.2e-16

What are your conclusions? Which variables are significantly related to treasure and in which direction (i.e.; positive or negative)?

Significantly related to treasure (<.05) are the variables size, cannons, ecorations, daysfromshore. The coherence between decorations, daysfromshore and treasure is negative.

Which variables are NOT significantly related to treasure? stylemodern, warnshot, date, heardof, speed.

Q6

Now tell me again, what was your conclusion about the relationship between decorations and treasure?

The relationsship between treasure and decorations is highly correlated in a negative way. The more decorations the ship has, the less it is worth capturing.

Ok, now plot the relationship between decorations and treasure again. Do you see anything strange?

plot(
  x = capture$decorations,
  y = capture$treasure
)
abline(lm(treasure ~ decorations, data = capture))

The regression is manipulated by outliers.

Repeat your regression analysis from Question 5 again, but ONLY include ships with treasure less than 3500. Save the object as treasure.lt3500.model

treasure.lt3500 <- subset(capture, capture$treasure < 3500)
treasure.lt3500.model <- lm(treasure ~., data = treasure.lt3500)
summary(treasure.lt3500.model)
## 
## Call:
## lm(formula = treasure ~ ., data = treasure.lt3500)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.703  -1.926   2.320   5.420   8.845 
## 
## Coefficients:
##                 Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)   -1.046e+01  3.844e+00   -2.722  0.00662 ** 
## size           2.000e+01  6.540e-02  305.746  < 2e-16 ***
## cannons        1.999e+01  1.405e-02 1422.085  < 2e-16 ***
## stylemodern    6.147e+00  9.457e-01    6.500 1.32e-10 ***
## warnshot       1.001e+02  6.702e-01  149.289  < 2e-16 ***
## date          -6.736e-04  2.561e-03   -0.263  0.79258    
## heardof        1.462e+01  6.046e-01   24.182  < 2e-16 ***
## decorations    3.183e+01  1.137e-01  279.890  < 2e-16 ***
## daysfromshore -1.000e+01  3.107e-02 -321.940  < 2e-16 ***
## speed          9.972e+00  9.173e-02  108.711  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.101 on 905 degrees of freedom
## Multiple R-squared:  0.9996, Adjusted R-squared:  0.9996 
## F-statistic: 2.587e+05 on 9 and 905 DF,  p-value: < 2.2e-16
plot(x = treasure.lt3500$decorations, y = treasure.lt3500$treasure)
abline(lm(treasure.lt3500$treasure~treasure.lt3500$decorations))

The correlation changed from negatave to positive.