WPA7

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

download the data

capture <-read.table("http://nathanieldphillips.com/wp-content/uploads/2015/12/capture.txt")

```

What are the names of the columns in the capture dataframe?

names(capture)

##  [1] "size"          "cannons"       "style"         "warnshot"     
##  [5] "date"          "heardof"       "decorations"   "daysfromshore"
##  [9] "speed"         "treasure"

What are the first few rows of the dataframe?

head(capture)

##   size cannons   style warnshot date heardof decorations daysfromshore
## 1   48      54 classic        0  172       1           8            28
## 2   51      56  modern        0   15       0           3             6
## 3   50      44  modern        0   63       0           3            23
## 4   54      54  modern        0  362       1           2            23
## 5   50      56  modern        0  183       1           2            12
## 6   51      48  modern        0  279       0           1             3
##   speed treasure
## 1    16     2175
## 2    29     2465
## 3    18     1925
## 4    19     2200
## 5    21     2290
## 6    24     2195

Plot the relationship between the following continuous independent variable and treasure. For each plot, add axis and plot labels and a regression line showing the relationship between the independent and dependent variables.

A size

size.treasure <- lm( formula = treasure ~ size, data = capture)
plot(x = capture$size,
y = capture$treasure,
xlab = "size",
ylab = "treasure",
main = "Relationship between treasure and size")
abline(size.treasure)

B cannons

cannons.treasure <- lm(treasure ~ cannons, data = capture)
plot(x = capture$cannons,
y = capture$treasure,
xlab = "cannons",
ylab = "treasure",
main = "Relationship between treasure and cannons")
abline(cannons.treasure)

C date

date.treasure <- lm(treasure ~ date, data = capture)
plot(x = capture$date,
y = capture$treasure,
xlab = "date",
ylab = "treasure",
main = "Relationship between treasure and date")
abline(date.treasure)

D decoration

decorations.treasure <- lm(treasure ~ decorations, data = capture)
plot(x = capture$decorations,
y = capture$treasure,
xlab = "decorations",
ylab = "treasure",
main = "Relationship between treasure and decorations")
abline(decorations.treasure)

E daysfromshore

daysfromshore.treasure <- lm(treasure ~ daysfromshore, data = capture)
plot(x = capture$daysfromshore,
y = capture$treasure,
xlab = "daysfromshore",
ylab = "treasure",
main = "Relationship between treasure and days from shore")
abline(daysfromshore.treasure)

F speed

speed.treasure <- lm(treasure ~ speed, data = capture)
plot(x = capture$speed,
y = capture$treasure,
xlab = "speed",
ylab = "treasure",
main = "Relationship between treasure and speed")
abline(speed.treasure)

Q2 Now do the same for the following categorical independent variables and treasure (hint: try using the new pirateplot() function in the yarrr package! Look at how it works by running ?pirateplot). Again, add appropriate labels and a regression line in each plot.

A style:

capture\(style[capture\)style == “modern”] <- 0

capture\(style[capture\)style == “classic”] <- 1

treasure.style.glm <- glm(style ~ treasure, data = capture, family = “binomial”)

I really don’t know whats wron it doesn’t convert my style avriable into a binomial one and i don’t have an idea what to do. as I am missing class today I can’t ask you personally so I wrote my ideas in the comments

warnshot

treasure.warnshot.glm <- glm(warnshot ~ treasure,
data = capture,
family = "binomial")

capture.warnshot <- capture$warnshot == "1"

capture$treasure.cut <- cut(capture$treasure,
breaks = seq(1000, 5000, 1000))

probs <- aggregate(capture.warnshot ~ treasure.cut,
data = capture, FUN = mean)

plot(probs, xlab = "treasure (grouped)",
ylab = "p(ship fired a warnshot)",
main = "Probabiliy that a ship fired a warnshot given its amount of treasure")

heardof

treasure.heardof.glm <- glm(heardof ~ treasure,
data = capture,
family = "binomial")

capture.heardof <- capture$heardof == "1"

capture$treasure.cut <- cut(capture$treasure,
breaks = seq(1000, 5000, 1000))

probs <- aggregate(capture.heardof ~ treasure.cut,
data = capture, FUN = mean)

plot(probs, xlab = "treasure grouped", ylab = "p ship was heard of",
main = "Amount of treasure in relation to the probability that you have heard of that ship")

``` ###Q3 For each of the following variables (separately), calculate the median amount of treasure earned for each level of the IV: style, warnshot, decorations (hint: use aggregate or dplyr!)

aggregate (treasure ~ capture.heardof, data = capture, FUN = median)

##   capture.heardof treasure
## 1           FALSE     1875
## 2            TRUE     1940

aggregate (treasure ~ capture.warnshot, data = capture, FUN = median)

##   capture.warnshot treasure
## 1            FALSE     1885
## 2             TRUE     1945

aggregate (treasure ~ decorations, data = capture, FUN = median)

##    decorations treasure
## 1            1   2657.5
## 2            2   1780.0
## 3            3   1905.0
## 4            4   1797.5
## 5            5   1880.0
## 6            6   1855.0
## 7            7   1920.0
## 8            8   1935.0
## 9            9   1935.0
## 10          10   1955.0

The formula notation for conducting a correlation test with cor.test() is a bit different from regular formula notation. Instead of dv ~ iv, you use ~ dv + iv. For example, the following code will test the correlation between chickens’ age and weight using the ChickWeight dataset.

cor.test(~ Time + weight, 
         data = ChickWeight)

## 
##  Pearson's product-moment correlation
## 
## data:  Time and weight
## t = 36.725, df = 576, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8109073 0.8599481
## sample estimates:
##       cor 
## 0.8371017

Using the formula notation above, conduct a correlation test between the number of cannons a ship has and its size. What is the p-value?

cor.test(~cannons + size, data = capture)

## 
##  Pearson's product-moment correlation
## 
## data:  cannons and size
## t = 0.90501, df = 998, p-value = 0.3657
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.03341657  0.09046832
## sample estimates:
##        cor 
## 0.02863584

The p-Value is 0.3657.

Now do the same with linear regression. What is the p-value?

cannon.lm <- lm(cannons ~ size, data = capture)
summary(cannon.lm)

## 
## Call:
## lm(formula = cannons ~ size, data = capture)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.549 -14.324  -0.324  12.498  63.414 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  26.3039     7.2645   3.621 0.000308 ***
## size          0.1309     0.1446   0.905 0.365679    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.92 on 998 degrees of freedom
## Multiple R-squared:  0.00082,    Adjusted R-squared:  -0.0001812 
## F-statistic: 0.819 on 1 and 998 DF,  p-value: 0.3657

The p-Value is the same (0.3657).

Q5 Conduct a linear regression with treasure as the dependent variable, and with all other variables as independent variables. Save the object as treasure.model

treasure.model<-lm (treasure ~ size + cannons + warnshot + date + heardof+ decorations + daysfromshore + speed, data = capture)

Using the summary() function, print the coefficients and main statistics of the regression

summary(treasure.model)

## 
## Call:
## lm(formula = treasure ~ size + cannons + warnshot + date + heardof + 
##     decorations + daysfromshore + speed, data = capture)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -760.81 -451.21 -211.49   58.56 2423.42 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   604.0837   343.4865   1.759 0.078940 .  
## size           22.5769     5.9685   3.783 0.000164 ***
## cannons        19.3464     1.2949  14.940  < 2e-16 ***
## warnshot       82.4122    61.0533   1.350 0.177375    
## date            0.1328     0.2315   0.574 0.566326    
## heardof        98.4881    54.7038   1.800 0.072104 .  
## decorations   -95.0941    10.0166  -9.494  < 2e-16 ***
## daysfromshore  -8.4173     2.8202  -2.985 0.002909 ** 
## speed           8.7189     8.3965   1.038 0.299338    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 772.5 on 991 degrees of freedom
## Multiple R-squared:  0.2633, Adjusted R-squared:  0.2573 
## F-statistic: 44.27 on 8 and 991 DF,  p-value: < 2.2e-16

as you can see from the results the more cannons a ship has the more treasure it is likely to carry, but the more decorations it has or the more days it is from shore, the less treasure it is likely to carry.

Not significantly realted to treasure are warnshots, date, heardof, and speed.

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.