library(VGAM)
## Loading required package: stats4
## Loading required package: splines
Load the data. You may need to change this to match your computer. If the data and this file are in the same folder then you can keep it like this.
dat <- read.csv("squid_pops_data_R_4.11.2019MB.csv", header = TRUE, stringsAsFactors = FALSE)
dat <- dat[1:23, ] # remove rows that just have NAs in them
Lets take care of the date thing now and convert it to Juliuan day.
dat$date_set <- as.Date(dat$date_set, format = "%m/%d/%y")
dat$date_julian <- format(dat$date_set, "%j")
Lets take a look.
dat
## Look at the structure - this tells you how R is thinking of the different elements ##
str(dat)
## 'data.frame': 23 obs. of 24 variables:
## $ site : chr "Blanquezal" "Point Garcia" "South Wadleigh" "South Fish Egg" ...
## $ lat : num 55.6 55.6 55.5 55.5 55.6 ...
## $ long : num -133 -133 -133 -133 -133 ...
## $ sea_otter_level : chr "high" "high" "high" "high" ...
## $ date_set : Date, format: "2020-06-12" "2020-06-12" ...
## $ time_set : int 1010 1101 830 935 1131 1215 822 930 730 1025 ...
## $ date_pulled : chr "6/13/2018" "6/13/2018" "6/13/2018" "6/18/2018" ...
## $ time_pulled : int 1009 1045 1151 745 858 1400 830 933 546 602 ...
## $ total.time : num 24 23.7 27.4 22.2 21.4 ...
## $ Avg_temp_set_F : int 50 50 50 58 58 58 63 63 54 54 ...
## $ Max_wind_set_mph: int 12 12 12 9 9 9 13 13 12 12 ...
## $ Precip_set_in : num 0 0 0 0.04 0.04 0.04 0 0 0 0 ...
## $ Avg_temp_pull : int 55 55 55 63 63 63 61 61 53 53 ...
## $ Max_wind_pull : int 9 9 9 13 13 13 14 14 13 13 ...
## $ Precip_pull : num 0.08 0.08 0.08 0 0 0 0 0 0 0 ...
## $ top_pop : int 26 24 25 24 25 25 23 25 25 25 ...
## $ bottom_pop : int 26 24 25 24 25 24 23 25 25 25 ...
## $ total_pops : int 26 24 25 24 25 25 23 25 25 25 ...
## $ full_bottoms : int 18 23 17 17 8 23 13 20 23 25 ...
## $ partial_bottoms : int 6 1 1 3 1 1 2 2 1 0 ...
## $ full_tops : int 1 1 8 0 0 6 3 2 1 1 ...
## $ partial_tops : int 0 1 0 0 0 7 0 3 4 7 ...
## $ n_fallen_over : int NA 1 NA 2 NA NA NA NA NA 1 ...
## $ date_julian : chr "164" "164" "164" "169" ...
dat$sea_otter_level <- ordered(dat$sea_otter_level, levels = c("high", "low"))
What is our response? What did we measure? We counted the number of top and bottom baits in three different states, present, absent, and partial. Given how are data are structured I think it will be easiest to make or response data matrix in a slightly differnet way. We will build a new data frame from scratch while calculating the proportions at the same time. NOTE I am not totally sure how you have coded things here so I am assuming that “full_tops” means the number of top baits missing.
We will also need to sepatate the responses of top and bottom baits. So we will be looking at the top and bottom baits in isolation/independently. Tops
## Calculate proportion of tops absent ##
top.absent <- dat$full_tops
top.partial <- dat$partial_tops
top.present <- (dat$top_pop - dat$full_tops - dat$partial_tops)
## Make a response data frame then a matrix ##
tops <- data.frame(top.absent, top.partial, top.present)
tops <- as.matrix(tops)
rowSums(tops)
## [1] 26 24 25 24 25 25 23 25 25 25 25 25 25 25 25 25 24 24 24 24 25 25 25
Bottoms now you try
## Calculate proportion of tops absent ##
## Make a response data frame call is bots ##
OK great we have our response data ‘tops’ and ‘bots’. Thats pretty neat! We also have or explanatory data in dat. Things like the date, temp, precip etc. Let to some stats.
Lets fit a model for the top baits. Remember that the output is relative to whatever is designated as the reference behavior. The reference behovior is the last is the response matrix, which is baits being present. This makes interpretaion easier, because everything is relative to nothing happening.
## The Model ##
mod1 <- vglm(tops ~ sea_otter_level, multinomial, data = dat)
summary(mod1)
##
## Call:
## vglm(formula = tops ~ sea_otter_level, family = multinomial,
## data = dat)
##
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## log(mu[,1]/mu[,3]) -2.155 -1.536 -0.9201 0.889 9.447
## log(mu[,2]/mu[,3]) -2.297 -1.775 -0.6216 1.417 6.873
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 -1.66891 0.12471 -13.382 <2e-16 ***
## (Intercept):2 -1.58231 0.12030 -13.153 <2e-16 ***
## sea_otter_level.L:1 0.09764 0.17637 0.554 0.580
## sea_otter_level.L:2 0.10382 0.17013 0.610 0.542
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of linear predictors: 2
##
## Names of linear predictors: log(mu[,1]/mu[,3]), log(mu[,2]/mu[,3])
##
## Residual deviance: 254.4214 on 42 degrees of freedom
##
## Log-likelihood: -177.1813 on 42 degrees of freedom
##
## Number of iterations: 5
##
## No Hauck-Donner effect found in any of the estimates
##
## Reference group is level 3 of the response
## Coefficents ##
coef(mod1)
## (Intercept):1 (Intercept):2 sea_otter_level.L:1
## -1.66891261 -1.58230767 0.09764317
## sea_otter_level.L:2
## 0.10381882
exp(coef(mod1))
## (Intercept):1 (Intercept):2 sea_otter_level.L:1
## 0.1884519 0.2055003 1.1025693
## sea_otter_level.L:2
## 1.1093994
What do you think this means? Remember that in a multinomial setting everything is thought of as a probability of occurance relative to the reference level. Using the seal example as a reference can you figure it out? You may want to add some code here to help you decide. Hint: just looking at the raw data does it look like there is a sea otter effect?