We need to know if response category, event duration, the number of photos in each event and groupsize - are they the same thing or are they different? For while-tailed deer, caribou and moose.
First, load the data and take look at the distribution of each variable.
# load the data
behav <- read.csv("data/Ungulate_behaviour_event_30min.csv",stringsAsFactors = F,header = T)
str(behav)
## 'data.frame': 2137 obs. of 11 variables:
## $ Species : chr "Odocoileus virginianus" "Odocoileus virginianus" "Odocoileus virginianus" "Odocoileus virginianus" ...
## $ Deployment.Location.ID: chr "Algar01" "Algar01" "Algar01" "Algar06" ...
## $ Date_Time.Captured : chr "2015-11-22 02:54:47" "2015-11-28 04:02:54" "2016-09-17 17:23:34" "2019-06-08 07:23:08" ...
## $ Event.ID : chr "E0" "E1" "E10" "E1003" ...
## $ Event.Observations : int 2 9 12 3 7 4 11 14 6 2 ...
## $ Event.Duration : int 1 49 30 4 10 4 18 23 9 2 ...
## $ Event.Groupsize : int 1 1 1 1 1 1 1 1 1 1 ...
## $ mean : num 0 1 1 0 0 0 0 1 0 0 ...
## $ mode : int 0 1 1 0 0 0 0 1 0 0 ...
## $ Secure : logi FALSE TRUE TRUE FALSE FALSE FALSE ...
## $ Season : chr "Winter" "Winter" "Summer" "Summer" ...
Then, take a look at the distribution of each variable.
# Change 'secure' to 1 & 0
behav$Secure[behav$Secure == T] <- 1
behav$Secure[behav$Secure == F] <- 0
# Distrubtion of variables
par(mfrow=c(2,3))
hist(behav$Event.Observations)
hist(behav$Event.Duration)
hist(behav$Event.Groupsize)
hist(behav$mean)
hist(behav$mode)
hist(behav$Secure)
The Event Observations, Evbent Durations, and group size are skewed to the right. The mode and secure are just bionimial distributed and they look identical. Mean is not bionimal but pretty close to a bionimial distribtuion, which means ungulates barely display a second behaviour in one event. So let’s log transfer skewed variables:
# Distrubtion of log scale, since the duration
par(mfrow=c(1,3))
hist(log(behav$Event.Observations), main="Histogram for log Observation")
hist(log(behav$Event.Duration), main="Histogram for log Duration")
hist(log(behav$Event.Groupsize), main="Histogram for log Group Size")
Looks much better for Event Observations and Evbent Durations but not for the group size. Most of events only have 1 individual.
# log transfer Event observations and Event duration
behav$log.Observations <- log(behav$Event.Observations)
# Since duration has 0 mintues, apply log(x + 1) instead of log()
behav$Event.Duration <- (behav$Event.Duration + 1)
behav$log.Duration <- log(behav$Event.Duration)
# Run linear regression
summary(lm(log.Duration ~ log.Observations, data = behav))
##
## Call:
## lm(formula = log.Duration ~ log.Observations, data = behav)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.5177 -0.5447 -0.2155 0.1354 5.6669
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.08377 0.04031 2.078 0.0378 *
## log.Observations 1.50573 0.02033 74.066 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8826 on 2135 degrees of freedom
## Multiple R-squared: 0.7198, Adjusted R-squared: 0.7197
## F-statistic: 5486 on 1 and 2135 DF, p-value: < 2.2e-16
visreg::visreg(lm(log.Duration ~ log.Observations, data = behav))
# log transfer group size
behav$log.group <- log(behav$Event.Groupsize)
summary(lm(log.Observations ~ log.group, data = behav))
##
## Call:
## lm(formula = log.Observations ~ log.group, data = behav)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.6502 -0.5319 -0.0211 0.4489 3.7401
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.63052 0.02164 75.36 <2e-16 ***
## log.group 0.73557 0.05816 12.65 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9062 on 2135 degrees of freedom
## Multiple R-squared: 0.06969, Adjusted R-squared: 0.06925
## F-statistic: 159.9 on 1 and 2135 DF, p-value: < 2.2e-16
visreg::visreg(lm(log.Observations ~ log.group, data = behav))
summary(lm(Event.Duration ~ log.group, data = behav))
##
## Call:
## lm(formula = Event.Duration ~ log.group, data = behav)
##
## Residuals:
## Min 1Q Median 3Q Max
## -209.34 -60.44 -56.44 -36.44 2750.56
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 63.443 5.433 11.677 < 2e-16 ***
## log.group 105.966 14.606 7.255 5.59e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 227.6 on 2135 degrees of freedom
## Multiple R-squared: 0.02406, Adjusted R-squared: 0.0236
## F-statistic: 52.64 on 1 and 2135 DF, p-value: 5.591e-13
visreg::visreg(lm(Event.Duration ~ log.group, data = behav))
Both models are statistically significant P < 0.001 but both models have very low R2 0.06 and 0.02 . Which means the Event and Event Group Size are probally not the same.
I tried box plot as Chirs suggested using Secure and Event Duration since mean isn’t really categorical.
# Convert Secure to factor
behav$Secure <- as.factor(behav$Secure)
#bot plot
boxplot( behav$Event.Duration ~ behav$Secure )
Looks like there is a slight difference in Event Duration when comarping Secure and Non-secure behaviour.
So let’s fit the linear model:
summary(lm(Event.Duration ~ Secure, data = behav))
##
## Call:
## lm(formula = Event.Duration ~ Secure, data = behav)
##
## Residuals:
## Min 1Q Median 3Q Max
## -157.26 -52.42 -48.42 -32.42 2655.74
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 54.422 5.635 9.659 <2e-16 ***
## Secure1 103.834 11.325 9.169 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 225.9 on 2135 degrees of freedom
## Multiple R-squared: 0.03788, Adjusted R-squared: 0.03743
## F-statistic: 84.06 on 1 and 2135 DF, p-value: < 2.2e-16
visreg::visreg(lm(Event.Duration ~ Secure, data = behav))
The difference is statistically significant.
Since, Events Duration and Events Obvservation is correlated. We can also expect different Event Obvservation when compare secure and non-secure behaviour.
#bot plot
boxplot( behav$Event.Observations ~ behav$Secure )
summary(lm(Event.Observations ~ Secure, data = behav))
##
## Call:
## lm(formula = Event.Observations ~ Secure, data = behav)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.142 -3.691 -1.691 1.309 193.858
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.6909 0.3054 18.63 <2e-16 ***
## Secure1 15.4509 0.6139 25.17 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.25 on 2135 degrees of freedom
## Multiple R-squared: 0.2288, Adjusted R-squared: 0.2285
## F-statistic: 633.5 on 1 and 2135 DF, p-value: < 2.2e-16
visreg::visreg(lm(Event.Observations ~ Secure, data = behav))
The difference is more significant than Event Duration vs Secure, Estimate = 15.45, R2 = 0.23 .
We can check histogram visually:
I found three noticeable pattern by just look at histgram. Furtuer exaimnation is needed.
Finally, the Event VS. Secure behaviour was checked:
| Event | Behaviour | Species | Estimate | P | R.squre |
|---|---|---|---|---|---|
| Event Obsercations | Secure | White-tailed deer | 12.7 | 0 | 0.2 |
| Event Obsercations | Secure | Moose | 27.27 | 0 | 0.34 |
| Event Obsercations | Secure | Caribou | 12.34 | 0 | 0.25 |
| Event Duration | Secure | White-tailed deer | 122.72 | 0 | 0.04 |
| Event Duration | Secure | Moose | 111 | 0 | 0.05 |
| Event Duration | Secure | Caribou | 63.56 | 0 | 0.02 |
par(mfrow=c(1,3))
visreg::visreg(z1, main="Deer")
visreg::visreg(z2, main="Moose")
visreg::visreg(z3, main="Caribou")
par(mfrow=c(1,3))
visreg::visreg(z4, main="Deer")
visreg::visreg(z5, main="Moose")
visreg::visreg(z6, main="Caribou")
The take homes are: