set.seed(1)
require(forecast)
library(tscount)
library(ggplot2)
library(parameters)
library(tidyverse)
library(nlme)
library(scales)
library(ggforce)
library(kableExtra)

1 Spreading the word of temporal autocorrelation

Events that occur in succession tend to be more similar than events that occur after a long separation. The extent to which a measured is determined by previous events is called temporal autocorrelation.
Long-term ecological studies are likely to deal with temporal autocorrelation biases once they have collected data over a long period of time in succession.

Despite my limited knowledge in time series analysis, I find this topic particularly interesting because it involves complex nuances that cannot be fully captured by simple generalized linear models. So, let’s explore how we can handle them.

In this post, I will demonstrate how to use the Tscount package and compare the performance with a more simple Autoregressive Generalized Linear Model (GLM). In case you are not familiar with GLMs please check here or here first. Because we need some knowledge about trend, moving average and how a time series looks like, I’ll first introduce these concepts then we start playing around with the Tscount package. If you’re already familiar with these concepts, feel free to skip ahead to the Autoregressive GLMs or Tscount package, here we go! sections

To make our script more redable I’ll use a lot the pipe %>% command from tidyverse package. If you are not familiar with pipes please check here and here

2 Digging deep:

2.1 How an temporarily autocorrelated data looks like

The name sounds intuitive. A time series is any given variable measured along the time. The particular structure in this data is that the observation in a given day (let’s say today) is the result of multiple processes that occurred back in time (yesterday, a week ago…) but have an important effect on the current conditions.

Temperature is an excellent example to consider here. Let’s suppose you are reading this in a heavy-rainy Friday, the last day before the starting of your vacations. You could kill to rest in a beach for your first weekend in vacations. Considering that you didn’t check the forecast and need to decide if worth it to make the preparations to a short trip, how excited you could get? At less the weather forecast says otherwise, there is a great chance that you end up and postponing this trip. It is because you know that unless something happens, there is a great chance to the weather in the Saturday reflects the heavy-rainy you are experiencing in Friday. There is even more chance to the Saturday is still raining if more than the Friday, the day after (Thursday) was also heavy-rainy. Even worst if the weather is still raining for over the whole week before the weekend of your trip is supposed to be. Well, if you are in the wet season it is even more likely that you must avoid any expectations about your trip in the weekend after a rainy Friday, and so on.

I hope the above explanation sound reasonable to you, because is this kind of challenge we need to overcome handling with temporally correlated data in time series. So, let’s explore timeseries a little bit. Things will be a bit more complicated but I promise I’ll try to make it the most palatable as possible! If want to check other resources, I recommend here and here

We will start by simulating a time series of a variable with a specific autocorrelation. I’ll explain each parameter in our simulation latter.

# Set the seed for reproducibility
set.seed(123)

# Generate a random time series with an autoregressive component (ar) of 0.7 and a moving average of 0.5.
my_ar=.70
my_ma=.50
ts1 <- arima.sim(model = list(ar = c(my_ar),ma=c(my_ma)), sd=3, n = 365)

#Define dates
dates = seq(as.Date("01-01-2007",  format = "%d-%m-%Y"), length.out = 365, by = "day")

#Now, plot altogether
data.frame(Temperature=as.vector(ts1),dates=dates,Variable="Temperature")%>%
ggplot(., aes(y=Temperature, x=dates,color=Variable)) + 
  geom_line(aes(color=Variable)) +
  geom_point(shape=16)+
    labs(x="Date", y="Value",color=NULL)+
  scale_color_manual(values=c('black'))+
  scale_x_date(date_labels = "%b-%Y")+theme_classic(base_size =16)

2.2 Checking the autocorrelation

We can use the Autocorrelation Function acf()and the Partial Autocorrelation function pacf() to detected the extent to which a observation is dependent to the previous observations.

Thus, the acf() check how the observation is related to different time-lags. So, at the time-lag 0, observation is compared to itself and is always = 1. As the time-lag increases to 1, 2, and 3, observation is correlated with 1 observation before, or 2, or 3, respectively. Thus, if it is positively correlated it tends to look like a stairs. See the ACF plot ts1 below.

On another hand, the Partial Autocorrelation function pacf() decompose the contribution of each previous observation. So, in the time-lag 1, we have the same value in the ACF and PCAF function but at the time-lag 2 the correlation the extent to which observation is related to two observations back in time is discounted from the correlation at time-lag 1.

This is why the PCAF plot usually looks so different than the ACF. Both plots represent how a given observation is determined by previous observation. Below, we provide a table comparing both the ACF and PACF, we can see that the time-lag is equal at time-lag 1 and different therefore.

par(mfrow=c(1,2))
resultACF<-acf(ts1,xlim=c(0,15),ylim=c(-.4,1),main="ACF Plot ts1")
resultPACF<-pacf(ts1,xlim=c(1,14),ylim=c(-.4,1),main="PACF Plot ts1")

data.frame(time.lag=c(0,seq(1,5,1)), 
           ACF=round(resultACF$acf[1:6,,1],2),
           PACF=c("NA",round(resultPACF$acf[1:5,,1],2)))%>%t()%>%kable(.,"html")%>%
  kable_styling(full_width = F)

time.lag	0	1	2	3	4	5
ACF	1.00	0.79	0.49	0.28	0.14	0.06
PACF	NA	0.79	-0.36	0.14	-0.11	0.05

3 Modelling Time series

3.1 (S)ARIMA

Modelling time-series is a broad field in statistics and there is a huge toolbox to handle the unique nuances existing in time series data, here, we will focus on the usage of the Tscount package but might be worth it to consider the ARIMA model. ARIMA model is the brief for “Autoregressive Integrated Moving Average” the idea of this model is consider an autoregressive component, a trend, and multiple moving averages in a single model.

The autoregressive component represents the extent to which a observation is related to previous observations;
A trend. Year after year, winters are colder than summers, but if you collect data long enough, you are likely to see that there is a general trend to increase the temperature in most parts of the world due to climatic changes. This is a kind of trend that can emerge in a time series.
Finally, the moving average. I think this is the most difficult concept to grasp. Fortunately (or unfortunately) we heard about it every day during the Pandemic outbreak. It is because, at least in Brazil, for some reason, there was a general sub notification of the Covid cases during the weekends. Thus, every weekend there were a general felling (and hope) that the cases were reducing but hereafter comes the Mondays and Tuesdays to boost the cases once again. Any discussion based on the number of cases by day were useless because the ups and downs were almost unpredictable. Instead, the discussion were about the Moving average of the last three days and also considering the whole week. Thus, the moving average is basically the mean number of cases during a given period of time (three days, or a week). So, don’t matter the particular ups and downs of a couple of days, but the multiple moving averaging along the trend.

You will probably see the ARIMA models denoted as ARIMA(p,d,q). The letters p, d and q represents exactly each one of the components aforementioned. So, p = Autoregressive component, differentiation (d) = trend, and q represents the moving average. Also keep in mind that that the autoregressive and moving average might have more than one parameter. For instance, observation might be correlated with the previous observation at some degree but every week days might be similar too. So, every Monday is more similar with other Mondays than Wednesdays. In this case we expect at least two autoregressive components and we call this an autoregressive model of order 2.

Remember our example about the heavy-rainy Friday being in the wet season? So, where is the season in the model we simulated?

That is right, the season component is missing! The aforementioned ARIMA model is non-seasonal and could not capture the variation of annual cycles, but there is also a Seasonal (S)ARIMA where the same p,d, and q components are applied to to different seasons and are denoted by the capital letters P, D, and Q as follow ARIMA(p,d,q)(P,D,Q).

Please don’t bother yourself with these letters popping out every time. I just telling you this because it is quite likely that you need this to read the notation provided in most of the timeseries material.

Now you know that the ARIMA tool kit is complex enough, let’s stick in our non-seasonal ARIMA because it is exactly what we want since the Tscount package cannot handle the Season component directly. Of course, there is always a trick you can use to extent the functionalities of statistical analyses. Indeed, the documentation of the Tscount package handled the annual component in a analysis of killed drivers of light good vehicles in Great Britain due to the lack of seat belts by setting a autoregressive component related to 12 months. It was a great idea by the way.

3.1.1 ARIMA model: hands-on

Considering our time series, we can use the function auto.arima() to automatically estimate autocorrelation, trend, and moving average components.

arima.model1<-auto.arima(ts1)
arima.model1.tab<-arima.model1%>%parameters()%>%mutate_if(is.numeric,round,2)
arima.model1.tab

The Auto.arima function just nail it! This function correctly estimated the autoregressive component at ar1 = 0.62 from 0.7and the moving average (ma1) = 0.54 from 0.5 in our simulate data.

Now let’s plot our simulated time series followed by the timeseries predicted by the auto.arima function.

data.frame(simulated=as.vector(ts1),auto.arima=fitted(arima.model1),dates=dates)%>%
  pivot_longer(!dates, names_to = "timeseries", values_to = "Value")%>%
  ggplot(., aes(x=dates, y=Value)) + 
  geom_line(aes(color=timeseries)) +
  geom_point(shape=16,aes(color=timeseries))+
  labs(x="Date", y="Value",color=NULL)+
  scale_color_manual(values=c('tomato',"black"))+
  scale_x_date(date_labels = "%b-%Y")+theme_classic(base_size =16)

Wow, that is so accurate! The auto.arima function did a pretty good job at the end! But what about if we are interested in using covariates? The sad news is that we cannot rely on the auto.arima function for testing the effect of covariates in the variable we are analysing.

The good news is the is several fancy models to do so, here, I will use a generalized Linear Model (GLM) framework!😊

3.2 Autoregressive GLMs

3.2.0.1 Loading data

To exemplify the use of an autoregressive GLM we will use data from Marquiz-Alves et al. 2023 PeerJ. In this example, the authors surveyed the Andean condors perching in a communal roost three times per weeks as well as the temperature and rainfall at the location.

Data is available in the supplementary material of this paper but here we will use a slightly modified version of it.

condor_data<-read.csv("Condor_data.csv",sep=";")

head(condor_data)

# We need to standardize the covariates so we can compare the conclusions in different models at the end.
condor_data$Temperature_stand<-as.vector(scale(condor_data$Temperature))
condor_data$Rainfall_stand<-as.vector(scale(condor_data$Rainfall))

3.2.0.2 Plotting: Let’s check what we got!

The number of surveyed condors around the year tends to decrease dramatically. This coincides with the general trend of the temperature increasing.

par(mar=c(5,5,1,1))
plot(condor_data$Condors,type="b",col=NULL,xlab="Surveys days", ylab="Total condors surveyed",ylim=c(-10,40),pch=16,cex.lab=1.5,cex.axis=1.5)
barplot(condor_data$Rainfall, width = 1, space = 0, col=alpha("blue", 0.1),yaxt='n',add=T)
abline(a=0,b=0,col="grey",lty=2)
points(condor_data$Temperature,type="b",col="tomato",pch=16)
points(condor_data$Condors,type="b",col="black",pch=16)

legend(75, 40, legend=c("Condors", "Temperature","Rainfall"),
       col=c("black", "tomato",alpha("blue", 0.2)), lty=1,lwd=c(1,1,7), cex=1.2)

But let’s look close. So, we will focus on surveys 1 to 20 and 60 to 80 to visually examine the behaviour of the total condors censused at different daily temperature and rainfall. We can see there is a general trend to increase the number of surveyed condors at higher temperature that is easier to detected on zooming surveys 1-20 but a rapid reduction on the number of surveyed condors after a rainy day, that is quite clear zooming surveys 50-70.

Particularly in days 60-80 the relationship with the temperature seems not so strong. What do you thing?

par(mfrow=c(1,2))
plot(condor_data$Condors[1:20],type="b",col=NULL,xlab="Surveys days", ylab="Total condors surveyed",ylim=c(-10,40),pch=16, main="Days 1-20")
barplot(condor_data$Rainfall[1:20], width = 1, space = 0, col=alpha("blue", 0.1),add=T)
abline(a=0,b=0,col="grey",lty=2)
points(condor_data$Temperature[1:20],type="b",col="tomato",pch=16)
points(condor_data$Condors[1:20],type="b",col="black",pch=16)
legend(10, 40, legend=c("Condors", "Temperature","Rainfall"),
       col=c("black", "tomato",alpha("blue", 0.2)), lty=1,lwd=c(1,1,7), cex=1.2)


plot(condor_data$Condors[60:80],type="b",col=NULL,xlab="Surveys days", ylab="",ylim=c(-10,40),pch=16, main="Days 60-80")
barplot(condor_data$Rainfall[60:80], width = 1, space = 0, col=alpha("blue", 0.1),add=T,ylab="")
abline(a=0,b=0,col="grey",lty=2)
points(condor_data$Temperature[60:80],type="b",col="tomato",pch=16)
points(condor_data$Condors[60:80],type="b",col="black",pch=16)
legend(10, 40, legend=c("Condors", "Temperature","Rainfall"),
       col=c("black", "tomato",alpha("blue", 0.2)), lty=1,lwd=c(1,1,7), cex=1.2)

This is now where autoregressive GLMs come to action! ARIMA models, auto.arima() included, cannot handle covariates. So sad, indeed! So we cannot use this super fancy and very accurate function to rely our science stuffs 😞.

Fortunately, there is plenty of alternative to handle time series with covariates using the Generalized Linear Model framework. We will compare the performance of two alternative here.

3.2.0.3 First, let’s start with the Autoregressive GLM using the autoregressive component implemented from the `nlme` package.

So, let’s check how surveys are autocorrelated in time.

par(mfrow=c(1,2))
acf(condor_data$Condors,main="Andean condor census")
pacf(condor_data$Condors,main="Andean condor census")

As we can see, there is a strong temporal autocorrelation in the Andean condor survey. When we check the partial autocorrelation function we see that a given observation is quite strong on the immediate previous observation, time-lag = 1, but also significantly associated to 2, 3 and 4 time-lags.

3.2.0.4 Setting a simple glm to see whats happen.

We will first check how this data behaviour under a simple GLM.We will call this glm naive.glm

naive.glm<-glm(Condors ~ Temperature_stand + Rainfall_stand, family=poisson(),data=condor_data)
parameters(naive.glm)

There is a strong relationship between the surveyed condors and the temperature and rainfall in our model.

Now, let’s see how the model predicted the number of censused condors.

plot(condor_data$Condors,type="b",col=NULL,xlab="Surveys days", ylab="Total condors surveyed",ylim=c(-10,40),pch=16,cex.lab=1.5,cex.axis=1.5)
points(condor_data$Condors,type="b",col="black",pch=16)

points(fitted(naive.glm),type="b",col="blueviolet",pch=16)

barplot(condor_data$Rainfall, width = 1, space = 0, col=alpha("blue", 0.1),yaxt='n',add=T)
abline(a=0,b=0,col="grey",lty=2)
#points(condor_data$Temperature,type="b",col="tomato",pch=16)
#points(condor_data$Condors,type="b",col="black",pch=16)
legend(60, 40, legend=c("Condors", "Estimated naive glm","Rainfall"),
       col=c("black", "blueviolet",alpha("blue", 0.2)), lty=1,lwd=c(1,1,7), cex=1.2)

Our naive glm did a pretty good job! However, when we check the residuals (figure below) we can see it has a significant temporal autocorrelation structure which is definitively something undesired. Just remember, that a good model the residuals MUST be independent (in time or space, here we are assessing just time).

layout(t(matrix(c(1,1,1,1,2,2,3,3),ncol=2)))
plot(residuals(naive.glm),type="b",col=NULL,xlab="Surveys days", ylab="Residuals",ylim=c(-5,8),pch=16,cex.lab=1.5,cex.axis=1.5)
points(residuals(naive.glm),type="b")
acf(residuals(naive.glm))
pacf(residuals(naive.glm))

So, let’s use another tool to see if we can handle this.

3.2.0.5 Now, it’s time to incorporate the autoregressive component in our GLM.

Once again, we have bad news!

In these usual packages we ecologists customary use, there is temporal correlation component available as far I know. The most close approach is to use a Generalised Least Square regression from the ‘nlme’ package. This is not a big problem as we can transform or count data by using the natural logarithm + 1 to correct for count dispersion common in count data. This is simple, works, but it is not very elegant. Here an amazing example on how to use Autoregressive GLS in Social Sciences.

#We need to inform a index of observations

ID=1:dim(condor_data)[1]

condor_data_ID<- as.data.frame(cbind(condor_data,ID))

modelGLSAR<-gls(log(Condors+1) ~ Temperature_stand + Rainfall_stand, correlation = corAR1(form=~ID),data=condor_data,method="ML")

3.2.0.5.0.1 Now, let’s use the AIC to compare both models

We cannot directly compare models using transformed and non-transformed data, but again we can handle it.

To compare the AIC from transformed and non-transformed models, you must correct the AIC of the model where data was transformed by sum the AIC by 2*sum(logarithm of the data + 1). Check why on the stackoverflow

AIC_res<-AIC(naive.glm,modelGLSAR)
AIC_res$AIC[2]<-AIC_res$AIC[2] + sum(2*log(condor_data$Condors+1))     
AIC_res%>%kable(.,"html")%>%kable_styling(full_width = F) #Note: kable and kable_styling is just a bautiful way to provide the table. You problable can remove this part of the code if you are not using Rmarkdown

	df	AIC
naive.glm	3	748.8869
modelGLSAR	5	550.8080

When we compare the goodness of fit of both approaches we find a clearly support over the autoregressive GLM over the naive GLM.

3.2.0.5.0.2 We cannot forget to check basic assumption of independence of the residuals.

Well, we must also check the assumptions of our model and we can see that the residuals are finally autocorrelated. That is a great achievement!¹

layout(t(matrix(c(1,1,1,1,2,2,3,3),ncol=2)))
plot(residuals(modelGLSAR,type="normalized"),type="b",col=NULL,xlab="Surveys days", ylab="Residuals",ylim=c(-3,3),pch=16,cex.lab=1.5,cex.axis=1.5)
points(residuals(modelGLSAR,type="normalized"),type="b")
acf(residuals(modelGLSAR,type="normalized"))
pacf(residuals(modelGLSAR,type="normalized"))

3.2.0.5.0.3 But what about check results visually?

Now, let’s compare visually the performance of our naive glm and our autoregressive gls compared to the original dataset.

plot(condor_data$Condors,type="b",col=NULL,xlab="Surveys days", ylab="Total condors surveyed", ylim=c(-10,40), pch=16, cex.lab=1.5,cex.axis=1.5)
points(condor_data$Condors,type="b",col="black",pch=16)

points(fitted(naive.glm),type="b",col="blueviolet",pch=16)
points(expm1(fitted(modelGLSAR)),type="b",col="green",pch=16)
legend(70, 40, legend=c("Condors", "Estimated naive glm","Estimated GLSAR"),
       col=c("black", "blueviolet",alpha("green", 0.2)), lty=1,lwd=c(1,1,7), cex=1.2)

Oh my! There is a clear under performing using the autoregressive GLS! We definitively must try something else once these results are not good enough to make the conclusions we desire. Now, it’s time to play around with the Tscount package!

4 Tscount package, here we go!

Every component presented, we can now go to the Tscount package, which is very straightforward. There is a main function called tsglm that perform the GLM with an autoregressive component (also moving averaging if you need it).

The documentation of the tscount package provides numerous comparisons with much more fancy packages but with proportional complexity. I recommend reading it, but if this is the first time you are in touch to time series analyses, I suggest you to expend a bit more time navigating here and other the references I’m citing.

4.1 Tscount package drawnback

Back to the tscount, the major difference between our last models and tsglm is that we cannot use the syntax \(y = x1 + x2\) used in our previous model. I think that this syntax is pretty intuitive and I truly don’t understand what the challenges to implement something like this, however, it is as it is. Therefore, we need to group our covariates in a separate data.frame. Here, I will call it Betas.

Betas<-cbind(Temperature_stand= condor_data$Temperature_stand,
                Rainfall_stand=condor_data$Rainfall_stand)

Betas%>%head()%>%t()%>%kable(.,"html")%>%
  kable_styling(full_width = F)

Temperature_stand	-0.6812912	-0.3798905	-0.3296570	-0.7315246	-1.0831588	-3.1427302
Rainfall_stand	-0.2867711	-0.2867711	-0.2867711	-0.2867711	-0.2867711	-0.2867711

4.2 We finally got the juicy part!

Once we have the regressors (our Betas) we can finally perform the analyses using the Tscount package. So, let’s use the function tsglm function!

glm_tscount<- tsglm(ts=condor_data$Condors, link="log",
  model=list(past_obs=c(1)), distr="pois",xreg=Betas)

glm_tscount

## 
## Call:
## tsglm(ts = condor_data$Condors, model = list(past_obs = c(1)), 
##     xreg = Betas, link = "log", distr = "pois")
## 
## Coefficients:
##       (Intercept)             beta_1  Temperature_stand     Rainfall_stand  
##            0.6586             0.5106            -0.1653            -1.4631

Of course, we cannot forget to check the assumptions in our model.We can just use plot(glm_tscount)to quick check the most important assumptions in our model.

layout(t(matrix(c(1,1,1,1,2,2,3,3),ncol=2)))
plot(residuals(glm_tscount,type="pearson"),type="b",col=NULL,xlab="Surveys days", ylab="Residuals",ylim=c(-3,7),pch=16,cex.lab=1.5,cex.axis=1.5)
points(residuals(glm_tscount,type="pearson"),type="b")
acf(residuals(glm_tscount,type="pearson"))
pacf(residuals(glm_tscount,type="pearson"))

4.3 Now, let’s compare the performance of all models

We will apply again the AIC and visually.

4.3.1 Let’s kill our anxiety by checking visually at first

When we plot all models together, we can see that even if the naive GLM did a fair job to estimate the total condors surveyed, the Tscount has a superior accuracy. Using the Tscount the model could incorporate the high ocurrence of zeros at the end of the time series that coincide with the increasing in the rainfall.

plot(condor_data$Condors,type="b",col=NULL,xlab="Surveys days", ylab="Total condors surveyed", ylim=c(-10,40), pch=16, cex.lab=1.5,cex.axis=1.5)
points(condor_data$Condors,type="b",col="black",pch=16)

points(fitted(naive.glm),type="b",col="blueviolet",pch=16)
points(expm1(fitted(modelGLSAR)),type="b",col="green",pch=16)
points(fitted(glm_tscount),type="b",col="orange",pch=16)

legend(70, 40, legend=c("Condors", "Estimated naive glm","Estimated GLSAR","Estimated Tscount"),
       col=c("black", "blueviolet","green","orange"), lty=1,lwd=c(1,1,1), cex=1.2)

4.3.2 Now, let’s see what our lovely AIC say to us

Let’s take the same code lines we previously used to compute and make comparable the AICs for the naive.glm and modelGLSAR models. We just need to add the glm_tscount model here too.

AIC_res<-AIC(naive.glm,modelGLSAR,glm_tscount)
AIC_res$AIC[2]<-AIC_res$AIC[2] + sum(2*log(condor_data$Condors+1))     
AIC_res%>%kable(.,"html")%>%kable_styling(full_width = F)

	df	AIC
naive.glm	3	748.8869
modelGLSAR	5	550.8080
glm_tscount	4	633.6192

At first glance we see that the Tscount package underperformed compared to the modelGLSAR. We made a huge achievement by accounting for the temporal autocorrelation, so, don’t let it down and don’t worry because we can also consider a negative binomial distribution to see if the model improve its performance.

The negative binomial distribution usually perform better than Poisson when we have highly dispersed data. The overdispersion in the condor dataset is likely due to few surveys that accounted over 30 individuals meanwhile a bunch of zero records.

glm_tscount_pois<-glm_tscount

glm_tscount_negbin<- tsglm(ts=condor_data$Condors, link="log",
  model=list(past_obs=c(1)), distr="nbinom",xreg=Betas)

AIC_res<-AIC(naive.glm,modelGLSAR,glm_tscount_pois,glm_tscount_negbin)
AIC_res$AIC[2]<-AIC_res$AIC[2] + sum(2*log(condor_data$Condors+1))     
AIC_res%>%kable(.,"html")%>%kable_styling(full_width = F)

	df	AIC
naive.glm	3	748.8869
modelGLSAR	5	550.8080
glm_tscount_pois	4	633.6192
glm_tscount_negbin	5	535.4569

This looks like the case here! The model using a negative binomial performed much better than other models. So, let’s consider it to drawn our conclusions hereafter.

4.4 We all love p-values, no?

Despite the major criticism about p-values, I like to report it as I think that it is become already intuitive for most of minimum trained researchers. However, I prefer to report the effect size and its respective confidence interval. If you wanna to check more about p-values and how we can better understand it, check this amazing source!

Because tsglm doesn’t provide the p-values, we find them using a normal approximation technique applied to the confidence interval from the parameters estimated by our model. The normal approximation is explained in detail here.

glm_tscount_t_list<-abs(as.vector(summary(glm_tscount_negbin)$coeff[,1]/summary(glm_tscount_negbin)$coeff[,2]))

#P-valor by normal approximation
glm_tscount_p_list<-exp(-0.717*abs(as.vector(glm_tscount_t_list)) - 
    0.416 * abs(as.vector(glm_tscount_t_list))^2)

glm_tscount_withP<-data.frame(summary(glm_tscount_negbin)$coeff,t_value=glm_tscount_t_list,p_value=glm_tscount_p_list)%>%round(4)

glm_tscount_withP

So, now, we see that temperature is no longer a super important variable. Indeed, only a very small evidence support the idea that the temperature determine the total number of surveyed condors. Now, let’s compare each model and their conclusions.

naive_glm_df<-naive.glm%>%parameters()%>%as.data.frame()

  naive_glm_df<-naive_glm_df[,c(1,2,3,9)]
    naive_glm_df$Method<- "Naive GLM"  

modelGLSAR.df<-modelGLSAR%>%parameters()%>%as.data.frame()
  
    modelGLSAR.df<-modelGLSAR.df[,c(1,2,3,9)]
    modelGLSAR.df$Method<- "GLS AR1"  
    
    
glm_tscount_df<-glm_tscount_withP%>%
add_rownames(., var = "Parameter")%>% 
    data.frame()

## Warning: `add_rownames()` was deprecated in dplyr 1.0.0.
## i Please use `tibble::rownames_to_column()` instead.

    glm_tscount_df<-glm_tscount_df[,c(1,2,3,7)]
    colnames(glm_tscount_df)<-c("Parameter","Coefficient","SE","p")
    glm_tscount_df$Method<- "Tscount"  

rbind(naive_glm_df,modelGLSAR.df,glm_tscount_df)%>%mutate_if(.,is.numeric,round,3)%>%kable(.,"html")%>%kable_styling(full_width = F)

Parameter	Coefficient	SE	p	Method
(Intercept)	1.492	0.098	0.000	Naive GLM
Temperature_stand	-0.300	0.032	0.000	Naive GLM
Rainfall_stand	-2.358	0.355	0.000	Naive GLM
(Intercept)	1.717	0.210	0.000	GLS AR1
Temperature_stand	-0.194	0.099	0.054	GLS AR1
Rainfall_stand	-0.097	0.064	0.134	GLS AR1
(Intercept)	0.659	0.239	0.006	Tscount
beta_1	0.511	0.116	0.000	Tscount
Temperature_stand	-0.165	0.099	0.095	Tscount
Rainfall_stand	-1.463	0.422	0.001	Tscount
sigmasq	0.544	NA	NA	Tscount

rbind(naive_glm_df,modelGLSAR.df,glm_tscount_df)%>%
ggplot(.,aes(x=Parameter,y=Coefficient,group=Method,fill=Method,color=Method))+
  geom_point(aes(fill=Method))+
  geom_pointrange(aes(x=Parameter,ymin=Coefficient-SE,ymax=Coefficient+SE))+
  ylim(-3,2)+theme_bw()+
  labs(y="Estimated value")

## Warning: Removed 1 rows containing missing values (`geom_segment()`).

4.4.1 Now, it is time to draw the conclusions.

Comparing the three different models we reached very similar results. That is cool, because it supports that both parameters, rainfall and temperature are truly affecting the total surveyed condors along the year. Yet, some small pieces differ and might lead to different conclusions.

First there naive glm seems to over estimate the importance of rainfall.
Second, and most insteresting, the Autoregressive GLS show a stronger effect of temperature than rainfall in the total surveyed condors.

NOTICE!

Hey, it is pretty much my first blog post. It got quite long than I expect and I’m planning to come back to improve it in a near future. So, be tuned!

Meanwhile, what about following me on Twitter to check the next posts? @anycommomname

Any comments, feel free to be in touch. Leave a comment, use the twitter or send me a email

#if woundering why we must normalize the residuals please check here.

https://stats.stackexchange.com/questions/80823/do-autocorrelated-residual-patterns-remain-even-in-models-with-different%20residuals%20type%20normalized%20work%20in%20glm%20in%20R-corre ↩︎

Playing around with the Tscount package

Gabriel Santos

24 janeiro, 2023

1 Spreading the word of temporal autocorrelation

2 Digging deep:

2.1 How an temporarily autocorrelated data looks like

2.2 Checking the autocorrelation

3 Modelling Time series

3.1 (S)ARIMA

3.1.1 ARIMA model: hands-on

3.2 Autoregressive GLMs

3.2.0.1 Loading data

3.2.0.2 Plotting: Let’s check what we got!

3.2.0.3 First, let’s start with the Autoregressive GLM using the autoregressive component implemented from the `nlme` package.

3.2.0.4 Setting a simple glm to see whats happen.

3.2.0.5 Now, it’s time to incorporate the autoregressive component in our GLM.

3.2.0.5.0.1 Now, let’s use the AIC to compare both models

3.2.0.5.0.2 We cannot forget to check basic assumption of independence of the residuals.

3.2.0.5.0.3 But what about check results visually?

4 Tscount package, here we go!

4.1 Tscount package drawnback

4.2 We finally got the juicy part!

4.3 Now, let’s compare the performance of all models

4.3.1 Let’s kill our anxiety by checking visually at first

4.3.2 Now, let’s see what our lovely AIC say to us

4.4 We all love p-values, no?

4.4.1 Now, it is time to draw the conclusions.

Playing around with the Tscount package

Gabriel Santos

24 janeiro, 2023

1 Spreading the word of temporal autocorrelation

2 Digging deep:

2.1 How an temporarily autocorrelated data looks like

2.2 Checking the autocorrelation

3 Modelling Time series

3.1 (S)ARIMA

3.1.1 ARIMA model: hands-on

3.2 Autoregressive GLMs

3.2.0.1 Loading data

3.2.0.2 Plotting: Let’s check what we got!

3.2.0.3 First, let’s start with the Autoregressive GLM using the autoregressive component implemented from the nlme package.

3.2.0.4 Setting a simple glm to see whats happen.

3.2.0.5 Now, it’s time to incorporate the autoregressive component in our GLM.

3.2.0.5.0.1 Now, let’s use the AIC to compare both models

3.2.0.5.0.2 We cannot forget to check basic assumption of independence of the residuals.

3.2.0.5.0.3 But what about check results visually?

4 Tscount package, here we go!

4.1 Tscount package drawnback

4.2 We finally got the juicy part!

4.3 Now, let’s compare the performance of all models

4.3.1 Let’s kill our anxiety by checking visually at first

4.3.2 Now, let’s see what our lovely AIC say to us

4.4 We all love p-values, no?

4.4.1 Now, it is time to draw the conclusions.

3.2.0.3 First, let’s start with the Autoregressive GLM using the autoregressive component implemented from the `nlme` package.