Load Dataset

forest <- read.csv(url("https://www.dropbox.com/s/jbtrv4jy7245qdb/forest.csv?dl=1"),
                   header = TRUE)

The datasets contains the following variables:

Fit a Linear Model for the Dataset

model01 <- lm(Forest.loss~., forest)
summary(model01)
## 
## Call:
## lm(formula = Forest.loss ~ ., data = forest)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.49635 -0.37084 -0.06013  0.27327  3.10696 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.5656742  0.1327105   4.262 6.57e-05 ***
## Pop.dens     0.0008077  0.0001136   7.113 1.02e-09 ***
## Crop.ch     -0.0039748  0.0102140  -0.389  0.69842    
## Pasture.ch   0.0279660  0.0100031   2.796  0.00677 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6743 on 66 degrees of freedom
## Multiple R-squared:  0.4952, Adjusted R-squared:  0.4723 
## F-statistic: 21.59 on 3 and 66 DF,  p-value: 7.448e-10

A multiple linear regression model denoted by model01 was calculated to model the average annual forest loss over the period of 1981 - 1990 expressed as a percentage of total forested area (Forest.loss) based on the number of people per thousand hectares (Pop.dens), average annual cropland change in acres measured in thousand (Crop.Ch), and average annual pasture land change in acres measured in thousand units (Pasture.Ch). A highly significant regression equation was found, \(\text{F}(3,66) = 21.59\), \(\text{p} < 0.001\). The multiple \(\text{R}^2\) and adjusted \(\text{R}^2\) associated with the model is \(0.4952\) and \(0.4723\), respectively, which means that \(49.52\%\) of the variance from the data was accounted by the model. However, the parameter associated with the average annual cropland change in acres is not significant at \(\alpha = 0.05\), \(\hat{\beta}_2 = -0.004\), \(\text{p} = 0.70\). This means that the model may be improved by removing this variable out of the model.

Removing Crop.Ch in the Original Model

model02 <- lm(Forest.loss~Pop.dens + Pasture.ch, forest)
summary(model02)
## 
## Call:
## lm(formula = Forest.loss ~ Pop.dens + Pasture.ch, data = forest)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.49235 -0.38880 -0.07072  0.28086  3.10464 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.5368158  0.1093586   4.909 6.17e-06 ***
## Pop.dens    0.0008146  0.0001115   7.307 4.27e-10 ***
## Pasture.ch  0.0269409  0.0095887   2.810  0.00649 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.67 on 67 degrees of freedom
## Multiple R-squared:  0.4941, Adjusted R-squared:  0.479 
## F-statistic: 32.72 on 2 and 67 DF,  p-value: 1.22e-10

Multiple regression denoted by model02 was calculated the model the average annual forest loss over the period of 1981 - 1990 expressed as a percentage of total forested area (Forest.loss) based on the number of people per thousand hectares (Pop.dens) and average annual pasture land change in acres measured in thousand units (Pasture.Ch). A highly significant regression equation was found, \(\text{F}(2,67) = 32.72\), \(\text{p} < 0.001\). The multiple \(\text{R}^2\) and adjusted \(\text{R}^2\) associated with the model is \(0.4941\) and \(0.479\), respectively, which means that \(49.41\%\) of the variance from the data was accounted by the model and this model is a slight improvement to the first model according the the Adjusted \(\text{R}^2\).

In addition, the parameters associated with Intercept (\(\hat{\beta}_0 \approx 0.5368\), \(\text{p}<0.001\)), Pop.dens (\(\hat{\beta}_1 \approx 0.0008\), \(\text{p}<0.001\)), and Pasture.ch (\(\hat{\beta}_3 \approx 0.0269\), \(\text{p}<0.001\)) are all highly significant. This is an improvement of the first model generated previously. With that, the model associated with the dataset is given by \[ \text{Forest.loss} = 0.5368 + 0.0008 \text{ Pop.dens} + 0.0269 \text{ Pasture.ch} \] which means that for every one person increase per thousand hectares, the average annual forest loss will increase by 0.0008%, on average, while the average annual pasture land change in acres is being held constant. In addition, for every one thousand units increase in the average annual pasture land change in acres, there will be an average increase of 0.0269% in the average annual forest loss while the number of people per thousand hectares is being held constant. Finally, if the number of people per thousand hectares and average annual pasture land change in acres are set to zero, the average annual forest loss will be 0.5368%, on average.

Diagnostic Checking

par(mfrow=c(2,2))
plot(model02)

Residual analysis is performed to assess model02. Looking at the Residuals vs Fitted values plot, there seems to be a funneling pattern in the residuals which means that the residuals are heteroskedastic. In addition, Normal Q-Q Plot slightly suggests a skewed residuals but it can be overlooked since most of the points lies on the line which means that we may assume that the residuals are normally distributed. However, Cook’s distance shows that there are possible outliers in the residuals, specially observation 61. Therefore, the model is not the best model for the dataset.

library(forecast)
Acf(model02$residuals)

Autocorrelation plot was used to assess if there exist serial correlation in the residuals. Since all of the lags are within the acceptable region, this suggests that there exist no serial correlation in the residuals.

library(car)
vif(model02)
##   Pop.dens Pasture.ch 
##   1.007917   1.007917

Variance inflation factors (VIF) were used to assess if multicollinearity is present in the variables. Since the VIF of the two variables are less than 5, then we may assume that there is no multicollinearity in the variables.