Part 1 - Introduction

The data presented corresponds to the area of forests and overall tree coverage affected by deforestation throughout the world while focusing on the country Brazil.

With there being many causes of deforestation throughout the world. The focus of the studied material is the different causes of deforestation affecting Brazil. The data also presents the amount of area affected for each separated cause of the deforestation within the country. A very large portion of the overall deforestation of Earth consists within Brazil. According to Hannah Ritchie, “One-third of tropical deforestation happened in Brazil. That was 1.7 million hectares each year. The other single country where large forest areas are lost is Indonesia – it accounted for 14%. This means around half (47%) of tropical deforestation occurred in Brazil and Indonesia.” (2021) This helps give a quick depiction as to how vast the area of deforestation with a focus on tropical deforestation within Brazil is compared to the rest of the world.The expansion of pasture for beef production, croplands for soy and palm oil, and the increasing conversion of primary forest to tree plantations for paper and pulp have been the key drivers of deforestation - primarily in Brazil and Indonesia. As well in 2022, the world lost approximately 4 million hectares of primary forest, about a ten percent increased from 2021. Brazil had accounted for about 40 percent of that loss. A 2020 study found that two-thirds of deforested land in the Amazon and Cerrado is used for cattle pasture, which has contributed to Brazil doubling its meat exports. Focusing on these aspects as to how they each have an affect on the deforestation of Brazil will help determine which has the greatest affect out of all of them.

Area of Deforestation Connected to Cattle in Brazil
Area of Deforestation Connected to Cattle in Brazil

The main question of the breakdown of the data is: How severe does each cause of deforestation affect the overall deforestation occurring in Brazil? With a focus on whether Natural disturbances have a greater effect on the deforestation in Brazil or Human caused disturbances have a greater effect on the deforestation in Brazil.

To explain the breakdown of the experiment a little bit further. The data is separated into a response and explanatory variable. The response variable within this data is the amount of forest and overall land affected by the separate causes being measured in hectares. The explanatory variable within the data is the individual causes of deforestation within Brazil.

#Load Libraries and Data

library(ggplot2)
library(ggpubr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
library('broom')
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.4     ✔ tibble    3.2.1
## ✔ purrr     1.0.4     ✔ tidyr     1.3.1
## ✔ readr     2.1.5
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ car::recode()   masks dplyr::recode()
## ✖ purrr::some()   masks car::some()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(datasets)
library(rstatix)
## 
## Attaching package: 'rstatix'
## 
## The following object is masked from 'package:stats':
## 
##     filter
library(car)
library(RColorBrewer)

brazil_loss <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-04-06/brazil_loss.csv')
## Rows: 13 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): entity, code
## dbl (12): year, commercial_crops, flooding_due_to_dams, natural_disturbances...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
brazil.df <- data.frame(brazil_loss)

head(brazil_loss)
## # A tibble: 6 × 14
##   entity code   year commercial_crops flooding_due_to_dams natural_disturbances
##   <chr>  <chr> <dbl>            <dbl>                <dbl>                <dbl>
## 1 Brazil BRA    2001           280000                    0                    0
## 2 Brazil BRA    2002           415000                79000                35000
## 3 Brazil BRA    2003           550000                    0                35000
## 4 Brazil BRA    2004           747000                26000                22000
## 5 Brazil BRA    2005           328000                17000                26000
## 6 Brazil BRA    2006           188000                17000                26000
## # ℹ 8 more variables: pasture <dbl>, selective_logging <dbl>, fire <dbl>,
## #   mining <dbl>, other_infrastructure <dbl>, roads <dbl>,
## #   tree_plantations_including_palm <dbl>, small_scale_clearing <dbl>
summary(brazil_loss)
##     entity              code                year      commercial_crops
##  Length:13          Length:13          Min.   :2001   Min.   : 52000  
##  Class :character   Class :character   1st Qu.:2004   1st Qu.: 79000  
##  Mode  :character   Mode  :character   Median :2007   Median :118000  
##                                        Mean   :2007   Mean   :234846  
##                                        3rd Qu.:2010   3rd Qu.:328000  
##                                        Max.   :2013   Max.   :747000  
##  flooding_due_to_dams natural_disturbances    pasture        selective_logging
##  Min.   :    0        Min.   :    0        Min.   : 546000   Min.   : 44000   
##  1st Qu.:    0        1st Qu.:22000        1st Qu.: 738000   1st Qu.: 87000   
##  Median : 9000        Median :26000        Median :1520000   Median : 96000   
##  Mean   :14692        Mean   :31538        Mean   :1561769   Mean   :104846   
##  3rd Qu.:17000        3rd Qu.:35000        3rd Qu.:2564000   3rd Qu.:131000   
##  Max.   :79000        Max.   :87000        Max.   :2761000   Max.   :166000   
##       fire            mining      other_infrastructure     roads      
##  Min.   : 26000   Min.   :    0   Min.   :    0        Min.   : 9000  
##  1st Qu.: 44000   1st Qu.:    0   1st Qu.: 9000        1st Qu.:13000  
##  Median : 79000   Median :    0   Median : 9000        Median :22000  
##  Mean   :157692   Mean   : 5769   Mean   :10077        Mean   :25923  
##  3rd Qu.:122000   3rd Qu.: 9000   3rd Qu.:13000        3rd Qu.:35000  
##  Max.   :537000   Max.   :35000   Max.   :17000        Max.   :57000  
##  tree_plantations_including_palm small_scale_clearing
##  Min.   : 9000                   Min.   :232000      
##  1st Qu.:26000                   1st Qu.:271000      
##  Median :35000                   Median :293000      
##  Mean   :36231                   Mean   :305769      
##  3rd Qu.:44000                   3rd Qu.:310000      
##  Max.   :92000                   Max.   :415000

Part 2 - Exploring the Data (Descriptive Statistics)

This portion of the data presented shows the breakdown of the mean, median, standard deviation and the standard error of the separate causes of deforestation. This shows the statistics of the amount of area in hectares affected by the deforestation.

#explain finding mean, median, etc…

#mean, medians, standard deviations, standard error

#mean
mean(brazil_loss$natural_disturbances)
## [1] 31538.46
mean(brazil_loss$flooding_due_to_dams)
## [1] 14692.31
mean(brazil_loss$commercial_crops)
## [1] 234846.2
mean(brazil_loss$pasture)
## [1] 1561769
mean(brazil_loss$selective_logging)
## [1] 104846.2
mean(brazil_loss$fire)
## [1] 157692.3
mean(brazil_loss$mining)
## [1] 5769.231
mean(brazil_loss$other_infrastructure)
## [1] 10076.92
mean(brazil_loss$small_scale_clearing)
## [1] 305769.2
mean(brazil_loss$roads)
## [1] 25923.08
mean(brazil_loss$tree_plantations_including_palm)
## [1] 36230.77
#median
median(brazil_loss$natural_disturbances)
## [1] 26000
median(brazil_loss$flooding_due_to_dams)
## [1] 9000
median(brazil_loss$commercial_crops)
## [1] 118000
median(brazil_loss$pasture)
## [1] 1520000
median(brazil_loss$selective_logging)
## [1] 96000
median(brazil_loss$fire)
## [1] 79000
median(brazil_loss$mining)
## [1] 0
median(brazil_loss$other_infrastructure)
## [1] 9000
median(brazil_loss$small_scale_clearing)
## [1] 293000
median(brazil_loss$roads)
## [1] 22000
median(brazil_loss$tree_plantations_including_palm)
## [1] 35000
#standard deviation
sd(brazil_loss$natural_disturbances)
## [1] 21344.85
sd(brazil_loss$flooding_due_to_dams)
## [1] 21269.64
sd(brazil_loss$commercial_crops)
## [1] 220504.7
sd(brazil_loss$pasture)
## [1] 850381.8
sd(brazil_loss$selective_logging)
## [1] 38021.59
sd(brazil_loss$fire)
## [1] 176505.6
sd(brazil_loss$mining)
## [1] 9713.855
sd(brazil_loss$other_infrastructure)
## [1] 4424.582
sd(brazil_loss$small_scale_clearing)
## [1] 54051.14
sd(brazil_loss$roads)
## [1] 15140.79
sd(brazil_loss$tree_plantations_including_palm)
## [1] 20453.83
#standard error
SE.ND = sd(brazil_loss$natural_disturbances)/sqrt(length((brazil_loss)))
SE.ND
## [1] 5704.651
SE.Fl = sd(brazil_loss$flooding_due_to_dams)/sqrt(length(brazil_loss))
SE.Fl
## [1] 5684.549
SE.CC = sd(brazil_loss$commercial_crops)/sqrt(length(brazil_loss))
SE.CC
## [1] 58932.35
SE.P = sd(brazil_loss$pasture)/sqrt(length(brazil_loss))
SE.P
## [1] 227274.1
SE.Sl = sd(brazil_loss$selective_logging)/sqrt(length(brazil_loss))
SE.Sl
## [1] 10161.7
SE.Fire = sd(brazil_loss$fire)/sqrt(length(brazil_loss))
SE.Fire
## [1] 47173.11
SE.M = sd(brazil_loss$mining)/sqrt(length(brazil_loss))
SE.M
## [1] 2596.137
SE.OI = sd(brazil_loss$other_infrastructure)/sqrt(length(brazil_loss))
SE.OI
## [1] 1182.519
SE.SSC = sd(brazil_loss$small_scale_clearing)/sqrt(length(brazil_loss))
SE.SSC
## [1] 14445.77
SE.R = sd(brazil_loss$roads)/sqrt(length(brazil_loss))
SE.R
## [1] 4046.547
SE.TPIP = sd(brazil_loss$tree_plantations_including_palm)/sqrt(length(brazil_loss))
SE.TPIP
## [1] 5466.515

show visuals of the data presented (histograms)

This portion of the data presented shows the immediate visualization of the presented data with no statistical tests being ran. The histograms help show the initial skew of the data of each aspect causing the lessening of land for the forests. Having the visualization of the skewed data helps support the later statistical tests performed on the data.

#histograms supporting data sets 
ggplot(brazil_loss, aes(x = natural_disturbances)) +
  geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()

ggplot(brazil_loss, aes(x = flooding_due_to_dams)) +
  geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()

ggplot(brazil_loss, aes(x = commercial_crops)) +
  geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()

ggplot(brazil_loss, aes(x = pasture)) +
  geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()

ggplot(brazil_loss, aes(x = selective_logging)) +
  geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()

ggplot(brazil_loss, aes(x = fire)) +
  geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()

ggplot(brazil_loss, aes(x = mining)) +
  geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()

ggplot(brazil_loss, aes(x = other_infrastructure)) +
  geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()

ggplot(brazil_loss, aes(x = small_scale_clearing)) +
  geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()

ggplot(brazil_loss, aes(x = roads)) +
  geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()

ggplot(brazil_loss, aes(x = tree_plantations_including_palm)) +
  geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()

Part 3 - Statistical Test (Inferential Statistics)

When completing the statistical portions of this comparison showed the overall differences between each impact contributing towards the overall deforestation of Brazil. Using the summary of the residuals of each group gives us an instant quick overview of what the data is going to present. Within this portion we focus on the adjusted r-value and the p-values. We then use a histogram model of the residuals to get a visualization of the data to see if the amount of area affected skews in any direction as the years occur. Then a qqplot/qqnorm grouping was performed to determine if any grouping had any outliers present within the data. And finally a shapiro.wilks test was performed to re-test the p-value. This is because the shapiro.wilks test is more sensitive and gives a more precise value when being compared to the other r-values in the summary command. Using these tests as a combination allow us to observe the statistical variance between each cause and too help determine which has the greatest affect on deforestation.

natural.model<- lm(natural_disturbances~year, data = brazil_loss)
natural.model
## 
## Call:
## lm(formula = natural_disturbances ~ year, data = brazil_loss)
## 
## Coefficients:
## (Intercept)         year  
##    -4798495         2407
summary(natural.model)
## 
## Call:
## lm(formula = natural_disturbances ~ year, data = brazil_loss)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -32978  -9538  -2319   8429  45835 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4798494    2979820  -1.610    0.136
## year            2407       1485   1.621    0.133
## 
## Residual standard error: 20030 on 11 degrees of freedom
## Multiple R-squared:  0.1928, Adjusted R-squared:  0.1194 
## F-statistic: 2.627 on 1 and 11 DF,  p-value: 0.1333
hist(resid(natural.model))

qqnorm(resid(natural.model))
qqline(resid(natural.model))

shapiro.test(resid(natural.model))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(natural.model)
## W = 0.9477, p-value = 0.5639
##The relationship between year and natural disturbances is not statistically significant. 
## The low R^2 (Multiple R-squared:  0.1928, Adjusted R-squared:  0.1194) 
## suggests that there are only small proportions of variance. The Shapiro-Wilk tests 
## show that the residuals don’t significantly deviate from a normal distribution. 

flood.model <- lm(flooding_due_to_dams~year, data = brazil_loss)
flood.model
## 
## Call:
## lm(formula = flooding_due_to_dams ~ year, data = brazil_loss)
## 
## Coefficients:
## (Intercept)         year  
##     3907390        -1940
summary(flood.model)
## 
## Call:
## lm(formula = flooding_due_to_dams ~ year, data = brazil_loss)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -26330  -8874  -1813   5489  54610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  3907390    3089536   1.265    0.232
## year           -1940       1539  -1.260    0.234
## 
## Residual standard error: 20770 on 11 degrees of freedom
## Multiple R-squared:  0.1261, Adjusted R-squared:  0.04667 
## F-statistic: 1.588 on 1 and 11 DF,  p-value: 0.2338
hist(resid(flood.model))

qqnorm(resid(flood.model))
qqline(resid(flood.model))

shapiro.test(resid(flood.model))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(flood.model)
## W = 0.85068, p-value = 0.02909
##The model indicates that the relationship between year and flooding is a negative relationship 
## and that it is not statistically significant. That was shown by the low R-squared value (Multiple R-squared:  0.1261,) 
## and non-significant p-value (p-value: 0.2338). The Shapiro-Wilk test also suggests that the ## residuals are not normally distributed. 

crops.model <- lm(commercial_crops~year, data = brazil_loss)
crops.model
## 
## Call:
## lm(formula = commercial_crops ~ year, data = brazil_loss)
## 
## Coefficients:
## (Intercept)         year  
##    80228132       -39857
summary(crops.model)
## 
## Call:
## lm(formula = commercial_crops ~ year, data = brazil_loss)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -193989  -98132  -19132   82440  392582 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 80228132   24335690   3.297  0.00712 **
## year          -39857      12125  -3.287  0.00724 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 163600 on 11 degrees of freedom
## Multiple R-squared:  0.4955, Adjusted R-squared:  0.4497 
## F-statistic:  10.8 on 1 and 11 DF,  p-value: 0.007242
hist(resid(crops.model))

qqnorm(resid(crops.model))
qqline(resid(crops.model))

shapiro.test(resid(crops.model))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(crops.model)
## W = 0.90698, p-value = 0.1667
##The model shows a statistically significant linear trend, commercial crop losses have been decreasing over time.
## The R-squared value (0.4955) indicates a moderate fit and the p-value (p-value: 0.007242) confirms there is a significant relationship between year and commercial crop losses. 
##The Shapiro-test (W = 0.90698, p-value = 0.1667) show that the p-value is greater than 0.05 meaning that the null hypothesis cannot be rejected, the residuals are normally distributed. 

pasture.model <- lm(pasture~year, data = brazil_loss)
pasture.model
## 
## Call:
## lm(formula = pasture ~ year, data = brazil_loss)
## 
## Coefficients:
## (Intercept)         year  
##   367100429      -182132
summary(pasture.model)
## 
## Call:
## lm(formula = pasture ~ year, data = brazil_loss)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1134560  -105110    15231   226022   738967 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 367100429   72888295   5.036 0.000380 ***
## year          -182132      36317  -5.015 0.000393 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 489900 on 11 degrees of freedom
## Multiple R-squared:  0.6957, Adjusted R-squared:  0.6681 
## F-statistic: 25.15 on 1 and 11 DF,  p-value: 0.0003931
hist(resid(pasture.model))

qqnorm(resid(pasture.model))
qqline(resid(pasture.model))

shapiro.test(resid(pasture.model))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(pasture.model)
## W = 0.93884, p-value = 0.4421
##The model provides a statistically significant negative linear trend. This indicates pasture losses decreases over the years. 
## The R-squared value (Multiple R-squared:  0.6957, Adjusted R-squared:  0.6681) indicate a strong fit and the p-value (p-value: 0.0003931) indicates that the overall model is strongly significant. 
## The Shapiro-Wilk test (W = 0.93884, p-value = 0.4421) indicates that the p-value cannot be rejected and that the residuals are normally distributed. 

selective.model <- lm(selective_logging~year, data = brazil_loss)
selective.model
## 
## Call:
## lm(formula = selective_logging ~ year, data = brazil_loss)
## 
## Coefficients:
## (Intercept)         year  
##   193065.93       -43.96
summary(selective.model)
## 
## Call:
## lm(formula = selective_logging ~ year, data = brazil_loss)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -60670 -17758  -8846  26418  61374 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  193065.93 5907891.95   0.033    0.975
## year            -43.96    2943.64  -0.015    0.988
## 
## Residual standard error: 39710 on 11 degrees of freedom
## Multiple R-squared:  2.027e-05,  Adjusted R-squared:  -0.09089 
## F-statistic: 0.000223 on 1 and 11 DF,  p-value: 0.9884
hist(resid(selective.model))

qqnorm(resid(selective.model))
qqline(resid(selective.model))

shapiro.test(resid(selective.model))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(selective.model)
## W = 0.95957, p-value = 0.7471
##The relationship between year and selective logging show no statistically significant linear trend
##The R-squared (Multiple R-squared:  2.027e-05,    Adjusted R-squared:  -0.09089) values are extremely low, meaning that there is little to no variation. 
##The Shapiro-Wilk test (W = 0.95957, p-value = 0.7471) indicates that the p-value cannot be rejected  meaning that the residuals are normally distributed. 

fire.model <- lm(fire~year, data = brazil_loss)
fire.model
## 
## Call:
## lm(formula = fire ~ year, data = brazil_loss)
## 
## Coefficients:
## (Intercept)         year  
##    -8212159         4170
summary(fire.model)
## 
## Call:
## lm(formula = fire ~ year, data = brazil_loss)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -156714 -106670  -74522  -22841  366797 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8212159   27309823  -0.301    0.769
## year            4170      13607   0.306    0.765
## 
## Residual standard error: 183600 on 11 degrees of freedom
## Multiple R-squared:  0.008467,   Adjusted R-squared:  -0.08167 
## F-statistic: 0.09393 on 1 and 11 DF,  p-value: 0.765
hist(resid(fire.model))

qqnorm(resid(fire.model))
qqline(resid(fire.model))

shapiro.test(resid(fire.model))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(fire.model)
## W = 0.73636, p-value = 0.001315
##The relationship between year and fire-related losses show no statistical significance
##The R-squared (Multiple R-squared:  0.008467, Adjusted R-squared:  -0.08167) show that there is almost no variance in forest loss from fire. 
##The Shapiro-Wilk test (W = 0.73636, p-value = 0.001315) indicate that the residuals are not normally distributed 

mining.model <- lm(mining~year, data = brazil_loss)
mining.model
## 
## Call:
## lm(formula = mining ~ year, data = brazil_loss)
## 
## Coefficients:
## (Intercept)         year  
##  -1449857.1        725.3
summary(mining.model)
## 
## Call:
## lm(formula = mining ~ year, data = brazil_loss)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -9396  -5044  -3593   3231  24879 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1449857.1  1444161.3  -1.004    0.337
## year             725.3      719.6   1.008    0.335
## 
## Residual standard error: 9707 on 11 degrees of freedom
## Multiple R-squared:  0.08455,    Adjusted R-squared:  0.001327 
## F-statistic: 1.016 on 1 and 11 DF,  p-value: 0.3351
hist(resid(mining.model))

qqnorm(resid(mining.model))
qqline(resid(mining.model))

shapiro.test(resid(mining.model))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(mining.model)
## W = 0.83302, p-value = 0.01729
## The relationship between year and mining show no statistical significance. 
## The model shows that a small proportion of the forest lost was mining (Multiple R-squared:  0.08455, Adjusted R-squared:  0.001327), indicating poor fit. 
## The Shapiro-Wilk test (W = 0.83302, p-value = 0.01729) show that the p-value can be rejected which indicates the residuals are not normally distributed. 

infastructure.model <- lm(other_infrastructure~year, data = brazil_loss)
infastructure.model
## 
## Call:
## lm(formula = other_infrastructure ~ year, data = brazil_loss)
## 
## Coefficients:
## (Intercept)         year  
##    870219.8       -428.6
summary(infastructure.model)
## 
## Call:
## lm(formula = other_infrastructure ~ year, data = brazil_loss)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8362.6 -2791.2   208.8  2065.9  7351.6 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 870219.8   636718.8   1.367    0.199
## year          -428.6      317.2  -1.351    0.204
## 
## Residual standard error: 4280 on 11 degrees of freedom
## Multiple R-squared:  0.1423, Adjusted R-squared:  0.06432 
## F-statistic: 1.825 on 1 and 11 DF,  p-value: 0.2039
hist(resid(infastructure.model))

qqnorm(resid(infastructure.model))
qqline(resid(infastructure.model))

shapiro.test(resid(infastructure.model))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(infastructure.model)
## W = 0.98627, p-value = 0.9973
## The relationship between year and infrastructure show no statistical significance. 
## The model fit is weak which is indicated by low R-squared values (Multiple R-squared:  0.1423, Adjusted R-squared:  0.06432).
## The Shapiro-Wilk test(W = 0.98627, p-value = 0.9973) shows that the p-value is greater than 0.05, meaning that the null hypothesis failed to be rejected, indicating a normal distribution. 

small_scale.model<- lm(small_scale_clearing~year, data = brazil_loss)
small_scale.model
## 
## Call:
## lm(formula = small_scale_clearing ~ year, data = brazil_loss)
## 
## Coefficients:
## (Intercept)         year  
##     8664593        -4165
summary(small_scale.model)
## 
## Call:
## lm(formula = small_scale_clearing ~ year, data = brazil_loss)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -81758 -26099  -3934   4231  96736 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  8664593    8011627   1.082    0.303
## year           -4165       3992  -1.043    0.319
## 
## Residual standard error: 53850 on 11 degrees of freedom
## Multiple R-squared:  0.09005,    Adjusted R-squared:  0.007326 
## F-statistic: 1.089 on 1 and 11 DF,  p-value: 0.3192
hist(resid(small_scale.model))

qqnorm(resid(small_scale.model))
qqline(resid(small_scale.model))

shapiro.test(resid(small_scale.model))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(small_scale.model)
## W = 0.91471, p-value = 0.2126
##The relationship between year and small scale model show no statistical significance.
## The model fit is weak which was indicated by low R-squared values (Multiple R-squared:  0.09005, Adjusted R-squared:  0.007326).
## The Shapiro-Wilk test (W = 0.91471, p-value = 0.2126) show that the p-value is greater than 0.05, meaning that the null hypothesis failed to be rejected, indicating a normal distribution.

roads.model <- lm(roads~year, data = brazil_loss)
roads.model
## 
## Call:
## lm(formula = roads ~ year, data = brazil_loss)
## 
## Coefficients:
## (Intercept)         year  
##     2297582        -1132
summary(roads.model)
## 
## Call:
## lm(formula = roads ~ year, data = brazil_loss)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19714.3 -12395.6   -582.4   6813.2  27681.3 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  2297582    2250728   1.021    0.329
## year           -1132       1121  -1.009    0.335
## 
## Residual standard error: 15130 on 11 degrees of freedom
## Multiple R-squared:  0.08476,    Adjusted R-squared:  0.001555 
## F-statistic: 1.019 on 1 and 11 DF,  p-value: 0.3345
hist(resid(roads.model))

qqnorm(resid(roads.model))
qqline(resid(roads.model))

shapiro.test(resid(roads.model))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(roads.model)
## W = 0.94073, p-value = 0.4664
##The relationship between year and roads.model show no statistical significance.
## The model fit is weak which was indicated by low R-squared values (Multiple R-squared:  0.08476, Adjusted R-squared:  0.001555).
## The Shapiro-Wilk test (W = 0.94073, p-value = 0.4664) show that the p-value is greater than 0.05, meaning that the null hypothesis failed to be rejected, indicating a normal distribution. 

plantation.model <- lm(tree_plantations_including_palm~year, data = brazil_loss)
plantation.model
## 
## Call:
## lm(formula = tree_plantations_including_palm ~ year, data = brazil_loss)
## 
## Coefficients:
## (Intercept)         year  
##     4844209        -2396
summary(plantation.model)
## 
## Call:
## lm(formula = tree_plantations_including_palm ~ year, data = brazil_loss)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -20044 -12626   -648   3560  48582 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  4844209    2828327   1.713    0.115
## year           -2396       1409  -1.700    0.117
## 
## Residual standard error: 19010 on 11 degrees of freedom
## Multiple R-squared:  0.2081, Adjusted R-squared:  0.1361 
## F-statistic:  2.89 on 1 and 11 DF,  p-value: 0.1172
hist(resid(plantation.model))

qqnorm(resid(plantation.model))
qqline(resid(plantation.model))

shapiro.test(resid(plantation.model))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(plantation.model)
## W = 0.8623, p-value = 0.04131
##The relationship between year and plantation.model does not show strong statistical significance.
## The model fit is moderate which was indicated by the R-squared values (Multiple R-squared:  0.2081, Adjusted R-squared:  0.1361).
## The Shapiro-Wilk test (W = 0.8623, p-value = 0.04131) suggests that the residuals are not normally distributed.

Scatter-plots are used to give another visualization of the presented data following the regression lines.

ggplot(brazil_loss, aes(x = year, y = commercial_crops)) + 
    geom_point(size = 3, col = "black") + 
    geom_smooth(method = "lm", se = FALSE, col = "black") +
    labs(x = "year", y = "Commercial Crops (hectares)") + 
    theme_classic()
## `geom_smooth()` using formula = 'y ~ x'

ggplot(brazil_loss, aes(x = year, y = natural_disturbances)) + 
    geom_point(size = 3, col = "black") + 
    geom_smooth(method = "lm", se = FALSE, col = "black") +
    labs(x = "year", y = "Natural disturbances (hectares)") + 
    theme_classic()
## `geom_smooth()` using formula = 'y ~ x'

ggplot(brazil_loss, aes(x = year, y = flooding_due_to_dams)) + 
    geom_point(size = 3, col = "black") + 
    geom_smooth(method = "lm", se = FALSE, col = "black") +
    labs(x = "year", y = "Flooding (hectares)") + 
    theme_classic()
## `geom_smooth()` using formula = 'y ~ x'

ggplot(brazil_loss, aes(x = year, y = pasture)) + 
    geom_point(size = 3, col = "black") + 
    geom_smooth(method = "lm", se = FALSE, col = "black") +
    labs(x = "year", y = "Pasture (hectares)") + 
    theme_classic()
## `geom_smooth()` using formula = 'y ~ x'

ggplot(brazil_loss, aes(x = year, y = selective_logging)) + 
    geom_point(size = 3, col = "black") + 
    geom_smooth(method = "lm", se = FALSE, col = "black") +
    labs(x = "year", y = "Selective Logging (hectares)") + 
    theme_classic()
## `geom_smooth()` using formula = 'y ~ x'

ggplot(brazil_loss, aes(x = year, y = fire)) + 
    geom_point(size = 3, col = "black") + 
    geom_smooth(method = "lm", se = FALSE, col = "black") +
    labs(x = "year", y = "Fire  (hectares)") + 
    theme_classic()
## `geom_smooth()` using formula = 'y ~ x'

ggplot(brazil_loss, aes(x = year, y = mining)) + 
    geom_point(size = 3, col = "black") + 
    geom_smooth(method = "lm", se = FALSE, col = "black") +
    labs(x = "year", y = "Mining  (hectares)") + 
    theme_classic()
## `geom_smooth()` using formula = 'y ~ x'

ggplot(brazil_loss, aes(x = year, y = other_infrastructure)) + 
    geom_point(size = 3, col = "black") + 
    geom_smooth(method = "lm", se = FALSE, col = "black") +
    labs(x = "year", y = "Infrastructure  (hectares)") + 
    theme_classic()
## `geom_smooth()` using formula = 'y ~ x'

ggplot(brazil_loss, aes(x = year, y = roads)) + 
    geom_point(size = 3, col = "black") + 
    geom_smooth(method = "lm", se = FALSE, col = "black") +
    labs(x = "year", y = "Roads  (hectares)") + 
    theme_classic()
## `geom_smooth()` using formula = 'y ~ x'

ggplot(brazil_loss, aes(x = year, y = tree_plantations_including_palm)) + 
    geom_point(size = 3, col = "black") + 
    geom_smooth(method = "lm", se = FALSE, col = "black") +
    labs(x = "year", y = "Tree Plantations (hectares)") + 
    theme_classic()
## `geom_smooth()` using formula = 'y ~ x'

ggplot(brazil_loss, aes(x = year, y = small_scale_clearing)) + 
    geom_point(size = 3, col = "black") + 
    geom_smooth(method = "lm", se = FALSE, col = "black") +
    labs(x = "year", y = "Small Scale Clearing (hectares)") + 
    theme_classic()
## `geom_smooth()` using formula = 'y ~ x'

Part 4 - Conclusion

In conclusion it was determined that the affects including the pastures and the commercial crops have the greatest impact on the deforestation in Brazil. This was determined because of the comparison of the R-squared values and the p-values of both the summary of each cause and the shapiro.wilks tests performed. All other causes did show impacts, but the pastures and commercial crops showed to have the biggest impacts due to the greater statistical variances.

References

Hannah Ritchie (2021) - “Drivers of Deforestation” Published online at OurWorldinData.org. Retrieved from: ‘https://ourworldindata.org/drivers-of-deforestation’ [Online Resource]

“The Cerrado Crisis: Brazil’s Deforestation Frontline.” Global Witness, 21 Feb. 2024, https://globalwitness.org/en/campaigns/forests/the-cerrado-crisis-brazils-deforestation-frontline/

Pallares, Gloria. “What’s Happening with Deforestation in the Amazon?” ThinkLandscape, Global Landscapes Forum, 21 Mar. 2024, https://thinklandscape.globallandscapesforum.org/67561/whats-happening-with-deforestation-in-the-amazon/

jonthegeek. 2021. Deforestation. Tidy Tuesday. https://github.com/rfordatascience/tidytuesday/blob/main/data/2021/2021-04-06/readme.md