The data presented corresponds to the area of forests and overall tree coverage affected by deforestation throughout the world while focusing on the country Brazil.
With there being many causes of deforestation throughout the world. The focus of the studied material is the different causes of deforestation affecting Brazil. The data also presents the amount of area affected for each separated cause of the deforestation within the country. A very large portion of the overall deforestation of Earth consists within Brazil. According to Hannah Ritchie, “One-third of tropical deforestation happened in Brazil. That was 1.7 million hectares each year. The other single country where large forest areas are lost is Indonesia – it accounted for 14%. This means around half (47%) of tropical deforestation occurred in Brazil and Indonesia.” (2021) This helps give a quick depiction as to how vast the area of deforestation with a focus on tropical deforestation within Brazil is compared to the rest of the world.The expansion of pasture for beef production, croplands for soy and palm oil, and the increasing conversion of primary forest to tree plantations for paper and pulp have been the key drivers of deforestation - primarily in Brazil and Indonesia. As well in 2022, the world lost approximately 4 million hectares of primary forest, about a ten percent increased from 2021. Brazil had accounted for about 40 percent of that loss. A 2020 study found that two-thirds of deforested land in the Amazon and Cerrado is used for cattle pasture, which has contributed to Brazil doubling its meat exports. Focusing on these aspects as to how they each have an affect on the deforestation of Brazil will help determine which has the greatest affect out of all of them.
The main question of the breakdown of the data is: How severe does each cause of deforestation affect the overall deforestation occurring in Brazil? With a focus on whether Natural disturbances have a greater effect on the deforestation in Brazil or Human caused disturbances have a greater effect on the deforestation in Brazil.
To explain the breakdown of the experiment a little bit further. The data is separated into a response and explanatory variable. The response variable within this data is the amount of forest and overall land affected by the separate causes being measured in hectares. The explanatory variable within the data is the individual causes of deforestation within Brazil.
#Load Libraries and Data
library(ggplot2)
library(ggpubr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
library('broom')
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.4 ✔ tibble 3.2.1
## ✔ purrr 1.0.4 ✔ tidyr 1.3.1
## ✔ readr 2.1.5
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ car::recode() masks dplyr::recode()
## ✖ purrr::some() masks car::some()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(datasets)
library(rstatix)
##
## Attaching package: 'rstatix'
##
## The following object is masked from 'package:stats':
##
## filter
library(car)
library(RColorBrewer)
brazil_loss <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-04-06/brazil_loss.csv')
## Rows: 13 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): entity, code
## dbl (12): year, commercial_crops, flooding_due_to_dams, natural_disturbances...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
brazil.df <- data.frame(brazil_loss)
head(brazil_loss)
## # A tibble: 6 × 14
## entity code year commercial_crops flooding_due_to_dams natural_disturbances
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Brazil BRA 2001 280000 0 0
## 2 Brazil BRA 2002 415000 79000 35000
## 3 Brazil BRA 2003 550000 0 35000
## 4 Brazil BRA 2004 747000 26000 22000
## 5 Brazil BRA 2005 328000 17000 26000
## 6 Brazil BRA 2006 188000 17000 26000
## # ℹ 8 more variables: pasture <dbl>, selective_logging <dbl>, fire <dbl>,
## # mining <dbl>, other_infrastructure <dbl>, roads <dbl>,
## # tree_plantations_including_palm <dbl>, small_scale_clearing <dbl>
summary(brazil_loss)
## entity code year commercial_crops
## Length:13 Length:13 Min. :2001 Min. : 52000
## Class :character Class :character 1st Qu.:2004 1st Qu.: 79000
## Mode :character Mode :character Median :2007 Median :118000
## Mean :2007 Mean :234846
## 3rd Qu.:2010 3rd Qu.:328000
## Max. :2013 Max. :747000
## flooding_due_to_dams natural_disturbances pasture selective_logging
## Min. : 0 Min. : 0 Min. : 546000 Min. : 44000
## 1st Qu.: 0 1st Qu.:22000 1st Qu.: 738000 1st Qu.: 87000
## Median : 9000 Median :26000 Median :1520000 Median : 96000
## Mean :14692 Mean :31538 Mean :1561769 Mean :104846
## 3rd Qu.:17000 3rd Qu.:35000 3rd Qu.:2564000 3rd Qu.:131000
## Max. :79000 Max. :87000 Max. :2761000 Max. :166000
## fire mining other_infrastructure roads
## Min. : 26000 Min. : 0 Min. : 0 Min. : 9000
## 1st Qu.: 44000 1st Qu.: 0 1st Qu.: 9000 1st Qu.:13000
## Median : 79000 Median : 0 Median : 9000 Median :22000
## Mean :157692 Mean : 5769 Mean :10077 Mean :25923
## 3rd Qu.:122000 3rd Qu.: 9000 3rd Qu.:13000 3rd Qu.:35000
## Max. :537000 Max. :35000 Max. :17000 Max. :57000
## tree_plantations_including_palm small_scale_clearing
## Min. : 9000 Min. :232000
## 1st Qu.:26000 1st Qu.:271000
## Median :35000 Median :293000
## Mean :36231 Mean :305769
## 3rd Qu.:44000 3rd Qu.:310000
## Max. :92000 Max. :415000
This portion of the data presented shows the breakdown of the mean, median, standard deviation and the standard error of the separate causes of deforestation. This shows the statistics of the amount of area in hectares affected by the deforestation.
#explain finding mean, median, etc…
#mean, medians, standard deviations, standard error
#mean
mean(brazil_loss$natural_disturbances)
## [1] 31538.46
mean(brazil_loss$flooding_due_to_dams)
## [1] 14692.31
mean(brazil_loss$commercial_crops)
## [1] 234846.2
mean(brazil_loss$pasture)
## [1] 1561769
mean(brazil_loss$selective_logging)
## [1] 104846.2
mean(brazil_loss$fire)
## [1] 157692.3
mean(brazil_loss$mining)
## [1] 5769.231
mean(brazil_loss$other_infrastructure)
## [1] 10076.92
mean(brazil_loss$small_scale_clearing)
## [1] 305769.2
mean(brazil_loss$roads)
## [1] 25923.08
mean(brazil_loss$tree_plantations_including_palm)
## [1] 36230.77
#median
median(brazil_loss$natural_disturbances)
## [1] 26000
median(brazil_loss$flooding_due_to_dams)
## [1] 9000
median(brazil_loss$commercial_crops)
## [1] 118000
median(brazil_loss$pasture)
## [1] 1520000
median(brazil_loss$selective_logging)
## [1] 96000
median(brazil_loss$fire)
## [1] 79000
median(brazil_loss$mining)
## [1] 0
median(brazil_loss$other_infrastructure)
## [1] 9000
median(brazil_loss$small_scale_clearing)
## [1] 293000
median(brazil_loss$roads)
## [1] 22000
median(brazil_loss$tree_plantations_including_palm)
## [1] 35000
#standard deviation
sd(brazil_loss$natural_disturbances)
## [1] 21344.85
sd(brazil_loss$flooding_due_to_dams)
## [1] 21269.64
sd(brazil_loss$commercial_crops)
## [1] 220504.7
sd(brazil_loss$pasture)
## [1] 850381.8
sd(brazil_loss$selective_logging)
## [1] 38021.59
sd(brazil_loss$fire)
## [1] 176505.6
sd(brazil_loss$mining)
## [1] 9713.855
sd(brazil_loss$other_infrastructure)
## [1] 4424.582
sd(brazil_loss$small_scale_clearing)
## [1] 54051.14
sd(brazil_loss$roads)
## [1] 15140.79
sd(brazil_loss$tree_plantations_including_palm)
## [1] 20453.83
#standard error
SE.ND = sd(brazil_loss$natural_disturbances)/sqrt(length((brazil_loss)))
SE.ND
## [1] 5704.651
SE.Fl = sd(brazil_loss$flooding_due_to_dams)/sqrt(length(brazil_loss))
SE.Fl
## [1] 5684.549
SE.CC = sd(brazil_loss$commercial_crops)/sqrt(length(brazil_loss))
SE.CC
## [1] 58932.35
SE.P = sd(brazil_loss$pasture)/sqrt(length(brazil_loss))
SE.P
## [1] 227274.1
SE.Sl = sd(brazil_loss$selective_logging)/sqrt(length(brazil_loss))
SE.Sl
## [1] 10161.7
SE.Fire = sd(brazil_loss$fire)/sqrt(length(brazil_loss))
SE.Fire
## [1] 47173.11
SE.M = sd(brazil_loss$mining)/sqrt(length(brazil_loss))
SE.M
## [1] 2596.137
SE.OI = sd(brazil_loss$other_infrastructure)/sqrt(length(brazil_loss))
SE.OI
## [1] 1182.519
SE.SSC = sd(brazil_loss$small_scale_clearing)/sqrt(length(brazil_loss))
SE.SSC
## [1] 14445.77
SE.R = sd(brazil_loss$roads)/sqrt(length(brazil_loss))
SE.R
## [1] 4046.547
SE.TPIP = sd(brazil_loss$tree_plantations_including_palm)/sqrt(length(brazil_loss))
SE.TPIP
## [1] 5466.515
This portion of the data presented shows the immediate visualization of the presented data with no statistical tests being ran. The histograms help show the initial skew of the data of each aspect causing the lessening of land for the forests. Having the visualization of the skewed data helps support the later statistical tests performed on the data.
#histograms supporting data sets
ggplot(brazil_loss, aes(x = natural_disturbances)) +
geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()
ggplot(brazil_loss, aes(x = flooding_due_to_dams)) +
geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()
ggplot(brazil_loss, aes(x = commercial_crops)) +
geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()
ggplot(brazil_loss, aes(x = pasture)) +
geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()
ggplot(brazil_loss, aes(x = selective_logging)) +
geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()
ggplot(brazil_loss, aes(x = fire)) +
geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()
ggplot(brazil_loss, aes(x = mining)) +
geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()
ggplot(brazil_loss, aes(x = other_infrastructure)) +
geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()
ggplot(brazil_loss, aes(x = small_scale_clearing)) +
geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()
ggplot(brazil_loss, aes(x = roads)) +
geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()
ggplot(brazil_loss, aes(x = tree_plantations_including_palm)) +
geom_histogram(bins = 15, fill = "lightblue", color = "black") + theme_classic()
When completing the statistical portions of this comparison showed the overall differences between each impact contributing towards the overall deforestation of Brazil. Using the summary of the residuals of each group gives us an instant quick overview of what the data is going to present. Within this portion we focus on the adjusted r-value and the p-values. We then use a histogram model of the residuals to get a visualization of the data to see if the amount of area affected skews in any direction as the years occur. Then a qqplot/qqnorm grouping was performed to determine if any grouping had any outliers present within the data. And finally a shapiro.wilks test was performed to re-test the p-value. This is because the shapiro.wilks test is more sensitive and gives a more precise value when being compared to the other r-values in the summary command. Using these tests as a combination allow us to observe the statistical variance between each cause and too help determine which has the greatest affect on deforestation.
natural.model<- lm(natural_disturbances~year, data = brazil_loss)
natural.model
##
## Call:
## lm(formula = natural_disturbances ~ year, data = brazil_loss)
##
## Coefficients:
## (Intercept) year
## -4798495 2407
summary(natural.model)
##
## Call:
## lm(formula = natural_disturbances ~ year, data = brazil_loss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32978 -9538 -2319 8429 45835
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4798494 2979820 -1.610 0.136
## year 2407 1485 1.621 0.133
##
## Residual standard error: 20030 on 11 degrees of freedom
## Multiple R-squared: 0.1928, Adjusted R-squared: 0.1194
## F-statistic: 2.627 on 1 and 11 DF, p-value: 0.1333
hist(resid(natural.model))
qqnorm(resid(natural.model))
qqline(resid(natural.model))
shapiro.test(resid(natural.model))
##
## Shapiro-Wilk normality test
##
## data: resid(natural.model)
## W = 0.9477, p-value = 0.5639
##The relationship between year and natural disturbances is not statistically significant.
## The low R^2 (Multiple R-squared: 0.1928, Adjusted R-squared: 0.1194)
## suggests that there are only small proportions of variance. The Shapiro-Wilk tests
## show that the residuals don’t significantly deviate from a normal distribution.
flood.model <- lm(flooding_due_to_dams~year, data = brazil_loss)
flood.model
##
## Call:
## lm(formula = flooding_due_to_dams ~ year, data = brazil_loss)
##
## Coefficients:
## (Intercept) year
## 3907390 -1940
summary(flood.model)
##
## Call:
## lm(formula = flooding_due_to_dams ~ year, data = brazil_loss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26330 -8874 -1813 5489 54610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3907390 3089536 1.265 0.232
## year -1940 1539 -1.260 0.234
##
## Residual standard error: 20770 on 11 degrees of freedom
## Multiple R-squared: 0.1261, Adjusted R-squared: 0.04667
## F-statistic: 1.588 on 1 and 11 DF, p-value: 0.2338
hist(resid(flood.model))
qqnorm(resid(flood.model))
qqline(resid(flood.model))
shapiro.test(resid(flood.model))
##
## Shapiro-Wilk normality test
##
## data: resid(flood.model)
## W = 0.85068, p-value = 0.02909
##The model indicates that the relationship between year and flooding is a negative relationship
## and that it is not statistically significant. That was shown by the low R-squared value (Multiple R-squared: 0.1261,)
## and non-significant p-value (p-value: 0.2338). The Shapiro-Wilk test also suggests that the ## residuals are not normally distributed.
crops.model <- lm(commercial_crops~year, data = brazil_loss)
crops.model
##
## Call:
## lm(formula = commercial_crops ~ year, data = brazil_loss)
##
## Coefficients:
## (Intercept) year
## 80228132 -39857
summary(crops.model)
##
## Call:
## lm(formula = commercial_crops ~ year, data = brazil_loss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -193989 -98132 -19132 82440 392582
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 80228132 24335690 3.297 0.00712 **
## year -39857 12125 -3.287 0.00724 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 163600 on 11 degrees of freedom
## Multiple R-squared: 0.4955, Adjusted R-squared: 0.4497
## F-statistic: 10.8 on 1 and 11 DF, p-value: 0.007242
hist(resid(crops.model))
qqnorm(resid(crops.model))
qqline(resid(crops.model))
shapiro.test(resid(crops.model))
##
## Shapiro-Wilk normality test
##
## data: resid(crops.model)
## W = 0.90698, p-value = 0.1667
##The model shows a statistically significant linear trend, commercial crop losses have been decreasing over time.
## The R-squared value (0.4955) indicates a moderate fit and the p-value (p-value: 0.007242) confirms there is a significant relationship between year and commercial crop losses.
##The Shapiro-test (W = 0.90698, p-value = 0.1667) show that the p-value is greater than 0.05 meaning that the null hypothesis cannot be rejected, the residuals are normally distributed.
pasture.model <- lm(pasture~year, data = brazil_loss)
pasture.model
##
## Call:
## lm(formula = pasture ~ year, data = brazil_loss)
##
## Coefficients:
## (Intercept) year
## 367100429 -182132
summary(pasture.model)
##
## Call:
## lm(formula = pasture ~ year, data = brazil_loss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1134560 -105110 15231 226022 738967
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 367100429 72888295 5.036 0.000380 ***
## year -182132 36317 -5.015 0.000393 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 489900 on 11 degrees of freedom
## Multiple R-squared: 0.6957, Adjusted R-squared: 0.6681
## F-statistic: 25.15 on 1 and 11 DF, p-value: 0.0003931
hist(resid(pasture.model))
qqnorm(resid(pasture.model))
qqline(resid(pasture.model))
shapiro.test(resid(pasture.model))
##
## Shapiro-Wilk normality test
##
## data: resid(pasture.model)
## W = 0.93884, p-value = 0.4421
##The model provides a statistically significant negative linear trend. This indicates pasture losses decreases over the years.
## The R-squared value (Multiple R-squared: 0.6957, Adjusted R-squared: 0.6681) indicate a strong fit and the p-value (p-value: 0.0003931) indicates that the overall model is strongly significant.
## The Shapiro-Wilk test (W = 0.93884, p-value = 0.4421) indicates that the p-value cannot be rejected and that the residuals are normally distributed.
selective.model <- lm(selective_logging~year, data = brazil_loss)
selective.model
##
## Call:
## lm(formula = selective_logging ~ year, data = brazil_loss)
##
## Coefficients:
## (Intercept) year
## 193065.93 -43.96
summary(selective.model)
##
## Call:
## lm(formula = selective_logging ~ year, data = brazil_loss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -60670 -17758 -8846 26418 61374
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 193065.93 5907891.95 0.033 0.975
## year -43.96 2943.64 -0.015 0.988
##
## Residual standard error: 39710 on 11 degrees of freedom
## Multiple R-squared: 2.027e-05, Adjusted R-squared: -0.09089
## F-statistic: 0.000223 on 1 and 11 DF, p-value: 0.9884
hist(resid(selective.model))
qqnorm(resid(selective.model))
qqline(resid(selective.model))
shapiro.test(resid(selective.model))
##
## Shapiro-Wilk normality test
##
## data: resid(selective.model)
## W = 0.95957, p-value = 0.7471
##The relationship between year and selective logging show no statistically significant linear trend
##The R-squared (Multiple R-squared: 2.027e-05, Adjusted R-squared: -0.09089) values are extremely low, meaning that there is little to no variation.
##The Shapiro-Wilk test (W = 0.95957, p-value = 0.7471) indicates that the p-value cannot be rejected meaning that the residuals are normally distributed.
fire.model <- lm(fire~year, data = brazil_loss)
fire.model
##
## Call:
## lm(formula = fire ~ year, data = brazil_loss)
##
## Coefficients:
## (Intercept) year
## -8212159 4170
summary(fire.model)
##
## Call:
## lm(formula = fire ~ year, data = brazil_loss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -156714 -106670 -74522 -22841 366797
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8212159 27309823 -0.301 0.769
## year 4170 13607 0.306 0.765
##
## Residual standard error: 183600 on 11 degrees of freedom
## Multiple R-squared: 0.008467, Adjusted R-squared: -0.08167
## F-statistic: 0.09393 on 1 and 11 DF, p-value: 0.765
hist(resid(fire.model))
qqnorm(resid(fire.model))
qqline(resid(fire.model))
shapiro.test(resid(fire.model))
##
## Shapiro-Wilk normality test
##
## data: resid(fire.model)
## W = 0.73636, p-value = 0.001315
##The relationship between year and fire-related losses show no statistical significance
##The R-squared (Multiple R-squared: 0.008467, Adjusted R-squared: -0.08167) show that there is almost no variance in forest loss from fire.
##The Shapiro-Wilk test (W = 0.73636, p-value = 0.001315) indicate that the residuals are not normally distributed
mining.model <- lm(mining~year, data = brazil_loss)
mining.model
##
## Call:
## lm(formula = mining ~ year, data = brazil_loss)
##
## Coefficients:
## (Intercept) year
## -1449857.1 725.3
summary(mining.model)
##
## Call:
## lm(formula = mining ~ year, data = brazil_loss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9396 -5044 -3593 3231 24879
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1449857.1 1444161.3 -1.004 0.337
## year 725.3 719.6 1.008 0.335
##
## Residual standard error: 9707 on 11 degrees of freedom
## Multiple R-squared: 0.08455, Adjusted R-squared: 0.001327
## F-statistic: 1.016 on 1 and 11 DF, p-value: 0.3351
hist(resid(mining.model))
qqnorm(resid(mining.model))
qqline(resid(mining.model))
shapiro.test(resid(mining.model))
##
## Shapiro-Wilk normality test
##
## data: resid(mining.model)
## W = 0.83302, p-value = 0.01729
## The relationship between year and mining show no statistical significance.
## The model shows that a small proportion of the forest lost was mining (Multiple R-squared: 0.08455, Adjusted R-squared: 0.001327), indicating poor fit.
## The Shapiro-Wilk test (W = 0.83302, p-value = 0.01729) show that the p-value can be rejected which indicates the residuals are not normally distributed.
infastructure.model <- lm(other_infrastructure~year, data = brazil_loss)
infastructure.model
##
## Call:
## lm(formula = other_infrastructure ~ year, data = brazil_loss)
##
## Coefficients:
## (Intercept) year
## 870219.8 -428.6
summary(infastructure.model)
##
## Call:
## lm(formula = other_infrastructure ~ year, data = brazil_loss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8362.6 -2791.2 208.8 2065.9 7351.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 870219.8 636718.8 1.367 0.199
## year -428.6 317.2 -1.351 0.204
##
## Residual standard error: 4280 on 11 degrees of freedom
## Multiple R-squared: 0.1423, Adjusted R-squared: 0.06432
## F-statistic: 1.825 on 1 and 11 DF, p-value: 0.2039
hist(resid(infastructure.model))
qqnorm(resid(infastructure.model))
qqline(resid(infastructure.model))
shapiro.test(resid(infastructure.model))
##
## Shapiro-Wilk normality test
##
## data: resid(infastructure.model)
## W = 0.98627, p-value = 0.9973
## The relationship between year and infrastructure show no statistical significance.
## The model fit is weak which is indicated by low R-squared values (Multiple R-squared: 0.1423, Adjusted R-squared: 0.06432).
## The Shapiro-Wilk test(W = 0.98627, p-value = 0.9973) shows that the p-value is greater than 0.05, meaning that the null hypothesis failed to be rejected, indicating a normal distribution.
small_scale.model<- lm(small_scale_clearing~year, data = brazil_loss)
small_scale.model
##
## Call:
## lm(formula = small_scale_clearing ~ year, data = brazil_loss)
##
## Coefficients:
## (Intercept) year
## 8664593 -4165
summary(small_scale.model)
##
## Call:
## lm(formula = small_scale_clearing ~ year, data = brazil_loss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -81758 -26099 -3934 4231 96736
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8664593 8011627 1.082 0.303
## year -4165 3992 -1.043 0.319
##
## Residual standard error: 53850 on 11 degrees of freedom
## Multiple R-squared: 0.09005, Adjusted R-squared: 0.007326
## F-statistic: 1.089 on 1 and 11 DF, p-value: 0.3192
hist(resid(small_scale.model))
qqnorm(resid(small_scale.model))
qqline(resid(small_scale.model))
shapiro.test(resid(small_scale.model))
##
## Shapiro-Wilk normality test
##
## data: resid(small_scale.model)
## W = 0.91471, p-value = 0.2126
##The relationship between year and small scale model show no statistical significance.
## The model fit is weak which was indicated by low R-squared values (Multiple R-squared: 0.09005, Adjusted R-squared: 0.007326).
## The Shapiro-Wilk test (W = 0.91471, p-value = 0.2126) show that the p-value is greater than 0.05, meaning that the null hypothesis failed to be rejected, indicating a normal distribution.
roads.model <- lm(roads~year, data = brazil_loss)
roads.model
##
## Call:
## lm(formula = roads ~ year, data = brazil_loss)
##
## Coefficients:
## (Intercept) year
## 2297582 -1132
summary(roads.model)
##
## Call:
## lm(formula = roads ~ year, data = brazil_loss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19714.3 -12395.6 -582.4 6813.2 27681.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2297582 2250728 1.021 0.329
## year -1132 1121 -1.009 0.335
##
## Residual standard error: 15130 on 11 degrees of freedom
## Multiple R-squared: 0.08476, Adjusted R-squared: 0.001555
## F-statistic: 1.019 on 1 and 11 DF, p-value: 0.3345
hist(resid(roads.model))
qqnorm(resid(roads.model))
qqline(resid(roads.model))
shapiro.test(resid(roads.model))
##
## Shapiro-Wilk normality test
##
## data: resid(roads.model)
## W = 0.94073, p-value = 0.4664
##The relationship between year and roads.model show no statistical significance.
## The model fit is weak which was indicated by low R-squared values (Multiple R-squared: 0.08476, Adjusted R-squared: 0.001555).
## The Shapiro-Wilk test (W = 0.94073, p-value = 0.4664) show that the p-value is greater than 0.05, meaning that the null hypothesis failed to be rejected, indicating a normal distribution.
plantation.model <- lm(tree_plantations_including_palm~year, data = brazil_loss)
plantation.model
##
## Call:
## lm(formula = tree_plantations_including_palm ~ year, data = brazil_loss)
##
## Coefficients:
## (Intercept) year
## 4844209 -2396
summary(plantation.model)
##
## Call:
## lm(formula = tree_plantations_including_palm ~ year, data = brazil_loss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20044 -12626 -648 3560 48582
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4844209 2828327 1.713 0.115
## year -2396 1409 -1.700 0.117
##
## Residual standard error: 19010 on 11 degrees of freedom
## Multiple R-squared: 0.2081, Adjusted R-squared: 0.1361
## F-statistic: 2.89 on 1 and 11 DF, p-value: 0.1172
hist(resid(plantation.model))
qqnorm(resid(plantation.model))
qqline(resid(plantation.model))
shapiro.test(resid(plantation.model))
##
## Shapiro-Wilk normality test
##
## data: resid(plantation.model)
## W = 0.8623, p-value = 0.04131
##The relationship between year and plantation.model does not show strong statistical significance.
## The model fit is moderate which was indicated by the R-squared values (Multiple R-squared: 0.2081, Adjusted R-squared: 0.1361).
## The Shapiro-Wilk test (W = 0.8623, p-value = 0.04131) suggests that the residuals are not normally distributed.
Scatter-plots are used to give another visualization of the presented data following the regression lines.
ggplot(brazil_loss, aes(x = year, y = commercial_crops)) +
geom_point(size = 3, col = "black") +
geom_smooth(method = "lm", se = FALSE, col = "black") +
labs(x = "year", y = "Commercial Crops (hectares)") +
theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
ggplot(brazil_loss, aes(x = year, y = natural_disturbances)) +
geom_point(size = 3, col = "black") +
geom_smooth(method = "lm", se = FALSE, col = "black") +
labs(x = "year", y = "Natural disturbances (hectares)") +
theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
ggplot(brazil_loss, aes(x = year, y = flooding_due_to_dams)) +
geom_point(size = 3, col = "black") +
geom_smooth(method = "lm", se = FALSE, col = "black") +
labs(x = "year", y = "Flooding (hectares)") +
theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
ggplot(brazil_loss, aes(x = year, y = pasture)) +
geom_point(size = 3, col = "black") +
geom_smooth(method = "lm", se = FALSE, col = "black") +
labs(x = "year", y = "Pasture (hectares)") +
theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
ggplot(brazil_loss, aes(x = year, y = selective_logging)) +
geom_point(size = 3, col = "black") +
geom_smooth(method = "lm", se = FALSE, col = "black") +
labs(x = "year", y = "Selective Logging (hectares)") +
theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
ggplot(brazil_loss, aes(x = year, y = fire)) +
geom_point(size = 3, col = "black") +
geom_smooth(method = "lm", se = FALSE, col = "black") +
labs(x = "year", y = "Fire (hectares)") +
theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
ggplot(brazil_loss, aes(x = year, y = mining)) +
geom_point(size = 3, col = "black") +
geom_smooth(method = "lm", se = FALSE, col = "black") +
labs(x = "year", y = "Mining (hectares)") +
theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
ggplot(brazil_loss, aes(x = year, y = other_infrastructure)) +
geom_point(size = 3, col = "black") +
geom_smooth(method = "lm", se = FALSE, col = "black") +
labs(x = "year", y = "Infrastructure (hectares)") +
theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
ggplot(brazil_loss, aes(x = year, y = roads)) +
geom_point(size = 3, col = "black") +
geom_smooth(method = "lm", se = FALSE, col = "black") +
labs(x = "year", y = "Roads (hectares)") +
theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
ggplot(brazil_loss, aes(x = year, y = tree_plantations_including_palm)) +
geom_point(size = 3, col = "black") +
geom_smooth(method = "lm", se = FALSE, col = "black") +
labs(x = "year", y = "Tree Plantations (hectares)") +
theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
ggplot(brazil_loss, aes(x = year, y = small_scale_clearing)) +
geom_point(size = 3, col = "black") +
geom_smooth(method = "lm", se = FALSE, col = "black") +
labs(x = "year", y = "Small Scale Clearing (hectares)") +
theme_classic()
## `geom_smooth()` using formula = 'y ~ x'
In conclusion it was determined that the affects including the pastures and the commercial crops have the greatest impact on the deforestation in Brazil. This was determined because of the comparison of the R-squared values and the p-values of both the summary of each cause and the shapiro.wilks tests performed. All other causes did show impacts, but the pastures and commercial crops showed to have the biggest impacts due to the greater statistical variances.
Hannah Ritchie (2021) - “Drivers of Deforestation” Published online at OurWorldinData.org. Retrieved from: ‘https://ourworldindata.org/drivers-of-deforestation’ [Online Resource]
“The Cerrado Crisis: Brazil’s Deforestation Frontline.” Global Witness, 21 Feb. 2024, https://globalwitness.org/en/campaigns/forests/the-cerrado-crisis-brazils-deforestation-frontline/
Pallares, Gloria. “What’s Happening with Deforestation in the Amazon?” ThinkLandscape, Global Landscapes Forum, 21 Mar. 2024, https://thinklandscape.globallandscapesforum.org/67561/whats-happening-with-deforestation-in-the-amazon/
jonthegeek. 2021. Deforestation. Tidy Tuesday. https://github.com/rfordatascience/tidytuesday/blob/main/data/2021/2021-04-06/readme.md