R-Markdown Notebook


Introduction

Brazilian Forest Fire

Brazilian Forest Fire

The Amazon Biome spans approximately 6.7 million square kilometers and is shared by eight countries (Brazil, Bolivia, Peru, Ecuador, Colombia, Venezuela, Guyana and Suriname), as well as the overseas territory of French Guiana. Approximately 60% of the Amazon Basin is located within Brazil. Wild fires often occur in the Amazon rain forests, but lately due to increase in illegal human activities such as :logging, deforestation efforts, agricultural burning, these fire have turned rampant causing significant ecological damage in the past couple of decades. This Notebook contains the analysis of the forest fires that were reported in various states of Brazil during the period of 1998-2017 in an effort to deduce a pattern in the occurance of the forest fires and possible means of forcasting them.

Packages Used

tidyverse: Package of multiple R packages used for data manipulation
dplyr: Easy functions to perform data manipulation in R
ggplot2: Data visualisation in R mining for word processing and sentiment analysis
mice: Package to impute missing values with plausible data values
stringi: Package for convenient string/text processing
forecast: Forecasting Functions for Time Series and Linear Models
plotly: Graphing library makes interactive, publication-quality graphs

Data

The data for the study is obtained from the open data Kaggle source. The file consists of 6,454 observations of 5 variables. The variables include ‘year’, ‘state’, ‘month’, ‘number’ and ‘date’ as parameters while the rows represent the fires reported over the periods across the states of Brazil.

year(#: 9): 1998 to 2017
months(#: 12) : January to December (in Spanish)
states(#: 23) : Acre, Alagoas, Amapa, Amazonas, Bahia, Ceara, Distrito Federal, Espirito Santo, Goias, Maranhao, Mato Grosso, Minas Gerais, Para, Paraiba, Pernambuco, Piau, Rio, Rondonia, Roraima, Santa Catarina, Sao Paulo, Sergipe and Tocantins
date(#: 20) : 01-01-1998 to 01-010-2017

Data Import

##    year state   month number       date
## 1  1998  Acre Janeiro      0 1998-01-01
## 2  1999  Acre Janeiro      0 1999-01-01
## 3  2000  Acre Janeiro      0 2000-01-01
## 4  2001  Acre Janeiro      0 2001-01-01
## 5  2002  Acre Janeiro      0 2002-01-01
## 6  2003  Acre Janeiro     10 2003-01-01
## 7  2004  Acre Janeiro      0 2004-01-01
## 8  2005  Acre Janeiro     12 2005-01-01
## 9  2006  Acre Janeiro      4 2006-01-01
## 10 2007  Acre Janeiro      0 2007-01-01

Data Cleansing

##  /\     /\
## {  `---'  }
## {  O   O  }
## ==>  V <==  No need for mice. This data set is completely observed.
##  \  \|/  /
##   `-----'

##      year month state number  
## 5497    1     1     1      1 0
##         0     0     0      0 0
##    year   month            state number
## 1  1998 January             Acre      0
## 2  1998 January          Alagoas      0
## 3  1998 January            Amapa      0
## 4  1998 January         Amazonas      0
## 5  1998 January            Bahia      0
## 6  1998 January            Ceara      0
## 7  1998 January Distrito Federal      0
## 8  1998 January   Espirito Santo      0
## 9  1998 January            Goias      0
## 10 1998 January         Maranhao      0

Exploratory Data Analysis

Statistical Inferences

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0    10.0    65.0   613.7   362.0 30409.0
## [1] 352
## [1] 1761.297
## [1] 613.6745
## [1] 65

By Year

It can be seen from the bar graph, the years 2004 and 2017 record the highest number of fires. Though there has been decrease in the fires in some of the years between 2004-2017, the increasing trend can be observed.

By Month

Highest number of fires have been recorded in the month of September, closely followed by August and October. The fact that this is a trend occuring in every year can be seen from the year-wise stacked bar graph.

By State

Mato Grosso,Para and Maranhao seem to be worst affected of the states in Brazil.The state of Mato Grosso was heavily impacted while Sao Paulo wasn’t impacted with significance.

Forecasting

The univariate forecasting using ARIMA do not seem to yeild a pretty accurate forcasting of the forest fires. 87,000 reported number of fires occured in 2018 and around 80,000 in 2019. Multivariate forcasting methods should be explored for more accurate predictions.

Conclusion

Though there has been a zigzag trend in the number of forest fires over the past two decades, there is consistency observed in the months of Aug-Oct every year during which highest number of fires have been reported. The fires are pretty less for much of the year due to wet weather which prevents them from starting and spreading. However, during the months of July and November, activity typically increases due to the dry season.