Introduction

There are 6454 observations in the given data set. Each case represents a forest fire reported. This dataset report of the number of forest fires in Brazil divided by states. The series comprises the period of approximately 10 years (1998 to 2017). The data were obtained from the official website of the Brazilian government. My goal is to see which states have the most occurence of forest fires. Should states with higher occurence of forest fires increase their firefighting staffs? If so, when?

Analysis

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(knitr)
library(stringr)

forest_fire<-read.csv("https://raw.githubusercontent.com/Sizzlo/Data-Project-Proposal/master/amazon.csv", sep=",")

dim(forest_fire)
## [1] 6454    5
names(forest_fire)
## [1] "year"   "state"  "month"  "number" "date"
head(forest_fire)
##   year state   month number       date
## 1 1998  Acre Janeiro      0 1998-01-01
## 2 1999  Acre Janeiro      0 1999-01-01
## 3 2000  Acre Janeiro      0 2000-01-01
## 4 2001  Acre Janeiro      0 2001-01-01
## 5 2002  Acre Janeiro      0 2002-01-01
## 6 2003  Acre Janeiro     10 2003-01-01
  1. This groups the total number of forest fire in each state. Then I listed the total of forest fire from highest to lowest. I want to observe the top 10 places with the most forest fires. Using this data, we can be more alert for these areas that are more prone to forest fires. We can see that Mato Grosso have the highest total of forest fire from 1998 to 2017.
fire1<-forest_fire %>% 
  group_by(state) %>% 
  summarise(Total=round(sum(number))) %>% 
  arrange(desc(Total))
  
fire1_top10<-fire1 %>% 
  slice(0:10)

fire1_top10
## # A tibble: 10 x 2
##    state        Total
##    <fct>        <dbl>
##  1 Mato Grosso  96246
##  2 Paraiba      52436
##  3 Sao Paulo    51121
##  4 Rio          45161
##  5 Bahia        44746
##  6 Piau         37804
##  7 Goias        37696
##  8 Minas Gerais 37475
##  9 Tocantins    33708
## 10 Amazonas     30650
ggplot(fire1_top10, aes(x=state, y=Total))+geom_bar(fill="lightblue", stat="identity")

  1. We can also group by which year had the most forest fire. We can see that 2003 have the most forest fire that year. We can also look at the most forest fires within the top 5 years.
fire2<-forest_fire %>% 
  group_by(year) %>% 
  summarise(Total=round(sum(number))) %>% 
  arrange(desc(Total)) %>% 
  slice(0:5)
kable(head(fire2))
year Total
2003 42761
2016 42212
2015 41208
2012 40085
2014 39621
  1. Tidying up the data to only obtain the states and number of the forest fires.
test1<-forest_fire %>% 
  group_by(state) %>% 
  select(state, number)

test1
## # A tibble: 6,454 x 2
## # Groups:   state [23]
##    state number
##    <fct>  <dbl>
##  1 Acre       0
##  2 Acre       0
##  3 Acre       0
##  4 Acre       0
##  5 Acre       0
##  6 Acre      10
##  7 Acre       0
##  8 Acre      12
##  9 Acre       4
## 10 Acre       0
## # ... with 6,444 more rows
  1. Calculate the mean number of forest fires of each state.
meanfire1<-forest_fire %>% 
  group_by(state) %>% 
  summarise(Total=round(mean(number))) %>% 
  arrange(desc(Total))

meanfire1
## # A tibble: 23 x 2
##    state        Total
##    <fct>        <dbl>
##  1 Sao Paulo      214
##  2 Mato Grosso    201
##  3 Bahia          187
##  4 Goias          158
##  5 Piau           158
##  6 Minas Gerais   157
##  7 Tocantins      141
##  8 Amazonas       128
##  9 Ceara          127
## 10 Paraiba        110
## # ... with 13 more rows
  1. This is a bar graph of the mean number of forest fires of each state.
ggplot(meanfire1, aes(x=state, y=Total))+geom_bar(fill="lightblue", stat="identity")+ labs(y="Average Total") + coord_flip()

  1. Find which months have the highest occurence of forest fires in all the states combined
fire3<-forest_fire %>% 
  group_by(month) %>%
  summarise(Total=round(sum(number))) %>% 
  arrange(desc(Total))

fire3
## # A tibble: 12 x 2
##    month     Total
##    <fct>     <dbl>
##  1 Julho     92326
##  2 Outubro   88682
##  3 Agosto    88050
##  4 Novembro  85508
##  5 Setembro  58578
##  6 Dezembro  57535
##  7 Junho     56011
##  8 Janeiro   47748
##  9 Maio      34731
## 10 Fevereiro 30848
## 11 Março     30717
## 12 Abril     28189
# Translate the months from Brazilian into English
fire3$month<-as.character(fire3$month)  
fire3$month[fire3$month=="Janeiro"]<- "January"
fire3$month[fire3$month=="Fevereiro"]<- "February"
fire3$month[fire3$month=="Março"]<- "March"
fire3$month[fire3$month=="Abril"]<- "April"
fire3$month[fire3$month=="Maio"]<- "May"
fire3$month[fire3$month=="Junho"]<- "June"
fire3$month[fire3$month=="Julho"]<- "July"
fire3$month[fire3$month=="Agosto"]<- "August"
fire3$month[fire3$month=="Setembro"]<- "September"
fire3$month[fire3$month=="Outubro"]<- "October"
fire3$month[fire3$month=="Novembro"]<- "November"
fire3$month[fire3$month=="Dezembro"]<- "December"

#translated version
fire3
## # A tibble: 12 x 2
##    month     Total
##    <chr>     <dbl>
##  1 July      92326
##  2 October   88682
##  3 August    88050
##  4 November  85508
##  5 September 58578
##  6 December  57535
##  7 June      56011
##  8 January   47748
##  9 May       34731
## 10 February  30848
## 11 March     30717
## 12 April     28189

Conclusion

Based on the graphs and the data we can tell that Sao Paulo on average have the most forest fires out of each of the states. But Mato Grosso had the highest total in general. Sao Paulo have 214 average forest fires. During the time period of 1998 to 2017, year 2003 had the most fires with 42761. Also the highest number of forest fires occured during July with 92326, which can be a cause of an effect from the higher summer temperatures.

So during months like July and higher occurence forest fire states like Sao Paulo and Mato Grosso, we can use this data analysis to hire more firefighting staffs during these time period and this location.