There are 6454 observations in the given data set. Each case represents a forest fire reported. This dataset report of the number of forest fires in Brazil divided by states. The series comprises the period of approximately 10 years (1998 to 2017). The data were obtained from the official website of the Brazilian government. My goal is to see which states have the most occurence of forest fires. Should states with higher occurence of forest fires increase their firefighting staffs? If so, when?
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(knitr)
library(stringr)
forest_fire<-read.csv("https://raw.githubusercontent.com/Sizzlo/Data-Project-Proposal/master/amazon.csv", sep=",")
dim(forest_fire)
## [1] 6454 5
names(forest_fire)
## [1] "year" "state" "month" "number" "date"
head(forest_fire)
## year state month number date
## 1 1998 Acre Janeiro 0 1998-01-01
## 2 1999 Acre Janeiro 0 1999-01-01
## 3 2000 Acre Janeiro 0 2000-01-01
## 4 2001 Acre Janeiro 0 2001-01-01
## 5 2002 Acre Janeiro 0 2002-01-01
## 6 2003 Acre Janeiro 10 2003-01-01
fire1<-forest_fire %>%
group_by(state) %>%
summarise(Total=round(sum(number))) %>%
arrange(desc(Total))
fire1_top10<-fire1 %>%
slice(0:10)
fire1_top10
## # A tibble: 10 x 2
## state Total
## <fct> <dbl>
## 1 Mato Grosso 96246
## 2 Paraiba 52436
## 3 Sao Paulo 51121
## 4 Rio 45161
## 5 Bahia 44746
## 6 Piau 37804
## 7 Goias 37696
## 8 Minas Gerais 37475
## 9 Tocantins 33708
## 10 Amazonas 30650
ggplot(fire1_top10, aes(x=state, y=Total))+geom_bar(fill="lightblue", stat="identity")
fire2<-forest_fire %>%
group_by(year) %>%
summarise(Total=round(sum(number))) %>%
arrange(desc(Total)) %>%
slice(0:5)
kable(head(fire2))
| year | Total |
|---|---|
| 2003 | 42761 |
| 2016 | 42212 |
| 2015 | 41208 |
| 2012 | 40085 |
| 2014 | 39621 |
test1<-forest_fire %>%
group_by(state) %>%
select(state, number)
test1
## # A tibble: 6,454 x 2
## # Groups: state [23]
## state number
## <fct> <dbl>
## 1 Acre 0
## 2 Acre 0
## 3 Acre 0
## 4 Acre 0
## 5 Acre 0
## 6 Acre 10
## 7 Acre 0
## 8 Acre 12
## 9 Acre 4
## 10 Acre 0
## # ... with 6,444 more rows
meanfire1<-forest_fire %>%
group_by(state) %>%
summarise(Total=round(mean(number))) %>%
arrange(desc(Total))
meanfire1
## # A tibble: 23 x 2
## state Total
## <fct> <dbl>
## 1 Sao Paulo 214
## 2 Mato Grosso 201
## 3 Bahia 187
## 4 Goias 158
## 5 Piau 158
## 6 Minas Gerais 157
## 7 Tocantins 141
## 8 Amazonas 128
## 9 Ceara 127
## 10 Paraiba 110
## # ... with 13 more rows
ggplot(meanfire1, aes(x=state, y=Total))+geom_bar(fill="lightblue", stat="identity")+ labs(y="Average Total") + coord_flip()
fire3<-forest_fire %>%
group_by(month) %>%
summarise(Total=round(sum(number))) %>%
arrange(desc(Total))
fire3
## # A tibble: 12 x 2
## month Total
## <fct> <dbl>
## 1 Julho 92326
## 2 Outubro 88682
## 3 Agosto 88050
## 4 Novembro 85508
## 5 Setembro 58578
## 6 Dezembro 57535
## 7 Junho 56011
## 8 Janeiro 47748
## 9 Maio 34731
## 10 Fevereiro 30848
## 11 Março 30717
## 12 Abril 28189
# Translate the months from Brazilian into English
fire3$month<-as.character(fire3$month)
fire3$month[fire3$month=="Janeiro"]<- "January"
fire3$month[fire3$month=="Fevereiro"]<- "February"
fire3$month[fire3$month=="Março"]<- "March"
fire3$month[fire3$month=="Abril"]<- "April"
fire3$month[fire3$month=="Maio"]<- "May"
fire3$month[fire3$month=="Junho"]<- "June"
fire3$month[fire3$month=="Julho"]<- "July"
fire3$month[fire3$month=="Agosto"]<- "August"
fire3$month[fire3$month=="Setembro"]<- "September"
fire3$month[fire3$month=="Outubro"]<- "October"
fire3$month[fire3$month=="Novembro"]<- "November"
fire3$month[fire3$month=="Dezembro"]<- "December"
#translated version
fire3
## # A tibble: 12 x 2
## month Total
## <chr> <dbl>
## 1 July 92326
## 2 October 88682
## 3 August 88050
## 4 November 85508
## 5 September 58578
## 6 December 57535
## 7 June 56011
## 8 January 47748
## 9 May 34731
## 10 February 30848
## 11 March 30717
## 12 April 28189
Based on the graphs and the data we can tell that Sao Paulo on average have the most forest fires out of each of the states. But Mato Grosso had the highest total in general. Sao Paulo have 214 average forest fires. During the time period of 1998 to 2017, year 2003 had the most fires with 42761. Also the highest number of forest fires occured during July with 92326, which can be a cause of an effect from the higher summer temperatures.
So during months like July and higher occurence forest fire states like Sao Paulo and Mato Grosso, we can use this data analysis to hire more firefighting staffs during these time period and this location.