You should phrase your research question in a way that matches up with the scope of inference your dataset allows for. Are some cities more susceptible to forest fire? ## Cases What are the cases, and how many are there? Each case represents a forest fire reported. There are 6454 observations in the given data set. ## Data collection Describe the method of data collection. This dataset report of the number of forest fires in Brazil divided by states. The series comprises the period of approximately 10 years (1998 to 2017). The data were obtained from the official website of the Brazilian government. ## Type of study What type of study is this (observational/experiment)? This is an observational study. ## Data Source If you collected the data, state self-collected. If not, provide a citation/link.
http://dados.gov.br/dataset/sistema-nacional-de-informacoes-florestais-snif ## Response What is the response variable, and what type is it (numerical/categorical)? The response variable is the city and its categorical. ## Explanatory What is the explanatory variable, and what type is it (numerical/categorical)? The explanatory variable is the number of forest fire and is numerical. ## Relevant summary statistics Provide summary statistics relevant to your research question. For example, if youโre comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(knitr)
library(stringr)
forest_fire<-read.csv("https://raw.githubusercontent.com/Sizzlo/Data-Project-Proposal/master/amazon.csv", sep=",")
dim(forest_fire)
## [1] 6454 5
names(forest_fire)
## [1] "year" "state" "month" "number" "date"
summary(forest_fire)
## year state month number
## Min. :1998 Rio : 717 Janeiro : 541 Min. : 0.0
## 1st Qu.:2002 Mato Grosso: 478 Abril : 540 1st Qu.: 3.0
## Median :2007 Paraiba : 478 Agosto : 540 Median : 24.0
## Mean :2007 Alagoas : 240 Fevereiro: 540 Mean :108.3
## 3rd Qu.:2012 Acre : 239 Julho : 540 3rd Qu.:113.0
## Max. :2017 Amapa : 239 Junho : 540 Max. :998.0
## (Other) :4063 (Other) :3213
## date
## 1998-01-01: 324
## 1999-01-01: 324
## 2000-01-01: 324
## 2001-01-01: 324
## 2002-01-01: 324
## 2003-01-01: 324
## (Other) :4510
This groups the total number of forest fire in each state. Then I listed the total of forest fire from highest to lowest. I want to observe the top 10 places with the most forest fires. Using this data, we can be more alert for these areas that are more prone to forest fires. We can see that Matto Grosso have the highest total of forest fire.
fire1<-forest_fire %>%
group_by(state) %>%
summarise(Total=round(sum(number))) %>%
arrange(desc(Total))
fire1_top10<-fire1 %>%
slice(0:10)
ggplot(fire1_top10, aes(x=state, y=Total))+geom_bar(stat="identity")
We can also group by which year had the most forest fire. We can see that 2003 have the most forest fire that year. We can also look at the most forest fires within the top 5 years.
fire2<-forest_fire %>%
group_by(year) %>%
summarise(Total=round(sum(number))) %>%
arrange(desc(Total)) %>%
slice(0:5)
kable(head(fire2))
| year | Total |
|---|---|
| 2003 | 42761 |
| 2016 | 42212 |
| 2015 | 41208 |
| 2012 | 40085 |
| 2014 | 39621 |
ggplot(fire2, aes(x=year, y=Total))+geom_point(stat="identity")