This analysis has the objective discover which types of climate events are most harmful with respect to population health. The data that were used in this analisys were gathering from National Wheather Center between 1950 and 2011.Firstly, this text will show a summary of the types of climate ocorrences including main locations. After, it was proposed two indicators for States: Indicator of Fatalities per Event, Indicator of Injury per Event. Those indicators were calculated by States of United States. For last, we calculated the most dangers events. This short article intends to alert policians and public managers about the risk of climate events. Thus, the results should influence in investiment and prevent future deaths or injuries in population.
To read the data frame it was used the function read.csv with parameters: herder=TRUE and sep = “,”.
df_storm = read.csv("repdata_data_StormData.csv", header = TRUE, sep=",")
head(df_storm)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
It was used the library “sqldf”" to aggregate values of data frame. Using group by query it was possible agregate and order data to obtain a dataframe with most common events.
Select EVTYPE, count(*) as qtd from df_storm Group By EVTYPE order by qtd desc, EVTYPE
#df_storm = read.csv("repdata_data_StormData.csv", header = TRUE, sep=",")
library("sqldf")
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
events <- sqldf("Select EVTYPE, count(*) as qtd from df_storm Group By EVTYPE order by qtd desc, EVTYPE")
head(events)
## EVTYPE qtd
## 1 HAIL 288661
## 2 TSTM WIND 219940
## 3 THUNDERSTORM WIND 82563
## 4 TORNADO 60652
## 5 FLASH FLOOD 54277
## 6 FLOOD 25326
To show a graph with most common events will filter the 5 first obs. and aggregate the others using sum funcion.
numbers <- c(events[1:5,]$qtd, sum(events[11:nrow(events),]$qtd))
labels_slices <- c(as.character(events[1:5,]$EVTYPE),"others")
slices <- numbers
lbls <- labels_slices
pie(slices, labels = lbls, main="Main Events Registered")
Again using “sqldf” library. It was calculated two indicators per state:
sum(FATALITIES)/count() sum(INJURIES)/count()
Thus, it was possible to discover the states more unsafe.
library("sqldf")
indicators <- sqldf("Select STATE, count(*) as qtd_events, sum(FATALITIES)/count(*) as fatalities_per_events, sum(INJURIES)/count(*) as injury_per_events from df_storm Group By STATE Order by fatalities_per_events desc")
indicators_i <- sqldf("Select STATE, count(*) as qtd_events, sum(FATALITIES)/count(*) as fatalities_per_events, sum(INJURIES)/count(*) as injury_per_events from df_storm Group By STATE Order by injury_per_events desc")
head(indicators)
## STATE qtd_events fatalities_per_events injury_per_events
## 1 GU 306 0.26470588 1.3594771
## 2 AS 257 0.15953307 0.6381323
## 3 DC 437 0.07093822 0.8764302
## 4 PZ 96 0.05208333 0.0312500
## 5 CA 10780 0.05102041 0.3040816
## 6 IL 28488 0.04988065 0.1952752
head(indicators_i)
## STATE qtd_events fatalities_per_events injury_per_events
## 1 GU 306 0.26470588 1.3594771
## 2 MH 1 0.00000000 1.0000000
## 3 DC 437 0.07093822 0.8764302
## 4 AS 257 0.15953307 0.6381323
## 5 AL 22739 0.03447821 0.3844496
## 6 MA 5652 0.02476999 0.3752654
With library of Graph from R, it was genereted two graphs with most dangerous states using indicators that were just created.
par(mfrow=c(1,2))
g_fatal <- indicators$fatalities_per_events
g_inj <- indicators$injury_per_events
names(g_fatal) <- indicators$STATE
names(g_inj) <- indicators$STATE
barplot(g_fatal, main="Indicator of Fatalities per State")
barplot(g_inj, main="Indicator of Injury per State")
In this section, it was calculated two dataframes with events more fatalities and events with more injury persons.
library("sqldf")
indicators_2 <- sqldf("Select EVTYPE, count(*) as qtd_events, FATALITIES * count(*) as TOTAL_FATAL from df_storm Group By EVTYPE Order by TOTAL_FATAL desc")
head(indicators_2)
## EVTYPE qtd_events TOTAL_FATAL
## 1 EXTREME COLD 655 655
## 2 AVALANCHE 386 386
## 3 RIP CURRENTS 304 304
## 4 EXTREME WINDCHILL 204 204
## 5 RIVER FLOOD 173 173
## 6 HEAVY SURF 84 84
indicators_3 <- sqldf("Select EVTYPE, count(*) as qtd_events, INJURIES * count(*) as TOTAL_INJURY from df_storm Group By EVTYPE Order by TOTAL_INJURY desc")
head(indicators_3)
## EVTYPE qtd_events TOTAL_INJURY
## 1 TSTM WIND 219940 219940
## 2 TORNADO 60652 121304
## 3 WILD/FOREST FIRE 1457 1457
## 4 HEAT 767 767
## 5 RIP CURRENT 470 470
## 6 WATERSPOUT/TORNADO 8 312
For last, it was created two graph with more dangerous events. Using 5 main observation each one and aggregating the rest.
par(mfrow=c(1,2))
numbers <- c(indicators_2[1:5,]$TOTAL_FATAL, sum(indicators_2[11:nrow(indicators_2),]$TOTAL_FATAL))
labels_slices <- c(as.character(indicators_2[1:5,]$EVTYPE),"others")
slices <- numbers
lbls <- labels_slices
pie(slices, labels = lbls, main="Main Fatal Events Registered")
numbers <- c(indicators_3[1:5,]$TOTAL_INJURY, sum(indicators_3[11:nrow(indicators_3),]$TOTAL_INJURY))
labels_slices <- c(as.character(indicators_3[1:5,]$EVTYPE),"others")
slices <- numbers
lbls <- labels_slices
pie(slices, labels = lbls, main="Main Injury Events Registered")
With the data, it is possible to discover that:
The main events that were registed in the data frame were: HAIL,TSTM WIND, THUNDERSTORM WIND, TORNADO and FLASH FLOOD.
The States with more fatalities per event where: GU, AS, DC, PZ and CA.
The States with more injuries per event where: GU, MH, DC, AS, AL and MA.
The events that had more fatalities were: EXTREME COLD, AVALANCHE, RIP CURRENTS, EXTREME WINDCHILL and RIVER FLOOD
The events that had more injuries were : TSTM WIND and TORNADO.