TITLE: ANALYSIS ABOUT HARMFUL CLIMATE EVENTS IN POPULATION HEATH.

SYNOPSIS

This analysis has the objective discover which types of climate events are most harmful with respect to population health. The data that were used in this analisys were gathering from National Wheather Center between 1950 and 2011.Firstly, this text will show a summary of the types of climate ocorrences including main locations. After, it was proposed two indicators for States: Indicator of Fatalities per Event, Indicator of Injury per Event. Those indicators were calculated by States of United States. For last, we calculated the most dangers events. This short article intends to alert policians and public managers about the risk of climate events. Thus, the results should influence in investiment and prevent future deaths or injuries in population.

DATA PROCESSING

READING DATASET - STORMDATA

To read the data frame it was used the function read.csv with parameters: herder=TRUE and sep = “,”.

df_storm = read.csv("repdata_data_StormData.csv", header = TRUE, sep=",")
head(df_storm)

##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

Calculates: Most common events in data frame

It was used the library “sqldf”" to aggregate values of data frame. Using group by query it was possible agregate and order data to obtain a dataframe with most common events.

Select EVTYPE, count(*) as qtd from df_storm Group By EVTYPE order by qtd desc, EVTYPE

#df_storm = read.csv("repdata_data_StormData.csv", header = TRUE, sep=",")
library("sqldf")

## Loading required package: gsubfn

## Loading required package: proto

## Loading required package: RSQLite

events <- sqldf("Select EVTYPE, count(*) as qtd from df_storm Group By EVTYPE order by qtd desc, EVTYPE")

head(events)

##              EVTYPE    qtd
## 1              HAIL 288661
## 2         TSTM WIND 219940
## 3 THUNDERSTORM WIND  82563
## 4           TORNADO  60652
## 5       FLASH FLOOD  54277
## 6             FLOOD  25326

TOP 5 EVENTS IN DATA BASE

To show a graph with most common events will filter the 5 first obs. and aggregate the others using sum funcion.

numbers       <- c(events[1:5,]$qtd, sum(events[11:nrow(events),]$qtd))
labels_slices <- c(as.character(events[1:5,]$EVTYPE),"others")

slices <- numbers
lbls   <- labels_slices
pie(slices, labels = lbls, main="Main Events Registered")

Most common States with Events and Indicators

Again using “sqldf” library. It was calculated two indicators per state:

sum(FATALITIES)/count() sum(INJURIES)/count()

Thus, it was possible to discover the states more unsafe.

library("sqldf")
indicators <- sqldf("Select STATE, count(*) as qtd_events, sum(FATALITIES)/count(*) as fatalities_per_events, sum(INJURIES)/count(*) as injury_per_events  from df_storm Group By STATE Order by fatalities_per_events desc")

indicators_i <- sqldf("Select STATE, count(*) as qtd_events, sum(FATALITIES)/count(*) as fatalities_per_events, sum(INJURIES)/count(*) as injury_per_events  from df_storm Group By STATE Order by injury_per_events desc")

head(indicators)

##   STATE qtd_events fatalities_per_events injury_per_events
## 1    GU        306            0.26470588         1.3594771
## 2    AS        257            0.15953307         0.6381323
## 3    DC        437            0.07093822         0.8764302
## 4    PZ         96            0.05208333         0.0312500
## 5    CA      10780            0.05102041         0.3040816
## 6    IL      28488            0.04988065         0.1952752

head(indicators_i)

##   STATE qtd_events fatalities_per_events injury_per_events
## 1    GU        306            0.26470588         1.3594771
## 2    MH          1            0.00000000         1.0000000
## 3    DC        437            0.07093822         0.8764302
## 4    AS        257            0.15953307         0.6381323
## 5    AL      22739            0.03447821         0.3844496
## 6    MA       5652            0.02476999         0.3752654

Graph of Indicators

With library of Graph from R, it was genereted two graphs with most dangerous states using indicators that were just created.

par(mfrow=c(1,2))
g_fatal         <- indicators$fatalities_per_events
g_inj           <- indicators$injury_per_events
names(g_fatal)  <- indicators$STATE
names(g_inj)    <- indicators$STATE

barplot(g_fatal, main="Indicator of Fatalities per State")
barplot(g_inj, main="Indicator of Injury per State")

MOST DANGERS EVENTS - FATALITIES

In this section, it was calculated two dataframes with events more fatalities and events with more injury persons.

library("sqldf")
indicators_2 <- sqldf("Select EVTYPE, count(*) as qtd_events, FATALITIES * count(*) as TOTAL_FATAL  from df_storm Group By EVTYPE Order by TOTAL_FATAL desc")

head(indicators_2)

##              EVTYPE qtd_events TOTAL_FATAL
## 1      EXTREME COLD        655         655
## 2         AVALANCHE        386         386
## 3      RIP CURRENTS        304         304
## 4 EXTREME WINDCHILL        204         204
## 5       RIVER FLOOD        173         173
## 6        HEAVY SURF         84          84

indicators_3 <- sqldf("Select EVTYPE, count(*) as qtd_events, INJURIES * count(*) as TOTAL_INJURY  from df_storm Group By EVTYPE Order by TOTAL_INJURY desc")

head(indicators_3)

##               EVTYPE qtd_events TOTAL_INJURY
## 1          TSTM WIND     219940       219940
## 2            TORNADO      60652       121304
## 3   WILD/FOREST FIRE       1457         1457
## 4               HEAT        767          767
## 5        RIP CURRENT        470          470
## 6 WATERSPOUT/TORNADO          8          312

TOP 5 EVENTS IN DATA BASE

For last, it was created two graph with more dangerous events. Using 5 main observation each one and aggregating the rest.

par(mfrow=c(1,2))
numbers       <- c(indicators_2[1:5,]$TOTAL_FATAL, sum(indicators_2[11:nrow(indicators_2),]$TOTAL_FATAL))
labels_slices <- c(as.character(indicators_2[1:5,]$EVTYPE),"others")

slices <- numbers
lbls   <- labels_slices
pie(slices, labels = lbls, main="Main Fatal Events Registered")


numbers       <- c(indicators_3[1:5,]$TOTAL_INJURY, sum(indicators_3[11:nrow(indicators_3),]$TOTAL_INJURY))
labels_slices <- c(as.character(indicators_3[1:5,]$EVTYPE),"others")

slices <- numbers
lbls   <- labels_slices
pie(slices, labels = lbls, main="Main Injury Events Registered")

RESULT

With the data, it is possible to discover that:

The main events that were registed in the data frame were: HAIL,TSTM WIND, THUNDERSTORM WIND, TORNADO and FLASH FLOOD.
The States with more fatalities per event where: GU, AS, DC, PZ and CA.
The States with more injuries per event where: GU, MH, DC, AS, AL and MA.
The events that had more fatalities were: EXTREME COLD, AVALANCHE, RIP CURRENTS, EXTREME WINDCHILL and RIVER FLOOD
The events that had more injuries were : TSTM WIND and TORNADO.

Coursera - Week 4 - Reproducible Research

Marcelo Gadelha

19 de janeiro de 2019

TITLE: ANALYSIS ABOUT HARMFUL CLIMATE EVENTS IN POPULATION HEATH.

SYNOPSIS

DATA PROCESSING

READING DATASET - STORMDATA

Calculates: Most common events in data frame

TOP 5 EVENTS IN DATA BASE

Most common States with Events and Indicators

Graph of Indicators

MOST DANGERS EVENTS - FATALITIES

TOP 5 EVENTS IN DATA BASE

RESULT