1.Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The puropose of this analysis is to:

Determine the events that are most hamrful with respect to to population health
Determine the events that have the greatest economic consequences

2.Data processing

The purpose of this section is to gather the raw data from the source, and transform it so that the purpose of the analysis can be achieved.

2.1 Download & read data

First, the data has to be downloaded, extracted, and read into R.

url="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url,"Data.bz2",method = "curl")
raw.data=read.csv(bzfile("Data.bz2"),stringsAsFactors = F)
head(raw.data)

##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

2.2 Extract necessary information

As we can see, only some of the columns from the raw.data database are needed. This is filtered here.

harmful.data=data.frame(raw.data$EVTYPE,raw.data$FATALITIES,raw.data$INJURIES,
        raw.data$PROPDMG,raw.data$PROPDMGEXP,raw.data$CROPDMG,raw.data$CROPDMGEXP, stringsAsFactors = F)
names(harmful.data)=c("Event","Fatalities","Injuries","PropDmg","PropDmgExp","CropDmg","CropDmgExp")
head(harmful.data)

##     Event Fatalities Injuries PropDmg PropDmgExp CropDmg CropDmgExp
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0

2.3 Convert property exponentials to numbers

Also, it is quite important to notice that the property damage fields are composed of a X.XX number format field, and an exponential field which has to be tranformed.

harmful.data$PropDmgExp[harmful.data$PropDmgExp=="0"]=0
harmful.data$PropDmgExp[harmful.data$PropDmgExp=="+"]=0
harmful.data$PropDmgExp[harmful.data$PropDmgExp=="-"]=0
harmful.data$PropDmgExp[harmful.data$PropDmgExp=="?"]=0
harmful.data$PropDmgExp[harmful.data$PropDmgExp=="1"]=10
harmful.data$PropDmgExp[harmful.data$PropDmgExp=="2"]=100
harmful.data$PropDmgExp[harmful.data$PropDmgExp=="3"]=1000
harmful.data$PropDmgExp[harmful.data$PropDmgExp=="4"]=1e+04
harmful.data$PropDmgExp[harmful.data$PropDmgExp=="5"]=1e+05
harmful.data$PropDmgExp[harmful.data$PropDmgExp=="6"]=1e+06
harmful.data$PropDmgExp[harmful.data$PropDmgExp=="7"]=1e+07
harmful.data$PropDmgExp[harmful.data$PropDmgExp=="8"]=1e+08
harmful.data$PropDmgExp[harmful.data$PropDmgExp=="B"]=1e+09
harmful.data$PropDmgExp[harmful.data$PropDmgExp=="H"]=100
harmful.data$PropDmgExp[harmful.data$PropDmgExp=="h"]=100
harmful.data$PropDmgExp[harmful.data$PropDmgExp=="K"]=1000
harmful.data$PropDmgExp[harmful.data$PropDmgExp=="M"]=1e+06
harmful.data$PropDmgExp[harmful.data$PropDmgExp=="m"]=1e+06
harmful.data$PropDmgExp=as.numeric(harmful.data$PropDmgExp)

harmful.data$Property=harmful.data$PropDmg*harmful.data$PropDmgExp

2.4 Convert crop exponentials to numbers

The same process has to be done with the crop damage information.

harmful.data$CropDmgExp[harmful.data$CropDmgExp==""]=0
harmful.data$CropDmgExp[harmful.data$CropDmgExp=="0"]=0
harmful.data$CropDmgExp[harmful.data$CropDmgExp=="?"]=0
harmful.data$CropDmgExp[harmful.data$CropDmgExp=="2"]=100
harmful.data$CropDmgExp[harmful.data$CropDmgExp=="k"]=1000
harmful.data$CropDmgExp[harmful.data$CropDmgExp=="K"]=1000
harmful.data$CropDmgExp[harmful.data$CropDmgExp=="m"]=1e+06
harmful.data$CropDmgExp[harmful.data$CropDmgExp=="M"]=1e+06
harmful.data$CropDmgExp[harmful.data$CropDmgExp=="B"]=1e+06
harmful.data$CropDmgExp=as.numeric(harmful.data$CropDmgExp)

harmful.data$Crop=harmful.data$CropDmg*harmful.data$CropDmgExp

2.5 Obtain top events regarding population harmed

Since there is a huge number of events in the database, it is useful to only work with the top 10 events. This code will filter the top 10 events that harmed the most population.

harmful.data$Event=as.factor(harmful.data$Event)
harmful.data$TotPop=harmful.data$Fatalities+harmful.data$Injuries
top.population=aggregate(TotPop~Event,harmful.data,"sum")
top.population=top.population[order(top.population$TotPop, decreasing = T), ]
top.population=top.population[1:10,1]

2.6 Aggregate population damage

Once we have the top 10 events and the necessary information the database has to be aggregated so that each row belongs to a particular event. This new database will still separate the fatalities and the injuries per event.

fatalities=aggregate(Fatalities~Event,harmful.data,FUN = "sum")
names(fatalities)=c("Event","Amount")
fatalities$Type="Fatality"
fatalities=fatalities[fatalities$Event%in%top.population,]

injuries=aggregate(Injuries~Event,harmful.data,FUN = "sum")
names(injuries)=c("Event","Amount")
injuries$Type="Injury"
injuries=injuries[injuries$Event%in%top.population,]

population=rbind(injuries,fatalities)

2.7 Obtain top events regarding property

Since there is a huge number of events in the database, it is useful to only work with the top 10 events. This code will filter the top 10 events that had the greatest property damage.

harmful.data$TotProp=harmful.data$Property+harmful.data$Crop
top.property=aggregate(TotProp~Event,harmful.data,"sum")
top.property=top.property[order(top.property$TotProp, decreasing = T), ]
top.property=top.property[1:10,1]

2.8 Aggregate property damage

prop=aggregate(Property~Event,harmful.data,FUN = "sum")
names(prop)=c("Event","Amount")
prop$Type="Property"
prop=prop[prop$Event%in%top.property,]  ###
prop=prop[complete.cases(prop),]

crop=aggregate(Crop~Event,harmful.data,FUN = "sum")
names(crop)=c("Event","Amount")
crop$Type="Crop"
crop=crop[crop$Event%in%top.property,]
crop=crop[complete.cases(crop),]

property=rbind(prop,crop)

3.Results

Two graphs will be created. The first for finding the event which has harmed the most population, and the second for obtaining the event that had the greatest property damage

3.1 Population harmed

A barplot with the top 10 events will be created, taking into account the separation from injuries and fatalities.

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.2.4

ggplot(population, aes(x=Event, y=Amount, fill=Type))+ 
    geom_bar(stat = "identity")+coord_flip()+labs(y="Affected people", 
    title="Top 10 Harmful Events to Population")+ 
    scale_fill_manual(values = c("green","blue"))

3.2 Property damage

A barplot with the top 10 events will be created, taking into account the separation from crop and property damage.

ggplot(property, aes(x=Event, y=Amount, fill=Type))+ 
    geom_bar(stat = "identity")+coord_flip()+labs(y="Damage Cost", 
    title="Top 10 Events by Property Damage Cost")+
    scale_fill_manual(values = c("green","blue"))

3.3 Final results

With this graphs it is possible to conclude that

Tornados are the most harmful events both regarding fatalities and injuries.
Floods are the events that have the gratest damage cost both in property and crop damage.

Most influential events in the US according to the NOAA Storm Database

Diego Chavez

July 19, 2016