Abstract

Each year, a number of weather events hits the United States. Since 1950 to November 2011 The NOAA record each weather event with its consequences in the population and properties around the country.

In this analysis we pretend show which weather events are most dangerous to the people, and show the economic damage of each weather event in the properties and crops in USA with the NAOO Storm Database.

The Tornadoes are the weather event with more fatalities and injuries, but the Flood is the event with more economic losses in properties and the Hurricanes and/or Typhoons affect mainly the crops.

Loading and preprocessing Data

Download the file from the web page and load in RStudio with the following codes:

download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
              "repdata-data-StormData.csv.bz2")

Now load the required packages and the database:

library(ggplot2)
library(gridExtra)
## Loading required package: grid
library(plyr)
data <- read.csv("repdata-data-StormData.csv.bz2")[,c(2,8,23:28)]

We just need the variables describing fatalities, injuries and damage to property and crops.

The variables of the database are:

dim(data)
## [1] 902297      8
summary(data[,c(3,4,5,7)])
##    FATALITIES          INJURIES            PROPDMG       
##  Min.   :  0.0000   Min.   :   0.0000   Min.   :   0.00  
##  1st Qu.:  0.0000   1st Qu.:   0.0000   1st Qu.:   0.00  
##  Median :  0.0000   Median :   0.0000   Median :   0.00  
##  Mean   :  0.0168   Mean   :   0.1557   Mean   :  12.06  
##  3rd Qu.:  0.0000   3rd Qu.:   0.0000   3rd Qu.:   0.50  
##  Max.   :583.0000   Max.   :1700.0000   Max.   :5000.00  
##     CROPDMG       
##  Min.   :  0.000  
##  1st Qu.:  0.000  
##  Median :  0.000  
##  Mean   :  1.527  
##  3rd Qu.:  0.000  
##  Max.   :990.000

The database have 902297 measures, but data collection is not the same in each year, the following histogram shows the difference of the measurements per year.

date<-as.Date(data$BGN_DATE,format='%m/%d/%Y %H:%M:%S')
hist(as.numeric(format(date,'%Y')),col='grey',breaks=20,
     main='NOAA Measures by Year',xlab='Year')

Fatalities and Injuries Analysis by Weather Event

Across the United States, which types of events are most harmful with respect to population health?
To answer this question, calculate the total fatalities for every weather event and the same way with the injuries.

FatalHarm<-aggregate(data$FATALITIES,by=list(data$EVTYPE),FUN=sum)
names(FatalHarm)<-c('EVTYPE','FATALITIES')
FatalHarm<-FatalHarm[order(FatalHarm$FATALITIES,decreasing=T),]
Top10Fatal<-FatalHarm[1:10,]
Top10Fatal
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224
InjuHarm<-aggregate(data$INJURIES,by=list(data$EVTYPE),FUN=sum)
names(InjuHarm)<-c('EVTYPE','INJURIES')
InjuHarm<-InjuHarm[order(InjuHarm$INJURIES,decreasing=T),]
Top10Inju<-InjuHarm[1:10,]
Top10Inju
##                EVTYPE INJURIES
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361

The weather event with more fatalities and injuries

maxfatal<-which.max(data$FATALITIES)
maxinjury<-which.max(data$INJURIES)
list(Fatality=data[maxfatal,1:4],Injury=data[maxinjury,1:4])
## $Fatality
##                 BGN_DATE EVTYPE FATALITIES INJURIES
## 198704 7/12/1995 0:00:00   HEAT        583        0
## 
## $Injury
##                 BGN_DATE  EVTYPE FATALITIES INJURIES
## 157885 4/10/1979 0:00:00 TORNADO         42     1700

The following panel plot shows the total number of fatalities and injuries each weather event in the US

p1<-ggplot(Top10Fatal, aes(x=reorder(EVTYPE,-FATALITIES),y=FATALITIES,fill=EVTYPE)) + 
   geom_bar(stat="identity")+guides(fill=FALSE)+
   scale_y_continuous("Number of Fatalities") + 
   xlab("Weather Type")+ 
   ggtitle("Total Fatalities\nby Weather in the U.S\n")+
   theme(axis.text.x=element_text(angle=45,hjust=1),
         plot.title=element_text(size=rel(1.5)))

p2<-ggplot(Top10Inju, aes(x=reorder(EVTYPE,-INJURIES), y=INJURIES, fill=EVTYPE)) + 
   geom_bar(stat="identity")+guides(fill=FALSE)+
   scale_y_continuous("Number of Injuries") + 
   xlab("Weather Type")+ 
   ggtitle("Total Injuries\nby Weather in the U.S\n")+
   theme(axis.text.x=element_text(angle=45,hjust=1),
         plot.title=element_text(size=rel(1.5)))

grid.arrange(p1,p2,ncol=2)

The Tornadoes are the weather event with more fatalities and injuries in USA between 1950 - November 2011.

Economic Damages in Properties and Crops

Across the United States, which types of events have the greatest economic consequences? Before calculating the economic damage, we must organize the database because the PROPDMGEXP and CROPDMGEXP variables are factors that quantify the magnitude of the damage. Look the Record Layout of the database for understand the levels of the factors.

levels(data$PROPDMGEXP)
##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
levels(data$CROPDMGEXP)
## [1] ""  "?" "0" "2" "B" "k" "K" "m" "M"
data$PROPDMGEXP<-mapvalues(data$PROPDMGEXP,warn_missing=F,
                           from=c("B","h","H","K","m","M"),
                           to=c("9","2","2","3","6","6"))

data$CROPDMGEXP<-mapvalues(data$PROPDMGEXP,warn_missing=F,
                           from=c("B","k","K","m","M"),
                           to=c("9","3","3","6","6"))

Now we can calculate the Economic Damage in Properties and Crops by weather event.

Properties<-subset(data,PROPDMGEXP!="+" & PROPDMGEXP!="-" & PROPDMGEXP!="?",select=c(1:6))
Properties$PROPDMGEXP<-as.numeric(as.character(Properties$PROPDMGEXP))
Properties$DMGVALUED<-Properties$PROPDMG*(10^Properties$PROPDMGEXP)

PropDamage<-aggregate(Properties$DMGVALUED,by=list(Properties$EVTYPE),FUN=sum,na.rm=T)
names(PropDamage)<-c('EVTYPE','DAMAGE')
PropDamage<-PropDamage[order(PropDamage$DAMAGE,decreasing=T),]
Top10Prop<-PropDamage[1:10,]
Top10Prop
##                EVTYPE       DAMAGE
## 169             FLOOD 144657709800
## 409 HURRICANE/TYPHOON  69305840000
## 832           TORNADO  56947380614
## 668       STORM SURGE  43323536000
## 152       FLASH FLOOD  16822673772
## 242              HAIL  15735267456
## 400         HURRICANE  11868319010
## 846    TROPICAL STORM   7703890550
## 970      WINTER STORM   6688497251
## 357         HIGH WIND   5270046260
Crops<-subset(data,CROPDMGEXP!="+" & CROPDMGEXP!="-" & CROPDMGEXP!="?",select=c(1:4,7:8))
Crops$CROPDMGEXP<-as.numeric(as.character(Crops$CROPDMGEXP))
Crops$DMGVALUED<-Crops$CROPDMG*(10^Crops$CROPDMGEXP)

CropDamage<-aggregate(Crops$DMGVALUED,by=list(Crops$EVTYPE),FUN=sum,na.rm=T)
names(CropDamage)<-c('EVTYPE','DAMAGE')
CropDamage<-CropDamage[order(CropDamage$DAMAGE,decreasing=T),]
Top10Crop<-CropDamage[1:10,]
Top10Crop
##                        EVTYPE       DAMAGE
## 400                 HURRICANE 802881916000
## 409         HURRICANE/TYPHOON 732768451330
## 169                     FLOOD  87251972270
## 152               FLASH FLOOD  38865137040
## 832                   TORNADO  28269872233
## 242                      HAIL  15316662250
## 408 HURRICANE OPAL/HIGH WINDS  10000000000
## 854                 TSTM WIND   7684639900
## 357                 HIGH WIND   7174065610
## 955                  WILDFIRE   7173808200

The weather event with more economic damage in properties and crops.

maxprop<-which.max(Properties$DMGVALUED)
maxcrop<-which.max(Crops$DMGVALUED)
list(Properties=Properties[maxprop,c(1:4,7)],Crops=Crops[maxcrop,c(1:4,7)])
## $Properties
##                BGN_DATE EVTYPE FATALITIES INJURIES DMGVALUED
## 605953 1/1/2006 0:00:00  FLOOD          0        0  1.15e+11
## 
## $Crops
##                 BGN_DATE    EVTYPE FATALITIES INJURIES DMGVALUED
## 366694 9/15/1999 0:00:00 HURRICANE          0        0     5e+11

The following panel plot shows the Economic Damage in Properties and Crops each weather event in the US

p3<-ggplot(Top10Prop, aes(x=reorder(EVTYPE,-DAMAGE), y=DAMAGE/10e9, fill=EVTYPE)) + 
   geom_bar(stat="identity")+guides(fill=FALSE)+
   scale_y_continuous("Damage Value (Billions of Dollars)") + 
   xlab("Weather Type")+ 
   ggtitle("Economic Damage Properties\nby Weather in the U.S\n")+
   theme(axis.text.x=element_text(angle=45,hjust=1),
         plot.title=element_text(size=rel(1.4)))

p4<-ggplot(Top10Crop, aes(x=reorder(EVTYPE,-DAMAGE), y=DAMAGE/10e9, fill=EVTYPE))+ 
   geom_bar(stat="identity")+guides(fill=FALSE)+
   scale_y_continuous("Damage Value (Billions of Dollars)") + 
   xlab("Weather Type")+ 
   ggtitle("Economic Damage Crops\nby Weather in the U.S\n")+
   theme(axis.text.x=element_text(angle=45,hjust=1),
         plot.title=element_text(size=rel(1.4)))

grid.arrange(p3,p4,ncol=2)

The Flood is the weather event with more economic losses in properties and the Hurricanes and/or Typhoons affect mainly the crops in the U.S.