Each year, a number of weather events hits the United States. Since 1950 to November 2011 The NOAA record each weather event with its consequences in the population and properties around the country.
In this analysis we pretend show which weather events are most dangerous to the people, and show the economic damage of each weather event in the properties and crops in USA with the NAOO Storm Database.
The Tornadoes are the weather event with more fatalities and injuries, but the Flood is the event with more economic losses in properties and the Hurricanes and/or Typhoons affect mainly the crops.
Download the file from the web page and load in RStudio with the following codes:
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
"repdata-data-StormData.csv.bz2")
Now load the required packages and the database:
library(ggplot2)
library(gridExtra)
## Loading required package: grid
library(plyr)
data <- read.csv("repdata-data-StormData.csv.bz2")[,c(2,8,23:28)]
We just need the variables describing fatalities, injuries and damage to property and crops.
The variables of the database are:
dim(data)
## [1] 902297 8
summary(data[,c(3,4,5,7)])
## FATALITIES INJURIES PROPDMG
## Min. : 0.0000 Min. : 0.0000 Min. : 0.00
## 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.00
## Median : 0.0000 Median : 0.0000 Median : 0.00
## Mean : 0.0168 Mean : 0.1557 Mean : 12.06
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 0.50
## Max. :583.0000 Max. :1700.0000 Max. :5000.00
## CROPDMG
## Min. : 0.000
## 1st Qu.: 0.000
## Median : 0.000
## Mean : 1.527
## 3rd Qu.: 0.000
## Max. :990.000
The database have 902297 measures, but data collection is not the same in each year, the following histogram shows the difference of the measurements per year.
date<-as.Date(data$BGN_DATE,format='%m/%d/%Y %H:%M:%S')
hist(as.numeric(format(date,'%Y')),col='grey',breaks=20,
main='NOAA Measures by Year',xlab='Year')
Across the United States, which types of events are most harmful with respect to population health?
To answer this question, calculate the total fatalities for every weather event and the same way with the injuries.
FatalHarm<-aggregate(data$FATALITIES,by=list(data$EVTYPE),FUN=sum)
names(FatalHarm)<-c('EVTYPE','FATALITIES')
FatalHarm<-FatalHarm[order(FatalHarm$FATALITIES,decreasing=T),]
Top10Fatal<-FatalHarm[1:10,]
Top10Fatal
## EVTYPE FATALITIES
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
InjuHarm<-aggregate(data$INJURIES,by=list(data$EVTYPE),FUN=sum)
names(InjuHarm)<-c('EVTYPE','INJURIES')
InjuHarm<-InjuHarm[order(InjuHarm$INJURIES,decreasing=T),]
Top10Inju<-InjuHarm[1:10,]
Top10Inju
## EVTYPE INJURIES
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
The weather event with more fatalities and injuries
maxfatal<-which.max(data$FATALITIES)
maxinjury<-which.max(data$INJURIES)
list(Fatality=data[maxfatal,1:4],Injury=data[maxinjury,1:4])
## $Fatality
## BGN_DATE EVTYPE FATALITIES INJURIES
## 198704 7/12/1995 0:00:00 HEAT 583 0
##
## $Injury
## BGN_DATE EVTYPE FATALITIES INJURIES
## 157885 4/10/1979 0:00:00 TORNADO 42 1700
The following panel plot shows the total number of fatalities and injuries each weather event in the US
p1<-ggplot(Top10Fatal, aes(x=reorder(EVTYPE,-FATALITIES),y=FATALITIES,fill=EVTYPE)) +
geom_bar(stat="identity")+guides(fill=FALSE)+
scale_y_continuous("Number of Fatalities") +
xlab("Weather Type")+
ggtitle("Total Fatalities\nby Weather in the U.S\n")+
theme(axis.text.x=element_text(angle=45,hjust=1),
plot.title=element_text(size=rel(1.5)))
p2<-ggplot(Top10Inju, aes(x=reorder(EVTYPE,-INJURIES), y=INJURIES, fill=EVTYPE)) +
geom_bar(stat="identity")+guides(fill=FALSE)+
scale_y_continuous("Number of Injuries") +
xlab("Weather Type")+
ggtitle("Total Injuries\nby Weather in the U.S\n")+
theme(axis.text.x=element_text(angle=45,hjust=1),
plot.title=element_text(size=rel(1.5)))
grid.arrange(p1,p2,ncol=2)
The Tornadoes are the weather event with more fatalities and injuries in USA between 1950 - November 2011.
Across the United States, which types of events have the greatest economic consequences? Before calculating the economic damage, we must organize the database because the PROPDMGEXP and CROPDMGEXP variables are factors that quantify the magnitude of the damage. Look the Record Layout of the database for understand the levels of the factors.
levels(data$PROPDMGEXP)
## [1] "" "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
levels(data$CROPDMGEXP)
## [1] "" "?" "0" "2" "B" "k" "K" "m" "M"
data$PROPDMGEXP<-mapvalues(data$PROPDMGEXP,warn_missing=F,
from=c("B","h","H","K","m","M"),
to=c("9","2","2","3","6","6"))
data$CROPDMGEXP<-mapvalues(data$PROPDMGEXP,warn_missing=F,
from=c("B","k","K","m","M"),
to=c("9","3","3","6","6"))
Now we can calculate the Economic Damage in Properties and Crops by weather event.
Properties<-subset(data,PROPDMGEXP!="+" & PROPDMGEXP!="-" & PROPDMGEXP!="?",select=c(1:6))
Properties$PROPDMGEXP<-as.numeric(as.character(Properties$PROPDMGEXP))
Properties$DMGVALUED<-Properties$PROPDMG*(10^Properties$PROPDMGEXP)
PropDamage<-aggregate(Properties$DMGVALUED,by=list(Properties$EVTYPE),FUN=sum,na.rm=T)
names(PropDamage)<-c('EVTYPE','DAMAGE')
PropDamage<-PropDamage[order(PropDamage$DAMAGE,decreasing=T),]
Top10Prop<-PropDamage[1:10,]
Top10Prop
## EVTYPE DAMAGE
## 169 FLOOD 144657709800
## 409 HURRICANE/TYPHOON 69305840000
## 832 TORNADO 56947380614
## 668 STORM SURGE 43323536000
## 152 FLASH FLOOD 16822673772
## 242 HAIL 15735267456
## 400 HURRICANE 11868319010
## 846 TROPICAL STORM 7703890550
## 970 WINTER STORM 6688497251
## 357 HIGH WIND 5270046260
Crops<-subset(data,CROPDMGEXP!="+" & CROPDMGEXP!="-" & CROPDMGEXP!="?",select=c(1:4,7:8))
Crops$CROPDMGEXP<-as.numeric(as.character(Crops$CROPDMGEXP))
Crops$DMGVALUED<-Crops$CROPDMG*(10^Crops$CROPDMGEXP)
CropDamage<-aggregate(Crops$DMGVALUED,by=list(Crops$EVTYPE),FUN=sum,na.rm=T)
names(CropDamage)<-c('EVTYPE','DAMAGE')
CropDamage<-CropDamage[order(CropDamage$DAMAGE,decreasing=T),]
Top10Crop<-CropDamage[1:10,]
Top10Crop
## EVTYPE DAMAGE
## 400 HURRICANE 802881916000
## 409 HURRICANE/TYPHOON 732768451330
## 169 FLOOD 87251972270
## 152 FLASH FLOOD 38865137040
## 832 TORNADO 28269872233
## 242 HAIL 15316662250
## 408 HURRICANE OPAL/HIGH WINDS 10000000000
## 854 TSTM WIND 7684639900
## 357 HIGH WIND 7174065610
## 955 WILDFIRE 7173808200
The weather event with more economic damage in properties and crops.
maxprop<-which.max(Properties$DMGVALUED)
maxcrop<-which.max(Crops$DMGVALUED)
list(Properties=Properties[maxprop,c(1:4,7)],Crops=Crops[maxcrop,c(1:4,7)])
## $Properties
## BGN_DATE EVTYPE FATALITIES INJURIES DMGVALUED
## 605953 1/1/2006 0:00:00 FLOOD 0 0 1.15e+11
##
## $Crops
## BGN_DATE EVTYPE FATALITIES INJURIES DMGVALUED
## 366694 9/15/1999 0:00:00 HURRICANE 0 0 5e+11
The following panel plot shows the Economic Damage in Properties and Crops each weather event in the US
p3<-ggplot(Top10Prop, aes(x=reorder(EVTYPE,-DAMAGE), y=DAMAGE/10e9, fill=EVTYPE)) +
geom_bar(stat="identity")+guides(fill=FALSE)+
scale_y_continuous("Damage Value (Billions of Dollars)") +
xlab("Weather Type")+
ggtitle("Economic Damage Properties\nby Weather in the U.S\n")+
theme(axis.text.x=element_text(angle=45,hjust=1),
plot.title=element_text(size=rel(1.4)))
p4<-ggplot(Top10Crop, aes(x=reorder(EVTYPE,-DAMAGE), y=DAMAGE/10e9, fill=EVTYPE))+
geom_bar(stat="identity")+guides(fill=FALSE)+
scale_y_continuous("Damage Value (Billions of Dollars)") +
xlab("Weather Type")+
ggtitle("Economic Damage Crops\nby Weather in the U.S\n")+
theme(axis.text.x=element_text(angle=45,hjust=1),
plot.title=element_text(size=rel(1.4)))
grid.arrange(p3,p4,ncol=2)
The Flood is the weather event with more economic losses in properties and the Hurricanes and/or Typhoons affect mainly the crops in the U.S.