Storms and other severe weather events can cause both public health and economic problems for communities and municipalities.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The data analysis will find the (1) Types of events that are most harmful with respect to population health across the US and (2) Types of events thathave the greatest economic consequences across the US
By aggregating the data by storm events type, the result shows that (1) Tornados are the most harmfull events on population health (2) Floods are responsible for the most economic damage.
The data for this project come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The file can be downloaded from the address:
Storm Data https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
There is also some documentation of the database available.
National Weather Service Storm Data Documentation https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf
National Climatic Data Center Storm Events FAQ https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
The Storm Data file needs to be downloaded onto the working directory of the R Studio, if otherwise, the file will be downloaded via the following code
if(!file.exists("repdata-data-StormData.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "repdata-data-StormData.csv.bz2")}
The file will be read
input <- read.csv("repdata-data-StormData.csv.bz2")
head(input)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
unique(input$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
input$PROPEXP <- 0
input$PROPVAL <- 0
input$PROPEXP[input$PROPDMGEXP == ""] <- 1
input$PROPEXP[input$PROPDMGEXP == "0"] <- 1
input$PROPEXP[input$PROPDMGEXP == "1"] <- 10
input$PROPEXP[input$PROPDMGEXP == "2"] <- 100
input$PROPEXP[input$PROPDMGEXP == "3"] <- 1000
input$PROPEXP[input$PROPDMGEXP == "4"] <- 10000
input$PROPEXP[input$PROPDMGEXP == "5"] <- 1e+05
input$PROPEXP[input$PROPDMGEXP == "6"] <- 1e+06
input$PROPEXP[input$PROPDMGEXP == "7"] <- 1e+07
input$PROPEXP[input$PROPDMGEXP == "8"] <- 1e+08
input$PROPEXP[input$PROPDMGEXP == "B"] <- 1e+09
input$PROPEXP[input$PROPDMGEXP == "h"] <- 100
input$PROPEXP[input$PROPDMGEXP == "H"] <- 100
input$PROPEXP[input$PROPDMGEXP == "K"] <- 1000
input$PROPEXP[input$PROPDMGEXP == "m"] <- 1e+06
input$PROPEXP[input$PROPDMGEXP == "M"] <- 1e+06
input$PROPEXP[input$PROPDMGEXP == "+"] <- 0
input$PROPEXP[input$PROPDMGEXP == "-"] <- 0
input$PROPEXP[input$PROPDMGEXP == "?"] <- 0
input$PROPDMGVAL <- input$PROPDMG * input$PROPEXP
unique(input$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
input$CROPEXP <- 0
input$CROPVAL <- 0
input$CROPEXP[input$CROPDMGEXP == ""] <- 1
input$CROPEXP[input$CROPDMGEXP == "0"] <- 1
input$CROPEXP[input$CROPDMGEXP == "2"] <- 100
input$CROPEXP[input$CROPDMGEXP == "B"] <- 1e+09
input$CROPEXP[input$CROPDMGEXP == "k"] <- 1000
input$CROPEXP[input$CROPDMGEXP == "K"] <- 1000
input$CROPEXP[input$CROPDMGEXP == "M"] <- 1e+06
input$CROPEXP[input$CROPDMGEXP == "m"] <- 1e+06
input$CROPEXP[input$CROPDMGEXP == "?"] <- 0
input$CROPDMGVAL <- input$CROPDMG * input$CROPEXP
fatal <-aggregate(input$FATALITIES,list(input$EVTYPE),sum,na.rm=TRUE)
colnames(fatal) <- c("Event","Fatalities")
fatal[order(-fatal$Fatalities),][1:8,]
## Event Fatalities
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
injury <-aggregate(input$INJURIES,list(input$EVTYPE),sum,na.rm=TRUE)
colnames(injury) <- c("Event","Injuries")
injury[order(-injury$`Injuries`),][1:8,]
## Event Injuries
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
risk <-aggregate(input$INJURIES+input$FATALITIES,list(input$EVTYPE),sum,na.rm=TRUE)
colnames(risk) <- c("Event","Health")
plot1<-risk[order(-risk$'Health'),][1:8,]
plot1$Health <- plot1$Health/1000
barplot(plot1$Health, las = 2, names.arg = plot1$Event, main = "Events with Highest Fatalities & Injuries", ylab = "Total fatalities & injuries ('000) ", col = "blue")
Tornados are the most harmful events to population health. It was followed by Excessive Heat and Thunderstorm wind.
propdmg <-aggregate(input$PROPDMGVAL,list(input$EVTYPE),sum,na.rm=TRUE)
colnames(propdmg) <- c("Event","PropDamage")
propdmg[order(-propdmg$PropDamage),][1:8,]
## Event PropDamage
## 170 FLOOD 144657709807
## 411 HURRICANE/TYPHOON 69305840000
## 834 TORNADO 56947380617
## 670 STORM SURGE 43323536000
## 153 FLASH FLOOD 16822673979
## 244 HAIL 15735267513
## 402 HURRICANE 11868319010
## 848 TROPICAL STORM 7703890550
cropdmg <-aggregate(input$CROPDMGVAL,list(input$EVTYPE),sum,na.rm=TRUE)
colnames(cropdmg) <- c("Event","CropDamage")
cropdmg[order(-cropdmg$CropDamage),][1:8,]
## Event CropDamage
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 590 RIVER FLOOD 5029459000
## 427 ICE STORM 5022113500
## 244 HAIL 3025954473
## 402 HURRICANE 2741910000
## 411 HURRICANE/TYPHOON 2607872800
## 153 FLASH FLOOD 1421317100
propcropdmg <-aggregate(input$PROPDMGVAL+input$CROPDMGVAL,list(input$EVTYPE),sum,na.rm=TRUE)
colnames(propcropdmg) <- c("Event","PropCropDmg")
plot2<-propcropdmg[order(-propcropdmg$PropCropDmg),][1:8,]
plot2$PropCropDmg <- plot2$PropCropDmg/1000000000
barplot(plot2$PropCropDmg, las = 2, names.arg = plot2$Event, main = "Events with Highest Property & Crop Damages", ylab = "Total Property & Crop Damages (Billions $)", col = "blue")
Floods caused the greatest economic consequences whereas the second major events that caused the greatest economic consequences were Hurricanes/Typhoons.