This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. My major focus are on the top 10 to 30 events that have had ove the course of the year a major economic and health impact. The table displays the summary of the findings. Storms and other severe weather events have different economic impact in property damages and crops. This study is aimed to be transparent and reproducible for further research.
The packages used for the analysis are as follows in order to analyze and display the data. Also the retrieving of the data. and its treatment of dates.
library(data.table)
library(lubridate)
library(ggplot2)
library(GGally)
library(broom)
library(lucr)
library(cowplot)
library(gridExtra)
library(knitr)
dt <- data.table(read.csv("repdata%2Fdata%2FStormData.csv.bz2", stringsAsFactors = FALSE))
dt[,BGN_DATE := mdy_hms(BGN_DATE)]
The cookbook mentions that the Alphabetical characters used for the damage expenses signify magnitude, include “K” for thousands, “M” for millions, and “B” for billions.
#correcting the magnitude of expenses
dt[,PROPDMG:=ifelse(PROPDMGEXP=="K",PROPDMG*1000,ifelse(PROPDMGEXP=="M",PROPDMG*1000000,ifelse(PROPDMGEXP=="B",PROPDMG*1000000000,PROPDMG)))]
dt[,CROPDMG:=ifelse(CROPDMGEXP=="K",CROPDMG*1000,ifelse(CROPDMGEXP=="M",CROPDMG*1000000,ifelse(PROPDMGEXP=="B",CROPDMG*1000000000,CROPDMG)))]
There are 985 events categorized by the National Weather Service. Many different kinds of storms. I have selected those that have been repeated more than 50 times. THe idea was to group similar events, but the task was too large and would not be accurate enough. I only show the names of the events ordered by most frequent to least.
events <- factor(tolower(dt$EVTYPE))
# 985 levels
tab <- table(events)
sort(tab, T)[1:50]
## events
## hail tstm wind thunderstorm wind
## 288661 219942 82564
## tornado flash flood flood
## 60652 54277 25327
## thunderstorm winds high wind lightning
## 20843 20214 15754
## heavy snow heavy rain winter storm
## 15708 11742 11433
## winter weather funnel cloud marine tstm wind
## 7045 6844 6175
## marine thunderstorm wind waterspout strong wind
## 5812 3796 3569
## urban/sml stream fld wildfire blizzard
## 3392 2761 2719
## drought ice storm excessive heat
## 2488 2006 1678
## high winds wild/forest fire frost/freeze
## 1533 1457 1343
## dense fog winter weather/mix tstm wind/hail
## 1293 1104 1028
## extreme cold/wind chill heat high surf
## 1002 767 734
## tropical storm flash flooding extreme cold
## 690 682 657
## coastal flood lake-effect snow flood/flash flood
## 656 636 625
## snow landslide cold/wind chill
## 617 600 539
## fog rip current marine hail
## 538 470 442
## dust storm avalanche wind
## 427 386 346
## rip currents storm surge
## 304 261
The table summary of the data below shows a partial result that will further be analyzed.
summarized <- dt[,.(Fatalities= sum(FATALITIES),Injuries= sum(INJURIES),Crop_Damage = to_currency(sum(CROPDMG), currency_symbol = "$"),Property_Damage = to_currency(sum(PROPDMG), currency_symbol ="$")), by=EVTYPE][order(-Fatalities),]
kable(summarized[Fatalities>1&Injuries>1,])
| EVTYPE | Fatalities | Injuries | Crop_Damage | Property_Damage |
|---|---|---|---|---|
| TORNADO | 5633 | 91346 | $414,953,270.00 | $56,925,660,790.20 |
| EXCESSIVE HEAT | 1903 | 6525 | $492,402,000.00 | $7,753,700.00 |
| FLASH FLOOD | 978 | 1777 | $1,421,317,100.00 | $16,140,812,067.10 |
| HEAT | 937 | 2100 | $1,461,500.40 | $1,797,000.00 |
| LIGHTNING | 816 | 5230 | $12,092,090.00 | $928,659,446.70 |
| TSTM WIND | 504 | 6957 | $554,007,350.00 | $4,484,928,495.00 |
| FLOOD | 470 | 6789 | $5,661,968,450.00 | $144,657,709,807.00 |
| RIP CURRENT | 368 | 232 | $0.00 | $1,000.00 |
| HIGH WIND | 248 | 1137 | $638,571,300.00 | $5,270,046,295.00 |
| AVALANCHE | 224 | 170 | $0.00 | $3,721,800.00 |
| WINTER STORM | 206 | 1321 | $26,944,000.00 | $6,688,497,251.00 |
| RIP CURRENTS | 204 | 297 | $0.00 | $162,000.00 |
| HEAT WAVE | 172 | 309 | $5,550,000.00 | $10,460,050.00 |
| EXTREME COLD | 160 | 231 | $1,292,973,000.00 | $67,737,400.00 |
| THUNDERSTORM WIND | 133 | 1488 | $414,843,050.00 | $3,483,121,284.00 |
| HEAVY SNOW | 127 | 1021 | $134,653,100.00 | $932,589,141.70 |
| EXTREME COLD/WIND CHILL | 125 | 24 | $50,000.00 | $8,648,000.00 |
| STRONG WIND | 103 | 280 | $64,953,500.00 | $175,241,450.00 |
| BLIZZARD | 101 | 805 | $112,060,000.00 | $659,213,950.00 |
| HIGH SURF | 101 | 152 | $0.00 | $89,575,000.00 |
| HEAVY RAIN | 98 | 251 | $733,399,800.00 | $694,248,090.00 |
| EXTREME HEAT | 96 | 155 | $5,000,000.00 | $115,000.00 |
| COLD/WIND CHILL | 95 | 12 | $600,000.00 | $1,990,000.00 |
| ICE STORM | 89 | 1975 | $22,113,505.00 | $3,944,927,860.00 |
| WILDFIRE | 75 | 911 | $295,472,800.00 | $4,765,114,000.00 |
| THUNDERSTORM WINDS | 64 | 908 | $190,650,792.00 | $1,733,459,025.70 |
| HURRICANE/TYPHOON | 64 | 1275 | $2,607,872,800.00 | $69,305,840,000.00 |
| FOG | 62 | 734 | $0.00 | $13,155,500.00 |
| HURRICANE | 61 | 46 | $2,741,910,000.00 | $11,868,319,010.00 |
| TROPICAL STORM | 58 | 340 | $678,346,000.00 | $7,703,890,550.00 |
| HEAVY SURF/HIGH SURF | 42 | 48 | $0.00 | $9,870,000.00 |
| LANDSLIDE | 38 | 52 | $20,017,000.00 | $324,596,000.00 |
| HIGH WINDS | 35 | 302 | $40,720,600.00 | $608,323,748.00 |
| COLD | 35 | 48 | $0.00 | $500,000.00 |
| WINTER WEATHER | 33 | 398 | $15,000,000.00 | $20,866,000.00 |
| TSUNAMI | 33 | 129 | $20,000.00 | $144,062,000.00 |
| URBAN/SML STREAM FLD | 28 | 79 | $8,488,100.00 | $58,309,650.00 |
| WINTER WEATHER/MIX | 28 | 72 | $0.00 | $6,372,000.00 |
| WIND | 23 | 86 | $300,000.00 | $8,684,500.00 |
| DUST STORM | 22 | 440 | $3,100,000.00 | $5,549,000.00 |
| FLASH FLOODING | 19 | 8 | $15,106,060.00 | $307,763,604.00 |
| DENSE FOG | 18 | 342 | $0.00 | $9,674,000.00 |
| FLOOD/FLASH FLOOD | 17 | 15 | $95,034,000.00 | $174,039,009.20 |
| EXTREME WINDCHILL | 17 | 5 | $17,000,000.00 | $755,000.00 |
| HAIL | 15 | 1361 | $3,025,537,890.00 | $15,727,367,052.60 |
| MARINE STRONG WIND | 14 | 22 | $0.00 | $418,330.00 |
| STORM SURGE | 13 | 38 | $5,000.00 | $43,323,536,000.00 |
| WILD/FOREST FIRE | 12 | 545 | $106,796,830.00 | $3,001,829,500.00 |
| UNSEASONABLY WARM | 11 | 17 | $10,000.00 | $0.00 |
| STORM SURGE/TIDE | 11 | 5 | $850,000.00 | $4,641,188,000.00 |
| WINTER STORMS | 10 | 17 | $500,000.00 | $500,000.00 |
| MARINE THUNDERSTORM WIND | 10 | 26 | $50,000.00 | $436,400.00 |
| MARINE TSTM WIND | 9 | 8 | $0.00 | $5,421,000.00 |
| TROPICAL STORM GORDON | 8 | 43 | $500,000.00 | $500,000.00 |
| ROUGH SEAS | 8 | 5 | $0.00 | $0.00 |
| FREEZING RAIN | 7 | 23 | $0.00 | $8,111,500.00 |
| MARINE MISHAP | 7 | 5 | $0.00 | $0.00 |
| STRONG WINDS | 7 | 21 | $5,000,000.00 | $2,314,790.00 |
| HEAVY SURF | 7 | 40 | $0.00 | $1,390,000.00 |
| GLAZE | 7 | 216 | $0.00 | $910,000.00 |
| FLOODING | 6 | 2 | $8,855,500.00 | $108,255,006.00 |
| ICE | 6 | 137 | $0.00 | $12,655,000.00 |
| SNOW | 5 | 29 | $10,000.00 | $14,762,550.00 |
| HIGH SEAS | 5 | 8 | $0.00 | $15,500.00 |
| ICY ROADS | 5 | 31 | $0.00 | $341,200.00 |
| TSTM WIND/HAIL | 5 | 95 | $64,696,250.00 | $44,335,500.00 |
| GUSTY WINDS | 4 | 8 | $200,000.00 | $343,000.00 |
| RAIN/SNOW | 4 | 2 | $0.00 | $0.00 |
| HEAT WAVE DROUGHT | 4 | 15 | $50,000.00 | $200,000.00 |
| Mudslide | 4 | 2 | $0.00 | $0.00 |
| WATERSPOUT | 3 | 29 | $0.00 | $9,353,700.00 |
| COASTAL FLOOD | 3 | 2 | $0.00 | $237,665,560.00 |
| WILD FIRES | 3 | 150 | $0.00 | $624,100,000.00 |
| DRY MICROBURST | 3 | 28 | $15,000.00 | $6,732,600.00 |
| WATERSPOUT/TORNADO | 3 | 42 | $0.00 | $51,110,000.00 |
| HIGH WINDS/SNOW | 3 | 6 | $0.00 | $150,000.00 |
| HIGH WIND AND SEAS | 3 | 20 | $0.00 | $50,000.00 |
| High Surf | 3 | 4 | $0.00 | $380,000.00 |
| RIVER FLOOD | 2 | 2 | $5,029,459,000.00 | $5,118,945,500.00 |
| DUST DEVIL | 2 | 42 | $0.00 | $700,330.00 |
| FREEZING DRIZZLE | 2 | 15 | $0.00 | $75,000.00 |
| RECORD HEAT | 2 | 50 | $0.00 | $0.00 |
| SNOW SQUALL | 2 | 35 | $0.00 | $680,000.00 |
| EXCESSIVE RAINFALL | 2 | 21 | $0.00 | $0.00 |
| MIXED PRECIP | 2 | 26 | $0.00 | $0.00 |
The figure displays all the events which are too long to see in an Q4 paper.
health <- dt[,.(injuries= sum(INJURIES),fatalities= sum(FATALITIES)),by=EVTYPE][injuries>1 & fatalities>1]
health.m <- melt(health)
## Warning in melt.data.table(health): To be consistent with reshape2's melt,
## id.vars and measure.vars are internally guessed when both are 'NULL'. All
## non-numeric/integer/logical type columns are conisdered id.vars, which in
## this case are columns [EVTYPE]. Consider providing at least one of 'id' or
## 'measure' vars in future.
g <- ggplot(health.m, aes(EVTYPE, value/1000)) + xlab("Event types") + ylab("Casualties by the thousands ")
g <- g + geom_bar(aes(fill=variable),stat="identity") +ggtitle("Casualties by all events")
g <- g + theme(axis.text.y=element_text(size=4))+ coord_flip()
g
Fig 1:All Events
From the graph above I further studied the top 3 events, tornado, excessive heat and flashfoods in terms of fatalities. in terms of injuries it is different.
##plotting the most important throughout event the years
tornado <- dt[EVTYPE=="TORNADO",.(injuries= sum(INJURIES),fatalities=sum(FATALITIES)),
by=BGN_DATE]
tornado.m <- melt(tornado,id.vars=1)
g1 <- ggplot(tornado.m, aes(as.numeric(year(BGN_DATE)), value)) + xlab("Event types") + ylab("Casualties") + ggtitle("Yearly Tornado Casualties")
g1 <- g1 + geom_bar(aes(fill=variable),stat="identity")
g1 <- g1 + theme(axis.text.x=element_text(angle=90, hjust =1))
##Excessive Heat
heat <- dt[EVTYPE=="EXCESSIVE HEAT",.(injuries= sum(INJURIES),fatalities=sum(FATALITIES)),
by=BGN_DATE]
heat.m <- melt(heat,id.vars=1)
g2 <- ggplot(heat.m, aes(as.numeric(year(BGN_DATE)), value)) + xlab("Event types") + ylab("Casualties") + ggtitle("Yearly Excessive Heat Casualties")
g2 <- g2 + geom_bar(aes(fill=variable),stat="identity")
g2 <- g2 + theme(axis.text.x=element_text(angle=90, hjust =1))
##Flashfloods
floods <- dt[EVTYPE=="FLASH FLOOD",.(injuries= sum(INJURIES),fatalities=sum(FATALITIES)),
by=BGN_DATE]
floods.m <- melt(floods,id.vars=1)
g3 <- ggplot(floods.m, aes(as.numeric(year(BGN_DATE)), value)) + xlab("Event types") + ylab("Casualties") + ggtitle("Yearly Flash Floods Casualties")
g3 <- g3 + geom_bar(aes(fill=variable),stat="identity")
g3 <- g3 + theme(axis.text.x=element_text(angle=90, hjust =1))
grid.arrange(g1,g2,g3,ncol=2, nrow=2, layout_matrix = rbind(c(1,1), c(2,3)))
Fig 2:combination of casualties per event through time
summarized <- dt[,.(Fatalities= sum(FATALITIES),Injuries= sum(INJURIES),Crop_Damage = sum(CROPDMG),Property_Damage = sum(PROPDMG)), by=EVTYPE][order(-Fatalities),]
croppy <- summarized
proppy <- summarized
croppy <- croppy[order(-Crop_Damage),][1:10,][,.(EVTYPE,Fatalities,Injuries,Crop_Damage = to_currency(Crop_Damage, currency_symbol = "$"),Property_Damage = to_currency(Property_Damage, currency_symbol = "$"))]
proppy <- proppy[order(-Property_Damage),][1:10,][,.(EVTYPE,Fatalities,Injuries,Crop_Damage = to_currency(Crop_Damage, currency_symbol = "$"),Property_Damage = to_currency(Property_Damage, currency_symbol = "$"))]
kable(croppy)
| EVTYPE | Fatalities | Injuries | Crop_Damage | Property_Damage |
|---|---|---|---|---|
| DROUGHT | 0 | 4 | $12,472,566,001.50 | $ 1,046,106,000.00 |
| FLOOD | 470 | 6789 | $ 5,661,968,450.00 | $144,657,709,807.00 |
| RIVER FLOOD | 2 | 2 | $ 5,029,459,000.00 | $ 5,118,945,500.00 |
| HAIL | 15 | 1361 | $ 3,025,537,890.00 | $ 15,727,367,052.60 |
| HURRICANE | 61 | 46 | $ 2,741,910,000.00 | $ 11,868,319,010.00 |
| HURRICANE/TYPHOON | 64 | 1275 | $ 2,607,872,800.00 | $ 69,305,840,000.00 |
| FLASH FLOOD | 978 | 1777 | $ 1,421,317,100.00 | $ 16,140,812,067.10 |
| EXTREME COLD | 160 | 231 | $ 1,292,973,000.00 | $ , 67,737,400.00 |
| FROST/FREEZE | 0 | 0 | $ 1,094,086,000.00 | $ , 9,480,000.00 |
| HEAVY RAIN | 98 | 251 | $ ,733,399,800.00 | $ ,694,248,090.00 |
| * Top ten Property D | amage: |
#reflecting the real dollars to the damages
kable(proppy)
| EVTYPE | Fatalities | Injuries | Crop_Damage | Property_Damage |
|---|---|---|---|---|
| FLOOD | 470 | 6789 | $5,661,968,450.00 | $144,657,709,807.00 |
| HURRICANE/TYPHOON | 64 | 1275 | $2,607,872,800.00 | $ 69,305,840,000.00 |
| TORNADO | 5633 | 91346 | $ ,414,953,270.00 | $ 56,925,660,790.20 |
| STORM SURGE | 13 | 38 | $ , , 5,000.00 | $ 43,323,536,000.00 |
| FLASH FLOOD | 978 | 1777 | $1,421,317,100.00 | $ 16,140,812,067.10 |
| HAIL | 15 | 1361 | $3,025,537,890.00 | $ 15,727,367,052.60 |
| HURRICANE | 61 | 46 | $2,741,910,000.00 | $ 11,868,319,010.00 |
| TROPICAL STORM | 58 | 340 | $ ,678,346,000.00 | $ 7,703,890,550.00 |
| WINTER STORM | 206 | 1321 | $ , 26,944,000.00 | $ 6,688,497,251.00 |
| HIGH WIND | 248 | 1137 | $ ,638,571,300.00 | $ 5,270,046,295.00 |
summarized <- dt[,.(Fatalities= sum(FATALITIES),Injuries= sum(INJURIES),Crop_Damage = sum(CROPDMG),Property_Damage = sum(PROPDMG)), by=EVTYPE][order(-Fatalities),]
summarized.m <- melt(summarized[,c(1,4,5)],id.vars=1)
summarized.m <- summarized.m[value>100000]
g4 <- ggplot(summarized.m, aes(EVTYPE,value/1000000000)) + xlab("Event types") + ylab("Damage Expenses in Billions of Dollars")
g4 <- g4+ geom_bar(aes(fill=variable),stat="identity")+ ggtitle("All Economic Damages by Event")
g4 <- g4 + theme(axis.text.y=element_text(size = 4)) + coord_flip()
#top 30 total economic damages
summarized.m.1 <- summarized.m[,sum(value),by= EVTYPE][order(-V1)][1:30,]
g5 <- ggplot(summarized.m.1, aes(EVTYPE,V1/1000000000)) + xlab("Event types") + ylab("Damage Expenses in Billions of Dollars")
g5 <- g5+ geom_bar(stat="identity") + ggtitle("Top 30 Economic Damages by Event")
g5 <- g5 + theme(axis.text.y=element_text(size = 4)) + coord_flip()
grid.arrange(g4,g5,nrow=2)
Fig 3:Damages Expenses by event