Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
After analysing processing and analysing the data we came up with a ranking of the worst weather events across USA. By each state, we found out which event is most harmful with respect to population health, and which has the greatest economic consequences.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
-National Weather Service Storm Data Documentation
-National Climatic Data Center Storm Events FAQ
Let us start by downloading the data and its documentation. I included documentation from Ire.org
# Downloading the Data
url <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists("repdata_data_StormData.csv.bz2")) {
download.file(url,
destfile = "repdata_data_StormData.csv.bz2")
}
file.info("repdata_data_StormData.csv.bz2")$mtime
## [1] "2014-09-19 21:23:49 VET"
# Documentation needed
if (!file.exists("PA2 Documentation")) {
dir.create("PA2 Documentation")
download.file("http://ire.org/media/uploads/files/datalibrary/samplefiles/Storm%20Events/readme_08.doc",
destfile = "PA2 Documentation/readme.doc")
download.file("http://ire.org/media/uploads/files/datalibrary/samplefiles/Storm%20Events/layout08.doc",
destfile = "PA2 Documentation/layout.doc")
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf",
destfile = "PA2 Documentation/Documentation.pdf")
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf",
destfile = "PA2 Documentation/FAQ.pdf")
file.info("PA2 Documentation/readme.doc")$mtime
file.info("PA2 Documentation/layout.doc")$mtime
file.info("PA2 Documentation/Documentation.pdf")$mtime
file.info("PA2 Documentation/FAQ.pdf")$mtime
}
Now let’s load the data in R
# Reading the data
storm_data <- read.csv("repdata_data_StormData.csv.bz2")
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
To take a look at this we need to adequately format the variable BGN_DATE(Date the storm event began). And due to the data is incomplete before 1996 we eliminate the records corresponding to that time
# Preprocessing the data
names(storm_data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
str(storm_data$BGN_DATE)
## Factor w/ 16335 levels "10/10/1954 0:00:00",..: 6523 6523 4213 11116 1426 1426 1462 2873 3980 3980 ...
storm_data$BGN_DATE <- as.Date(storm_data$BGN_DATE, "%m/%d/%Y")
str(storm_data$BGN_DATE)
## Date[1:902297], format: "1950-04-18" "1950-04-18" "1951-02-20" "1951-06-08" ...
storm_data <- storm_data[storm_data$BGN_DATE>="1996-01-01",]
For analysing the events across the USA, the variables EVTYPE(Type of storm event) and STATE(State postal code variable) will be used
str(storm_data$EVTYPE)
## Factor w/ 985 levels "?","ABNORMALLY DRY",..: 972 830 854 854 854 238 354 854 854 854 ...
sort(unique(storm_data$EVTYPE))
## [1] ABNORMALLY DRY ABNORMALLY WET
## [3] ABNORMAL WARMTH ACCUMULATED SNOWFALL
## [5] AGRICULTURAL FREEZE ASTRONOMICAL HIGH TIDE
## [7] ASTRONOMICAL LOW TIDE AVALANCHE
## [9] Beach Erosion BEACH EROSION
## [11] BITTER WIND CHILL BITTER WIND CHILL TEMPERATURES
## [13] Black Ice BLACK ICE
## [15] BLIZZARD Blizzard Summary
## [17] BLOWING DUST blowing snow
## [19] Blowing Snow BLOW-OUT TIDE
## [21] BLOW-OUT TIDES BRUSH FIRE
## [23] COASTAL EROSION Coastal Flood
## [25] COASTALFLOOD COASTAL FLOOD
## [27] COASTAL FLOOD coastal flooding
## [29] Coastal Flooding COASTAL FLOODING
## [31] COASTAL FLOODING/EROSION COASTAL FLOODING/EROSION
## [33] Coastal Storm COASTALSTORM
## [35] COASTAL STORM Cold
## [37] COLD Cold and Frost
## [39] COLD AND FROST COLD AND SNOW
## [41] Cold Temperature COLD TEMPERATURES
## [43] COLD WEATHER COLD/WIND CHILL
## [45] COLD WIND CHILL TEMPERATURES COOL SPELL
## [47] CSTL FLOODING/EROSION Damaging Freeze
## [49] DAMAGING FREEZE DAM BREAK
## [51] DENSE FOG DENSE SMOKE
## [53] DOWNBURST DRIEST MONTH
## [55] Drifting Snow DROUGHT
## [57] DROWNING DRY
## [59] DRY CONDITIONS DRY MICROBURST
## [61] DRYNESS DRY SPELL
## [63] DRY WEATHER DUST DEVEL
## [65] Dust Devil DUST DEVIL
## [67] DUST STORM Early Frost
## [69] EARLY RAIN Early snowfall
## [71] EARLY SNOWFALL Erosion/Cstl Flood
## [73] Excessive Cold EXCESSIVE HEAT
## [75] EXCESSIVE HEAT/DROUGHT EXCESSIVELY DRY
## [77] EXCESSIVE RAIN EXCESSIVE RAINFALL
## [79] EXCESSIVE SNOW Extended Cold
## [81] Extreme Cold EXTREME COLD
## [83] EXTREME COLD/WIND CHILL EXTREMELY WET
## [85] EXTREME WINDCHILL EXTREME WIND CHILL
## [87] EXTREME WINDCHILL TEMPERATURES FALLING SNOW/ICE
## [89] FIRST FROST FIRST SNOW
## [91] FLASH FLOOD FLASH FLOOD
## [93] FLASH FLOOD/FLOOD FLASH FLOODING
## [95] Flood FLOOD
## [97] Flood/Flash Flood FLOOD/FLASH/FLOOD
## [99] Flood/Strong Wind FOG
## [101] Freeze FREEZE
## [103] Freezing drizzle Freezing Drizzle
## [105] FREEZING DRIZZLE Freezing Fog
## [107] FREEZING FOG Freezing rain
## [109] Freezing Rain FREEZING RAIN
## [111] FREEZING RAIN/SLEET Freezing Spray
## [113] Frost FROST
## [115] Frost/Freeze FROST/FREEZE
## [117] Funnel Cloud FUNNEL CLOUD
## [119] FUNNEL CLOUDS Glaze
## [121] GLAZE gradient wind
## [123] Gradient wind GRADIENT WIND
## [125] GUSTY LAKE WIND GUSTY THUNDERSTORM WIND
## [127] GUSTY THUNDERSTORM WINDS Gusty Wind
## [129] GUSTY WIND GUSTY WIND/HAIL
## [131] GUSTY WIND/HVY RAIN Gusty wind/rain
## [133] Gusty winds Gusty Winds
## [135] GUSTY WINDS HAIL
## [137] Hail(0.75) HAIL/WIND
## [139] HARD FREEZE HAZARDOUS SURF
## [141] HEAT Heatburst
## [143] Heat Wave HEAT WAVE
## [145] Heavy Precipitation Heavy rain
## [147] Heavy Rain HEAVY RAIN
## [149] Heavy Rain and Wind HEAVY RAIN EFFECTS
## [151] HEAVY RAINFALL Heavy Rain/High Surf
## [153] HEAVY RAIN/WIND HEAVY SEAS
## [155] HEAVY SNOW Heavy snow shower
## [157] HEAVY SNOW SQUALLS Heavy Surf
## [159] HEAVY SURF Heavy surf and wind
## [161] HEAVY SURF/HIGH SURF HIGH SEAS
## [163] High Surf HIGH SURF
## [165] HIGH SURF ADVISORIES HIGH SURF ADVISORY
## [167] HIGH SURF ADVISORY HIGH SWELLS
## [169] HIGH SWELLS HIGH WATER
## [171] High Wind HIGH WIND
## [173] HIGH WIND (G40) HIGH WINDS
## [175] Hot and Dry HOT SPELL
## [177] HOT WEATHER HURRICANE
## [179] Hurricane Edouard HURRICANE/TYPHOON
## [181] HYPERTHERMIA/EXPOSURE Hypothermia/Exposure
## [183] HYPOTHERMIA/EXPOSURE ICE
## [185] Ice Fog ICE JAM
## [187] Ice jam flood (minor ICE ON ROAD
## [189] ICE PELLETS ICE ROADS
## [191] Ice/Snow ICE/SNOW
## [193] ICE STORM Icestorm/Blizzard
## [195] Icy Roads ICY ROADS
## [197] Lake Effect Snow LAKE EFFECT SNOW
## [199] LAKE-EFFECT SNOW LAKESHORE FLOOD
## [201] LANDSLIDE LANDSLIDES
## [203] Landslump LANDSLUMP
## [205] LANDSPOUT LATE FREEZE
## [207] LATE SEASON HAIL LATE SEASON SNOW
## [209] Late-season Snowfall Late Season Snowfall
## [211] LATE SNOW LIGHT FREEZING RAIN
## [213] LIGHTNING LIGHTNING
## [215] Light snow Light Snow
## [217] LIGHT SNOW Light Snowfall
## [219] Light Snow/Flurries LIGHT SNOW/FREEZING PRECIP
## [221] LOCALLY HEAVY RAIN Marine Accident
## [223] MARINE HAIL MARINE HIGH WIND
## [225] MARINE STRONG WIND MARINE THUNDERSTORM WIND
## [227] MARINE TSTM WIND Metro Storm, May 26
## [229] Microburst Mild and Dry Pattern
## [231] Minor Flooding MIXED PRECIP
## [233] Mixed Precipitation MIXED PRECIPITATION
## [235] MODERATE SNOW MODERATE SNOWFALL
## [237] MONTHLY PRECIPITATION Monthly Rainfall
## [239] MONTHLY RAINFALL Monthly Snowfall
## [241] MONTHLY SNOWFALL MONTHLY TEMPERATURE
## [243] Mountain Snows Mudslide
## [245] MUDSLIDE MUD SLIDE
## [247] MUDSLIDE/LANDSLIDE Mudslides
## [249] MUDSLIDES NONE
## [251] NON SEVERE HAIL NON-SEVERE WIND DAMAGE
## [253] NON TSTM WIND NON-TSTM WIND
## [255] NORTHERN LIGHTS No Severe Weather
## [257] Other OTHER
## [259] PATCHY DENSE FOG PATCHY ICE
## [261] Prolong Cold PROLONG COLD
## [263] PROLONGED RAIN PROLONG WARMTH
## [265] RAIN Rain Damage
## [267] RAIN (HEAVY) RAIN/SNOW
## [269] Record Cold RECORD COLD
## [271] RECORD COLD RECORD COOL
## [273] Record dry month RECORD DRYNESS
## [275] Record Heat RECORD HEAT
## [277] Record High RECORD LOW RAINFALL
## [279] Record May Snow RECORD PRECIPITATION
## [281] RECORD RAINFALL RECORD SNOW
## [283] RECORD SNOWFALL Record temperature
## [285] RECORD TEMPERATURE Record Temperatures
## [287] RECORD TEMPERATURES RECORD WARM
## [289] RECORD WARM TEMPS. Record Warmth
## [291] RECORD WARMTH Record Winter Snow
## [293] RED FLAG CRITERIA RED FLAG FIRE WX
## [295] REMNANTS OF FLOYD RIP CURRENT
## [297] RIP CURRENTS RIVER FLOOD
## [299] River Flooding RIVER FLOODING
## [301] ROCK SLIDE ROGUE WAVE
## [303] ROUGH SEAS ROUGH SURF
## [305] Saharan Dust SAHARAN DUST
## [307] Seasonal Snowfall SEICHE
## [309] SEVERE THUNDERSTORM SEVERE THUNDERSTORMS
## [311] SLEET SLEET/FREEZING RAIN
## [313] SLEET STORM small hail
## [315] Small Hail SMALL HAIL
## [317] Sml Stream Fld SMOKE
## [319] Snow SNOW
## [321] Snow Accumulation SNOW ADVISORY
## [323] Snow and Ice SNOW AND ICE
## [325] Snow and sleet SNOW AND SLEET
## [327] SNOW/BLOWING SNOW SNOW DROUGHT
## [329] SNOW/FREEZING RAIN SNOW/ICE
## [331] SNOWMELT FLOODING SNOW SHOWERS
## [333] SNOW/SLEET SNOW SQUALL
## [335] Snow squalls Snow Squalls
## [337] SNOW SQUALLS STORM SURGE
## [339] STORM SURGE/TIDE STREET FLOODING
## [341] Strong Wind STRONG WIND
## [343] STRONG WIND GUST Strong winds
## [345] Strong Winds STRONG WINDS
## [347] Summary August 10 Summary August 11
## [349] Summary August 17 Summary August 21
## [351] Summary August 2-3 Summary August 28
## [353] Summary August 4 Summary August 7
## [355] Summary August 9 Summary Jan 17
## [357] Summary July 23-24 Summary June 18-19
## [359] Summary June 5-6 Summary June 6
## [361] Summary: Nov. 16 Summary: Nov. 6-7
## [363] Summary: Oct. 20-21 Summary: October 31
## [365] Summary of April 12 Summary of April 13
## [367] Summary of April 21 Summary of April 27
## [369] Summary of April 3rd Summary of August 1
## [371] Summary of July 11 Summary of July 2
## [373] Summary of July 22 Summary of July 26
## [375] Summary of July 29 Summary of July 3
## [377] Summary of June 10 Summary of June 11
## [379] Summary of June 12 Summary of June 13
## [381] Summary of June 15 Summary of June 16
## [383] Summary of June 18 Summary of June 23
## [385] Summary of June 24 Summary of June 3
## [387] Summary of June 30 Summary of June 4
## [389] Summary of June 6 Summary of March 14
## [391] Summary of March 23 Summary of March 24
## [393] SUMMARY OF MARCH 24-25 SUMMARY OF MARCH 27
## [395] SUMMARY OF MARCH 29 Summary of May 10
## [397] Summary of May 13 Summary of May 14
## [399] Summary of May 22 Summary of May 22 am
## [401] Summary of May 22 pm Summary of May 26 am
## [403] Summary of May 26 pm Summary of May 31 am
## [405] Summary of May 31 pm Summary of May 9-10
## [407] Summary: Sept. 18 Summary Sept. 25-26
## [409] Summary September 20 Summary September 23
## [411] Summary September 3 Summary September 4
## [413] Temperature record Thundersnow shower
## [415] THUNDERSTORM THUNDERSTORMS
## [417] Thunderstorm Wind THUNDERSTORM WIND
## [419] THUNDERSTORM WIND (G40) Tidal Flooding
## [421] TIDAL FLOODING TORNADO
## [423] TORNADO DEBRIS Torrential Rainfall
## [425] TROPICAL DEPRESSION TROPICAL STORM
## [427] TSTM TSTM HEAVY RAIN
## [429] Tstm Wind TSTM WIND
## [431] TSTM WIND TSTM WIND 40
## [433] TSTM WIND (41) TSTM WIND 45
## [435] TSTM WIND AND LIGHTNING TSTM WIND (G35)
## [437] TSTM WIND (G40) TSTM WIND (G45)
## [439] TSTM WIND G45 TSTM WIND (G45)
## [441] TSTM WIND (G45) TSTM WIND/HAIL
## [443] TSTM WINDS TSTM WND
## [445] TSUNAMI TYPHOON
## [447] Unseasonable Cold UNSEASONABLY COLD
## [449] UNSEASONABLY COOL UNSEASONABLY COOL & WET
## [451] UNSEASONABLY DRY UNSEASONABLY HOT
## [453] UNSEASONABLY WARM UNSEASONABLY WARM AND DRY
## [455] UNSEASONABLY WARM & WET UNSEASONABLY WARM/WET
## [457] UNSEASONABLY WARM YEAR UNSEASONABLY WET
## [459] UNSEASONAL LOW TEMP UNSEASONAL RAIN
## [461] UNUSUALLY COLD UNUSUALLY LATE SNOW
## [463] UNUSUALLY WARM UNUSUAL/RECORD WARMTH
## [465] UNUSUAL WARMTH Urban flood
## [467] Urban Flood URBAN FLOOD
## [469] Urban Flooding URBAN/SMALL STRM FLDG
## [471] URBAN/SML STREAM FLD URBAN/SML STREAM FLDG
## [473] URBAN/STREET FLOODING VERY DRY
## [475] VERY WARM VOG
## [477] Volcanic Ash VOLCANIC ASH
## [479] VOLCANIC ASHFALL Volcanic Ash Plume
## [481] VOLCANIC ERUPTION WAKE LOW WIND
## [483] WALL CLOUD WARM WEATHER
## [485] WATERSPOUT WATERSPOUT
## [487] WATERSPOUTS wet micoburst
## [489] WET MICROBURST Wet Month
## [491] Wet Year Whirlwind
## [493] WHIRLWIND WILDFIRE
## [495] WILD/FOREST FIRE Wind
## [497] WIND WIND
## [499] WIND ADVISORY WIND AND WAVE
## [501] WIND CHILL Wind Damage
## [503] WIND DAMAGE WIND GUSTS
## [505] WINDS WINTER MIX
## [507] WINTER STORM Winter Weather
## [509] WINTER WEATHER WINTER WEATHER MIX
## [511] WINTER WEATHER/MIX WINTERY MIX
## [513] Wintry mix Wintry Mix
## [515] WINTRY MIX WND
## 985 Levels: ? ABNORMALLY DRY ABNORMALLY WET ... WND
str(storm_data$STATE)
## Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
head(unique(storm_data$STATE))
## [1] AL AK AZ AR CO CA
## 72 Levels: AK AL AM AN AR AS AZ CA CO CT DC DE FL GA GM GU HI IA ID ... XX
Variables FATALITIES and INJURIES will be used for this purpose.
summary(storm_data$FATALITIES)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 0.00 0.01 0.00 158.00
summary(storm_data$INJURIES)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 0.0 0.0 0.1 0.0 1150.0
Let’s create a data frame with the variables of interest.
health_data = data.frame(storm_data$STATE,
as.character(storm_data$EVTYPE),
storm_data$FATALITIES,
storm_data$INJURIES)
health_data = cbind(health_data,
storm_data$FATALITIES+storm_data$INJURIES)
colnames(health_data) <- c("STATE",
"EVTYPE",
"FATALITIES",
"INJURIES",
"TOT_HARM")
str(health_data)
## 'data.frame': 653530 obs. of 5 variables:
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 516 levels "ABNORMALLY DRY",..: 507 422 431 431 431 136 172 431 431 431 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 0 0 ...
## $ INJURIES : num 0 0 0 0 0 0 0 0 0 0 ...
## $ TOT_HARM : num 0 0 0 0 0 0 0 0 0 0 ...
summary(health_data)
## STATE EVTYPE FATALITIES
## TX : 51335 HAIL :207715 Min. : 0.00
## KS : 38649 TSTM WIND :128662 1st Qu.: 0.00
## OK : 26980 THUNDERSTORM WIND: 81402 Median : 0.00
## MO : 25802 FLASH FLOOD : 50999 Mean : 0.01
## IA : 22712 FLOOD : 24247 3rd Qu.: 0.00
## IL : 21215 TORNADO : 23154 Max. :158.00
## (Other):466837 (Other) :137351
## INJURIES TOT_HARM
## Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 0.0 Median : 0.0
## Mean : 0.1 Mean : 0.1
## 3rd Qu.: 0.0 3rd Qu.: 0.0
## Max. :1150.0 Max. :1308.0
##
NOAA will often leave fields blank until finalized numbers are available. For instance, damage amount is listed as $0 for a hail storm that damaged the cars and roofs of at least a third of Columbia, Mo, in March 2006. So we consider only those events where FATALITIES+INJURIES > 0
health_data <- health_data[health_data[4] > 0,]
We now proceed to find out the total harm of each event across USA since 1996.
library(plyr)
sum_health_data = ddply(health_data,
.(STATE,EVTYPE),
summarise,
SUM_HARM=sum(TOT_HARM)
)
And in each state we find out which event was the most harmful.
ranked_health_data <- ddply(sum_health_data,
.(STATE),
summarise,
EVTYPE=EVTYPE[which.max(SUM_HARM)],
MAX_HARM=max(SUM_HARM)
)
Let’s take a look at the result.
library(lattice)
xyplot(ranked_health_data$MAX_HARM ~ ranked_health_data$STATE,
main="Most harmful events across USA for population health", type="l",
xlab="STATE",
ylab="Events Harm",
scales=list(x=list(rot=90)))
ranked_health_data[order(-ranked_health_data$MAX_HARM),]
## STATE EVTYPE MAX_HARM
## 51 TX FLOOD 6368
## 30 MO EXCESSIVE HEAT 3609
## 2 AL TORNADO 3471
## 50 TN TORNADO 2227
## 42 OK TORNADO 1745
## 5 AR TORNADO 1507
## 14 GA TORNADO 1265
## 31 MS TORNADO 955
## 13 FL HURRICANE/TYPHOON 815
## 33 NC TORNADO 746
## 8 CA WILDFIRE 654
## 20 IN TORNADO 584
## 23 LA TORNADO 501
## 22 KY TORNADO 484
## 19 IL TORNADO 478
## 21 KS TORNADO 476
## 53 VA TORNADO 462
## 17 IA TORNADO 458
## 44 PA EXCESSIVE HEAT 456
## 26 MD EXCESSIVE HEAT 450
## 52 UT WINTER STORM 432
## 25 MA TORNADO 406
## 15 GU HURRICANE/TYPHOON 334
## 11 DC EXCESSIVE HEAT 322
## 37 NJ EXCESSIVE HEAT 318
## 28 MI HEAT 315
## 41 OH TORNADO 301
## 29 MN TORNADO 274
## 48 SC TORNADO 252
## 49 SD TORNADO 227
## 9 CO LIGHTNING 224
## 57 WI TORNADO 214
## 40 NY LIGHTNING 199
## 7 AZ FLASH FLOOD 168
## 6 AS TSUNAMI 161
## 35 NE TORNADO 131
## 59 WY WINTER STORM 123
## 34 ND BLIZZARD 103
## 36 NH LIGHTNING 79
## 18 ID THUNDERSTORM WIND 74
## 12 DE HIGH SURF 63
## 27 ME LIGHTNING 59
## 43 OR HIGH WIND 53
## 56 WA HIGH WIND 53
## 10 CT LIGHTNING 52
## 38 NM TORNADO 52
## 39 NV FLOOD 52
## 1 AK ICE STORM 34
## 32 MT WILD/FOREST FIRE 33
## 58 WV LIGHTNING 33
## 3 AM MARINE THUNDERSTORM WIND 27
## 16 HI STRONG WIND 21
## 4 AN MARINE STRONG WIND 19
## 47 RI LIGHTNING 18
## 55 VT TSTM WIND 17
## 45 PR HEAVY RAIN 12
## 46 PZ MARINE STRONG WIND 5
## 24 LM MARINE THUNDERSTORM WIND 3
## 54 VI LIGHTNING 2
We can see that, for instance, in Texas the most harmful events are the floods which has taken 6368 victims(death and injuries) since 1996.
head(sort(table(ranked_health_data$EVTYPE),decreasing=T))
##
## TORNADO LIGHTNING EXCESSIVE HEAT FLOOD
## 22 8 5 2
## HIGH WIND HURRICANE/TYPHOON
## 2 2
Form here, Since 1996, Tornadoes have been the most frequent weather events across USA, causing damages in 22 states.
We will use the following variables:
summary(storm_data$PROPDMG)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 0 12 1 5000
table(storm_data$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5
## 276185 0 0 0 1 0 0 0 0 0
## 6 7 8 B h H K m M
## 0 0 0 32 0 0 369938 0 7374
summary(storm_data$CROPDMG)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 0.0 0.0 1.8 0.0 990.0
table(storm_data$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 373069 0 0 0 4 0 278686 0 1771
There are multipliers which are empty because the damage for those record were populated with zero as information was unavailable. Let’s make a data frame with the variables of interest and resolve this matter.
economic_data = data.frame(storm_data$STATE,
as.character(storm_data$EVTYPE),
storm_data$PROPDMG,
storm_data$PROPDMGEXP,
storm_data$CROPDMG,
storm_data$CROPDMGEXP
)
names(economic_data) <- c("STATE", "EVTYPE", "PROPDMG",
"PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
economic_data=economic_data[economic_data[3] > 0 | economic_data[5] > 0,]
summary(economic_data)
## STATE EVTYPE PROPDMG
## TX : 15834 TSTM WIND :61317 Min. : 0
## IA : 13173 THUNDERSTORM WIND:42914 1st Qu.: 2
## OH : 10828 HAIL :22611 Median : 8
## MS : 9980 FLASH FLOOD :18724 Mean : 39
## AL : 9372 TORNADO :12147 3rd Qu.: 25
## GA : 9100 FLOOD : 9409 Max. :5000
## (Other):126238 (Other) :27403
## PROPDMGEXP CROPDMG CROPDMGEXP
## K :183449 Min. : 0.0 :97949
## M : 7363 1st Qu.: 0.0 K :94815
## : 3681 Median : 0.0 M : 1759
## B : 32 Mean : 6.2 B : 2
## - : 0 3rd Qu.: 0.0 ? : 0
## ? : 0 Max. :990.0 0 : 0
## (Other): 0 (Other): 0
We can derive an event’s damage using the multipliers
unique(economic_data$PROPDMGEXP)
## [1] K M B
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
x <- gsub(pattern = "K",replacement = 1000,
x = economic_data$PROPDMGEXP)
x <- gsub(pattern = "M",replacement = 1000000,
x)
x <- gsub(pattern = "B",replacement = 1000000000,
x)
x <- as.integer(x)
x[is.na(x)] <- 0
unique(x)
## [1] 1e+03 1e+06 0e+00 1e+09
PROPCASH = economic_data$PROPDMG * x
summary(PROPCASH)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00e+00 2.00e+03 1.00e+04 1.89e+06 3.00e+04 1.15e+11
unique(economic_data$CROPDMGEXP)
## [1] K M B
## Levels: ? 0 2 B k K m M
y <- gsub(pattern = "K",replacement = 1000,
x = economic_data$CROPDMGEXP)
y <- gsub(pattern = "M",replacement = 1000000,
y)
y <- gsub(pattern = "B",replacement = 1000000000,
y)
y <- as.integer(y)
y[is.na(y)] <- 0
unique(y)
## [1] 1e+03 0e+00 1e+06 1e+09
CROPCASH = economic_data$CROPDMG * y
summary(CROPCASH)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00e+00 0.00e+00 0.00e+00 1.79e+05 0.00e+00 1.51e+09
TOTCASH = CROPCASH + PROPCASH
economic_data = cbind(economic_data,TOTCASH)
The total damage an cause is the sum of property damage and crop damage. Let’s calculate the total damage each event caused across the united states in total since 1996.
sum_economic_data = ddply(economic_data,
.(STATE,EVTYPE),
summarise,
SUM_TOTCASH=sum(TOTCASH)
)
ranked_economic_data <- ddply(sum_economic_data,
.(STATE),
summarise,
EVTYPE=EVTYPE[which.max(SUM_TOTCASH)],
MAX_CASH=max(SUM_TOTCASH)
)
ranked_economic_data[order(-ranked_economic_data$MAX_CASH),]
## STATE EVTYPE MAX_CASH
## 8 CA FLOOD 1.170e+11
## 24 LA STORM SURGE 3.174e+10
## 13 FL HURRICANE/TYPHOON 2.855e+10
## 35 MS HURRICANE/TYPHOON 1.501e+10
## 57 TX DROUGHT 6.722e+09
## 37 NC HURRICANE 6.405e+09
## 2 AL TORNADO 5.032e+09
## 56 TN FLOOD 4.249e+09
## 38 ND FLOOD 3.970e+09
## 34 MO TORNADO 3.655e+09
## 18 IA FLOOD 2.908e+09
## 7 AZ HAIL 2.829e+09
## 50 PR HURRICANE 2.275e+09
## 41 NJ FLOOD 2.112e+09
## 44 NY FLASH FLOOD 1.755e+09
## 46 OK TORNADO 1.740e+09
## 5 AR TORNADO 1.561e+09
## 21 IN FLOOD 1.533e+09
## 42 NM WILD/FOREST FIRE 1.510e+09
## 39 NE HAIL 1.472e+09
## 9 CO HAIL 1.456e+09
## 48 PA FLASH FLOOD 1.406e+09
## 33 MN FLOOD 1.393e+09
## 45 OH FLASH FLOOD 1.258e+09
## 63 WI FLASH FLOOD 1.125e+09
## 61 VT FLOOD 1.083e+09
## 14 GA TORNADO 9.750e+08
## 20 IL FLASH FLOOD 8.196e+08
## 22 KS TORNADO 7.415e+08
## 47 OR FLOOD 7.294e+08
## 43 NV FLOOD 6.757e+08
## 23 KY HAIL 6.082e+08
## 16 GU TYPHOON 6.011e+08
## 30 MD TROPICAL STORM 5.392e+08
## 59 VA HURRICANE/TYPHOON 5.266e+08
## 64 WV FLASH FLOOD 4.762e+08
## 29 MA TORNADO 4.602e+08
## 32 MI TORNADO 3.354e+08
## 58 UT FLOOD 3.319e+08
## 31 ME ICE STORM 3.182e+08
## 62 WA FLOOD 2.120e+08
## 1 AK FLOOD 1.571e+08
## 17 HI FLASH FLOOD 1.562e+08
## 53 SC ICE STORM 1.482e+08
## 11 DC TROPICAL STORM 1.276e+08
## 54 SD HAIL 1.244e+08
## 36 MT HAIL 1.222e+08
## 19 ID FLOOD 1.141e+08
## 65 WY HAIL 1.130e+08
## 52 RI FLOOD 9.286e+07
## 6 AS TSUNAMI 8.102e+07
## 40 NH ICE STORM 6.493e+07
## 10 CT TROPICAL STORM 6.000e+07
## 12 DE COASTAL FLOOD 4.010e+07
## 60 VI HURRICANE 2.822e+07
## 3 AM WATERSPOUT 5.102e+06
## 15 GM MARINE TSTM WIND 3.226e+06
## 26 LM MARINE TSTM WIND 1.205e+06
## 28 LS MARINE TSTM WIND 4.000e+05
## 4 AN MARINE THUNDERSTORM WIND 1.690e+05
## 51 PZ MARINE STRONG WIND 7.600e+04
## 27 LO MARINE TSTM WIND 5.000e+04
## 49 PK MARINE HIGH WIND 3.100e+04
## 25 LE MARINE THUNDERSTORM WIND 2.500e+04
## 55 SL MARINE TSTM WIND 1.500e+04
xyplot(ranked_economic_data$MAX_CASH ~ ranked_economic_data$STATE,
main="Most harmful events across USA for economy", type="l",
xlab="STATE",
ylab="Events Harm",
scales=list(x=list(rot=90)))
head(sort(table(ranked_economic_data$EVTYPE),decreasing=T))
##
## FLOOD TORNADO FLASH FLOOD HAIL
## 15 8 7 7
## MARINE TSTM WIND HURRICANE
## 5 3
In the last tables, we can see that floods are the most common events regarding economic consequences from weather events. And California has suffered more property or crop damage since 1996, where floods has taken approximately $1.170e+11 in loss.