Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
We will explore the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database and answer some basic questions about severe weather events and their relationship with population health and economic consequences. The events in the database start in the year 1950 and end in November 2011.
packages <- c("maps", "knitr", "scatterplot3d", "fields")
diff <- setdiff(packages, rownames(installed.packages()))
if (length(diff > 0)) {
install.packages(diff)
}
library(scatterplot3d)
library(fields)
library(maps)
Loading the data and showing names of columns
if (!exists("sdata")) {
sdata <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))
}
names(sdata)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
Damage data is recorded with two variables: PROPDMG (number) and PROPDMGEXP (exponent). We will define a function that transforms this into a single value:
union(levels(sdata$PROPDMGEXP), levels(sdata$CROPDMGEXP))
## [1] "" "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M" "k"
realmoney <- function(x, exp) {
if (is.integer(exp))
return(x * 10^exp)
if (exp %in% c("", "-"))
return(x)
if (exp %in% c("b", "B"))
return(x * 1e+09)
if (exp %in% c("m", "M"))
return(x * 1e+06)
if (exp %in% c("h", "H"))
return(x * 1e+06)
if (exp %in% c("k", "K"))
return(x * 1000)
return(ceiling(x))
}
sdata$PROP <- mapply(realmoney, x = sdata$PROPDMG, exp = sdata$PROPDMGEXP)
sdata$CROP <- mapply(realmoney, x = sdata$CROPDMG, exp = sdata$CROPDMGEXP)
sdata$COST <- sdata$PROP + sdata$CROP
How many injuries has each event type provoked? (30 most harmful event types showed)
totinj <- aggregate(sdata$INJURIES, by = list(EVTYPE = sdata$EVTYPE), sum)
totinjnz <- totinj[which(totinj$x > 0), ]
head(totinjnz[with(totinjnz, order(-x)), ], n = 30)
## EVTYPE x
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
## 972 WINTER STORM 1321
## 411 HURRICANE/TYPHOON 1275
## 359 HIGH WIND 1137
## 310 HEAVY SNOW 1021
## 957 WILDFIRE 911
## 786 THUNDERSTORM WINDS 908
## 30 BLIZZARD 805
## 188 FOG 734
## 955 WILD/FOREST FIRE 545
## 117 DUST STORM 440
## 978 WINTER WEATHER 398
## 89 DENSE FOG 342
## 848 TROPICAL STORM 340
## 278 HEAT WAVE 309
## 376 HIGH WINDS 302
## 586 RIP CURRENTS 297
## 676 STRONG WIND 280
## 290 HEAVY RAIN 251
## 585 RIP CURRENT 232
## 140 EXTREME COLD 231
We conclude that Tornados are the most dangerous events for the population
How much damage (both property and cost) has each event type provoked? (30 most costly event types showed)
totcost <- aggregate(sdata$COST, by = list(EVTYPE = sdata$EVTYPE), sum)
head(totcost[with(totcost, order(-x)), ], n = 30)
## EVTYPE x
## 170 FLOOD 1.503e+11
## 411 HURRICANE/TYPHOON 7.191e+10
## 834 TORNADO 5.735e+10
## 670 STORM SURGE 4.332e+10
## 244 HAIL 1.876e+10
## 153 FLASH FLOOD 1.756e+10
## 95 DROUGHT 1.502e+10
## 402 HURRICANE 1.461e+10
## 590 RIVER FLOOD 1.015e+10
## 427 ICE STORM 8.967e+09
## 848 TROPICAL STORM 8.382e+09
## 972 WINTER STORM 6.715e+09
## 359 HIGH WIND 5.909e+09
## 957 WILDFIRE 5.061e+09
## 856 TSTM WIND 5.039e+09
## 671 STORM SURGE/TIDE 4.642e+09
## 760 THUNDERSTORM WIND 3.898e+09
## 409 HURRICANE OPAL 3.192e+09
## 955 WILD/FOREST FIRE 3.109e+09
## 298 HEAVY RAIN/SEVERE WEATHER 2.500e+09
## 786 THUNDERSTORM WINDS 1.947e+09
## 842 TORNADOES, TSTM WIND, HAIL 1.602e+09
## 290 HEAVY RAIN 1.428e+09
## 140 EXTREME COLD 1.361e+09
## 604 SEVERE THUNDERSTORM 1.206e+09
## 212 FROST/FREEZE 1.104e+09
## 310 HEAVY SNOW 1.067e+09
## 464 LIGHTNING 9.408e+08
## 30 BLIZZARD 7.713e+08
## 376 HIGH WINDS 6.490e+08
We conclude that Floods are the most expensive events in terms of total damage (property and crop)
This is a comparative map of the spatial distribution of injuries by state.
First we match the states:
stateinj <- aggregate(sdata$INJURIES, by = list(state = sdata$STATE), sum)
data(state.fips)
mat2 <- match(state.fips$abb, stateinj$state)
stateinj <- stateinj[mat2, ]
Now we create the map:
col1 <- heat.colors(100)
logsc <- log(stateinj$x)
mi <- min(logsc)
ma <- max(logsc)
colores <- col1[ceiling(1 + 99 * (logsc - mi)/(ma - mi))]
map("state", fill = TRUE, col = colores, boundary = FALSE)
title("Total injuries")
ticks <- c(500, 1000, 5000, 10000, 15000)
image.plot(legend.only = TRUE, zlim = c(mi, ma), nlevel = 100, col = col1, axis.args = list(at = log(ticks),
labels = ticks))
This is the number of injuries by state:
unique(stateinj)
## state x
## 2 AL 8742
## 7 AZ 968
## 5 AR 5550
## 8 CA 3278
## 9 CO 1004
## 10 CT 897
## 12 DE 338
## 11 DC 383
## 13 FL 5918
## 14 GA 5061
## 19 ID 273
## 20 IL 5563
## 21 IN 4720
## 18 IA 2892
## 22 KS 3449
## 23 KY 3480
## 24 LA 3215
## 33 ME 177
## 32 MD 1537
## 31 MA 2121
## 35 MI 4586
## 36 MN 2282
## 38 MS 6675
## 37 MO 8998
## 39 MT 181
## 42 NE 1471
## 46 NV 232
## 43 NH 195
## 44 NJ 1152
## 45 NM 385
## 47 NY 1340
## 40 NC 3415
## 41 ND 608
## 48 OH 7112
## 49 OK 5710
## 50 OR 225
## 51 PA 3223
## 57 RI 48
## 58 SC 1786
## 59 SD 868
## 62 TN 5202
## 63 TX 17667
## 64 UT 1070
## 67 VT 71
## 65 VA 1703
## 68 WA 753
## 70 WV 363
## 69 WI 2309
## 71 WY 432
This is a comparative map of the spatial distribution of damage cost by state (US dollars).
First we match the states:
statecost <- aggregate(sdata$COST, by = list(state = sdata$STATE), sum)
data(state.fips)
mat2 <- match(state.fips$abb, statecost$state)
statecost <- statecost[mat2, ]
Now we create the map:
col1 <- heat.colors(100)
logsc <- log(statecost$x)
mi <- min(logsc)
ma <- max(logsc)
colores <- col1[ceiling(1 + 99 * (logsc - mi)/(ma - mi))]
map("state", fill = TRUE, col = colores, boundary = FALSE)
title("Total Cost $")
ticks <- c(1e+08, 1e+09, 1e+10, 1e+11)
image.plot(legend.only = TRUE, zlim = c(mi, ma), nlevel = 100, col = col1, axis.args = list(at = log(ticks),
labels = ticks))
This is the cost by state in dollars:
unique(statecost)
## state x
## 2 AL 1.785e+10
## 7 AZ 3.961e+09
## 5 AR 4.557e+09
## 8 CA 1.271e+11
## 9 CO 2.890e+09
## 10 CT 7.616e+08
## 12 DE 1.790e+08
## 11 DC 1.578e+08
## 13 FL 4.541e+10
## 14 GA 6.028e+09
## 19 ID 2.675e+08
## 20 IL 1.409e+10
## 21 IN 4.891e+09
## 18 IA 1.019e+10
## 22 KS 5.062e+09
## 23 KY 3.038e+09
## 24 LA 6.130e+10
## 33 ME 5.623e+08
## 32 MD 1.300e+09
## 31 MA 1.280e+09
## 35 MI 2.687e+09
## 36 MN 5.702e+09
## 38 MS 3.642e+10
## 37 MO 7.932e+09
## 39 MT 4.053e+08
## 42 NE 5.300e+09
## 46 NV 8.408e+08
## 43 NH 2.204e+08
## 44 NJ 3.295e+09
## 45 NM 1.981e+09
## 47 NY 4.972e+09
## 40 NC 1.028e+10
## 41 ND 5.866e+09
## 48 OH 7.250e+09
## 49 OK 6.720e+09
## 50 OR 1.065e+09
## 51 PA 5.423e+09
## 57 RI 1.206e+08
## 58 SC 1.268e+09
## 59 SD 8.521e+08
## 62 TN 6.583e+09
## 63 TX 3.394e+10
## 64 UT 7.982e+08
## 67 VT 1.538e+09
## 65 VA 2.532e+09
## 68 WA 1.413e+09
## 70 WV 1.027e+09
## 69 WI 4.203e+09
## 71 WY 2.221e+08