This report is about the problems like fatalities, injuries, property damage and crop damage resulting from severe weather events. The aim is to prove which of the events produce the worst consequences. In order to acomplishe this I will expore the U.S. National Oceanic and Atmosferic Administrations’ storm database. The data was dowloaded from the course web site storm data.
The downloaded data is in the working directory as a comma-separated-value file compressed via the bzip2 algorithm and is read with:
stormdata <- read.csv("stormdata") # it takes a few minutes to read
dim(stormdata) # check on the file
## [1] 902297 37
The records are from 1950 to November 2011. At the beginning there was recorded only one type of event. Lately there were more types of events recorded. I will look for the years with the most number of records. Besides, in terms of the value of the damages inflation makes the comparison from 1950 to 2011 difficult to do. It is supposed that the technical means for communication and record of the events are much better in the last years.
stormdata <- tbl_df(stormdata) # prints nicer
# create a new variable - the year of the record
stormdata <- mutate(stormdata, date = as.Date(BGN_DATE, "%m/%d/%Y"),
year = as.numeric(substr(date, 1, 4)))
mydata <- select(stormdata, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP,
CROPDMG, CROPDMGEXP, year) # select the relevant variables
rm(stormdata) # free some memory
y <- as.data.frame(table(mydata$year)) # this shows the number of records in each year
names(y) <- c("year", "events")
with(y, plot(year, events, main = "Number of records per year", xlab = "year", ylab = "number of records"))
The above plot shows that in the mid 1990’s the number of record per year has dramatically increased.
Selecting the start year to be used in the analyse:
quantile(y$events)
## 0% 25% 50% 75% 100%
## 223.00 2388.25 5766.00 28502.50 62174.00
startyear <- filter(y, events >quantile(y$events)[[4]])[1,1]
startyear
## [1] 1996
## 62 Levels: 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 ... 2011
Starting with 1996 the number of events recorded each year is much greater than in the previous years. I will base the analysis on the events recorded from 1996 to 2011:
mydata <- filter(mydata, year >= 1996)
dim(mydata)
## [1] 653530 8
This shows that about 2/3 of the total number of records are in the selected interval. We are interested only in the events that caused damages like fatalities, injuries, property damage, crop damage, so I will filter the respective records:
mydata <- filter(mydata, FATALITIES>0 | INJURIES>0 | PROPDMG>0 | CROPDMG>0)
For property damages the value results by multiplying from PROPDMG variable with the magnitude coded in the variable PROPDMGEXP as K for thousands, M for millions, B for billions. The value will be stored in the variable PROPDMGVAL. The same with crop damages, which will be stored in the variable CROPDMGVAL:
expdf <- tbl_df(cbind(c("", "0", "K", "M", "B"), c(0, 0, 3, 6, 9)))
names(expdf) <- c("symbol", "exp")
mydata <- merge(mydata, expdf, by.x = "PROPDMGEXP", by.y = "symbol") %>%
mutate(PROPDMGVAL = PROPDMG*10^as.numeric(exp))%>%
select(-PROPDMG, -PROPDMGEXP, -exp)
mydata <- merge(mydata, expdf, by.x = "CROPDMGEXP", by.y = "symbol") %>%
mutate(CROPDMGVAL = CROPDMG*10^as.numeric(exp))%>%
select(-CROPDMG, -CROPDMGEXP, -exp)
The fatalities, injuries, property damage and crop damage are summarized across the selected period for each type of event. The yearly average is computed for each event.
years <- 16 # the number of years, set by hand (!)
damages <- summarize(group_by(mydata, EVTYPE), fatalities = sum(FATALITIES)/years,
injuries = sum(INJURIES)/years,
propdmg = sum(PROPDMGVAL)/years,
cropdmg = sum(CROPDMGVAL)/years)
summary(damages)
## EVTYPE fatalities injuries
## HIGH SURF ADVISORY : 1 Min. : 0.0000 Min. : 0.00
## FLASH FLOOD : 1 1st Qu.: 0.0000 1st Qu.: 0.00
## TSTM WIND : 1 Median : 0.0000 Median : 0.00
## TSTM WIND (G45) : 1 Mean : 2.4583 Mean : 16.32
## AGRICULTURAL FREEZE : 1 3rd Qu.: 0.1875 3rd Qu.: 0.75
## ASTRONOMICAL HIGH TIDE: 1 Max. :112.3125 Max. :1291.69
## (Other) :216
## propdmg cropdmg
## Min. :0.000e+00 Min. : 0
## 1st Qu.:2.660e+02 1st Qu.: 0
## Median :3.906e+03 Median : 0
## Mean :1.033e+08 Mean : 9783989
## 3rd Qu.:2.129e+05 3rd Qu.: 859
## Max. :8.997e+09 Max. :835472875
##
First I will show the events producing above the average damages, in descending order for each type of damage:
topfatalities <- arrange(filter(damages, fatalities > mean(fatalities)),desc(fatalities))
topfatalities
## # A tibble: 28 × 5
## EVTYPE fatalities injuries propdmg cropdmg
## <fctr> <dbl> <dbl> <dbl> <dbl>
## 1 EXCESSIVE HEAT 112.3125 399.4375 482731.2 30775125.00
## 2 TORNADO 94.4375 1291.6875 1538559106.9 17714063.12
## 3 FLASH FLOOD 55.4375 104.6250 951387744.4 83431356.25
## 4 LIGHTNING 40.6875 258.8125 46442317.5 431152.50
## 5 FLOOD 25.8750 422.3750 8996552096.9 310923650.00
## 6 RIP CURRENT 21.2500 13.0625 62.5 0.00
## 7 TSTM WIND 15.0625 226.8125 279876652.5 34619709.38
## 8 HEAT 14.8125 76.3750 95000.0 11031.25
## 9 HIGH WIND 14.6875 67.6875 327991272.5 39597581.25
## 10 AVALANCHE 13.9375 9.7500 231987.5 0.00
## # ... with 18 more rows
topinjuries <- arrange(filter(damages, injuries > mean(injuries)),desc(injuries))
topinjuries
## # A tibble: 23 × 5
## EVTYPE fatalities injuries propdmg cropdmg
## <fctr> <dbl> <dbl> <dbl> <dbl>
## 1 TORNADO 94.4375 1291.6875 1538559106.9 17714063.12
## 2 FLOOD 25.8750 422.3750 8996552096.9 310923650.00
## 3 EXCESSIVE HEAT 112.3125 399.4375 482731.2 30775125.00
## 4 LIGHTNING 40.6875 258.8125 46442317.5 431152.50
## 5 TSTM WIND 15.0625 226.8125 279876652.5 34619709.38
## 6 FLASH FLOOD 55.4375 104.6250 951387744.4 83431356.25
## 7 THUNDERSTORM WIND 8.1250 87.5000 211415902.5 24895687.50
## 8 WINTER STORM 11.9375 80.7500 95796453.1 746500.00
## 9 HURRICANE/TYPHOON 4.0000 79.6875 4331615000.0 162992050.00
## 10 HEAT 14.8125 76.3750 95000.0 11031.25
## # ... with 13 more rows
toppropdmg <- arrange(filter(damages, propdmg > mean(propdmg)),desc(propdmg))
toppropdmg
## # A tibble: 15 × 5
## EVTYPE fatalities injuries propdmg cropdmg
## <fctr> <dbl> <dbl> <dbl> <dbl>
## 1 FLOOD 25.8750 422.3750 8996552097 310923650.0
## 2 HURRICANE/TYPHOON 4.0000 79.6875 4331615000 162992050.0
## 3 STORM SURGE 0.1250 2.3125 2699596000 312.5
## 4 TORNADO 94.4375 1291.6875 1538559107 17714063.1
## 5 FLASH FLOOD 55.4375 104.6250 951387744 83431356.2
## 6 HAIL 0.4375 44.5625 912196464 154751840.6
## 7 HURRICANE 3.8125 2.8750 738301188 171338125.0
## 8 TROPICAL STORM 3.5625 21.1250 477654722 42356937.5
## 9 HIGH WIND 14.6875 67.6875 327991273 39597581.2
## 10 WILDFIRE 4.6875 56.9375 297416688 18467050.0
## 11 STORM SURGE/TIDE 0.6875 0.3125 290074250 53125.0
## 12 TSTM WIND 15.0625 226.8125 279876653 34619709.4
## 13 ICE STORM 5.1250 19.8750 227640551 978750.0
## 14 THUNDERSTORM WIND 8.1250 87.5000 211415903 24895687.5
## 15 WILD/FOREST FIRE 0.7500 34.0625 187611406 6673895.6
topcropdmg <- arrange(filter(damages, cropdmg > mean(cropdmg)),desc(cropdmg))
topcropdmg
## # A tibble: 16 × 5
## EVTYPE fatalities injuries propdmg cropdmg
## <fctr> <dbl> <dbl> <dbl> <dbl>
## 1 DROUGHT 0.0000 0.2500 65381312.5 835472875
## 2 FLOOD 25.8750 422.3750 8996552096.9 310923650
## 3 HURRICANE 3.8125 2.8750 738301188.1 171338125
## 4 HURRICANE/TYPHOON 4.0000 79.6875 4331615000.0 162992050
## 5 HAIL 0.4375 44.5625 912196463.8 154751841
## 6 FLASH FLOOD 55.4375 104.6250 951387744.4 83431356
## 7 EXTREME COLD 7.0625 4.9375 1235025.0 80560813
## 8 FROST/FREEZE 0.0000 0.0000 592500.0 68380375
## 9 HEAVY RAIN 5.8750 14.3750 36554027.5 45510613
## 10 TROPICAL STORM 3.5625 21.1250 477654721.9 42356938
## 11 HIGH WIND 14.6875 67.6875 327991272.5 39597581
## 12 TSTM WIND 15.0625 226.8125 279876652.5 34619709
## 13 EXCESSIVE HEAT 112.3125 399.4375 482731.2 30775125
## 14 THUNDERSTORM WIND 8.1250 87.5000 211415902.5 24895688
## 15 WILDFIRE 4.6875 56.9375 297416687.5 18467050
## 16 TORNADO 94.4375 1291.6875 1538559106.9 17714063
The events that are producing fatalities, injuries, property damage and crop damage all above the average are listed below:
topdmg <- Reduce(intersect, list(topfatalities$EVTYPE, topinjuries$EVTYPE, toppropdmg$EVTYPE, topcropdmg$EVTYPE))
topdmg
## [1] "TORNADO" "FLASH FLOOD" "FLOOD"
## [4] "TSTM WIND" "HIGH WIND" "THUNDERSTORM WIND"
## [7] "WILDFIRE" "HURRICANE/TYPHOON" "TROPICAL STORM"
The results above outline the most damageing events for each of the main damage type alone or for all four together.