1. Synopsis

This report is about the problems like fatalities, injuries, property damage and crop damage resulting from severe weather events. The aim is to prove which of the events produce the worst consequences. In order to acomplishe this I will expore the U.S. National Oceanic and Atmosferic Administrations’ storm database. The data was dowloaded from the course web site storm data.

2. Data processing

2.1 Reading the data

The downloaded data is in the working directory as a comma-separated-value file compressed via the bzip2 algorithm and is read with:

stormdata <- read.csv("stormdata") # it takes a few minutes to read
dim(stormdata) # check on the file
## [1] 902297     37

2.2 Selecting the relevant records

The records are from 1950 to November 2011. At the beginning there was recorded only one type of event. Lately there were more types of events recorded. I will look for the years with the most number of records. Besides, in terms of the value of the damages inflation makes the comparison from 1950 to 2011 difficult to do. It is supposed that the technical means for communication and record of the events are much better in the last years.

stormdata <- tbl_df(stormdata) # prints nicer

# create a new variable - the year of the record
stormdata <- mutate(stormdata, date = as.Date(BGN_DATE, "%m/%d/%Y"),
                     year = as.numeric(substr(date, 1, 4)))
mydata <- select(stormdata, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP,
                    CROPDMG, CROPDMGEXP, year) # select the relevant variables
rm(stormdata) # free some memory
y <- as.data.frame(table(mydata$year)) # this shows the number of records in each year
names(y) <- c("year", "events")
with(y, plot(year, events, main = "Number of records per year", xlab = "year", ylab = "number of records"))

The above plot shows that in the mid 1990’s the number of record per year has dramatically increased.

Selecting the start year to be used in the analyse:

quantile(y$events)
##       0%      25%      50%      75%     100% 
##   223.00  2388.25  5766.00 28502.50 62174.00
startyear <- filter(y, events >quantile(y$events)[[4]])[1,1]
startyear
## [1] 1996
## 62 Levels: 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 ... 2011

Starting with 1996 the number of events recorded each year is much greater than in the previous years. I will base the analysis on the events recorded from 1996 to 2011:

mydata <- filter(mydata, year >= 1996)
dim(mydata)
## [1] 653530      8

This shows that about 2/3 of the total number of records are in the selected interval. We are interested only in the events that caused damages like fatalities, injuries, property damage, crop damage, so I will filter the respective records:

mydata <- filter(mydata, FATALITIES>0 | INJURIES>0 | PROPDMG>0 | CROPDMG>0)

2.3 Computing the value of damages

For property damages the value results by multiplying from PROPDMG variable with the magnitude coded in the variable PROPDMGEXP as K for thousands, M for millions, B for billions. The value will be stored in the variable PROPDMGVAL. The same with crop damages, which will be stored in the variable CROPDMGVAL:

expdf <- tbl_df(cbind(c("", "0", "K", "M", "B"), c(0, 0, 3, 6, 9)))
names(expdf) <- c("symbol", "exp")

mydata <- merge(mydata, expdf, by.x = "PROPDMGEXP", by.y = "symbol") %>%
        mutate(PROPDMGVAL = PROPDMG*10^as.numeric(exp))%>%
        select(-PROPDMG, -PROPDMGEXP, -exp)

mydata <- merge(mydata, expdf, by.x = "CROPDMGEXP", by.y = "symbol") %>%
        mutate(CROPDMGVAL = CROPDMG*10^as.numeric(exp))%>%
        select(-CROPDMG, -CROPDMGEXP, -exp)

2.4 Summarizing the damages

The fatalities, injuries, property damage and crop damage are summarized across the selected period for each type of event. The yearly average is computed for each event.

years <- 16 # the number of years, set by hand (!)
damages <- summarize(group_by(mydata, EVTYPE), fatalities = sum(FATALITIES)/years,
                                injuries = sum(INJURIES)/years,
                                propdmg = sum(PROPDMGVAL)/years,
                                cropdmg = sum(CROPDMGVAL)/years)
summary(damages)
##                     EVTYPE      fatalities          injuries      
##     HIGH SURF ADVISORY :  1   Min.   :  0.0000   Min.   :   0.00  
##   FLASH FLOOD          :  1   1st Qu.:  0.0000   1st Qu.:   0.00  
##   TSTM WIND            :  1   Median :  0.0000   Median :   0.00  
##   TSTM WIND (G45)      :  1   Mean   :  2.4583   Mean   :  16.32  
##  AGRICULTURAL FREEZE   :  1   3rd Qu.:  0.1875   3rd Qu.:   0.75  
##  ASTRONOMICAL HIGH TIDE:  1   Max.   :112.3125   Max.   :1291.69  
##  (Other)               :216                                       
##     propdmg             cropdmg         
##  Min.   :0.000e+00   Min.   :        0  
##  1st Qu.:2.660e+02   1st Qu.:        0  
##  Median :3.906e+03   Median :        0  
##  Mean   :1.033e+08   Mean   :  9783989  
##  3rd Qu.:2.129e+05   3rd Qu.:      859  
##  Max.   :8.997e+09   Max.   :835472875  
## 

3. Results

First I will show the events producing above the average damages, in descending order for each type of damage:

3.1 Above the average fatalities

topfatalities <- arrange(filter(damages, fatalities > mean(fatalities)),desc(fatalities))
topfatalities
## # A tibble: 28 × 5
##            EVTYPE fatalities  injuries      propdmg      cropdmg
##            <fctr>      <dbl>     <dbl>        <dbl>        <dbl>
## 1  EXCESSIVE HEAT   112.3125  399.4375     482731.2  30775125.00
## 2         TORNADO    94.4375 1291.6875 1538559106.9  17714063.12
## 3     FLASH FLOOD    55.4375  104.6250  951387744.4  83431356.25
## 4       LIGHTNING    40.6875  258.8125   46442317.5    431152.50
## 5           FLOOD    25.8750  422.3750 8996552096.9 310923650.00
## 6     RIP CURRENT    21.2500   13.0625         62.5         0.00
## 7       TSTM WIND    15.0625  226.8125  279876652.5  34619709.38
## 8            HEAT    14.8125   76.3750      95000.0     11031.25
## 9       HIGH WIND    14.6875   67.6875  327991272.5  39597581.25
## 10      AVALANCHE    13.9375    9.7500     231987.5         0.00
## # ... with 18 more rows

3.2 Above the average injuries

topinjuries <- arrange(filter(damages, injuries > mean(injuries)),desc(injuries))
topinjuries
## # A tibble: 23 × 5
##               EVTYPE fatalities  injuries      propdmg      cropdmg
##               <fctr>      <dbl>     <dbl>        <dbl>        <dbl>
## 1            TORNADO    94.4375 1291.6875 1538559106.9  17714063.12
## 2              FLOOD    25.8750  422.3750 8996552096.9 310923650.00
## 3     EXCESSIVE HEAT   112.3125  399.4375     482731.2  30775125.00
## 4          LIGHTNING    40.6875  258.8125   46442317.5    431152.50
## 5          TSTM WIND    15.0625  226.8125  279876652.5  34619709.38
## 6        FLASH FLOOD    55.4375  104.6250  951387744.4  83431356.25
## 7  THUNDERSTORM WIND     8.1250   87.5000  211415902.5  24895687.50
## 8       WINTER STORM    11.9375   80.7500   95796453.1    746500.00
## 9  HURRICANE/TYPHOON     4.0000   79.6875 4331615000.0 162992050.00
## 10              HEAT    14.8125   76.3750      95000.0     11031.25
## # ... with 13 more rows

3.3 Above the average property damage

toppropdmg <- arrange(filter(damages, propdmg > mean(propdmg)),desc(propdmg))
toppropdmg
## # A tibble: 15 × 5
##               EVTYPE fatalities  injuries    propdmg     cropdmg
##               <fctr>      <dbl>     <dbl>      <dbl>       <dbl>
## 1              FLOOD    25.8750  422.3750 8996552097 310923650.0
## 2  HURRICANE/TYPHOON     4.0000   79.6875 4331615000 162992050.0
## 3        STORM SURGE     0.1250    2.3125 2699596000       312.5
## 4            TORNADO    94.4375 1291.6875 1538559107  17714063.1
## 5        FLASH FLOOD    55.4375  104.6250  951387744  83431356.2
## 6               HAIL     0.4375   44.5625  912196464 154751840.6
## 7          HURRICANE     3.8125    2.8750  738301188 171338125.0
## 8     TROPICAL STORM     3.5625   21.1250  477654722  42356937.5
## 9          HIGH WIND    14.6875   67.6875  327991273  39597581.2
## 10          WILDFIRE     4.6875   56.9375  297416688  18467050.0
## 11  STORM SURGE/TIDE     0.6875    0.3125  290074250     53125.0
## 12         TSTM WIND    15.0625  226.8125  279876653  34619709.4
## 13         ICE STORM     5.1250   19.8750  227640551    978750.0
## 14 THUNDERSTORM WIND     8.1250   87.5000  211415903  24895687.5
## 15  WILD/FOREST FIRE     0.7500   34.0625  187611406   6673895.6

3.4 Above the average crop damage

topcropdmg <- arrange(filter(damages, cropdmg > mean(cropdmg)),desc(cropdmg))
topcropdmg
## # A tibble: 16 × 5
##               EVTYPE fatalities  injuries      propdmg   cropdmg
##               <fctr>      <dbl>     <dbl>        <dbl>     <dbl>
## 1            DROUGHT     0.0000    0.2500   65381312.5 835472875
## 2              FLOOD    25.8750  422.3750 8996552096.9 310923650
## 3          HURRICANE     3.8125    2.8750  738301188.1 171338125
## 4  HURRICANE/TYPHOON     4.0000   79.6875 4331615000.0 162992050
## 5               HAIL     0.4375   44.5625  912196463.8 154751841
## 6        FLASH FLOOD    55.4375  104.6250  951387744.4  83431356
## 7       EXTREME COLD     7.0625    4.9375    1235025.0  80560813
## 8       FROST/FREEZE     0.0000    0.0000     592500.0  68380375
## 9         HEAVY RAIN     5.8750   14.3750   36554027.5  45510613
## 10    TROPICAL STORM     3.5625   21.1250  477654721.9  42356938
## 11         HIGH WIND    14.6875   67.6875  327991272.5  39597581
## 12         TSTM WIND    15.0625  226.8125  279876652.5  34619709
## 13    EXCESSIVE HEAT   112.3125  399.4375     482731.2  30775125
## 14 THUNDERSTORM WIND     8.1250   87.5000  211415902.5  24895688
## 15          WILDFIRE     4.6875   56.9375  297416687.5  18467050
## 16           TORNADO    94.4375 1291.6875 1538559106.9  17714063

3.5 The most devastating events

The events that are producing fatalities, injuries, property damage and crop damage all above the average are listed below:

topdmg <- Reduce(intersect, list(topfatalities$EVTYPE, topinjuries$EVTYPE, toppropdmg$EVTYPE, topcropdmg$EVTYPE))
topdmg
## [1] "TORNADO"           "FLASH FLOOD"       "FLOOD"            
## [4] "TSTM WIND"         "HIGH WIND"         "THUNDERSTORM WIND"
## [7] "WILDFIRE"          "HURRICANE/TYPHOON" "TROPICAL STORM"

The results above outline the most damageing events for each of the main damage type alone or for all four together.