Storm events in U.S from 1950 to 2011

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Synopsis

You can also embed plots, for example:

Data Processing

storm<-read.csv("project/FStormData.csv")
str(storm)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436774 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

There’s 902,297 observations and 37 variables.

As I’ll use the year of event to explore and aggregate the data I’ll convert the variable “BGN_DATE” to a date-time format and then extract the year.

storm$BGN_DATE<- as.POSIXct(storm$BGN_DATE,format="%m/%d/%Y %H:%M:%S",tz=Sys.timezone())
storm$BGN_Year<-year(storm$BGN_DATE)

Events by state

Total of weather events by state

  storm %>%
  group_by(STATE) %>% 
  summarise(Number_of_Events=n()) %>% 
  arrange(desc(Number_of_Events)) %>% 
  head(10)
## # A tibble: 10 × 2
##     STATE Number_of_Events
##    <fctr>            <int>
## 1      TX            83728
## 2      KS            53440
## 3      OK            46802
## 4      MO            35648
## 5      IA            31069
## 6      NE            30271
## 7      IL            28488
## 8      AR            27102
## 9      NC            25351
## 10     GA            25259

Total of injuries by state

  storm %>%
  group_by(STATE) %>% 
  summarise(Number_of_Injuries=sum(INJURIES)) %>% 
  arrange(desc(Number_of_Injuries)) %>% 
  head(10)
## # A tibble: 10 × 2
##     STATE Number_of_Injuries
##    <fctr>              <dbl>
## 1      TX              17667
## 2      MO               8998
## 3      AL               8742
## 4      OH               7112
## 5      MS               6675
## 6      FL               5918
## 7      OK               5710
## 8      IL               5563
## 9      AR               5550
## 10     TN               5202

Total of fatalities by state

  storm %>%
  group_by(STATE) %>% 
  summarise(Number_of_Fatalities=sum(FATALITIES)) %>% 
  arrange(desc(Number_of_Fatalities)) %>% 
  head(10)
## # A tibble: 10 × 2
##     STATE Number_of_Fatalities
##    <fctr>                <dbl>
## 1      IL                 1421
## 2      TX                 1366
## 3      PA                  846
## 4      AL                  784
## 5      MO                  754
## 6      FL                  746
## 7      MS                  555
## 8      CA                  550
## 9      AR                  530
## 10     TN                  521

Texas is the state with the highest number of events however the state of Illinois has the highest number of fatalities. There are around threee times more events recorded in Texas than Illinois but there are still more fatalities recorded in Illinois. It would be interesting to explore further those figures. To go further we could investigate what kind of event occur more often in Texas compare with Illinois, what is the popullation of those states, is it a rural or urban area …

Events most harmful to population health

Top 10 event types

  storm %>%
  group_by(EVTYPE) %>% 
  summarise(Number_of_Events=n()) %>% 
  arrange(desc(Number_of_Events)) %>% 
  head(10)
## # A tibble: 10 × 2
##                EVTYPE Number_of_Events
##                <fctr>            <int>
## 1                HAIL           288661
## 2           TSTM WIND           219940
## 3   THUNDERSTORM WIND            82563
## 4             TORNADO            60652
## 5         FLASH FLOOD            54277
## 6               FLOOD            25326
## 7  THUNDERSTORM WINDS            20843
## 8           HIGH WIND            20212
## 9           LIGHTNING            15754
## 10         HEAVY SNOW            15708

Top 10 event types causing the most victims (injuries)

  storm %>%
  group_by(EVTYPE) %>% 
  summarise(Number_of_Injuries=sum(INJURIES)) %>% 
  arrange(desc(Number_of_Injuries)) %>% 
  head(10)
## # A tibble: 10 × 2
##               EVTYPE Number_of_Injuries
##               <fctr>              <dbl>
## 1            TORNADO              91346
## 2          TSTM WIND               6957
## 3              FLOOD               6789
## 4     EXCESSIVE HEAT               6525
## 5          LIGHTNING               5230
## 6               HEAT               2100
## 7          ICE STORM               1975
## 8        FLASH FLOOD               1777
## 9  THUNDERSTORM WIND               1488
## 10              HAIL               1361

Top 10 event types causing the most victims (fatalities)

  storm %>%
  group_by(EVTYPE) %>% 
  summarise(Number_of_Fatalities=sum(FATALITIES)) %>% 
  arrange(desc(Number_of_Fatalities)) %>% 
  head(10)
## # A tibble: 10 × 2
##            EVTYPE Number_of_Fatalities
##            <fctr>                <dbl>
## 1         TORNADO                 5633
## 2  EXCESSIVE HEAT                 1903
## 3     FLASH FLOOD                  978
## 4            HEAT                  937
## 5       LIGHTNING                  816
## 6       TSTM WIND                  504
## 7           FLOOD                  470
## 8     RIP CURRENT                  368
## 9       HIGH WIND                  248
## 10      AVALANCHE                  224

After a brief exploration we can see that tornade and excessive heat are the top deadliest events while hail and thunderstorm are the most comon events.

Event types with the greatest economic consequences

Top 10 event types causing the most property damage

  storm %>%
  group_by(EVTYPE) %>% 
  summarise(Tot_Property_Damage=sum(PROPDMG)) %>% 
  arrange(desc(Tot_Property_Damage)) %>% 
  head(10)
## # A tibble: 10 × 2
##                EVTYPE Tot_Property_Damage
##                <fctr>               <dbl>
## 1             TORNADO           3212258.2
## 2         FLASH FLOOD           1420124.6
## 3           TSTM WIND           1335965.6
## 4               FLOOD            899938.5
## 5   THUNDERSTORM WIND            876844.2
## 6                HAIL            688693.4
## 7           LIGHTNING            603351.8
## 8  THUNDERSTORM WINDS            446293.2
## 9           HIGH WIND            324731.6
## 10       WINTER STORM            132720.6

Top 10 event types causing the most crop damage

  storm %>%
  group_by(EVTYPE) %>% 
  summarise(Tot_Crop_Damage=sum(CROPDMG)) %>% 
  arrange(desc(Tot_Crop_Damage)) %>% 
  head(10)
## # A tibble: 10 × 2
##                EVTYPE Tot_Crop_Damage
##                <fctr>           <dbl>
## 1                HAIL       579596.28
## 2         FLASH FLOOD       179200.46
## 3               FLOOD       168037.88
## 4           TSTM WIND       109202.60
## 5             TORNADO       100018.52
## 6   THUNDERSTORM WIND        66791.45
## 7             DROUGHT        33898.62
## 8  THUNDERSTORM WINDS        18684.93
## 9           HIGH WIND        17283.21
## 10         HEAVY RAIN        11122.80

Hail is the most devastating weather event in term of crop damage however tornado is the event that causes the biggest property damage.

As long I was exploring this dataset I had an assumption that I’d like to verify. I’d like to see if the ratio fatalities?number of events and injuries/number of events has decreased over the years which would mean that nowadays we are better prepared to anticipate and tackle the weather events and thus reducing the civilian casualties.

Total events recorded since 1960 until 2011

aggrVictims<-storm %>% group_by(BGN_Year) %>%
                       summarise(NbEvents=n(),totFatalities=sum(FATALITIES),
                       ratioEventFat = sum(FATALITIES)/n(),totInjuries=sum(INJURIES),ratioEventInj=sum(INJURIES)/n())

#The reason why I'm excluding the data before 1960 is to avoid the outliers in the ratio calculation as I'm guessing that only the event causing fatalities or injuries were recorded in the past 
  aggrVictims  %>% 
  filter(BGN_Year>=1960) %>%
  ggplot( aes(x=BGN_Year,y=NbEvents,fill=NbEvents)) + 
  geom_bar(stat="identity") +
  labs(x="Year", y="Number of events",title= "Number of events from 1960 to 2011") + 
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5))

The number of events has significantly increased since 1960 obvisouly there were probably a lot of events in the past that haven’t been reccorded.

Ratio Injuries/number of events recorded since 1960 until 2011

  aggrVictims %>% 
  filter(BGN_Year>=1960) %>%
  ggplot( aes(x=BGN_Year,y=ratioEventInj,fill=ratioEventInj)) + 
  geom_bar(stat="identity") +
  labs(x="Year", y="Ratio Injuries",title= "Ratio Injuries per Event from 1960 to 2011") + 
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5))

Ratio Fatalities/number of events recorded since 1960 until to 2011

  aggrVictims %>% 
  filter(BGN_Year>=1960) %>%
  ggplot( aes(x=BGN_Year,y=ratioEventFat,fill=ratioEventFat)) + 
  geom_bar(stat="identity") +
  labs(x="Year", y="Ratio Fatalies",title= "Ratio Fatalies per Event from 1960 to 2011") + 
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5))

So appart from couple of peaks we can see that in general the ratio injuries per number of events tends to decrease over time and for the fatalities ratio it looks less obvious but we can still observe a decrease. However I can’t bring any evidence from this result as in my opinion only the major events were recorded in the past and we now record all events happening which reduces the ratio fatalities/ numbe rof events.

Results

Health Impact

aggrVictimsEvtType<-storm %>% group_by(EVTYPE) %>%
                              summarise(totFatalities=sum(FATALITIES))

aggrVictimsEvtType$label<-paste(aggrVictimsEvtType$EVTYPE, aggrVictimsEvtType$totFatalities, sep = "\n ")

treemap(aggrVictimsEvtType,
        index="label",
        vSize="totFatalities",
        vColor="totFatalities",
        type="value",
        title="Total Fatalities caused by weather events in U.S since 1950")

#### Economic Impact

aggrPrpDmgcEvtType<-storm %>% group_by(EVTYPE) %>%
                              summarise(totPropDmg=round(sum(PROPDMG)/1000000,2))

aggrPrpDmgcEvtType$label<-paste(aggrPrpDmgcEvtType$EVTYPE, paste(aggrPrpDmgcEvtType$totPropDmg,"M$"), sep = "\n ")

        treemap(aggrPrpDmgcEvtType,
        index="label",
        vSize="totPropDmg",
        vColor="totPropDmg",
        type="value",
        title="Total Property damage caused by weather events in U.S since 1950")

aggrCropDmgcEvtType<-storm %>% group_by(EVTYPE) %>%
                              summarise(totCropDmg=round(sum(CROPDMG)/1000,2))

aggrCropDmgcEvtType$label<-paste(aggrCropDmgcEvtType$EVTYPE, paste(aggrCropDmgcEvtType$totCropDmg,"K$"), sep = "\n ")

        treemap(aggrCropDmgcEvtType,
        index="label",
        vSize="totCropDmg",
        vColor="totCropDmg",
        type="value",
        title="Total Crop damage caused by weather events in U.S since 1950")

Conclusion

The top 3 deadliest event type are Tornado, Excessive heat and Flash flood. In the top 3 events causing the most preoperty damage we also find Tornado and Flash flood and the third one is thunder storm. Finally the top 3 events causing the most crop damages are Hail, Flash flood and Flood.