Population and Economic Impact Analysis of Weather Events in United States (1950 - 2011)

Synopsis

The analysis is performed to identify the top 10 categories that has maximum impact across past 60 years. Population health impact is categorized in total number of fatalities and total number of injuries. To analyze economic impact we have used property damage and crop damage as indicators. Maximum population and economic impact is from Tornedos over the years. Second considerable consolidated impact is from TSTM wind and floods.

Data Processing

  • Loading required packages
require(dplyr)
require(ggplot2)
require(reshape2)
#Unzipped data file should be available in the current working directory
filepath <- paste(getwd(), "/repdata_data_StormData.csv", sep = "")
stormdata <- read.csv(filepath, stringsAsFactors = FALSE)
  • Extracting required columns from complete data to evaluate most harmful population health impact
  • Calculating total sum of fatalities and injuries
  • Summarizing the data across eventtypes
  • Filtering only the records with total count > 0
  • Arranging the data set in descending order of total count
stormsub <- select(stormdata, EVTYPE, FATALITIES, INJURIES) %>% 
            group_by(EVTYPE) %>% 
            mutate(TOTAL = FATALITIES + INJURIES) %>% 
            summarise(fatalities = sum(FATALITIES), 
                      injuries = sum(INJURIES), 
                      total = sum(TOTAL)) %>% 
            filter(total > 0) %>% 
            arrange(desc(total))
  • Identifying top 10 most impactful events based on total count
maxfatalandinjuries <- top_n(stormsub, 10, wt = total)
maxfatalandinjuries
## Source: local data frame [10 x 4]
## 
##               EVTYPE fatalities injuries total
##                (chr)      (dbl)    (dbl) (dbl)
## 1            TORNADO       5633    91346 96979
## 2     EXCESSIVE HEAT       1903     6525  8428
## 3          TSTM WIND        504     6957  7461
## 4              FLOOD        470     6789  7259
## 5          LIGHTNING        816     5230  6046
## 6               HEAT        937     2100  3037
## 7        FLASH FLOOD        978     1777  2755
## 8          ICE STORM         89     1975  2064
## 9  THUNDERSTORM WIND        133     1488  1621
## 10      WINTER STORM        206     1321  1527
  • Melting dataset to plot bar graph of the consolidate data
  • Plotting the dataset
meltmaxfatalandinjuries <- melt(maxfatalandinjuries, 
                                id.vars = "EVTYPE",
                                variable.name = "type",
                                value.name = "count")
plot1 <- ggplot(meltmaxfatalandinjuries, 
               aes(x = EVTYPE, y = count))
plot1 + geom_bar( stat = "identity", 
                  position = "dodge", 
                  aes(fill = as.factor(type)), 
                  color = "steelblue") + 
        labs(fill = "Type of Impact") +
        coord_flip() +
        labs(x = "Event Type", y = "Count") +
        ggtitle("Total Population Impact")

  • Extracting required columns from complete data to evaluate the economic impact
  • Calculating total sum of property damage and crop damage
  • Summarizing the data across eventtypes
  • Filtering only the records with total count > 0
  • Arranging the data set in descending order of total count
stormsub1 <- select(stormdata, EVTYPE, PROPDMG:CROPDMGEXP) %>% 
             group_by(EVTYPE) %>% 
             mutate(TOTAL = PROPDMG + CROPDMG) %>% 
             summarise(propertydamage = sum(PROPDMG), 
                       cropdamage = sum(CROPDMG), 
                       total = sum(TOTAL)) %>% 
            filter(total > 0) %>% 
            arrange(desc(total))
  • Identifying top 10 most impactful events based on total count
maxpropandcrop <- top_n(stormsub1, 10, wt = total)
maxpropandcrop
## Source: local data frame [10 x 4]
## 
##                EVTYPE propertydamage cropdamage     total
##                 (chr)          (dbl)      (dbl)     (dbl)
## 1             TORNADO      3212258.2  100018.52 3312276.7
## 2         FLASH FLOOD      1420124.6  179200.46 1599325.1
## 3           TSTM WIND      1335965.6  109202.60 1445168.2
## 4                HAIL       688693.4  579596.28 1268289.7
## 5               FLOOD       899938.5  168037.88 1067976.4
## 6   THUNDERSTORM WIND       876844.2   66791.45  943635.6
## 7           LIGHTNING       603351.8    3580.61  606932.4
## 8  THUNDERSTORM WINDS       446293.2   18684.93  464978.1
## 9           HIGH WIND       324731.6   17283.21  342014.8
## 10       WINTER STORM       132720.6    1978.99  134699.6
  • Melting dataset to plot bar graph of the consolidate data
  • Plotting the dataset
meltmaxpropandcrop <- melt(maxpropandcrop, 
                           id.vars = "EVTYPE",
                           variable.name = "type",
                           value.name = "cost")

plot2 <- ggplot(meltmaxpropandcrop, 
               aes(x = EVTYPE, y = cost/10^6))
plot2 + geom_bar( stat = "identity", 
                  position = "dodge", 
                  aes(fill = as.factor(type))) + 
        labs(fill = "Damage Type") +
        coord_flip() +
        labs(x = "Event Type", y = "Total Cost (Million USD)") +
        ggtitle("Total Economic Impact of Weather Events") +
        theme()

Results

  • Most severe cause of overall population and economic consequence is tornadoes. After tornedo other two categories are wind and flooding. Further analysis of this data with location codes and time line may generate more interesting insights.