Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. Here, we will explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

We will try to address the following questions: Across the United States, which types of events are most harmful with respect to population health? Across the United States, which types of events have the greatest economic consequences?

Data Processing

Load the required libraries. Read the data into R. We will also get a feel of what all information is available in the data set.

library(ggplot2)
library(dplyr)

StormData1<-read.csv(file="repdata_data_StormData.csv.bz2",header=TRUE)
head(StormData1)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

Now we start cleaning the data. First we add a column to calculate prop damage and crop damage. Then we remove the columns which are not required for the analysis. To do this we make a subset with event type and its consequences.

StormData1$pd <- 0
StormData1[StormData1$PROPDMGEXP == "H", ]$pd <- StormData1[StormData1$PROPDMGEXP == "H", ]$PROPDMG * 100
StormData1[StormData1$PROPDMGEXP == "K", ]$pd <- StormData1[StormData1$PROPDMGEXP == "K", ]$PROPDMG * 1000
StormData1[StormData1$PROPDMGEXP == "M", ]$pd <- StormData1[StormData1$PROPDMGEXP == "M", ]$PROPDMG * 1000000
StormData1[StormData1$PROPDMGEXP == "B", ]$pd <- StormData1[StormData1$PROPDMGEXP == "B", ]$PROPDMG * 1000000000

StormData1$cd <- 0
StormData1[StormData1$CROPDMGEXP == "H", ]$cd <- StormData1[StormData1$CROPDMGEXP == "H", ]$CROPDMG * 100
StormData1[StormData1$CROPDMGEXP == "K", ]$cd <- StormData1[StormData1$CROPDMGEXP == "K", ]$CROPDMG * 1000
StormData1[StormData1$CROPDMGEXP == "M", ]$cd <- StormData1[StormData1$CROPDMGEXP == "M", ]$CROPDMG * 1000000
StormData1[StormData1$CROPDMGEXP == "B", ]$cd <- StormData1[StormData1$CROPDMGEXP == "B", ]$CROPDMG * 1000000000

StormData2<-StormData1[,c("STATE","EVTYPE","FATALITIES","INJURIES","pd","cd")]
#combine fatalities +injuries into healthCon
#combine crop and property damage into econoCon
StormData2$FATALITIES<-StormData2$FATALITIES+StormData2$INJURIES
StormData2$pd<-StormData2$pd+StormData2$cd
StormData2<-StormData2[,-c(4,6)]
names(StormData2)<-c("STATE","EVENTTYPE","HEALTHCON","ECONOCON")

Have a look at the top few rows of this new data set to check if any further cleaning is required.

head(StormData2)
##   STATE EVENTTYPE HEALTHCON ECONOCON
## 1    AL   TORNADO        15    25000
## 2    AL   TORNADO         0     2500
## 3    AL   TORNADO         2    25000
## 4    AL   TORNADO         2     2500
## 5    AL   TORNADO         2     2500
## 6    AL   TORNADO         6     2500

Results

To find out which event resulted in most harm to population and which event resulted in most economic damage we group the health and economic consequences by event type. Since there are a lot of events listed in the given data set, we shortlist the top 10 events in both the cases.

HealthConData <-aggregate(HEALTHCON ~ EVENTTYPE, data = StormData2, sum)
EconoConData <-aggregate(ECONOCON ~ EVENTTYPE, data = StormData2, sum)

HealthConData <- HealthConData[order(-HealthConData$HEALTHCON), ][1:10, ]
EconoConData <- EconoConData[order(-EconoConData$ECONOCON), ][1:10, ]

For a clear understanding and comparison, lets draw a plot of these events and their consequences on population health and economic factors.

ggplot(HealthConData, aes(x = EVENTTYPE, y = HEALTHCON)) + xlab("Event Type") + ylab("Health Damage")  + geom_bar(stat = "identity", fill = "blue", col= "black") +ggtitle("Top 10 Weather Events by Population Health Consequences")+theme(axis.text.x = element_text(angle = 90, hjust = 1)) +scale_y_continuous()

ggplot(EconoConData, aes(x = EVENTTYPE, y = ECONOCON)) + xlab("Event Type") + ylab("Economic Damage")  + geom_bar(stat = "identity", fill = "red", col= "black") +ggtitle("Top 10 Weather Events by Economic Consequences")+theme(axis.text.x = element_text(angle = 90, hjust = 1)) +scale_y_continuous()

Thus we see that Tornado has the maximum effect on human population and Flood has the greatest economic consequence.