Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. Here, we will explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
We will try to address the following questions: Across the United States, which types of events are most harmful with respect to population health? Across the United States, which types of events have the greatest economic consequences?
Load the required libraries. Read the data into R. We will also get a feel of what all information is available in the data set.
library(ggplot2)
library(dplyr)
StormData1<-read.csv(file="repdata_data_StormData.csv.bz2",header=TRUE)
head(StormData1)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
Now we start cleaning the data. First we add a column to calculate prop damage and crop damage. Then we remove the columns which are not required for the analysis. To do this we make a subset with event type and its consequences.
StormData1$pd <- 0
StormData1[StormData1$PROPDMGEXP == "H", ]$pd <- StormData1[StormData1$PROPDMGEXP == "H", ]$PROPDMG * 100
StormData1[StormData1$PROPDMGEXP == "K", ]$pd <- StormData1[StormData1$PROPDMGEXP == "K", ]$PROPDMG * 1000
StormData1[StormData1$PROPDMGEXP == "M", ]$pd <- StormData1[StormData1$PROPDMGEXP == "M", ]$PROPDMG * 1000000
StormData1[StormData1$PROPDMGEXP == "B", ]$pd <- StormData1[StormData1$PROPDMGEXP == "B", ]$PROPDMG * 1000000000
StormData1$cd <- 0
StormData1[StormData1$CROPDMGEXP == "H", ]$cd <- StormData1[StormData1$CROPDMGEXP == "H", ]$CROPDMG * 100
StormData1[StormData1$CROPDMGEXP == "K", ]$cd <- StormData1[StormData1$CROPDMGEXP == "K", ]$CROPDMG * 1000
StormData1[StormData1$CROPDMGEXP == "M", ]$cd <- StormData1[StormData1$CROPDMGEXP == "M", ]$CROPDMG * 1000000
StormData1[StormData1$CROPDMGEXP == "B", ]$cd <- StormData1[StormData1$CROPDMGEXP == "B", ]$CROPDMG * 1000000000
StormData2<-StormData1[,c("STATE","EVTYPE","FATALITIES","INJURIES","pd","cd")]
#combine fatalities +injuries into healthCon
#combine crop and property damage into econoCon
StormData2$FATALITIES<-StormData2$FATALITIES+StormData2$INJURIES
StormData2$pd<-StormData2$pd+StormData2$cd
StormData2<-StormData2[,-c(4,6)]
names(StormData2)<-c("STATE","EVENTTYPE","HEALTHCON","ECONOCON")
Have a look at the top few rows of this new data set to check if any further cleaning is required.
head(StormData2)
## STATE EVENTTYPE HEALTHCON ECONOCON
## 1 AL TORNADO 15 25000
## 2 AL TORNADO 0 2500
## 3 AL TORNADO 2 25000
## 4 AL TORNADO 2 2500
## 5 AL TORNADO 2 2500
## 6 AL TORNADO 6 2500
To find out which event resulted in most harm to population and which event resulted in most economic damage we group the health and economic consequences by event type. Since there are a lot of events listed in the given data set, we shortlist the top 10 events in both the cases.
HealthConData <-aggregate(HEALTHCON ~ EVENTTYPE, data = StormData2, sum)
EconoConData <-aggregate(ECONOCON ~ EVENTTYPE, data = StormData2, sum)
HealthConData <- HealthConData[order(-HealthConData$HEALTHCON), ][1:10, ]
EconoConData <- EconoConData[order(-EconoConData$ECONOCON), ][1:10, ]
For a clear understanding and comparison, lets draw a plot of these events and their consequences on population health and economic factors.
ggplot(HealthConData, aes(x = EVENTTYPE, y = HEALTHCON)) + xlab("Event Type") + ylab("Health Damage") + geom_bar(stat = "identity", fill = "blue", col= "black") +ggtitle("Top 10 Weather Events by Population Health Consequences")+theme(axis.text.x = element_text(angle = 90, hjust = 1)) +scale_y_continuous()
ggplot(EconoConData, aes(x = EVENTTYPE, y = ECONOCON)) + xlab("Event Type") + ylab("Economic Damage") + geom_bar(stat = "identity", fill = "red", col= "black") +ggtitle("Top 10 Weather Events by Economic Consequences")+theme(axis.text.x = element_text(angle = 90, hjust = 1)) +scale_y_continuous()
Thus we see that Tornado has the maximum effect on human population and Flood has the greatest economic consequence.