This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
In my report, I plan to find out the most harmful event type with respect to population health as well as the event type causing the greatest economic consequences. Based on data analysis, I conclude that the event type, Tornado, had the most impact on both population health and economy, with 96,979 people suffered as well as $3,312,277,000 loss.
DATA
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
Storm Data [47Mb] There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
DATA PROCESSING
Now, I load the original “repdata-data-StormData.csv.bz2” data and store it in a data frame called “data”, using read.csv(bzfile()).
data <- read.csv(bzfile("repdata_data_StormData.csv.bz2"))
names(data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
RESULTS
The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. I use the database to answer the questions below and show the code for my entire analysis.
Your data analysis must address the following questions: Question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? In this question, I focus on both FATALITIES and INJURIES to study the impact on population health. First of all, I create a new variable to sum the impact on population health.
data$health <- data$FATALITIE + data$INJURIES
Then, I aggregate the data based on the event types.
aggdata_health <-aggregate(health ~ EVTYPE, data = data, sum)
Finally, I list the top five harmful events in a pie chart.
health_top5 <- head(aggdata_health[order(-aggdata_health$health), ], 5)
others <- data.frame(EVTYPE = as.factor("others"),
health = sum(aggdata_health$health) - sum(health_top5$health))
health_top6 <- rbind(health_top5, others)
health_top6
## EVTYPE health
## 834 TORNADO 96979
## 130 EXCESSIVE HEAT 8428
## 856 TSTM WIND 7461
## 170 FLOOD 7259
## 464 LIGHTNING 6046
## 1 others 29500
pie(health_top6$health, labels=health_top6$EVTYPE,
main="Top 6 Harmful Events", radius = 1)
From the pie chart, we can clearly find out that the event, Tornado resulted in the most harmful population health consequences, 96,979 people suffered, even more than all of the other event types.
#Question 2: Across the United States, which types of events have the greatest economic consequences?
For economic consequences question, I turn to the other two variables PROPDMG and CROPDMG, which refers to the property demage loss and crops demage loss, respectively. Similarly, I create another new variable to sum the impact on economics.
data$econ <- data$PROPDMG + data$CROPDMG
Secondly, I aggregate the data based on the event types.
aggdata_econ <-aggregate(econ ~ EVTYPE, data = data, sum)
Finally, I list the top five loss events in a bar chart.
econ_top5 <- head(aggdata_econ[order(-aggdata_econ$econ), ], 5)
econ_top5
## EVTYPE econ
## 834 TORNADO 3312277
## 153 FLASH FLOOD 1599325
## 856 TSTM WIND 1445168
## 244 HAIL 1268290
## 170 FLOOD 1067976
barplot(econ_top5$econ,
col = c("red", "green", "green", "green", "green"),
names.arg = econ_top5$EVTYPE, cex.names = 0.65,
xlab = "Event Types", ylab = "Loss, $1K",
main="Top 5 Loss Events")
The above barplot also returns Tornado as the heaviest loss in the economy, with about $3,312,277,000 loss.