SYNOPSIS

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

In my report, I plan to find out the most harmful event type with respect to population health as well as the event type causing the greatest economic consequences. Based on data analysis, I conclude that the event type, Tornado, had the most impact on both population health and economy, with 96,979 people suffered as well as $3,312,277,000 loss.

DATA
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm Data [47Mb] There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation

National Climatic Data Center Storm Events FAQ

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

DATA PROCESSING
Now, I load the original “repdata-data-StormData.csv.bz2” data and store it in a data frame called “data”, using read.csv(bzfile()).

data <- read.csv(bzfile("repdata_data_StormData.csv.bz2"))
names(data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

RESULTS
The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. I use the database to answer the questions below and show the code for my entire analysis.

Your data analysis must address the following questions: Question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? In this question, I focus on both FATALITIES and INJURIES to study the impact on population health. First of all, I create a new variable to sum the impact on population health.

data$health <- data$FATALITIE + data$INJURIES

Then, I aggregate the data based on the event types.

aggdata_health <-aggregate(health ~ EVTYPE, data = data, sum)

Finally, I list the top five harmful events in a pie chart.

health_top5 <- head(aggdata_health[order(-aggdata_health$health), ], 5)
others <- data.frame(EVTYPE = as.factor("others"), 
                     health = sum(aggdata_health$health) - sum(health_top5$health))
health_top6 <- rbind(health_top5, others)
health_top6
##             EVTYPE health
## 834        TORNADO  96979
## 130 EXCESSIVE HEAT   8428
## 856      TSTM WIND   7461
## 170          FLOOD   7259
## 464      LIGHTNING   6046
## 1           others  29500
pie(health_top6$health, labels=health_top6$EVTYPE, 
    main="Top 6 Harmful Events", radius = 1)

From the pie chart, we can clearly find out that the event, Tornado resulted in the most harmful population health consequences, 96,979 people suffered, even more than all of the other event types.
#Question 2: Across the United States, which types of events have the greatest economic consequences?
For economic consequences question, I turn to the other two variables PROPDMG and CROPDMG, which refers to the property demage loss and crops demage loss, respectively. Similarly, I create another new variable to sum the impact on economics.

data$econ <- data$PROPDMG + data$CROPDMG

Secondly, I aggregate the data based on the event types.

aggdata_econ <-aggregate(econ ~ EVTYPE, data = data, sum)

Finally, I list the top five loss events in a bar chart.

econ_top5 <- head(aggdata_econ[order(-aggdata_econ$econ), ], 5)
econ_top5
##          EVTYPE    econ
## 834     TORNADO 3312277
## 153 FLASH FLOOD 1599325
## 856   TSTM WIND 1445168
## 244        HAIL 1268290
## 170       FLOOD 1067976
barplot(econ_top5$econ, 
        col = c("red", "green", "green", "green", "green"),
        names.arg = econ_top5$EVTYPE, cex.names = 0.65,
        xlab = "Event Types", ylab = "Loss, $1K",
        main="Top 5 Loss Events")

The above barplot also returns Tornado as the heaviest loss in the economy, with about $3,312,277,000 loss.