This analysis is based on data about storms and weather events provided by U.S. National Oceanic and Atmospheric Administration’s (NOAA). The data includes when and where the events happened, as well as estimates of any fatalities, injuries, and property/crop damages. The goal of this analysis is to find out which events affect the public health and economics most heavily. After analyzing the data during the period from August 2001 to August 2004, we conclude that: in public health, excessive heat caused the most fatalities, while tornado caused the most injuries; in economics, hurricane/typhoon caused the most property damages, while tornado caused the most crop damages. Overall, tornado and hurricane/typhoon ranked the top events that affected the public health and economics in the U.S. from August 2001 to August 2004, respectively.

Data Processing

Load the data

The data is downloaded from Storm Data at the time of Thu Aug 21, 2014. We save the destination file in the original .csv.bz2 form to reduce the size, choose to read the columns as characters in the options of the read.csv() function to reduce the time loading data. Due to the limited computing capability we take a slice of 100,000 obervations.

#download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile="storm.csv.bz2")
classes<-rep("NULL",37)
classes[c(23,24,25,27)]<-"numeric"
classes[c(1,8,26,28)]<-"factor"
classes[c(2)]<-"character"
Storm <- read.csv("repdata-data-StormData.csv.bz2", skip=700000,nrows=100000, colClasses=classes)

Data formatting

The columns to be considered as relavant to the analysis are:

  • STATE: an integer to indicate the state id
  • EVTYPE: a string to indicate the event type
  • FATALITIES: the number of direct and indirect weather-related - fatalities
  • INJURIES: the number of direct and indirect weather-related - injuries
  • PROPDMG: the number of property damages, is either “”, “K” (thousand), “M”(million), or “B”(billion)
  • PROPDMGEXP: the component for the number of property damages
  • CROPDMG: the number of crop damages
  • CROPDMGEXP: the component for the number of crop damages, is either “”, “K”(thousand), or “M”(million)

For convenience we rename the above columns as follows:

CNames<-c("State","Date","Event","Fatalities","Injuries","PropDamage","PropDamageExp","CropDamage","CropDamageExp")
colnames(Storm)<-CNames

We also convert the Dates into the POSXIT format for R:

Storm$Date<-strptime(Storm$Date,format="%e/%d/%Y %H:%M:%S")

Data Examination

We check the start and end dates of the selected data:

##Examine the Dates
Storm$Date[1]
## [1] "2001-08-22 EDT"
Storm$Date[100000]
## [1] "2004-08-17 EDT"

So we will analyze the impacts of weather-related events on the U.S. from the August 2001 to August 2004.

We check the states where the events took place to make sure the data is geographically evenly distributed in the U.S.

##Examine the States distribution
levels(Storm$State)
##  [1] "1.00"  "10.00" "11.00" "12.00" "13.00" "15.00" "16.00" "17.00"
##  [9] "18.00" "19.00" "2.00"  "20.00" "21.00" "22.00" "23.00" "24.00"
## [17] "25.00" "26.00" "27.00" "28.00" "29.00" "30.00" "31.00" "32.00"
## [25] "33.00" "34.00" "35.00" "36.00" "37.00" "38.00" "39.00" "4.00" 
## [33] "40.00" "41.00" "42.00" "44.00" "45.00" "46.00" "47.00" "48.00"
## [41] "49.00" "5.00"  "50.00" "51.00" "53.00" "54.00" "55.00" "56.00"
## [49] "6.00"  "60.00" "66.00" "72.00" "78.00" "8.00"  "81.00" "83.00"
## [57] "84.00" "85.00" "86.00" "87.00" "88.00" "89.00" "9.00"  "90.00"
## [65] "91.00" "92.00" "93.00" "94.00" "95.00"

Data analysis

Question 1: Which type of events are most harmful with respect to population health?

To address the analysis on which event has the greatest impacts on public health, we aggregate the sums of Fatalities and Injuries by Event:

pub.health.sum<-aggregate(cbind(Fatalities, Injuries)~Event,data=Storm,FUN=sum)

The events that caused top 5 fatalities are:

Fsum<-pub.health.sum[with(pub.health.sum, order(-Fatalities)),]
head(Fsum)
##             Event Fatalities Injuries
## 21 EXCESSIVE HEAT        214      626
## 85        TORNADO        139     2347
## 26    FLASH FLOOD        127      147
## 59      LIGHTNING        121      664
## 71   RIP CURRENTS         56       96
## 70    RIP CURRENT         54       48

The events that caused top 5 injuries are:

Isum<-pub.health.sum[with(pub.health.sum, order(-Injuries)),]
head(Isum)
##                 Event Fatalities Injuries
## 85            TORNADO        139     2347
## 52  HURRICANE/TYPHOON         30     1114
## 59          LIGHTNING        121      664
## 89          TSTM WIND         44      649
## 21     EXCESSIVE HEAT        214      626
## 102          WILDFIRE         24      303

Question 2: Which type of events have the greatest economic consequences?

To address the analysis on which event has the greatest impacts on economics, we aggregate the sums of PropDamage and CropDamage by Event. Note that the exponents of the data are different; hence we only consider the data with the highest exponent in the corresponding category: “B” for PropDamage, and “M” for CropDamage.

Prop.sum<-aggregate(PropDamage~Event+PropDamageExp,data=Storm,subset=(PropDamageExp=="B"),FUN=sum)
Psum<-Prop.sum[with(Prop.sum, order(-PropDamage)),]
head(Psum)
##               Event PropDamageExp PropDamage
## 3 HURRICANE/TYPHOON             B      16.75
## 2         HIGH WIND             B       1.30
## 4          WILDFIRE             B       1.04
## 1       FLASH FLOOD             B       1.00
Crop.sum<-aggregate(CropDamage~Event+CropDamageExp,data=Storm,subset=(PropDamageExp=="M"),FUN=sum)
Csum<-Crop.sum[with(Crop.sum, order(-CropDamage)),]
head(Csum)
##                Event CropDamageExp CropDamage
## 33           TORNADO             K       2053
## 28       FLASH FLOOD             K       1970
## 31         HIGH WIND             K       1800
## 29             FLOOD             K       1060
## 32 HURRICANE/TYPHOON             K       1001
## 35         TSTM WIND             K        950

Exploratory Analysis

We use pie charts to visualize the percentage of top event among all the weather-related events in the impacts to public health or economics.

##The pie charts for the top 10 total fatatilities and injuries by event
pie(Fsum$Fatalities[1:5],label=Fsum$Event,col=rainbow(5),main="Pie Chart of top 5 total fatalities by event",radius=1)

plot of chunk pie charts

pie(Isum$Injuries[1:5],label=Isum$Event,col=rainbow(5),main="Pie Chart of top 5 total injuries by event",radius=1)

plot of chunk pie charts

pie(Psum$PropDamage,label=Psum$Event,col=rainbow(4),main="Pie Chart of top 4 total property damages by event",radius=1)

plot of chunk pie charts

pie(Csum$CropDamage[1:5],label=Csum$Event,col=rainbow(5),main="Pie Chart of top 5 total crop damages by event",radius=1)

plot of chunk pie charts

Result

From the above analysis, we drive the following conclusion that from August 2001 to August 2004 in the U.S.: