This analysis is based on data about storms and weather events provided by U.S. National Oceanic and Atmospheric Administration’s (NOAA). The data includes when and where the events happened, as well as estimates of any fatalities, injuries, and property/crop damages. The goal of this analysis is to find out which events affect the public health and economics most heavily. After analyzing the data during the period from August 2001 to August 2004, we conclude that: in public health, excessive heat caused the most fatalities, while tornado caused the most injuries; in economics, hurricane/typhoon caused the most property damages, while tornado caused the most crop damages. Overall, tornado and hurricane/typhoon ranked the top events that affected the public health and economics in the U.S. from August 2001 to August 2004, respectively.
The data is downloaded from Storm Data at the time of Thu Aug 21, 2014. We save the destination file in the original .csv.bz2 form to reduce the size, choose to read the columns as characters in the options of the read.csv() function to reduce the time loading data. Due to the limited computing capability we take a slice of 100,000 obervations.
#download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile="storm.csv.bz2")
classes<-rep("NULL",37)
classes[c(23,24,25,27)]<-"numeric"
classes[c(1,8,26,28)]<-"factor"
classes[c(2)]<-"character"
Storm <- read.csv("repdata-data-StormData.csv.bz2", skip=700000,nrows=100000, colClasses=classes)
The columns to be considered as relavant to the analysis are:
For convenience we rename the above columns as follows:
CNames<-c("State","Date","Event","Fatalities","Injuries","PropDamage","PropDamageExp","CropDamage","CropDamageExp")
colnames(Storm)<-CNames
We also convert the Dates into the POSXIT format for R:
Storm$Date<-strptime(Storm$Date,format="%e/%d/%Y %H:%M:%S")
We check the start and end dates of the selected data:
##Examine the Dates
Storm$Date[1]
## [1] "2001-08-22 EDT"
Storm$Date[100000]
## [1] "2004-08-17 EDT"
So we will analyze the impacts of weather-related events on the U.S. from the August 2001 to August 2004.
We check the states where the events took place to make sure the data is geographically evenly distributed in the U.S.
##Examine the States distribution
levels(Storm$State)
## [1] "1.00" "10.00" "11.00" "12.00" "13.00" "15.00" "16.00" "17.00"
## [9] "18.00" "19.00" "2.00" "20.00" "21.00" "22.00" "23.00" "24.00"
## [17] "25.00" "26.00" "27.00" "28.00" "29.00" "30.00" "31.00" "32.00"
## [25] "33.00" "34.00" "35.00" "36.00" "37.00" "38.00" "39.00" "4.00"
## [33] "40.00" "41.00" "42.00" "44.00" "45.00" "46.00" "47.00" "48.00"
## [41] "49.00" "5.00" "50.00" "51.00" "53.00" "54.00" "55.00" "56.00"
## [49] "6.00" "60.00" "66.00" "72.00" "78.00" "8.00" "81.00" "83.00"
## [57] "84.00" "85.00" "86.00" "87.00" "88.00" "89.00" "9.00" "90.00"
## [65] "91.00" "92.00" "93.00" "94.00" "95.00"
To address the analysis on which event has the greatest impacts on public health, we aggregate the sums of Fatalities and Injuries by Event:
pub.health.sum<-aggregate(cbind(Fatalities, Injuries)~Event,data=Storm,FUN=sum)
The events that caused top 5 fatalities are:
Fsum<-pub.health.sum[with(pub.health.sum, order(-Fatalities)),]
head(Fsum)
## Event Fatalities Injuries
## 21 EXCESSIVE HEAT 214 626
## 85 TORNADO 139 2347
## 26 FLASH FLOOD 127 147
## 59 LIGHTNING 121 664
## 71 RIP CURRENTS 56 96
## 70 RIP CURRENT 54 48
The events that caused top 5 injuries are:
Isum<-pub.health.sum[with(pub.health.sum, order(-Injuries)),]
head(Isum)
## Event Fatalities Injuries
## 85 TORNADO 139 2347
## 52 HURRICANE/TYPHOON 30 1114
## 59 LIGHTNING 121 664
## 89 TSTM WIND 44 649
## 21 EXCESSIVE HEAT 214 626
## 102 WILDFIRE 24 303
To address the analysis on which event has the greatest impacts on economics, we aggregate the sums of PropDamage and CropDamage by Event. Note that the exponents of the data are different; hence we only consider the data with the highest exponent in the corresponding category: “B” for PropDamage, and “M” for CropDamage.
Prop.sum<-aggregate(PropDamage~Event+PropDamageExp,data=Storm,subset=(PropDamageExp=="B"),FUN=sum)
Psum<-Prop.sum[with(Prop.sum, order(-PropDamage)),]
head(Psum)
## Event PropDamageExp PropDamage
## 3 HURRICANE/TYPHOON B 16.75
## 2 HIGH WIND B 1.30
## 4 WILDFIRE B 1.04
## 1 FLASH FLOOD B 1.00
Crop.sum<-aggregate(CropDamage~Event+CropDamageExp,data=Storm,subset=(PropDamageExp=="M"),FUN=sum)
Csum<-Crop.sum[with(Crop.sum, order(-CropDamage)),]
head(Csum)
## Event CropDamageExp CropDamage
## 33 TORNADO K 2053
## 28 FLASH FLOOD K 1970
## 31 HIGH WIND K 1800
## 29 FLOOD K 1060
## 32 HURRICANE/TYPHOON K 1001
## 35 TSTM WIND K 950
We use pie charts to visualize the percentage of top event among all the weather-related events in the impacts to public health or economics.
##The pie charts for the top 10 total fatatilities and injuries by event
pie(Fsum$Fatalities[1:5],label=Fsum$Event,col=rainbow(5),main="Pie Chart of top 5 total fatalities by event",radius=1)
pie(Isum$Injuries[1:5],label=Isum$Event,col=rainbow(5),main="Pie Chart of top 5 total injuries by event",radius=1)
pie(Psum$PropDamage,label=Psum$Event,col=rainbow(4),main="Pie Chart of top 4 total property damages by event",radius=1)
pie(Csum$CropDamage[1:5],label=Csum$Event,col=rainbow(5),main="Pie Chart of top 5 total crop damages by event",radius=1)
From the above analysis, we drive the following conclusion that from August 2001 to August 2004 in the U.S.: