The below data analysis analyzes the types of events most harmful to population health, and the events having the greatest economic consequences.
The below report is for a government or municipal manager who might be responsible for preparing for severe weather events. It displays graphs and code for displaying the events that have caused the most harm to the health of human population. It also depicts events that have had the grestest economic consequnces in the United States. The graphs have been plotted on the basis of data taken from the NOAA Storm Database.
Loading data into R:
The below code loads the repdata-data-StormData.csv file into a variable ‘data’. This file is then subsetted into a variable ‘health’ such that it contains only the EVTYPE, FATALITIES and INJURIES columns.
data<-read.csv("/Users/arushigulati/Desktop/Coursera/Peer Assessment 2 Reproducible Research/repdata-data-StormData.csv")
health<-subset(data, select=c(EVTYPE, FATALITIES, INJURIES))
health$EVTYPE<-tolower(health$EVTYPE)
The below code is used to combine similar events together.
health$EVTYPE[grep("tstm", health$EVTYPE)]<-"tstm wind"
health$EVTYPE[grep("coastal", health$EVTYPE)]<-"coastal flooding"
health$EVTYPE[grep("drought", health$EVTYPE)]<-"drought"
health$EVTYPE[grep("microburst", health$EVTYPE)]<-"microburst"
health$EVTYPE[grep("heat", health$EVTYPE)]<-"extreme heat"
health$EVTYPE[grep("warm", health$EVTYPE)]<-"extreme heat"
health$EVTYPE[grep("cold", health$EVTYPE)]<-"extreme cold"
health$EVTYPE[grep("winter", health$EVTYPE)]<-"extreme cold"
health$EVTYPE[grep("wintry", health$EVTYPE)]<-"extreme cold"
health$EVTYPE[grep("freeze", health$EVTYPE)]<-"extreme cold"
health$EVTYPE[grep("low temperature", health$EVTYPE)]<-"extreme cold"
health$EVTYPE[grep("flash", health$EVTYPE)]<-"floods"
health$EVTYPE[grep("flood", health$EVTYPE)]<-"floods"
health$EVTYPE[grep("fld", health$EVTYPE)]<-"floods"
health$EVTYPE[grep("fog", health$EVTYPE)]<-"fog"
health$EVTYPE[grep("freezing", health$EVTYPE)]<-"freezing rain"
health$EVTYPE[grep("glaze", health$EVTYPE)]<-"glaze"
health$EVTYPE[grep("gusty", health$EVTYPE)]<-"strong winds"
health$EVTYPE[grep("heavy", health$EVTYPE)]<-"heavy rain/snow/surf"
health$EVTYPE[grep("high", health$EVTYPE)]<-"heavy rain/snow/surf"
health$EVTYPE[grep("hurricane", health$EVTYPE)]<-"hurricane"
health$EVTYPE[grep("ice", health$EVTYPE)]<-"ice"
health$EVTYPE[grep("icy", health$EVTYPE)]<-"ice"
health$EVTYPE[grep("storm", health$EVTYPE)]<-"storms"
health$EVTYPE[grep("dust", health$EVTYPE)]<-"storms"
health$EVTYPE[grep("wind", health$EVTYPE)]<-"strong winds"
health$EVTYPE[grep("thunderstorm", health$EVTYPE)]<-"thunderstorm"
health$EVTYPE[grep("landslide", health$EVTYPE)]<-"landslide"
health$EVTYPE[grep("lightning", health$EVTYPE)]<-"lightning"
health$EVTYPE[grep("tropical", health$EVTYPE)]<-"tropical storm"
health$EVTYPE[grep("hypothermia", health$EVTYPE)]<-"hypothermia/exposure"
health$EVTYPE[grep("marine", health$EVTYPE)]<-"marine accident/wind"
health$EVTYPE[grep("mudslide", health$EVTYPE)]<-"mudslide"
health$EVTYPE[grep("rain", health$EVTYPE)]<-"heavy rain/snow/surf"
health$EVTYPE[grep("current", health$EVTYPE)]<-"heavy rain/snow/surf"
health$EVTYPE[grep("rough", health$EVTYPE)]<-"heavy rain/snow/surf"
health$EVTYPE[grep("rogue", health$EVTYPE)]<-"heavy rain/snow/surf"
health$EVTYPE[grep("snow", health$EVTYPE)]<-"heavy rain/snow/surf"
health$EVTYPE[grep("tornado", health$EVTYPE)]<-"tornado"
health$EVTYPE[grep("fire", health$EVTYPE)]<-"wildfire"
health$EVTYPE[grep("avalan", health$EVTYPE)]<-"avalanche"
Creating 2 data frames ‘agg1’ and ‘agg2’ for Fatalities and Injuries each, and performing cbind on them to get the resulting data frame ‘agg’. Further, the data is ordered such that the highest number of fatalities and injuries is on top.
agg1<- aggregate(health$FATALITIES ~ health$EVTYPE, FUN=sum)
agg2<- aggregate(health$INJURIES ~ health$EVTYPE, FUN=sum)
col<-c("Event","Injuries")
colnames(agg2)<-col
col1<-c("Event","Fatalities")
colnames(agg1)<-col1
agg<-cbind(agg1,agg2)
agg<-subset(agg, select = c(Event, Fatalities, Injuries))
ordered<-agg[order(-agg$Fatalities, -agg$Injuries),]
Finally, the graph may be plotted.
The highest number of fatalities and injuries were caused by tornadoes, at 5636 fatalities, and 91407 injuries. Extreme heat caused 3172 fatalities and 9228 injuries, while floods caused 1557 fatalities and 8685 injuries. Heavy rain/snow caused 1317 fatalities and 3752 injuries.
Loading data into R:
The below code uses the data loaded into variable ‘data’ earlier. This file is then subsetted into a variable ‘economic’ such that it contains only the EVTYPE, PROPDMG and PROPDMGEXP columns. This data is then subsetted into a variable agg3 such that it contains only those events that caused property damages in billions.
economic<-subset(data, select=c(PROPDMG,PROPDMGEXP, EVTYPE, CROPDMG, CROPDMGEXP))
crop<-subset (economic, economic$CROPDMGEXP!="")
agg3<-subset(crop, crop$PROPDMGEXP=="B")
Further processing
The below code converts all text under EVTYPE to lower case, and uses the grep command to combine all similar events into one.
agg3$EVTYPE<-tolower(agg3$EVTYPE)
agg3$EVTYPE[grep("storm", agg3$EVTYPE)]<-"Storm"
agg3$EVTYPE[grep("hurricane", agg3$EVTYPE)]<-"Hurricane/Typhoon"
agg3$EVTYPE[grep("flood", agg3$EVTYPE)]<-"Flood"
agg3$EVTYPE[grep("tornado", agg3$EVTYPE)]<-"Tornado"
agg4<- aggregate(agg3$PROPDMG ~ agg3$EVTYPE, FUN=sum)
col<-c("Event","PROPDMG")
colnames(agg4)<-col
‘agg4’ data frame is then ordered to place events causing the highest property damage on top. Then, the top 5 events are added to another data frame, which is transposed, and plotted using the barplot function.
ordered2<-agg4[order(-agg4$PROPDMG),]
ord2<-head(ordered2, n=5)
sub3<-subset(ord2, select=c(PROPDMG))
sub4<-t(sub3)
col2<-c("Floods", "Hurricane" , "Tornado","Storm", "Hail")
colnames(sub4)<-col2
barplot(sub4, xlab="Events", ylab="Property Damage", main="Economic Consequences: Property and Crop")
Floods have caused the maximum property damage, followed by hurricanes/typhoons, tornadoes, storms, and hail.