This report is generated to fulfill the requirement of a coursera course - “reproducible research, week4 project”.
The goal of the assignment is to explore the NOAA Storm Database and explore the effects of severe weather events on both public health and economic consequencies.
As a result, the top 10 weather events that have highest impact on public health (injuries and fatalities),as well as economic consequencies are plotted.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. They can be downloaded from the course web site:
There is also some documentation of the database available. Here y some of the variables are constructed/defined:
National Weather Service Storm Data Documentation National Climatic Data Center Storm Events FAQ
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
The following code is used to download and process the data:
fileUrl<- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl,destfile = paste0(getwd(),"/stormdata.csv.bz2"), method = "curl")
library(dplyr)
library(ggplot2)
library(grid)
library(gridExtra)
storm<- read.csv(bzfile("stormdata.csv.bz2"))
storm <-tbl_df(storm)
In order to turn the value that are related to economic consequencies into useful value, we have first turn the exponential values into more practical ones. The following function is therefore created to do the job.
TurnExp <- function(x){
if (x %in% c("H","h"))
return (2)
else if (x %in% c("k","K"))
return(3)
else if (x %in% c("m","M"))
return(6)
else if (x %in% c("b","B"))
return(9)
else if (!is.na(as.numeric(x)))
return(as.numeric(x))
else if (x %in% c("-","?","+",""))
return(0)
else {
stop("N.A")
}
}
Then the function is applied to the dataset to turn specific columns into practically useful numbers and new dataset that shows relations between weather type and economic consequencies are created:
new_CROPEXP <- sapply(storm$CROPDMGEXP,FUN=TurnExp)
new_PROPEXP <- sapply(storm$PROPDMGEXP,FUN=TurnExp)
Crop_weather<-storm %>% mutate(Crop=CROPDMG*(10**new_CROPEXP))%>%group_by(EVTYPE) %>% summarise(Crop_total=sum(Crop))%>%arrange(desc(Crop_total))
Prop_weather<-storm %>% mutate(Prop=PROPDMG*(10**new_PROPEXP))%>%group_by(EVTYPE) %>% summarise(Prop_total=sum(Prop))%>%arrange(desc(Prop_total))
The top 10 weather effects that have highest impact on public health is plotted using the following code:
injury <- storm%>%group_by(EVTYPE)%>%summarise(Injury=sum(INJURIES))
injury_sorted <- injury %>% arrange(desc(Injury))
injury_top_10 <- injury_sorted[1:10,]
fatality <- storm%>%group_by(EVTYPE)%>%summarise(Fatality=sum(FATALITIES))
fatality_sorted <- fatality %>% arrange(desc(Fatality))
fatality_top_10 <- fatality_sorted[1:10,]
p1<- ggplot(data=injury_top_10,aes(reorder(EVTYPE,Injury),Injury))+geom_bar(fill="red",stat = "identity")+coord_flip()+labs(title="Health impact of weather events - Top 10", y="Total number of Injuries", x="Weather event")
p2<- ggplot(data=fatality_top_10,aes(reorder(EVTYPE,Fatality),Fatality))+geom_bar(fill="blue",stat = "identity")+coord_flip()+labs(y="Total number of Fatalities", x="Weather event")
grid.arrange(p1,p2,nrow=2)
The top 10 weather effects that have highest impact on economic consequencies is plotted using the following code:
p3<- ggplot(data=head(Crop_weather,10),aes(reorder(EVTYPE,Crop_total),y=log10(Crop_total)))+geom_bar(fill="green",stat = "identity")+coord_flip()+labs(title="Economic consequencies of weather events - Top 10", y="Crop damage in dollars (log10)", x="Weather event")
p4<- ggplot(data=head(Prop_weather,10),aes(reorder(EVTYPE,Prop_total),y=log10(Prop_total)))+geom_bar(fill="yellow",stat = "identity")+coord_flip()+labs(y="Property damage in dollers(log10)", x="Weather event")
grid.arrange(p3,p4,nrow=2)