This report is the final project of the Coursera Reproducible Research course. The purpose of the project is to explore the NOAA Storm Database and analyze the impact of weather events on population and property.
The first analysis focus on showing what types of severe weather events are the most harmful on population health, exactly injuries and fatalities.
The second analysis shows the economic consequences that severe weather events has caused in property and crops.
The results show that the tornados caused the higher impact on population health. It also shows tha flood had the higher impact on property damage and drought had the higher impact on crop damaged.
The data can be downloaded from the course web site:
The variables corresponding to the analysis are:
The database contains data from 1950 until November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete. There is also some documentation of the database available:
Downloading the file
fileUrl<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
directory<-("raw_data.csv")
download.file(fileUrl,directory)
library(plyr)
library(dplyr)
library(ggplot2)
library(grid)
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.4.1
Loading the data
data<-read.table("raw_data.csv",header=TRUE, sep=",")
Subsetting the main Data for Analysis
storm_data<-data[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
Processing the data
To show the impact of the weather in population, the data was summarized by type of event and ordered in a decreasing way in both fatalities and injuries.
fatalities<-storm_data%>%
group_by(EVTYPE)%>%
summarise(FATALITIES=sum(FATALITIES))
fatalities<-fatalities[order(fatalities$FATALITIES,decreasing=TRUE),]
injuries<-storm_data%>%
group_by(EVTYPE)%>%
summarise(INJURIES=sum(INJURIES))
injuries<-injuries[order(injuries$INJURIES,decreasing=TRUE),]
Setting the right units
The exponential values are stored in a seperate column describing their value with letters (h,H = hundred, k,K = thousand, m,M = million, B = billion); to convert the letters to a numeric value, the letter is changed to a 10^x expression, and the signs diferent to a number or letter are changed to 0. The new values were assigned to a new columns.
levels of PROPDMGEXP
levels(storm_data$PROPDMGEXP)
## [1] "" "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
storm_data$PROPEXP[storm_data$PROPDMGEXP == "" ]<-1
storm_data$PROPEXP[storm_data$PROPDMGEXP == "-" ]<-0
storm_data$PROPEXP[storm_data$PROPDMGEXP == "?" ]<-0
storm_data$PROPEXP[storm_data$PROPDMGEXP == "+" ]<-0
storm_data$PROPEXP[storm_data$PROPDMGEXP == 0 ]<-1
storm_data$PROPEXP[storm_data$PROPDMGEXP == 1 ]<-10^1
storm_data$PROPEXP[storm_data$PROPDMGEXP == 2 ]<-10^2
storm_data$PROPEXP[storm_data$PROPDMGEXP == 3 ]<-10^3
storm_data$PROPEXP[storm_data$PROPDMGEXP == 4 ]<-10^4
storm_data$PROPEXP[storm_data$PROPDMGEXP == 5 ]<-10^5
storm_data$PROPEXP[storm_data$PROPDMGEXP == 7 ]<-10^7
storm_data$PROPEXP[storm_data$PROPDMGEXP == 8 ]<-10^8
storm_data$PROPEXP[storm_data$PROPDMGEXP == "B" ]<-10^9
storm_data$PROPEXP[storm_data$PROPDMGEXP == "h" | storm_data$PROPDMGEXP == "H" ]<-10^2
storm_data$PROPEXP[storm_data$PROPDMGEXP == "k" | storm_data$PROPDMGEXP == "K" ]<-10^3
storm_data$PROPEXP[storm_data$PROPDMGEXP == "m" | storm_data$PROPDMGEXP == "M" ]<-10^6
levels of CROPDMGEXP
levels(storm_data$CROPDMGEXP)
## [1] "" "?" "0" "2" "B" "k" "K" "m" "M"
storm_data$CROPEXP[storm_data$CROPDMGEXP == ""]<-1
storm_data$CROPEXP[storm_data$CROPDMGEXP == "?"]<-0
storm_data$CROPEXP[storm_data$CROPDMGEXP == 0]<-1
storm_data$CROPEXP[storm_data$CROPDMGEXP == 2]<-10^2
storm_data$CROPEXP[storm_data$CROPDMGEXP == "B"]<-10^9
storm_data$CROPEXP[storm_data$CROPDMGEXP == "k" | storm_data$CROPDMGEXP == "K"]<-10^3
storm_data$CROPEXP[storm_data$CROPDMGEXP == "m" | storm_data$CROPDMGEXP == "M"]<-10^6
Creating the right exponential value
storm_data$PROPDMGCOST<-storm_data$PROPDMG*(as.numeric(storm_data$PROPEXP))
storm_data$CROPDMGCOST<-storm_data$CROPDMG*(as.numeric(storm_data$CROPEXP))
Processing the data
To show the impact of the weather in the economy, the data was summarized by type of event and ordered in a decreasing way in both property and crop.
propdmg<-storm_data%>%
group_by(EVTYPE)%>%
summarise(PROPDMGCOST=sum(PROPDMGCOST))
propdmg<-propdmg[order(propdmg$PROPDMGCOST,decreasing=TRUE),]
cropdmg<-storm_data%>%
group_by(EVTYPE)%>%
summarise(CROPDMGCOST=sum(CROPDMGCOST))
cropdmg<-cropdmg[order(cropdmg$CROPDMGCOST,decreasing=TRUE),]
Across the United States 985 type of eventes have been registered, this report shows the top 10 weather events that affected the populations health (injuries and deaths). Tornados caused the most higher number of fatalities and injuries.
report_f<-head(fatalities,10)
report_f
## # A tibble: 10 x 2
## EVTYPE FATALITIES
## <fctr> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
report_i<-head(injuries,10)
report_i
## # A tibble: 10 x 2
## EVTYPE INJURIES
## <fctr> <dbl>
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
plot_f <- ggplot(data=head(fatalities,10), aes(x=reorder(EVTYPE,FATALITIES),y=FATALITIES)) +
coord_flip()+geom_bar(fill="purple",stat="identity",width=0.5) +
labs(title =" Top 10 Events causing impact in Health",
x = "Event Type", y = "Total Number of Fatalities")
plot_i <- ggplot(data=head(injuries,10), aes(x=reorder(EVTYPE,INJURIES),y=INJURIES)) +
coord_flip()+geom_bar(fill="turquoise",stat="identity",width=0.5) +
xlab("Event Type")+ylab("Total Number of Injuries")
grid.arrange(plot_f, plot_i, nrow =2)
Regarding to the cost of property damage, we can see the flood produced the the higher lost, followed by hurricanes adn storms. In the case of crops the higher cost from damages were caused by drought folowed by floods and river floods.
report_prop<-head(propdmg,10)
report_prop
## # A tibble: 10 x 2
## EVTYPE PROPDMGCOST
## <fctr> <dbl>
## 1 FLOOD 144657709807
## 2 HURRICANE/TYPHOON 69305840000
## 3 STORM SURGE 43323536000
## 4 FLASH FLOOD 16822673979
## 5 HAIL 15735267513
## 6 HURRICANE 11868319010
## 7 TROPICAL STORM 7703890550
## 8 WINTER STORM 6688497251
## 9 HIGH WIND 5270046260
## 10 RIVER FLOOD 5118945500
report_crop<-head(cropdmg,10)
report_crop
## # A tibble: 10 x 2
## EVTYPE CROPDMGCOST
## <fctr> <dbl>
## 1 DROUGHT 13972566000
## 2 FLOOD 5661968450
## 3 RIVER FLOOD 5029459000
## 4 ICE STORM 5022113500
## 5 HAIL 3025954473
## 6 HURRICANE 2741910000
## 7 HURRICANE/TYPHOON 2607872800
## 8 FLASH FLOOD 1421317100
## 9 EXTREME COLD 1292973000
## 10 FROST/FREEZE 1094086000
plot_prop <- ggplot(data=head(propdmg,10), aes(x=reorder(EVTYPE,PROPDMGCOST),y=PROPDMGCOST)) +
coord_flip()+geom_bar(fill="purple",stat="identity",width=0.5) +
labs(title =" Top 10 Events Damage Cost",
x = "Event Type", y = "Damage Cost")
plot_crop <- ggplot(data=head(cropdmg,10), aes(x=reorder(EVTYPE,CROPDMGCOST),y=CROPDMGCOST)) +
coord_flip()+geom_bar(fill="turquoise",stat="identity",width=0.5) +
xlab("Event Type")+ylab("Damage Cost")
grid.arrange(plot_prop, plot_crop, nrow =2)