Consequences in terms of population health and economical damage caused by severe weather events are studies in this report. The data used in this report is downloaded from the NOAA Storm Database. The data regarding the people health is available in terms of injuries and casualties. The economical loss data is presented in terms of crop and property economical losses. The data provided is split by type of peril and geographical adimistrations in the United States.
The analysis consists in summarizing the total population affected and the total economical damage caused by severe weather events in the United States. The population affected and the total economical damage are aggregated by peril for the entire United States. These results are used to find out what type of events are affecting the population health the most and what types of events are the most damage-causing.
The data is downloaded from the NOAA Storm Database. The file is saved in the working directory under the folder data.
The raw data is read and saved in the following variable.
noaa <- read.csv("./data/repdata_data_StormData.csv.bz2")
A few exploratory tests are performed on the data, such as
class(noaa)
## [1] "data.frame"
head(noaa)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
# summary(noaa)
The raw data needs to be clean and processed in order to be used in our analysis. The major issue with the data is regarding the economical damage. The values provided as economical damage are in terms of either USD, thousands USD, million USD or billion USD. Therefore, the entire data needs to be transformed in a unique unit in order to be processed consistently. Processed data is saved in a new variable, in which new columns are added to (1) account for the factor by which economical damage is scaled and (2) calculate the actual economical losses in USD.
noaa_new <- noaa
noaa_new["dmg_prop"] <- 0
noaa_new["dmg_crop"] <- 0
noaa_new["prop_factor"] <- 1
noaa_new["crop_factor"] <- 1
noaa_new$prop_factor[noaa_new$PROPDMGEXP == "k" | noaa_new$PROPDMGEXP == "K"] = 1000
noaa_new$prop_factor[noaa_new$PROPDMGEXP == "m" | noaa_new$PROPDMGEXP == "M"] = 1000000
noaa_new$prop_factor[noaa_new$PROPDMGEXP == "b" | noaa_new$PROPDMGEXP == "B"] = 1000000000
noaa_new$crop_factor[noaa_new$CROPDMGEXP == "k" | noaa_new$CROPDMGEXP == "K"] = 1000
noaa_new$crop_factor[noaa_new$CROPDMGEXP == "m" | noaa_new$CROPDMGEXP == "M"] = 1000000
noaa_new$crop_factor[noaa_new$CROPDMGEXP == "b" | noaa_new$CROPDMGEXP == "B"] = 1000000000
Two sets of results are sought:
types of events which affected health population the most
types of events which caused the largest economical damage
The top 5 types of events affecting the most population in terms of injuries and casualties are plotted in the follwoing bar plot. The number of population affected is disaggregated by injuries and fatalities.
noaa_inj <- with(noaa_new, aggregate(INJURIES, list(Peril = EVTYPE), FUN = "sum")) # fatalities
noaa_fat <- with(noaa_new, aggregate(FATALITIES, list(Peril = EVTYPE), FUN = "sum")) # injuries
noaa_pop <- merge(noaa_inj, noaa_fat, by = "Peril", all = TRUE)
noaa_pop["People_Affected"] <- noaa_pop$x.x+noaa_pop$x.y
noaa_pop_order <- noaa_pop[with(noaa_pop, order(People_Affected, decreasing = TRUE)), ]
pop_matrix <- t(as.matrix(noaa_pop_order[seq(1,5),c(2,3,4)]))
colnames(pop_matrix) <- as.character(noaa_pop_order[seq(1,5),1])
rownames(pop_matrix) <- c("Injuries","Casualties","Total_Population_Affected")
barplot(pop_matrix[c(1,2),],cex.names = 0.75, cex.axis = 0.75,
main="Five Most Health-Affecting Perils",
xlab="Peril", ylab="Number of People Affected",
col=c("green","blue"),
legend = rownames(pop_matrix[c(1,2),]))
A summary of the numbers reflecting the injuries and casualties caused by the 5 top events is shown in the table below.
library(knitr)
## Warning: package 'knitr' was built under R version 3.2.5
kable(pop_matrix, digits=0)
| TORNADO | EXCESSIVE HEAT | TSTM WIND | FLOOD | LIGHTNING | |
|---|---|---|---|---|---|
| Injuries | 91346 | 6525 | 6957 | 6789 | 5230 |
| Casualties | 5633 | 1903 | 504 | 470 | 816 |
| Total_Population_Affected | 96979 | 8428 | 7461 | 7259 | 6046 |
The top 5 types of events causing the most economical loss, in terms of crop and property loss, are plotted in the follwoing bar plot. The value of economical damage is disaggregated by crop loss and property loss and is presented in terms of million USD.
noaa_new$dmg_prop <- noaa_new$prop_factor*noaa_new$PROPDMG/10^6
noaa_new$dmg_crop <- noaa_new$crop_factor*noaa_new$CROPDMG/10^6
noaa_crop <- with(noaa_new, aggregate(dmg_crop, list(Peril = EVTYPE), FUN = "sum"))
noaa_prop <- with(noaa_new, aggregate(dmg_prop, list(Peril = EVTYPE), FUN = "sum"))
noaa_damage <- merge(noaa_crop, noaa_prop, by = "Peril", all = TRUE)
noaa_damage["Total_Damage"] <- noaa_damage$x.x+noaa_damage$x.y
noaa_damage_order <- noaa_damage[with(noaa_damage, order(Total_Damage, decreasing = TRUE)), ]
damage_matrix <- t(as.matrix(noaa_damage_order[seq(1,5),c(2,3,4)]))
colnames(damage_matrix) <- as.character(noaa_damage_order[seq(1,5),1])
rownames(damage_matrix) <- c("Crop","Property","Total_Damage")
barplot(damage_matrix[c(1,2),],cex.names = 0.75, cex.axis = 0.75,
main="Five Most Damage-causing Perils",
xlab="Peril", ylab="Damage [M USD]",
col=c("green","blue"),
legend = rownames(damage_matrix[c(1,2),]))
A summary of the values of the economical loss in million USD split by crop loss and property loss caused by the 5 top events is shown in the table below.
library(knitr)
kable(damage_matrix, digits=0)
| FLOOD | HURRICANE/TYPHOON | TORNADO | STORM SURGE | HAIL | |
|---|---|---|---|---|---|
| Crop | 5662 | 2608 | 415 | 0 | 3026 |
| Property | 144658 | 69306 | 56937 | 43324 | 15732 |
| Total_Damage | 150320 | 71914 | 57352 | 43324 | 18758 |
Based on our analysis, tornados are the type of events with the highest impact on population health, with a total of 9.697910^{4} of peope affected and floods are the type of events with the highest economical damage with a total 1.503196810^{5} million USD in total damage from crop and property damage combined.