The following document describes a brief analysis performed on data collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, as part of the Johns Hopkings University Data Science course at Coursera.
The anaysis pretends to identify the weather related events that produce the major impacts related to both healh/injuries and economic losses.
The conclusions of the analysis are the following:
Across the United States, the most harmful events for overall porpulation health (including injuries and deaths) are the numerous Tornadoes, producing more than 100.000 injuries in the last 60 years. This is followed by injuries produced by heat waves, wild fires and tropical storm , that are consistently harmful to the population.
In terms of economic losses derived from weather conditions.Hurricanes, Floods and Storms are the events producing the highest economic damage to properties, producing around 200mill losses per event. Regarding crops, the most harmful conditions are high cold, and wet weather conditions, producing around 100mill losses per event.
In this section, it is covered the initial data straction, data treatments and all the aggregation anf transformation processesprior to producing the anlysis results.
Data is dwownloaded from this link and loaded into R:
data_raw <- read.csv("repdata-data-StormData.csv")
Additional information on the dataset can be found in the following links:
Libraries loaded:
library(plyr)
library(ggplot2)
Many of the event types are reffering to the same type of weather conditions. After some exploration of the most representative events, some of them are aggregated under common categories, including:
data <- data_raw
data[which(data[,"EVTYPE"] %in% c("TORNADO","TORNADO F0","TORNADOES, TSTM WIND, HAIL"), arr.ind=TRUE), "EVTYPE"] <- "TORNADO"
data[which(data[,"EVTYPE"] %in% c("HURRICANE ERIN","HURRICANE OPAL","HURRICANE OPAL/HIGH WINDS","HURRICANE/TYPHOON","TYPHOON"), arr.ind=TRUE), "EVTYPE"] <- "HURRICANE"
data[which(data[,"EVTYPE"] %in% c("HEAVY RAIN","HEAVY RAINS", "RAIN","HEAVY RAIN/SEVERE WEATHER"), arr.ind=TRUE), "EVTYPE"] <- "HEAVY RAIN"
data[which(data[,"EVTYPE"] %in% c("HAIL","ICE STORM","HAIL 1.75)","HAILSTORM"), arr.ind=TRUE), "EVTYPE"] <- "HAIL"
data[which(data[,"EVTYPE"] %in% c("LIGHTING","LIGHTNING","LIGHTNING AND HEAVY RAIN"), arr.ind=TRUE), "EVTYPE"] <- "LIGHTNING"
data[which(data[,"EVTYPE"] %in% c("FLASH FLOOD","FLOOD","FLOODING","FLASH FLOODING/THUNDERSTORM WI","FLASH FLOODING","RIVER FLOOD"), arr.ind=TRUE), "EVTYPE"] <- "FLOOD"
data[which(data[,"EVTYPE"] %in% c("THUNDERSTORM WIND","THUNDERSTORM WINDS","THUNDERSTORM WINS","THUNDERSTORM WINDS/HAIL","THUNDERSTORM WINDS HAIL","THUNDERSTORM WINDS LIGHTNING","SEVERE THUNDERSTORM"), arr.ind=TRUE), "EVTYPE"] <- "THUNDERSTORM WIND"
data[which(data[,"EVTYPE"] %in% c("FROST/FREEZE","EXTREME COLD","DAMAGING FREEZE","Damaging Freeze","Extreme Cold","Early Frost", "FREEZE", "Freeze"), arr.ind=TRUE), "EVTYPE"] <- "EXTREME COLD"
For the economic data, there are discrepancies when defining the unit in wich the economic loss amount is represented. The following code is appliedin order to obtain a final, comparable loss amount for both property and crop losses:
data$PROPDAMAGE <- 0
aux0 <- which(data$PROPDMGEXP %in% c("0",""))
data[aux0,]$PROPDAMAGE <- data[aux0,]$PROPDAMAGE
aux1 <- which(data$PROPDMGEXP == "1")
data[aux1,]$PROPDAMAGE <- data[aux1,]$PROPDMG * 10
aux2 <- which(data$PROPDMGEXP %in% c("H","h","2"))
data[aux2,]$PROPDAMAGE <- data[aux2,]$PROPDMG * 10**2
aux3 <- which(data$PROPDMGEXP %in% c("K","k","3"))
data[aux3,]$PROPDAMAGE <- data[aux3,]$PROPDMG * 10**3
aux4 <- which(data$PROPDMGEXP == "4")
data[aux4,]$PROPDAMAGE <- data[aux4,]$PROPDMG * 10**4
aux5 <- which(data$PROPDMGEXP == "5")
data[aux5,]$PROPDAMAGE <- data[aux5,]$PROPDMG * 10**5
aux6 <- which(data$PROPDMGEXP %in% c("M","m","6"))
data[aux6,]$PROPDAMAGE <- data[aux6,]$PROPDMG * 10**6
aux7 <- which(data$PROPDMGEXP == "7")
data[aux7,]$PROPDAMAGE <- data[aux7,]$PROPDMG * 10**7
aux8 <- which(data$PROPDMGEXP == "8")
data[aux8,]$PROPDAMAGE <- data[aux8,]$PROPDMG * 10**8
aux9 <- which(data$PROPDMGEXP == "B")
data[aux9,]$PROPDAMAGE <- data[aux9,]$PROPDMG * 10**9
data$CROPDAMAGE <- 0
aux0 <- which(data$CROPDMGEXP %in% c("0",""))
data[aux0,]$CROPDAMAGE <- data[aux0,]$CROPDAMAGE
aux1 <- which(data$CROPDMGEXP == "1")
data[aux1,]$CROPDAMAGE <- data[aux1,]$CROPDMG * 10
aux2 <- which(data$CROPDMGEXP %in% c("H","h","2"))
data[aux2,]$CROPDAMAGE <- data[aux2,]$CROPDMG * 10**2
aux3 <- which(data$CROPDMGEXP %in% c("K","k","3"))
data[aux3,]$CROPDAMAGE <- data[aux3,]$CROPDMG * 10**3
aux4 <- which(data$CROPDMGEXP == "4")
data[aux4,]$CROPDAMAGE <- data[aux4,]$CROPDMG * 10**4
aux5 <- which(data$CROPDMGEXP == "5")
data[aux5,]$CROPDAMAGE <- data[aux5,]$CROPDMG * 10**5
aux6 <- which(data$CROPDMGEXP %in% c("M","m","6"))
data[aux6,]$CROPDAMAGE <- data[aux6,]$CROPDMG * 10**6
aux7 <- which(data$CROPDMGEXP == "7")
data[aux7,]$CROPDAMAGE <- data[aux7,]$CROPDMG * 10**7
aux8 <- which(data$CROPDMGEXP == "8")
data[aux8,]$CROPDAMAGE <- data[aux8,]$CROPDMG * 10**8
aux9 <- which(data$CROPDMGEXP == "B")
data[aux9,]$CROPDAMAGE <- data[aux9,]$CROPDMG * 10**9
In order to analyze the effect on the population health, the following variables are considered and extracted from the large dataset:
All the information is related to events occuring within the US and there is only one observation per event:
health <- data[,c("EVTYPE","FATALITIES","INJURIES")]
## EVTYPE FATALITIES INJURIES
## 1 TORNADO 0 15
## 2 TORNADO 0 0
## 3 TORNADO 0 2
## 4 TORNADO 0 2
## 5 TORNADO 0 2
## 6 TORNADO 0 6
The injuries and fatalities are considered from both absulute and relative terms. TOP 10 event types are considered for both effects.
health_sum <- aggregate(cbind(FATALITIES,INJURIES) ~ EVTYPE,health, sum)
inj_sum_t10<- health_sum[with(health_sum, order(-INJURIES)),][1:10,]
fat_sum_t10<- health_sum[with(health_sum, order(-FATALITIES)),][1:10,]
health_mean <- aggregate(cbind(FATALITIES,INJURIES) ~ EVTYPE, health,mean)
inj_mean_t10<- health_mean[with(health_mean, order(-INJURIES)),][1:10,]
fat_mean_t10<- health_mean[with(health_mean, order(-FATALITIES)),][1:10,]
Data is classified and agregated into report tables in order to produce the final results
d1 <- data.frame(event = inj_sum_t10$EVTYPE, values=inj_sum_t10$INJURIES, effect="Injuries", analysis = "Sum")
d2 <- data.frame(event = fat_sum_t10$EVTYPE, values=fat_sum_t10$FATALITIES, effect="Fatalities", analysis = "Sum")
h.tops <- rbind(d1,d2)
d3 <- data.frame(event = inj_mean_t10$EVTYPE, values=inj_mean_t10$INJURIES, effect="Injuries", analysis = "Mean")
d4 <- data.frame(event = fat_mean_t10$EVTYPE, values=fat_mean_t10$FATALITIES, effect="Fatalities", analysis = "Mean")
h.topm <- rbind(d3,d4)
.
In order to analyze the effect on the economic impact, the following variables are considered and extracted from the large dataset:
expense <- data[,c("EVTYPE","CROPDAMAGE","PROPDAMAGE")]
## EVTYPE CROPDAMAGE PROPDAMAGE
## 1 TORNADO 0 25000
## 2 TORNADO 0 2500
## 3 TORNADO 0 25000
## 4 TORNADO 0 2500
## 5 TORNADO 0 2500
## 6 TORNADO 0 2500
The expenses are considered from both absulute and relative terms. TOP 5 event types are considered.
expense_sum <- aggregate(cbind(CROPDAMAGE,PROPDAMAGE) ~ EVTYPE,expense, sum)
crop_sum_t10<- expense_sum[with(expense_sum, order(-CROPDAMAGE)),][1:5,]
prop_sum_t10<- expense_sum[with(expense_sum, order(-PROPDAMAGE)),][1:5,]
expense_mean <- aggregate(cbind(CROPDAMAGE,PROPDAMAGE) ~ EVTYPE,expense, mean)
crop_mean_t10<- expense_mean[with(expense_mean, order(-CROPDAMAGE)),][1:5,]
prop_mean_t10<- expense_mean[with(expense_mean, order(-PROPDAMAGE)),][1:5,]
Data is classified and agregated into report tables in order to produce the final results
d1 <- data.frame(event = crop_sum_t10$EVTYPE, values=crop_sum_t10$CROPDAMAGE/1000000000, effect="Crop Damage", analysis = "Sum")
d2 <- data.frame(event = prop_sum_t10$EVTYPE, values=prop_sum_t10$PROPDAMAGE/1000000000, effect="Property Damage", analysis = "Sum")
d3 <- data.frame(event = crop_mean_t10$EVTYPE, values=crop_mean_t10$CROPDAMAGE/1000000000, effect="Crop Damage", analysis = "Mean")
d4 <- data.frame(event = prop_mean_t10$EVTYPE, values=prop_mean_t10$PROPDAMAGE/1000000000, effect="Property Damage", analysis = "Mean")
e.top <- rbind(d1,d2,d3,d4)
Tornado_inj <- h.tops[h.tops$event == "TORNADO" & h.tops$effect == "Injuries",]$values
Tornado_fat <- h.tops[h.tops$event == "TORNADO" & h.tops$effect == "Fatalities",]$values
The absolute main contributor to injueries and deaths in history has been Tornadoes, producing a total of 1.0515210^{5} injuries and 7922 fatalities recorded from 1950 to 2013.
For the rest of the Top 10 events, the total recorded public healh effects are summarised in the following chart:
ggplot(h.tops[h.tops$event!= "TORNADO",], aes(x=event,y=values, fill= effect)) +
facet_grid(analysis ~ effect, scales= "free") +
geom_bar(stat="identity", colour= "black") +
theme(axis.text.x= element_text(angle= 90, hjust= 1))+
scale_y_continuous(name = "number of injuries/deaths from 1950 to 2011")
Tornadoes are followed by Heat Wave, with the second most harmful effect in both injuries and fatalities.
There are also many injuries related to wind conditions.
In relative values (average injuries and fatalities per event), the top10 contributors to public health injuries are:
ggplot(h.topm, aes(x=event,y=values, fill= effect)) +
facet_grid(analysis ~ effect, scales= "free") +
geom_bar(stat="identity", colour= "black") +
theme(axis.text.x= element_text(angle= 90, hjust= 1)) +
scale_y_continuous(name = "number of injuries/deaths per event")
It can be seen that in relative terms the conditions producing constant injuries are related to Heat waves, Wildfire and storms, with more than 40 injured per event.
In terms of economic damage, the analysis show the following absolute (total recorded losses) and relative (by event) eonomic loss derived from the Top 5
ggplot(e.top , aes(x=event,y=values, fill= effect)) +
facet_grid(analysis ~ ., scales= "free") +
geom_bar(stat="identity",colour = "black") +
theme(axis.text.x= element_text(angle= 90, hjust= 1)) +
scale_y_continuous(name = "economic loss ($bill)")
As can be seen, the greatest economic lossesfrom the destruction of properties derived from weather conditions come from the effect of hurricanes, floods and storms, producing around 200mill losses per event.
Regarding crops, the most harmful conditions are high cold, and wet weather conditions, producing around 100mill losses per event.