Weather Event Impact Analysis

Synopsis

The following document describes a brief analysis performed on data collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, as part of the Johns Hopkings University Data Science course at Coursera.

The anaysis pretends to identify the weather related events that produce the major impacts related to both healh/injuries and economic losses.

The conclusions of the analysis are the following:

  1. Across the United States, the most harmful events for overall porpulation health (including injuries and deaths) are the numerous Tornadoes, producing more than 100.000 injuries in the last 60 years. This is followed by injuries produced by heat waves, wild fires and tropical storm , that are consistently harmful to the population.

  2. In terms of economic losses derived from weather conditions.Hurricanes, Floods and Storms are the events producing the highest economic damage to properties, producing around 200mill losses per event. Regarding crops, the most harmful conditions are high cold, and wet weather conditions, producing around 100mill losses per event.

Data Procesing

In this section, it is covered the initial data straction, data treatments and all the aggregation anf transformation processesprior to producing the anlysis results.

Data is dwownloaded from this link and loaded into R:

data_raw <- read.csv("repdata-data-StormData.csv")

Additional information on the dataset can be found in the following links:

Libraries loaded:

library(plyr)
library(ggplot2)

Data Treatments

Many of the event types are reffering to the same type of weather conditions. After some exploration of the most representative events, some of them are aggregated under common categories, including:

  • Tornados
  • Hails
  • Hurricanes
  • Rains
  • Lightings
  • Floods
  • Thunderstorm winds
data <- data_raw

data[which(data[,"EVTYPE"] %in% c("TORNADO","TORNADO F0","TORNADOES, TSTM WIND, HAIL"), arr.ind=TRUE), "EVTYPE"] <- "TORNADO"
data[which(data[,"EVTYPE"] %in% c("HURRICANE ERIN","HURRICANE OPAL","HURRICANE OPAL/HIGH WINDS","HURRICANE/TYPHOON","TYPHOON"), arr.ind=TRUE), "EVTYPE"] <- "HURRICANE"
data[which(data[,"EVTYPE"] %in% c("HEAVY RAIN","HEAVY RAINS", "RAIN","HEAVY RAIN/SEVERE WEATHER"), arr.ind=TRUE), "EVTYPE"] <- "HEAVY RAIN"
data[which(data[,"EVTYPE"] %in% c("HAIL","ICE STORM","HAIL 1.75)","HAILSTORM"), arr.ind=TRUE), "EVTYPE"] <- "HAIL"
data[which(data[,"EVTYPE"] %in% c("LIGHTING","LIGHTNING","LIGHTNING AND HEAVY RAIN"), arr.ind=TRUE), "EVTYPE"] <- "LIGHTNING"
data[which(data[,"EVTYPE"] %in% c("FLASH FLOOD","FLOOD","FLOODING","FLASH FLOODING/THUNDERSTORM WI","FLASH FLOODING","RIVER FLOOD"), arr.ind=TRUE), "EVTYPE"] <- "FLOOD"
data[which(data[,"EVTYPE"] %in% c("THUNDERSTORM WIND","THUNDERSTORM WINDS","THUNDERSTORM WINS","THUNDERSTORM WINDS/HAIL","THUNDERSTORM WINDS HAIL","THUNDERSTORM WINDS LIGHTNING","SEVERE THUNDERSTORM"), arr.ind=TRUE), "EVTYPE"] <- "THUNDERSTORM WIND"
data[which(data[,"EVTYPE"] %in% c("FROST/FREEZE","EXTREME COLD","DAMAGING FREEZE","Damaging Freeze","Extreme Cold","Early Frost", "FREEZE", "Freeze"), arr.ind=TRUE), "EVTYPE"] <- "EXTREME COLD"

For the economic data, there are discrepancies when defining the unit in wich the economic loss amount is represented. The following code is appliedin order to obtain a final, comparable loss amount for both property and crop losses:

 data$PROPDAMAGE <- 0
aux0 <- which(data$PROPDMGEXP %in% c("0",""))
data[aux0,]$PROPDAMAGE <- data[aux0,]$PROPDAMAGE
aux1 <- which(data$PROPDMGEXP == "1")
data[aux1,]$PROPDAMAGE <- data[aux1,]$PROPDMG * 10
aux2 <- which(data$PROPDMGEXP %in% c("H","h","2"))
data[aux2,]$PROPDAMAGE <- data[aux2,]$PROPDMG * 10**2
aux3 <- which(data$PROPDMGEXP %in% c("K","k","3"))
data[aux3,]$PROPDAMAGE <- data[aux3,]$PROPDMG * 10**3
aux4 <- which(data$PROPDMGEXP == "4")
data[aux4,]$PROPDAMAGE <- data[aux4,]$PROPDMG * 10**4
aux5 <- which(data$PROPDMGEXP == "5")
data[aux5,]$PROPDAMAGE <- data[aux5,]$PROPDMG * 10**5
aux6 <- which(data$PROPDMGEXP %in% c("M","m","6"))
data[aux6,]$PROPDAMAGE <- data[aux6,]$PROPDMG * 10**6
aux7 <- which(data$PROPDMGEXP == "7")
data[aux7,]$PROPDAMAGE <- data[aux7,]$PROPDMG * 10**7
aux8 <- which(data$PROPDMGEXP == "8")
data[aux8,]$PROPDAMAGE <- data[aux8,]$PROPDMG * 10**8
aux9 <- which(data$PROPDMGEXP == "B")
data[aux9,]$PROPDAMAGE <- data[aux9,]$PROPDMG * 10**9   


data$CROPDAMAGE <- 0
aux0 <- which(data$CROPDMGEXP %in% c("0",""))
data[aux0,]$CROPDAMAGE <- data[aux0,]$CROPDAMAGE
aux1 <- which(data$CROPDMGEXP == "1")
data[aux1,]$CROPDAMAGE <- data[aux1,]$CROPDMG * 10
aux2 <- which(data$CROPDMGEXP %in% c("H","h","2"))
data[aux2,]$CROPDAMAGE <- data[aux2,]$CROPDMG * 10**2
aux3 <- which(data$CROPDMGEXP %in% c("K","k","3"))
data[aux3,]$CROPDAMAGE <- data[aux3,]$CROPDMG * 10**3
aux4 <- which(data$CROPDMGEXP == "4")
data[aux4,]$CROPDAMAGE <- data[aux4,]$CROPDMG * 10**4
aux5 <- which(data$CROPDMGEXP == "5")
data[aux5,]$CROPDAMAGE <- data[aux5,]$CROPDMG * 10**5
aux6 <- which(data$CROPDMGEXP %in% c("M","m","6"))
data[aux6,]$CROPDAMAGE <- data[aux6,]$CROPDMG * 10**6
aux7 <- which(data$CROPDMGEXP == "7")
data[aux7,]$CROPDAMAGE <- data[aux7,]$CROPDMG * 10**7
aux8 <- which(data$CROPDMGEXP == "8")
data[aux8,]$CROPDAMAGE <- data[aux8,]$CROPDMG * 10**8
aux9 <- which(data$CROPDMGEXP == "B")
data[aux9,]$CROPDAMAGE <- data[aux9,]$CROPDMG * 10**9  

Impact on population Health

In order to analyze the effect on the population health, the following variables are considered and extracted from the large dataset:

  • Type of event
  • Number of fatalities
  • Number of injuries

All the information is related to events occuring within the US and there is only one observation per event:

health <- data[,c("EVTYPE","FATALITIES","INJURIES")]
##    EVTYPE FATALITIES INJURIES
## 1 TORNADO          0       15
## 2 TORNADO          0        0
## 3 TORNADO          0        2
## 4 TORNADO          0        2
## 5 TORNADO          0        2
## 6 TORNADO          0        6

The injuries and fatalities are considered from both absulute and relative terms. TOP 10 event types are considered for both effects.

  1. Absolute: sum
health_sum <- aggregate(cbind(FATALITIES,INJURIES) ~ EVTYPE,health, sum)
inj_sum_t10<- health_sum[with(health_sum, order(-INJURIES)),][1:10,]
fat_sum_t10<- health_sum[with(health_sum, order(-FATALITIES)),][1:10,]
  1. Relative: mean
health_mean <- aggregate(cbind(FATALITIES,INJURIES) ~ EVTYPE, health,mean)
inj_mean_t10<- health_mean[with(health_mean, order(-INJURIES)),][1:10,]
fat_mean_t10<- health_mean[with(health_mean, order(-FATALITIES)),][1:10,]

Data is classified and agregated into report tables in order to produce the final results

d1 <- data.frame(event = inj_sum_t10$EVTYPE, values=inj_sum_t10$INJURIES, effect="Injuries", analysis = "Sum")
d2 <- data.frame(event = fat_sum_t10$EVTYPE, values=fat_sum_t10$FATALITIES, effect="Fatalities", analysis = "Sum")
h.tops <- rbind(d1,d2)


d3 <- data.frame(event = inj_mean_t10$EVTYPE, values=inj_mean_t10$INJURIES, effect="Injuries", analysis = "Mean")
d4 <- data.frame(event = fat_mean_t10$EVTYPE, values=fat_mean_t10$FATALITIES, effect="Fatalities", analysis = "Mean")
h.topm <- rbind(d3,d4)

.

Economic Impact

In order to analyze the effect on the economic impact, the following variables are considered and extracted from the large dataset:

  • Type of event
  • Property damage expenses
  • Crop damage expenses
expense <- data[,c("EVTYPE","CROPDAMAGE","PROPDAMAGE")]
##    EVTYPE CROPDAMAGE PROPDAMAGE
## 1 TORNADO          0      25000
## 2 TORNADO          0       2500
## 3 TORNADO          0      25000
## 4 TORNADO          0       2500
## 5 TORNADO          0       2500
## 6 TORNADO          0       2500

The expenses are considered from both absulute and relative terms. TOP 5 event types are considered.

  1. Absolute: sum
expense_sum <- aggregate(cbind(CROPDAMAGE,PROPDAMAGE) ~ EVTYPE,expense, sum)
crop_sum_t10<- expense_sum[with(expense_sum, order(-CROPDAMAGE)),][1:5,]
prop_sum_t10<- expense_sum[with(expense_sum, order(-PROPDAMAGE)),][1:5,]
  1. Relative: mean
expense_mean <- aggregate(cbind(CROPDAMAGE,PROPDAMAGE) ~ EVTYPE,expense, mean)
crop_mean_t10<- expense_mean[with(expense_mean, order(-CROPDAMAGE)),][1:5,]
prop_mean_t10<- expense_mean[with(expense_mean, order(-PROPDAMAGE)),][1:5,]

Data is classified and agregated into report tables in order to produce the final results

d1 <- data.frame(event = crop_sum_t10$EVTYPE, values=crop_sum_t10$CROPDAMAGE/1000000000, effect="Crop Damage", analysis = "Sum")
d2 <- data.frame(event = prop_sum_t10$EVTYPE, values=prop_sum_t10$PROPDAMAGE/1000000000, effect="Property Damage", analysis = "Sum")
d3 <- data.frame(event = crop_mean_t10$EVTYPE, values=crop_mean_t10$CROPDAMAGE/1000000000, effect="Crop Damage", analysis = "Mean")
d4 <- data.frame(event = prop_mean_t10$EVTYPE, values=prop_mean_t10$PROPDAMAGE/1000000000, effect="Property Damage", analysis = "Mean")
e.top <- rbind(d1,d2,d3,d4)

Results

Impact on population Health

Tornado_inj <- h.tops[h.tops$event == "TORNADO" & h.tops$effect == "Injuries",]$values
Tornado_fat <- h.tops[h.tops$event == "TORNADO" & h.tops$effect == "Fatalities",]$values

The absolute main contributor to injueries and deaths in history has been Tornadoes, producing a total of 1.0515210^{5} injuries and 7922 fatalities recorded from 1950 to 2013.

For the rest of the Top 10 events, the total recorded public healh effects are summarised in the following chart:

ggplot(h.tops[h.tops$event!= "TORNADO",], aes(x=event,y=values, fill= effect)) + 
  facet_grid(analysis ~ effect, scales= "free") +
  geom_bar(stat="identity", colour= "black")  +
  theme(axis.text.x= element_text(angle= 90, hjust= 1))+
  scale_y_continuous(name = "number of injuries/deaths from 1950 to 2011")

Tornadoes are followed by Heat Wave, with the second most harmful effect in both injuries and fatalities.

There are also many injuries related to wind conditions.

In relative values (average injuries and fatalities per event), the top10 contributors to public health injuries are:

ggplot(h.topm, aes(x=event,y=values, fill= effect)) + 
  facet_grid(analysis ~ effect, scales= "free") +
  geom_bar(stat="identity", colour= "black")  +
  theme(axis.text.x= element_text(angle= 90, hjust= 1)) +
  scale_y_continuous(name = "number of injuries/deaths per event")

It can be seen that in relative terms the conditions producing constant injuries are related to Heat waves, Wildfire and storms, with more than 40 injured per event.

Economic Impact

In terms of economic damage, the analysis show the following absolute (total recorded losses) and relative (by event) eonomic loss derived from the Top 5

ggplot(e.top , aes(x=event,y=values, fill= effect)) + 
  facet_grid(analysis ~ ., scales= "free") +
  geom_bar(stat="identity",colour = "black") +
  theme(axis.text.x= element_text(angle= 90, hjust= 1)) + 
  scale_y_continuous(name = "economic loss ($bill)")

As can be seen, the greatest economic lossesfrom the destruction of properties derived from weather conditions come from the effect of hurricanes, floods and storms, producing around 200mill losses per event.

Regarding crops, the most harmful conditions are high cold, and wet weather conditions, producing around 100mill losses per event.