Synopsis

This data analysis determines the type of severe weather event, across the United States, that (i) is the most harmful with respect to population health; and (ii) has the greatest economic consequence. For the former, we tabulate the fatalities and injuries caused by each weather event, to find out which weather event inflicts the most damage on population health. For the latter, we tabulate the cost of property and crop damage, to determine which weather event causes the greatest economic consequence. These findings will help prepare for severe weather events and prioritise resources for different types of events. Post-analysis, we find that tornadoes have the greatest impact on population health, while floods cause the most economic damage.

Data Processing

This project involves exploring the US National Oceanic and Atmospheric Administration’s storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries and property damage. This section details steps to which we clean and process the data for further analysis.

  1. Load the dataset
original_data <- read.csv("repdata-data-StormData.csv.bz2")
head(original_data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
  1. Simplify the dataset by subsetting it with columns of use
subset_data <- original_data[, c(2, 8, 23, 24, 25, 26, 27, 28)]
head(subset_data)
##             BGN_DATE  EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP
## 1  4/18/1950 0:00:00 TORNADO          0       15    25.0          K
## 2  4/18/1950 0:00:00 TORNADO          0        0     2.5          K
## 3  2/20/1951 0:00:00 TORNADO          0        2    25.0          K
## 4   6/8/1951 0:00:00 TORNADO          0        2     2.5          K
## 5 11/15/1951 0:00:00 TORNADO          0        2     2.5          K
## 6 11/15/1951 0:00:00 TORNADO          0        6     2.5          K
##   CROPDMG CROPDMGEXP
## 1       0           
## 2       0           
## 3       0           
## 4       0           
## 5       0           
## 6       0
  1. Clean the dataset: determine exact magnitude of property and crop damage
subset_data$PROPDMGEXP <- ifelse(subset_data$PROPDMGEXP == "K", 1000, ifelse(subset_data$PROPDMGEXP == "M", 1000000, ifelse(subset_data$PROPDMGEXP == "B", 1000000000, 0)))
subset_data$CROPDMGEXP <- ifelse(subset_data$CROPDMGEXP == "K", 1000, ifelse(subset_data$CROPDMGEXP == "M", 1000000, ifelse(subset_data$CROPDMGEXP == "B", 1000000000, 0)))
subset_data$propdamage <- subset_data$PROPDMG*subset_data$PROPDMGEXP
subset_data$cropdamage <- subset_data$CROPDMG*subset_data$CROPDMGEXP
  1. Finalise data set by summing fatalities and injuries, and property and crop damage, to analysis for consequences on population health and the economy respectively
subset_data$HEALTH <- subset_data$FATALITIES + subset_data$INJURIES
subset_data$ECONOMY <- subset_data$propdamage + subset_data$cropdamage
final_data <- subset_data[, c(1, 2, 11, 12)]
head(final_data)
##             BGN_DATE  EVTYPE HEALTH ECONOMY
## 1  4/18/1950 0:00:00 TORNADO     15   25000
## 2  4/18/1950 0:00:00 TORNADO      0    2500
## 3  2/20/1951 0:00:00 TORNADO      2   25000
## 4   6/8/1951 0:00:00 TORNADO      2    2500
## 5 11/15/1951 0:00:00 TORNADO      2    2500
## 6 11/15/1951 0:00:00 TORNADO      6    2500

Results

We determine fatalities and injuries based on weather events, to find out their consequence on population health. The figure below shows the top 10 weather events that result in the most fatalities and injuries across the US.

health <- tapply(final_data$HEALTH, final_data$EVTYPE, FUN=sum)
descending_health <- sort(health, decreasing = TRUE)
top10_health <- head(descending_health, n=10)
par(mar=c(4,8,2,1))
barplot(top10_health, horiz = TRUE, xlab = "Sum of Fatalities and Injuries", main = "Top 10 Weather Events Affecting Health", cex.names=0.7, las=1)

The weather event with the greatest impact on population health is the Tornado.

We determine property and crop damage based on weather events, to find out their consequence on the economy. The figure below shows the top 10 weather events that result in the most property and crop damage across the US.

damage <- tapply(final_data$ECONOMY, final_data$EVTYPE, FUN=sum)
descending_damage <- sort(damage, decreasing = TRUE)
top10_damage <- head(descending_damage, n=10)
par(mar=c(4,8,2,1))
barplot(top10_damage, horiz = TRUE, xlab = "Sum of Property and Crop Damage ($)", main = "Top 10 Weather Events Affecting Economy", cex.names=0.7, las=1)

The weather event with the greatest impact on the economy is the Flood.