This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
In the following paragraphs, I’ll take a glance of the whole dataset and focus on damages to population health and economy due to the weather events.

Include useful package:

library(plyr)

Reading the dataset and quick summary

First of all, I download the data using the URL and read it into R as a data frame.

filename <- "repdata_data_StormData.csv.bz2"
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists(filename)){
        download.file(fileURL, filename, method = "curl")
}
stormData <- read.csv(filename, sep = ",", header = TRUE)

To get a first impression of the data, we can look at the column names to see what kind of data do we have by calling the “names()” function:

names(stormData)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Also, we can see the summary of it:

summary(stormData)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY       COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31.0   Class :character   Class :character   Class :character  
##  Median : 75.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100.6                                                           
##  3rd Qu.:131.0                                                           
##  Max.   :873.0                                                           
##                                                                          
##    BGN_RANGE          BGN_AZI           BGN_LOCATI          END_DATE        
##  Min.   :   0.000   Length:902297      Length:902297      Length:902297     
##  1st Qu.:   0.000   Class :character   Class :character   Class :character  
##  Median :   0.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :   1.484                                                           
##  3rd Qu.:   1.000                                                           
##  Max.   :3749.000                                                           
##                                                                             
##    END_TIME           COUNTY_END COUNTYENDN       END_RANGE       
##  Length:902297      Min.   :0    Mode:logical   Min.   :  0.0000  
##  Class :character   1st Qu.:0    NA's:902297    1st Qu.:  0.0000  
##  Mode  :character   Median :0                   Median :  0.0000  
##                     Mean   :0                   Mean   :  0.9862  
##                     3rd Qu.:0                   3rd Qu.:  0.0000  
##                     Max.   :0                   Max.   :925.0000  
##                                                                   
##    END_AZI           END_LOCATI            LENGTH              WIDTH         
##  Length:902297      Length:902297      Min.   :   0.0000   Min.   :   0.000  
##  Class :character   Class :character   1st Qu.:   0.0000   1st Qu.:   0.000  
##  Mode  :character   Mode  :character   Median :   0.0000   Median :   0.000  
##                                        Mean   :   0.2301   Mean   :   7.503  
##                                        3rd Qu.:   0.0000   3rd Qu.:   0.000  
##                                        Max.   :2315.0000   Max.   :4400.000  
##                                                                              
##        F               MAG            FATALITIES          INJURIES        
##  Min.   :0.0      Min.   :    0.0   Min.   :  0.0000   Min.   :   0.0000  
##  1st Qu.:0.0      1st Qu.:    0.0   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  Median :1.0      Median :   50.0   Median :  0.0000   Median :   0.0000  
##  Mean   :0.9      Mean   :   46.9   Mean   :  0.0168   Mean   :   0.1557  
##  3rd Qu.:1.0      3rd Qu.:   75.0   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  Max.   :5.0      Max.   :22000.0   Max.   :583.0000   Max.   :1700.0000  
##  NA's   :843563                                                           
##     PROPDMG         PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Min.   :   0.00   Length:902297      Min.   :  0.000   Length:902297     
##  1st Qu.:   0.00   Class :character   1st Qu.:  0.000   Class :character  
##  Median :   0.00   Mode  :character   Median :  0.000   Mode  :character  
##  Mean   :  12.06                      Mean   :  1.527                     
##  3rd Qu.:   0.50                      3rd Qu.:  0.000                     
##  Max.   :5000.00                      Max.   :990.000                     
##                                                                           
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 

Data processing

We extract and process data for the following questions: ### Q1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? To answer this question, we first need to extract the relevant data. By looking through the column names, we can see only “FATALITIES” and “INJURIES” are related to population health. So we can sum for those numbers by event types:

fatal_byType <- aggregate(FATALITIES ~ EVTYPE, stormData, sum)
injury_byType <- aggregate(INJURIES ~ EVTYPE, stormData, sum )

Sort the data in a descending order:

fatal_byType_Sorted <- fatal_byType[order(-fatal_byType$FATALITIES),]
injury_byType_Sorted <- injury_byType[order(-injury_byType$INJURIES),]

Q2: Across the United States, which types of events have the greatest economic consequences?

The same way, we first need to extract the relevant data, which are “PROPDMG” (property damages) and “CROPDAMGE” (crop damages). There is one important thing to pay attention to, which is the “EXP” for each damages:

stormData <- mutate(stormData, propertyDMG = ifelse(toupper(PROPDMGEXP) =='H', PROPDMG*1e+02, 
                                                        ifelse(toupper(PROPDMGEXP) =='K', PROPDMG*1e+03, 
                                                                ifelse(toupper(PROPDMGEXP) == 'M', PROPDMG*1e+06, 
                                                                        ifelse(toupper(PROPDMGEXP) == 'B', PROPDMG*1e+09, PROPDMG)))))
stormData <- mutate(stormData, cropDMG = ifelse(toupper(CROPDMGEXP) =='H', CROPDMG*1e+02, 
                                                       ifelse(toupper(CROPDMGEXP) =='K', CROPDMG*1e+03, 
                                                              ifelse(toupper(CROPDMGEXP) == 'M', CROPDMG*1e+06, 
                                                                     ifelse(toupper(CROPDMGEXP) == 'B', CROPDMG*1e+09, PROPDMG)))))

Since these two factors have equivalent importance, in addition to see them seperately, I will also add them up to see a total damage to the economy. Still, we can sum for those numbers by event types and sort them in a descending order.

## Extracting and sum over types
propertyDMG_byType <- aggregate(propertyDMG ~ EVTYPE, stormData, sum)
cropDMG_byType <- aggregate(cropDMG ~ EVTYPE, stormData, sum)
totalDMG_byType <- merge(propertyDMG_byType, cropDMG_byType, by = "EVTYPE")
totalDMG_byType$TOTALDMG <- totalDMG_byType$propertyDMG + totalDMG_byType$cropDMG
## Sorting
propertyDMG_byType_Sorted <- propertyDMG_byType[order(-propertyDMG_byType$propertyDMG),]
cropDMG_byType_Sorted <- cropDMG_byType[order(-cropDMG_byType$cropDMG),]
totalDMG_byType_Sorted <- totalDMG_byType[order(-totalDMG_byType$TOTALDMG),]

Results

Q1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Here I show the top 10 weather events that cause fatalities and injuries:

fatal_byType_Sorted[1:10,]
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224
injury_byType_Sorted[1:10,]
##                EVTYPE INJURIES
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361

To get a more intuitive impression, we can make a bar plot showing the fatality and injury numbers caused by different events:

par(mfrow = c(1, 2))
par(mar = c(10, 4, 4, 2), cex = 0.8, cex.main = 1.2, cex.lab = 1.2)
barplot(fatal_byType_Sorted$FATALITIES[1:10], names.arg = fatal_byType_Sorted$EVTYPE[1:10], col = 'blue',
        main = 'Top 10 Weather Events for Fatalities', ylab = 'Number of Fatalities')
barplot(injury_byType_Sorted$INJURIES[1:10], names.arg = injury_byType_Sorted$EVTYPE[1:10], col = 'green',
        main = 'Top 10 Weather Events for Injuries', ylab = 'Number of Injuries')

It’s clear from the plots above, that TORNADO is most harmful to population health. The number of fatalities and injuries caused by TORNADO is far higher than the other events.

Q2: Across the United States, which types of events have the greatest economic consequences?

Here I show the top 10 weather events that cause property damages, crop damages and total damages:

propertyDMG_byType_Sorted[1:10,]
##                EVTYPE  propertyDMG
## 170             FLOOD 144657709807
## 411 HURRICANE/TYPHOON  69305840000
## 834           TORNADO  56937160779
## 670       STORM SURGE  43323536000
## 153       FLASH FLOOD  16140812067
## 244              HAIL  15732267543
## 402         HURRICANE  11868319010
## 848    TROPICAL STORM   7703890550
## 972      WINTER STORM   6688497251
## 359         HIGH WIND   5270046295
cropDMG_byType_Sorted[1:10,]
##                EVTYPE     cropDMG
## 95            DROUGHT 13972567047
## 170             FLOOD  5662327861
## 590       RIVER FLOOD  5029470938
## 427         ICE STORM  5022154924
## 244              HAIL  3026276714
## 402         HURRICANE  2741917138
## 411 HURRICANE/TYPHOON  2607874471
## 153       FLASH FLOOD  1422066007
## 140      EXTREME COLD  1292980301
## 212      FROST/FREEZE  1094086000
totalDMG_byType_Sorted[1:10,]
##                EVTYPE  propertyDMG      cropDMG     TOTALDMG
## 170             FLOOD 144657709807 5.662328e+09 150320037668
## 411 HURRICANE/TYPHOON  69305840000 2.607874e+09  71913714471
## 834           TORNADO  56937160779 4.175364e+08  57354697199
## 670       STORM SURGE  43323536000 2.408588e+04  43323560086
## 244              HAIL  15732267543 3.026277e+09  18758544256
## 153       FLASH FLOOD  16140812067 1.422066e+09  17562878074
## 95            DROUGHT   1046106000 1.397257e+10  15018673047
## 402         HURRICANE  11868319010 2.741917e+09  14610236148
## 590       RIVER FLOOD   5118945500 5.029471e+09  10148416438
## 427         ICE STORM   3944927860 5.022155e+09   8967082784

To get a more intuitive impression, we can make a bar plot showing the damages caused by different events:

par(mfrow = c(1, 3))
par(mar = c(10, 4, 4, 2), cex = 0.8, cex.main = 1.2, cex.lab = 1.2)
barplot(propertyDMG_byType_Sorted$propertyDMG[1:10], names.arg = propertyDMG_byType_Sorted$EVTYPE[1:10], col = 'blue',
        main = 'Top 10 Property damages', ylab = 'Property Damages')
barplot(cropDMG_byType_Sorted$cropDMG[1:10], names.arg = cropDMG_byType_Sorted$EVTYPE[1:10], col = 'green',
        main = 'Top 10 Crop damages', ylab = 'Crop Damages')
barplot(totalDMG_byType_Sorted$TOTALDMG[1:10], names.arg = totalDMG_byType_Sorted$EVTYPE[1:10], col = 'orange',
        main = 'Top 10 Total damages', ylab = 'Total Damages')

As shown in the plots, FLOOD causes the most property damages, while DROUGHT causing the most crop damages. Overall, FLOOD is still the most harmful weather event that causes economic consequences.