The Effect of Weather Events on Population Health and Economy Based on the NOAA Storm Database

Synopsis: The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. Aim of the report is to identify which types of events are most harmful to population health (in terms of fatalities and injuries numbers) and economics (in terms of damage to property and crop). Based on this information, better forcast and evacuation system can be made to reduce the damage in future.

Data Processing

First I downloaded the file form the link given in the assignment page, given that the file is bz2 format, I used the bzfile command to unzip it. Since reading file takes a long time, set the cache = TRUE to save time for further debug. Use str() and summary() function to get some idea of the data.

if(!file.exists("StormData.csv.bz2")){
  download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "StormData.csv.bz2")}
data <- read.csv(bzfile("StormData.csv.bz2"),stringsAsFactors = FALSE)
# it takes a while to read 
str(data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
summary(data)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY     COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31   Class :character   Class :character   Class :character  
##  Median : 75   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :101                                                           
##  3rd Qu.:131                                                           
##  Max.   :873                                                           
##                                                                        
##    BGN_RANGE      BGN_AZI           BGN_LOCATI          END_DATE        
##  Min.   :   0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:   0   Class :character   Class :character   Class :character  
##  Median :   0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :   1                                                           
##  3rd Qu.:   1                                                           
##  Max.   :3749                                                           
##                                                                         
##    END_TIME           COUNTY_END COUNTYENDN       END_RANGE  
##  Length:902297      Min.   :0    Mode:logical   Min.   :  0  
##  Class :character   1st Qu.:0    NA's:902297    1st Qu.:  0  
##  Mode  :character   Median :0                   Median :  0  
##                     Mean   :0                   Mean   :  1  
##                     3rd Qu.:0                   3rd Qu.:  0  
##                     Max.   :0                   Max.   :925  
##                                                              
##    END_AZI           END_LOCATI            LENGTH           WIDTH     
##  Length:902297      Length:902297      Min.   :   0.0   Min.   :   0  
##  Class :character   Class :character   1st Qu.:   0.0   1st Qu.:   0  
##  Mode  :character   Mode  :character   Median :   0.0   Median :   0  
##                                        Mean   :   0.2   Mean   :   8  
##                                        3rd Qu.:   0.0   3rd Qu.:   0  
##                                        Max.   :2315.0   Max.   :4400  
##                                                                       
##        F               MAG          FATALITIES     INJURIES     
##  Min.   :0        Min.   :    0   Min.   :  0   Min.   :   0.0  
##  1st Qu.:0        1st Qu.:    0   1st Qu.:  0   1st Qu.:   0.0  
##  Median :1        Median :   50   Median :  0   Median :   0.0  
##  Mean   :1        Mean   :   47   Mean   :  0   Mean   :   0.2  
##  3rd Qu.:1        3rd Qu.:   75   3rd Qu.:  0   3rd Qu.:   0.0  
##  Max.   :5        Max.   :22000   Max.   :583   Max.   :1700.0  
##  NA's   :843563                                                 
##     PROPDMG      PROPDMGEXP           CROPDMG       CROPDMGEXP       
##  Min.   :   0   Length:902297      Min.   :  0.0   Length:902297     
##  1st Qu.:   0   Class :character   1st Qu.:  0.0   Class :character  
##  Median :   0   Mode  :character   Median :  0.0   Mode  :character  
##  Mean   :  12                      Mean   :  1.5                     
##  3rd Qu.:   0                      3rd Qu.:  0.0                     
##  Max.   :5000                      Max.   :990.0                     
##                                                                      
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 
# To check the type of weather events
eventType <- unique(data$EVTYPE)
length(eventType)
## [1] 985
# A total of 985 types - WoW

Results

First question: Which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

The impact to population health can be inferred from FATALITIES and INJURIES columns which stand for total fatality number and injured number. FATALITIES: num 0 0 0 0 0 0 0 0 1 0 … INJURIES : num 15 0 2 2 2 6 1 0 14 0 …

# Use tapply to calculate the total faltalities and injuries number per weather type
populationDamage <- with(data,tapply(FATALITIES+INJURIES, EVTYPE,sum))
# Sort the data by descending order
populationDamage <- populationDamage[order(populationDamage,decreasing = TRUE)]
# Make the boxplot
barplot(head(populationDamage, n= 5),main="Population Damage caused by weather type", xlab="Type of weather", ylab="Sum of Fatalities and injuries number",col="gold",cex.names=0.7)

plot of chunk unnamed-chunk-2

As it can be seen above, tornado has the most severe damage to population health.


Second question: which types of events have the greatest economic consequences?

The damage to ecnomic can be infered from PROPDMG and CROPDMG - which stand for property damage and crop damage. The units of each damage is in the column PROPDMGEXP and CROPDMGEXP.

# First check the unqique type of PROPDMGEXP and CROPDMGEXP
unique(data$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
unique(data$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

Data transformation is neccessary to unify all the unit.

B is for billion -> 109

M is for million -> 106

K is for thousand -> 103

H is for hundred -> 102

For easy read, we converted all damage costs to million USD and save to new variables PROPDMG2 and CROPDMG2.

# use recode() function in car package to convert the B-M-K-H to 9-6-3-2 and then change all as numeric numbers
library(car) 
data$PROPDMGEXP2 <- as.numeric(with(data, recode(PROPDMGEXP,
                 "'B'=9;
                 'b'=9;
                 'M'=6;
                 'm'=6;
                 'K'=3;
                 'k'=3;
                 'H'=2;
                 'h'=2;
                 '+'=1;
                 '-'=1;
                 '?'=1;
                 ''=1")))
data$PROPDMG2 <- data$PROPDMG*(10^data$PROPDMGEXP2)/(10^6)

data$CROPDMGEXP2 <-  as.numeric(with(data, recode(CROPDMGEXP,
                 "'B'=9;
                 'b'=9;
                 'M'=6;
                 'm'=6;
                 'K'=3;
                 'k'=3;
                 'H'=2;
                 'h'=2;
                 '+'=1;
                 '-'=1;
                 '?'=1;
                 ''=1")))

data$CROPDMG2 <- data$CROPDMG*(10^data$CROPDMGEXP2)/(10^6)

economicDamage <- with(data,tapply(CROPDMG2+PROPDMG2, EVTYPE,sum))
economicDamage <- economicDamage[order(economicDamage, decreasing=TRUE)]
head(economicDamage,n=5)
##             FLOOD HURRICANE/TYPHOON           TORNADO       STORM SURGE 
##            150320             71914             57362             43324 
##              HAIL 
##             18761
barplot(head(economicDamage, n= 5),main="Total Economic Damage caused by weather type in million USD", xlab="Type of weather", ylab="Sum of Crop damage and Property damage in million USD",col="gold",cex.names=0.7)

plot of chunk unnamed-chunk-4

Based on the barplot, flood has the most severe damage to ecnomics and followed by hurricane/typhoon and tornado.