Synopsis

This report analyse the NOAA Storm Database (1950 - 2011) to determine the effects of severe weather events. We check the fatalities & injuries to assess the harm to population health. And we also check the property/crop damage of each event to see the enonomic impact.

The data reveals that:

Data Introduction

According to NATIONAL WEATHER SERVICE INSTRUCTION, NOAA Storm Database is an official publication of the National Oceanic and Atmospheric Administration (NOAA) which documents:

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Data Processing

First we download and read the compressed database

library(dplyr)
library(knitr)

# Create directory 'data'
if (!file.exists('data'))
{
    dir.create('data')
}

# Download the bzip2 database
fileUrl <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
if (!file.exists('./data/StormData.csv.bz2'))
{
    download.file(fileUrl, destfile='./data/StormData.csv.bz2',method='curl')
    dateDownloaded <- date()
}

data <- read.csv(bzfile('./data/StormData.csv.bz2'))

Let’s check the structure of the database

str(data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels ""," Christiansburg",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","%SD",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","\t","\t\t",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

According to official document NATIONAL WEATHER SERVICE INSTRUCTION, here some columns are very important for our analysis

The official document mentions that for the unit of prop/crop damage, ‘K’ means thousand, ‘M’ means million, and B means billion.

Let’s check the characters in PROPDMGEXP and CROPDMGEXP columns:

unique(sort(data$PROPDMGEXP))
##  [1]   - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(sort(data$CROPDMGEXP))
## [1]   ? 0 2 B k K m M
## Levels:  ? 0 2 B k K m M

We can see that there are some characters other than ‘K’, ‘B’, ‘M’ and their lower case ones. We will define ‘H’ or ‘h’ for hundred and other characters we let them to be 0.

Because we are analyse which event caused the greatest economic impaction, we first transfer the property damage from compressed expression to real value

    data$REALDMG<-0
    data$REALDMG[data$PROPDMGEXP=='H']<-1000000000*data$PROPDMG[data$PROPDMGEXP=='H']
    data$REALDMG[data$PROPDMGEXP=='K']<-1000*data$PROPDMG[data$PROPDMGEXP=='K']
    data$REALDMG[data$PROPDMGEXP=='M']<-1000000*data$PROPDMG[data$PROPDMGEXP=='M']
    data$REALDMG[data$PROPDMGEXP=='B']<-1000000000*data$PROPDMG[data$PROPDMGEXP=='B']
    data$REALDMG[data$PROPDMGEXP=='h']<-1000000000*data$PROPDMG[data$PROPDMGEXP=='h']
    data$REALDMG[data$PROPDMGEXP=='k']<-1000*data$PROPDMG[data$PROPDMGEXP=='k']
    data$REALDMG[data$PROPDMGEXP=='m']<-1000000*data$PROPDMG[data$PROPDMGEXP=='m']
    data$REALDMG[data$PROPDMGEXP=='b']<-1000000000*data$PROPDMG[data$PROPDMGEXP=='b']

We then group the data by event type, and summarize the FATALITIES and INJURIES column by event, so that we can know which types of events (as indicated in the EVTYPE variable) that are most harmful with respect to population health.

event_group<-group_by(data, EVTYPE)
total_fatainj<-summarize(event_group, sum(INJURIES, na.rm=TRUE), sum(FATALITIES, na.rm=TRUE), sum(INJURIES, FATALITIES, na.rm=TRUE))
colnames(total_fatainj)<-c("event","inj","fata","inj_fata")

total_fatainj<-total_fatainj[order(total_fatainj$inj_fata, decreasing = TRUE),]

total_fatainj<-total_fatainj[1:5,]
# give ID to each event
total_fatainj$id<-1:nrow(total_fatainj)

plot(total_fatainj$id, total_fatainj$inj, xlab="ID of events", ylab="Fatalities and Injuries", main="Total fatalities and injuries for each event type")

head(total_fatainj)
## Source: local data frame [5 x 5]
## 
##            event   inj fata inj_fata id
## 1        TORNADO 91346 5633    96979  1
## 2 EXCESSIVE HEAT  6525 1903     8428  2
## 3      TSTM WIND  6957  504     7461  3
## 4          FLOOD  6789  470     7259  4
## 5      LIGHTNING  5230  816     6046  5

From the above figure and table, we know the top 5 harmful event. Especially, we know the most harmful event type is TORNADO, which totally cause 96979 fatalities and injuries from 1950 to 2011 in United States.

We then summarize the economic impaction by each event, so we can know which types of events have the greatest economic consequences.

total_dmg<-summarize(event_group, sum(REALDMG,na.rm=TRUE))
colnames(total_dmg)<-c("event","total_dmg")

total_dmg<-total_dmg[order(total_dmg$total_dmg, decreasing = TRUE),]
total_dmg<-total_dmg[1:5,]

# give ID to each event
total_dmg$id<-1:nrow(total_dmg)

mostdmg<-total_dmg[total_dmg==max(total_dmg$total_dmg),]
plot(total_dmg$id, total_dmg$total_dmg, xlab="ID of events", ylab="Economic Impaction($)", main="Total property damage for each event type")

head(total_dmg)
## Source: local data frame [5 x 3]
## 
##                event    total_dmg id
## 1              FLOOD 144657709800  1
## 2  HURRICANE/TYPHOON  69305840000  2
## 3            TORNADO  56937160480  3
## 4        STORM SURGE  43323536000  4
## 5 THUNDERSTORM WINDS  21735952850  5

From above figure and table we can see the greatest economic consequences is caused by FLOOD, which totally cause 144657709800 dollars economic loss from 1950 to 2011 in United States.

Results

According to above data processing section, we know: