Synopsis

In this report, I will analyse NOAA Storm Events Data to find out the answer of two question:

1.Across the United States, which types of events are most harmful with respect to population health? 

2.Across the United States, which types of events have the greatest economic consequences?

The raw data start from year 1950 to November 2011, with 902297 observations. Each observations contains 37 variables. According to the documentation, there are only 7 variables related to these questions. They present levels of fatalities and injuries and economic damages. This analysis only forcus on them.

Data Processing

Importing data

Setting and some libaries may be used

  library("knitr") 
  library("ggplot2")
  library("plotrix")
  library("lubridate")
  library("chron")
  library("dplyr")
  library("tidyr")
  library("data.table")
  library("datasets")
  library("lattice")
  library("xtable")
  options(rpubs.upload.method = "internal")
  options(RCurlOptions = list(verbose = FALSE, capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"), ssl.verifypeer = FALSE))
  opts_chunk$set(warning = F,error = F, message = F)

Reading raw data

DataFile <- bzfile(description = "repdata_data_StormData.csv.bz2",open =  "repdata_data_StormData.csv")
Table <- read.csv(DataFile)

Because we only use 7 out of 37 variables, others can be removed safely. Further more, PROPDMGEXP and CROPDMGEXP can be easily removed by merge with CROPDMG and PROPDM numbers.

Table<- Table[c("EVTYPE","FATALITIES", "INJURIES","PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

Because in the document, only “B”,“K”,“M” are defined , I assume that “k”,“b”,“m” are same as their upper cases, other simbols are just NA.

  clean <- function(x){
    y <- as.numeric()
    y[!(x %in% c("B","b","M","m","K","k"))] <- 1
    y[x %in% c("B","b")] <- 1000000000
    y[x %in% c("M","m")] <- 1000000
    y[x %in% c("K","k")] <- 1000
    return (y)
    }
  clear <- function(x){
    y <- as.numeric(x)
    y[is.na(y)] <- 0
    return (y)
    }
  Table$ECONOMIC<-clear(Table$PROPDMG)*clean(Table$PROPDMGEXP)+
                  clear(Table$CROPDMG)*clean(Table$CROPDMGEXP)
  Table <- Table[c("EVTYPE","FATALITIES", "INJURIES","ECONOMIC")]
  str(Table)
## 'data.frame':    902297 obs. of  4 variables:
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ ECONOMIC  : num  25000 2500 25000 2500 2500 2500 2500 2500 25000 25000 ...

Next problem is that, there are too many type of events, I need to merge some of theme together unclear names.

  Table$EVTYPE <- toupper(as.character(Table$EVTYPE))
  Table<-Table[!grepl("SUMMARY",Table$EVTYPE),]
  Table$EVTYPE[grepl("DUST|ASH|VOLCANIC|VOG", Table$EVTYPE)] <- "Dust/Vocano"
  Table$EVTYPE[grepl("TSTM|THUNDERSTORM|STORM|*SPOUT|HURRICANE|TYPHOON|TORNADO|TORNDAO|DOWNBURST|MICROBURST|WIND|WND|GUSTNADO", Table$EVTYPE)] <- "Storm/Tornado/Wind"
  Table$EVTYPE[grepl("WARMTH|HEAT|HOT|WARM|HIGH TEMPERATURE|HYPERTHERMIA|HYPOTHERMIA|DRY|DRIEST|DROUGHT|FIRES|FIRE|WILDFIRE|RED FLAG",Table$EVTYPE)] <- "Heat/Dry/Fire"
  Table$EVTYPE[grepl("FROST|COLD|SLEET|FREEZE|WINTER|WINTRY|FREEZING|ICY|LOW TEMP|COOL|ICE|GLAZE|SNOW|BLIZZARD|HAIL",Table$EVTYPE)] <- "Cold/Hail/Snow"
  Table$EVTYPE[grepl("LIGHTNING|LIGHTING|LIGNTNING",Table$EVTYPE)]<-"Lightning"
  Table$EVTYPE[grepl("CURRENT|COASTAL|BEACH|TIDE|TIDES|TSUNAMI|SURF|WAVES|WAVE|SEAS|SWELL|SWELL|MARINE",Table$EVTYPE)]<-"Ocean conditions"
  Table$EVTYPE[grepl("WET|WETNESS|RAIN|PRECIPATATION|PRECIPITATION|PRECIP|SHOWER|SHOWERS|FLOOD|MUD|FLOODING|SEICHE|FLD|STREAM|FLOYD|DROWNING|DAM|RISING WATER|TURBULENCE|HIGH WATER|*SLIDE",Table$EVTYPE)] <- "Rain/Wet/Flood"
  Table$EVTYPE[grepl("FOG|CLOUD|SMOKE|FUNNEL",Table$EVTYPE)] <- "Fog/Smoke"
  Table$EVTYPE[grepl("RECORD|OTHER|DEPRESSION|SOUTHEAST|TEMPERATURE|NO*|URBAN|COUNTY|EXCESSIVE|MIX|HIGH|[?]",Table$EVTYPE)] <- "Other"
  Table <- summarise(group_by(Table,EVTYPE),ECONOMIC_CONSEQUENCES = sum(ECONOMIC),INJURIES = sum(INJURIES),FATALITIES=sum(FATALITIES))

Now, the raw data is processed, next job is find out the answer of 2 questions.

Results

which types of events are most harmful with respect to population health?

Actually, number of injuries and number of fatalities are hard to be merged into 1 number. So I will not merge them, instead I’ll split the question into 2 subquestions to find out which type of events is most harmful in terms of injuries and fatalities separately.

FATALITIES

  pie(Table$FATALITIES,col=rainbow(nrow(Table)),main="Number of fatalities")
  legend("topright",Table$EVTYPE,title="Type of events",cex=0.8,fill=rainbow(nrow(Table)))

INJURIES

  Tmp <-Table[rev(order(Table$INJURIES)),c("EVTYPE","INJURIES")]
  colnames(Tmp) <- c("Type of events","Number of injuries")
  kable(Tmp,format="html") 
Type of events Number of injuries
Storm/Tornado/Wind 108091
Heat/Dry/Fire 10855
Rain/Wet/Flood 7235
Lightning 5231
Cold/Hail/Snow 4644
Dust/Vocano 2285
Fog/Smoke 1079
Ocean conditions 933
Other 175

Which types of events have the greatest economic consequences?

  Tmp <- Table[rev(order(Table$ECONOMIC_CONSEQUENCES)),c("EVTYPE","ECONOMIC_CONSEQUENCES")]
  colnames(Tmp) <- c("Type of events","Economic Damages($)")
  kable(Tmp,format="html") 
Type of events Economic Damages($)
Storm/Tornado/Wind 241616734559
Rain/Wet/Flood 165473075475
Heat/Dry/Fire 24848387160
Cold/Hail/Snow 24347105722
Dust/Vocano 18450563467
Lightning 945824537
Ocean conditions 709588710
Fog/Smoke 23124100
Other 8438750

Conclusion

Clearly, In all 3 terms, Storm/Tornado/Wind are most harmful types of events. It accounts for half of number of Mortality rate and number of injuries it caused is roundly ten-fold the second one (Heat/Dry/Fire). On the other hands, while numbers of injuries and fatalities Heat/Dry/Fire caused are just lower than Storm/Tornado/Wind, it does not affect the economic as much as Rain/Wet/Flood does.