Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The data analysis will find the (1) Types of events that are most harmful with respect to population health across the US and (2) Types of events thathave the greatest economic consequences across the US

By aggregating the data by storm events type, the result shows that (1) Tornados are the most harmfull events on population health (2) Floods are responsible for the most economic damage.

Data Processing

Introduction To The Data

The data for this project come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The file can be downloaded from the address:

Storm Data https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

There is also some documentation of the database available.

National Weather Service Storm Data Documentation https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

National Climatic Data Center Storm Events FAQ https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

The Storm Data file needs to be downloaded onto the working directory of the R Studio, if otherwise, the file will be downloaded via the following code

Downloading The Data

if(!file.exists("repdata-data-StormData.csv.bz2")) {  
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                destfile = "repdata-data-StormData.csv.bz2")}

Loading The Data

The file will be read

input <- read.csv("repdata-data-StormData.csv.bz2")
head(input)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

Cleaning The Property Damage Data

Finding unique values in PROPDMGEXP

unique(input$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M

Assigning values for the property exponent data

input$PROPEXP <- 0
input$PROPVAL <- 0

input$PROPEXP[input$PROPDMGEXP == ""] <- 1
input$PROPEXP[input$PROPDMGEXP == "0"] <- 1
input$PROPEXP[input$PROPDMGEXP == "1"] <- 10
input$PROPEXP[input$PROPDMGEXP == "2"] <- 100
input$PROPEXP[input$PROPDMGEXP == "3"] <- 1000
input$PROPEXP[input$PROPDMGEXP == "4"] <- 10000
input$PROPEXP[input$PROPDMGEXP == "5"] <- 1e+05
input$PROPEXP[input$PROPDMGEXP == "6"] <- 1e+06
input$PROPEXP[input$PROPDMGEXP == "7"] <- 1e+07
input$PROPEXP[input$PROPDMGEXP == "8"] <- 1e+08
input$PROPEXP[input$PROPDMGEXP == "B"] <- 1e+09
input$PROPEXP[input$PROPDMGEXP == "h"] <- 100
input$PROPEXP[input$PROPDMGEXP == "H"] <- 100
input$PROPEXP[input$PROPDMGEXP == "K"] <- 1000
input$PROPEXP[input$PROPDMGEXP == "m"] <- 1e+06
input$PROPEXP[input$PROPDMGEXP == "M"] <- 1e+06
input$PROPEXP[input$PROPDMGEXP == "+"] <- 0
input$PROPEXP[input$PROPDMGEXP == "-"] <- 0
input$PROPEXP[input$PROPDMGEXP == "?"] <- 0

Calculating the property damage value

input$PROPDMGVAL <- input$PROPDMG * input$PROPEXP

Cleaning the Crop Damage Data

Finding unique values in CROPDMGEXP

unique(input$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M

Assigning values for the crop exponent data

input$CROPEXP <- 0
input$CROPVAL <- 0

input$CROPEXP[input$CROPDMGEXP == ""] <- 1
input$CROPEXP[input$CROPDMGEXP == "0"] <- 1
input$CROPEXP[input$CROPDMGEXP == "2"] <- 100
input$CROPEXP[input$CROPDMGEXP == "B"] <- 1e+09
input$CROPEXP[input$CROPDMGEXP == "k"] <- 1000
input$CROPEXP[input$CROPDMGEXP == "K"] <- 1000
input$CROPEXP[input$CROPDMGEXP == "M"] <- 1e+06
input$CROPEXP[input$CROPDMGEXP == "m"] <- 1e+06
input$CROPEXP[input$CROPDMGEXP == "?"] <- 0

Calculating the crop damage value

input$CROPDMGVAL <- input$CROPDMG * input$CROPEXP

Results

Finding The Event With Highest Health Risk

fatal  <-aggregate(input$FATALITIES,list(input$EVTYPE),sum,na.rm=TRUE)
colnames(fatal) <- c("Event","Fatalities")
fatal[order(-fatal$Fatalities),][1:8,]
##              Event Fatalities
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
injury  <-aggregate(input$INJURIES,list(input$EVTYPE),sum,na.rm=TRUE)
colnames(injury) <- c("Event","Injuries")
injury[order(-injury$`Injuries`),][1:8,]
##              Event Injuries
## 834        TORNADO    91346
## 856      TSTM WIND     6957
## 170          FLOOD     6789
## 130 EXCESSIVE HEAT     6525
## 464      LIGHTNING     5230
## 275           HEAT     2100
## 427      ICE STORM     1975
## 153    FLASH FLOOD     1777
risk  <-aggregate(input$INJURIES+input$FATALITIES,list(input$EVTYPE),sum,na.rm=TRUE)
colnames(risk) <- c("Event","Health")
plot1<-risk[order(-risk$'Health'),][1:8,] 
plot1$Health <- plot1$Health/1000


barplot(plot1$Health, las = 2, names.arg = plot1$Event, main = "Events with Highest Fatalities & Injuries", ylab = "Total fatalities & injuries ('000) ", col = "blue")

Tornados are the most harmful events to population health. It was followed by Excessive Heat and Thunderstorm wind.

Finding The Event With Highest Damages

propdmg  <-aggregate(input$PROPDMGVAL,list(input$EVTYPE),sum,na.rm=TRUE)
colnames(propdmg) <- c("Event","PropDamage")
propdmg[order(-propdmg$PropDamage),][1:8,]
##                 Event   PropDamage
## 170             FLOOD 144657709807
## 411 HURRICANE/TYPHOON  69305840000
## 834           TORNADO  56947380617
## 670       STORM SURGE  43323536000
## 153       FLASH FLOOD  16822673979
## 244              HAIL  15735267513
## 402         HURRICANE  11868319010
## 848    TROPICAL STORM   7703890550
cropdmg  <-aggregate(input$CROPDMGVAL,list(input$EVTYPE),sum,na.rm=TRUE)
colnames(cropdmg) <- c("Event","CropDamage")
cropdmg[order(-cropdmg$CropDamage),][1:8,]
##                 Event  CropDamage
## 95            DROUGHT 13972566000
## 170             FLOOD  5661968450
## 590       RIVER FLOOD  5029459000
## 427         ICE STORM  5022113500
## 244              HAIL  3025954473
## 402         HURRICANE  2741910000
## 411 HURRICANE/TYPHOON  2607872800
## 153       FLASH FLOOD  1421317100
propcropdmg  <-aggregate(input$PROPDMGVAL+input$CROPDMGVAL,list(input$EVTYPE),sum,na.rm=TRUE)
colnames(propcropdmg) <- c("Event","PropCropDmg")
plot2<-propcropdmg[order(-propcropdmg$PropCropDmg),][1:8,] 
plot2$PropCropDmg <- plot2$PropCropDmg/1000000000

barplot(plot2$PropCropDmg, las = 2, names.arg = plot2$Event, main = "Events with Highest Property & Crop Damages", ylab = "Total Property & Crop Damages (Billions $)", col = "blue")

Floods caused the greatest economic consequences whereas the second major events that caused the greatest economic consequences were Hurricanes/Typhoons.