Storm Impact Analysis on Economy and the Population in US during the period from 1950 to 2011 based on NOAA Storm Data

=======================================================================

Synopsis

This report presents an analysis about the impact of weather events on the population health and economy of US for the time period 1950 to 2011. We will address the damage on economy by reporting the economic damage calculated as property damage and crop damage measured in US dollars, and address the damage on public health by sum the number of injures and fatalites in the events. In the end, we illustrate the impact of different types of events on these two aspects with two barchart containing two measurements for each aspect. We found that the tornado is most harmful for population health, and the flash flood resulting in greatest loss on economic value.

Data

The data comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size.

Storm Data [47Mb]

#download packages if needed
list.of.packages <- c("plyr","grid","ggplot2","utils","R.utils","data.table","reshape2")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)

#load packages and functions
library("plyr")
library("grid")
library("ggplot2")
library("utils")
library("R.utils")
library("data.table")
library("reshape2")


# -- set a seed

set.seed(11)

# Download and unzip input file if it does not exist in current directory

fl <- "repdata-data-StormData.csv"
fl.zip <- paste0(fl, ".bz2")
url<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

if (file.exists(fl) == FALSE) {
    if (file.exists(fl.zip) == FALSE) {
        download.file(url, destfile = fl.zip, method = "curl",
                      quiet = TRUE)    
    }
    bunzip2(fl.zip)
}

1. The next step is to read in the data.

StormData <- read.csv("repdata-data-StormData.csv",
                      stringsAsFactors = FALSE)

Initial exploration shows that this is a fairly large dataset, which includes some attributes that are not useful to our specific analysis.

dim(StormData)
## [1] 902297     37
names(StormData)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
  • 2- Variables we keep for analysis
Variabl Name Description
state Abbervation for states in US
evtype Event type
fatalites The number of people died
injures The number of people injuured
propdmg The amount of property damge (measured in money)
propdmgexp The unit of damge (B,M,K,H)
cropdmg The amount of corp damge (measured in money)
cropdmgexp The unit of damge (B,M,K,H)
# attributes

attributes <- c("STATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP","CROPDMG", "CROPDMGEXP")

pData <- StormData[, attributes]
colnames(pData) <- c("state", "evtype", "fatalites", "injures", "propdmg", "propdmgexp","cropdmg", "cropdmgexp")
names(pData)
## [1] "state"      "evtype"     "fatalites"  "injures"    "propdmg"   
## [6] "propdmgexp" "cropdmg"    "cropdmgexp"

3 Data cleanning

3.1.1 Event tpye

  - Create a column as the unified name of that eventype 
  - The hierachy looks like as follow: 
    - Convection
      - lightning
      - tornado
      - thunderstorm Wind
      - hail
    - ExtremeTemp
      - cold
      - heat
    - Marine
      - costal Storm
      - sunami
      - runami
      - rip Current
    - Flood
      - flash Flood
      - river Flood
    - Tropical Cyclones
      - tropical Storm
      - hurricane
    - Other
      - drought
      - dust Storm
      - dust Devil
      - rain
      

The unified event name and filtr expression:

Event Filter expression
Lightning LIGHTNING
Tornado NADO,FUNNEL,WATERSPOUT“| Thunderstorm wind | THUNDER,STORM,WIND | Hail | HAIL | Cold | BLIZZARD|WINTER|COLD|LOW TEMP|RECORD LOW|SNOW|ICE | Heat | HEAT|WARM|RECORD HIGH | Costal Storm | COSTAL STORM | Sunami | SUNAMI | Rip current |RIP CURRENT Flash flood | FLASH FLOOD,FLD,FLOOD | River flood | RIVER FLOOD , URBAN FLOOD | Tropical Storm |TROPICAL STORM,TROPICAL| Hurricane | HURRICANE | Drought | DROUGHT | Dust storm | DUST STORM | Dust devil | DUST DEVIL | Rain | RAIN |
# Function : Give each eventype an unified event description
deliverEvtUnifiedName <- function(dd) {
    evtMatcher <- data.frame(reg = c("NADO|FUNNEL|WATERSPOUT", "THUNDER|STORM|WIND", 
        "HAIL", "FROST|FREEZ|BLIZZARD|WINTER|COLD|LOW TEMP|RECORD LOW|SNOW|ICE", 
        "HEAT|WARM|RECORD HIGH", "COSTAL STORM", "SUNAMI", "RIP CURRENT", "FLASH FLOOD|FLD|FLOOD", 
        "RIVER FLOOD|URBAN FLOOD", "TROPICAL STORM|TROPICAL", "HURRICANE", "DROUGHT", 
        "DUST STORM", "DUST DEVIL", "RAIN", "LIGHTNING"))
    
    factorId <- c("Tornado", "Thunderstorm wind", "Hail", "Cold", "Heat", "Costal Storm", 
        "Sunami", "Rip current", "Flash flood", "River flood", "Tropical Storm", 
        "Hurricane", "Drought", "Dust storm", "Dust devil", "Rain", "Ligntning")
    
    for (i in 1:nrow(evtMatcher)) {
        indexFit <- grep(evtMatcher[i, "reg"], toupper(dd[, "evtype"]))
        if (length(indexFit) > 0) {
            dd[indexFit, "event"] <- factorId[i]
            
        }
    }
    return(dd)
}

Use the function to filter and put value into “event”

pData$event <- ("-")

# Give each event an unified id factor (17+1 factors)
pData <- deliverEvtUnifiedName(pData)
otherIndex <- grep("-", pData[, "event"])
pData[otherIndex, "event"] <- "Other"
pData$event <- as.factor(pData$event)

# See the event and number of that
table(pData$event)
## 
##              Cold           Drought        Dust devil        Dust storm 
##             46131              2512               150               429 
##       Flash flood              Hail              Heat         Hurricane 
##             85530            290400              2969               288 
##         Ligntning             Other              Rain       Rip current 
##             15776              9558             12238               777 
##       River flood            Sunami Thunderstorm wind           Tornado 
##               569                20            362662             71531 
##    Tropical Storm 
##               757

3.1.2 Create an exact amounts for property and corp damage

  • Fomula I : propdmgValue = propdmg * propdmgexp
  • Fomula II : cropdmgValue = cropdmg * cropdmgexp

          propdmgexp   |  unit   
       ----------------| ----------------
          B            |  1,000,000,000
          M            |      1,000,000   
          K            |          1,000 
          H            |            100
          NA or BLANK  |              1

Function : Caculate the property and crop damg by ‘dmg’ and ‘exp’ variable

# Function : Caculate each entry the property and crop damge value by 'dmg' and 'exp' variables
deliverActualDmgValue <- function(dd) {
    unit <- data.frame(cha = c("B", "M", "K", "H"), val = c(1e+09, 1e+06, 1000, 
        100))
    multi <- c(1e+09, 1e+06, 1000, 100)
    
    for (i in 1:nrow(unit)) {
        # index that match the unit
        indexCrodFit <- grep(unit[i, "cha"], toupper(dd[, "cropdmgexp"]))
        
        if (length(indexCrodFit) > 0) {
            
            # Caculate the actual value
            dd[indexCrodFit, "cropdmgvalue"] <- multi[i] * dd[indexCrodFit, 
                "cropdmg"]
            
        }
        
        # Same procudure for property damage
        indexProdFit <- grep(unit[i, "cha"], toupper(dd[, "propdmgexp"]))
        
        if (length(indexProdFit) > 0) {
            
            dd[indexProdFit, "propdmgvalue"] <- multi[i] * dd[indexProdFit, 
                "propdmg"]
        }
        
    }
    return(dd)
}
# Default value of the damage equals to variable 'dmg'
pData$cropdmgvalue <- pData$cropdmg
pData$propdmgvalue <- pData$propdmg

pData <- deliverActualDmgValue(pData)

summary of crop and property damage

summary(pData$cropdmgvalue)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 0.000e+00 0.000e+00 5.442e+04 0.000e+00 5.000e+09
summary(pData$propdmgvalue)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 0.000e+00 0.000e+00 4.736e+05 5.000e+02 1.150e+11

Add new variables: Damge on economic, Damge on population

  • Damage on economic : ecodmgvalue = propdmagevalue + corpdmgvalue
  • Damage on population health : pephealthdmg = propdmagevalue + corpdmgvalue
pData$ecodmgvalue <- pData$cropdmgvalue + pData$propdmgvalue
pData$pephealthdmg <- pData$injures + pData$fatalites
summary(pData$ecodmgvalue)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.00e+00 0.00e+00 0.00e+00 5.28e+05 1.00e+03 1.15e+11

3.2 Data Aggeratiion

Caculate the Sum of propdmgvalue, cropdmgvalue,injures,fatalites for each event

library(reshape2)

t <- aggregate(cbind(propdmgvalue, cropdmgvalue, injures, fatalites) ~ event,pData, FUN = sum)
tidy <- melt(t, id.var = "event", variable.name = "variable")
colnames(tidy) <- c("event", "damagetype", "value")
print(tidy)
##                event   damagetype        value
## 1               Cold propdmgvalue  12683339763
## 2            Drought propdmgvalue   1046306000
## 3         Dust devil propdmgvalue       719130
## 4         Dust storm propdmgvalue      5599000
## 5        Flash flood propdmgvalue 162312938980
## 6               Hail propdmgvalue  17619991567
## 7               Heat propdmgvalue     20125750
## 8          Hurricane propdmgvalue  84756180010
## 9          Ligntning propdmgvalue    939007947
## 10             Other propdmgvalue   9577576400
## 11              Rain propdmgvalue   3265197192
## 12       Rip current propdmgvalue       163000
## 13       River flood propdmgvalue   5259639600
## 14            Sunami propdmgvalue    144062000
## 15 Thunderstorm wind propdmgvalue  64968720255
## 16           Tornado propdmgvalue  57002958829
## 17    Tropical Storm propdmgvalue   7716127550
## 18              Cold cropdmgvalue   8730107950
## 19           Drought cropdmgvalue  13972621780
## 20        Dust devil cropdmgvalue            0
## 21        Dust storm cropdmgvalue      3600000
## 22       Flash flood cropdmgvalue   7216670200
## 23              Hail cropdmgvalue   3114212873
## 24              Heat cropdmgvalue    904423500
## 25         Hurricane cropdmgvalue   5515292800
## 26         Ligntning cropdmgvalue     12097090
## 27             Other cropdmgvalue    572173030
## 28              Rain cropdmgvalue    919315800
## 29       Rip current cropdmgvalue            0
## 30       River flood cropdmgvalue   5058774000
## 31            Sunami cropdmgvalue        20000
## 32 Thunderstorm wind cropdmgvalue   1975024138
## 33           Tornado cropdmgvalue    414963020
## 34    Tropical Storm cropdmgvalue    694896000
## 35              Cold      injures         6350
## 36           Drought      injures           19
## 37        Dust devil      injures           43
## 38        Dust storm      injures          440
## 39       Flash flood      injures         8680
## 40              Hail      injures         1467
## 41              Heat      injures         9228
## 42         Hurricane      injures         1328
## 43         Ligntning      injures         5232
## 44             Other      injures         3565
## 45              Rain      injures          305
## 46       Rip current      injures          529
## 47       River flood      injures            3
## 48            Sunami      injures          129
## 49 Thunderstorm wind      injures        11388
## 50           Tornado      injures        91439
## 51    Tropical Storm      injures          383
## 52              Cold    fatalites         1088
## 53           Drought    fatalites            6
## 54        Dust devil    fatalites            2
## 55        Dust storm    fatalites           22
## 56       Flash flood    fatalites         1547
## 57              Hail    fatalites           45
## 58              Heat    fatalites         3172
## 59         Hurricane    fatalites          135
## 60         Ligntning    fatalites          817
## 61             Other    fatalites          657
## 62              Rain    fatalites          114
## 63       Rip current    fatalites          577
## 64       River flood    fatalites            5
## 65            Sunami    fatalites           33
## 66 Thunderstorm wind    fatalites         1220
## 67           Tornado    fatalites         5639
## 68    Tropical Storm    fatalites           66

4- Results

4.1 The most harmful weather events for affecting public health

  • Extract rows that contains info about impact on public
library("ggplot2")
damgeOnHealth <- tidy[grep("fatalites|injures", tidy[, "damagetype"]), ]

ggplot(damgeOnHealth, aes(x = reorder(event, value), y = value, fill = factor(damagetype, 
    labels = c("Injures", "Fatalites")))) + 
  geom_bar(stat = "identity") + 
  labs(title = "Top harmful weather event for populaiton health", 
    x = "Event", y = "Damage on populaiton health (number of people)") + scale_fill_manual(values = c("#F7B388", 
    "#C70000")) + 
  guides(fill = guide_legend(title = "Type of damage")) +
  xlab("Weather Event") + 
    theme(axis.text = element_text(size = 12, colour = "#1F4178"), axis.title = element_text(size = 14, 
        colour = "#3A3E42", face = "bold"),
        title = element_text(size = 16, 
        colour = "#282B2E", face = "bold")) + coord_flip()

4.2 The most harmful weather events Impacting the economy

  • Extract rows that contains info about impact on the economy
damgeOnEconomic <- tidy[grep("cropdmgvalue|propdmgvalue", tidy[, "damagetype"]),]
ggplot(damgeOnEconomic, aes(x = reorder(event, value), 
                            y = value, fill = factor(damagetype, 
    labels = c("Property", "Crop")))) + geom_bar(stat = "identity") + labs(title = "Top harmful weather event for economy ", 
    x = "Event", y = "Damage on economic value (US dollors)") + scale_fill_manual(values = c("#7D5E35","#407D47")) +
  guides(fill = guide_legend(title = "Type of damage")) +
  xlab("Weather Event") + theme(axis.text = element_text(size = 12, colour = "#1F4178"), axis.title = element_text(size = 14, 
        colour = "#3A3E42", face = "bold"),
        title = element_text(size = 16, 
        colour = "#282B2E", face = "bold")) + 
        coord_flip()

5- Conclusion

From the results, we can see : - 1. The most harmful weather event for population health is tornado

    1. The most harmful weather event for economy is flash flood

References

This analysis was completed as an assignment in Reproducible Research,part of the Data Science Specialization offered by Johns Hopkins University via the Coursera platform.

The NCDC’s Storm Center FAQ and the National Weather Service’s documentation explain the definitions of each variable and how the data was compiled.

Additional data can be obtained from the Storm Events database.

          ---------------------------------