Synopsis

The idea behind this analysis is to identify types of severe weather events that have the greatest impact on public health and the economy. We use the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database as a source of characteristics of major storms and weather events in the United States for the period from 1950 to 2011.
To evaluate the impact on public health, we count the number of fatalities and injuries. The negative effect on the economy is measured by property and crop damage (actual dollar amounts).
From these data, we found that tornadoes result in the largest number of fatalities and injuries, and floods have the greatest economic consequences.

Important note: some variables in the NOAA storm database contain misspelled and non-standardized values (e.g. weather event type “TORNDAO” vs “TORNADO”, etc) or possible duplicates (e.g. weather event type “TORNADO”, “TORNADO F2”, “TORNADOES”, “TORNADOS”, etc). Any alteration to these values or merging them is out of scope of this analysis (it is arguably a subject of another research) and we process this data “as is”.

Data Processing

The data for this analysis come in the form of a comma-separated-value (CSV) file compressed via the bzip2 algorithm.
The following resources were used to obtain the data itself and additional documentation:

Reading the raw data

# Load data
storm_data <- read.csv( bzfile("repdata_data_StormData.csv.bz2") )

colnames( storm_data )
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

We identified the following variables of our interest:

  1. EVTYPE - type of weather event
  2. FATALITIES - number of fatalities caused by the weather event
  3. INJURIES - number of injuries caused by the weather event
  4. PROPDMG - cost of property damage
  5. PROPDMGEXP - magnitude of damage to property
  6. CROPDMG - cost of crop damage
  7. CROPDMGEXP - magnitude of damage to crops

Health information dataset

This dataset is created in preparation for the analysis of weather-related fatalities and injuries. We extract these numbers from the original dataset. Then the total is calculated for each type of weather event.

# Extracting selected data (where either fatalities or injuries occurred) and calculating the total
PublicHealth_data <- aggregate( cbind(FATALITIES, INJURIES) ~ EVTYPE, data=storm_data, FUN=sum, subset= FATALITIES>0 | INJURIES>0 )

str( PublicHealth_data )
## 'data.frame':    220 obs. of  3 variables:
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 18 19 29 30 42 44 49 54 56 57 ...
##  $ FATALITIES: num  1 224 1 101 1 1 0 3 2 1 ...
##  $ INJURIES  : num  0 170 24 805 1 13 2 2 0 0 ...

Economic information dataset

This dataset is created in preparation for analyzing the types of severe weather events that have the greatest impact on the economy in terms of property and crop damage. We extract these numbers from the original dataset and transform the magnitude value. The total is then calculated for each type of weather event.

# Extracting selected data (events where damage to either property or crops occurred)
Economic_data <- subset( storm_data, subset= PROPDMG>0 | CROPDMG>0, select= c(EVTYPE,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP) )

# Transforming the value of magnitude for property damage
Economic_data$PROPDMGEXP <- factor( Economic_data$PROPDMGEXP, levels=c("-","?","+","","0","1","2","h","H","3","k","K","4","5","6","m","M","7","8","9","b","B"), labels=c(0,0,0,1,1,10,10^2,10^2,10^2,10^3,10^3,10^3,10^4,10^5,10^6,10^6,10^6,10^7,10^8,10^9,10^9,10^9) )

# Transforming the value of magnitude for crop damage
Economic_data$CROPDMGEXP <- factor( Economic_data$CROPDMGEXP, levels=c("-","?","+","","0","1","2","h","H","3","k","K","4","5","6","m","M","7","8","9","b","B"), labels=c(0,0,0,1,1,10,10^2,10^2,10^2,10^3,10^3,10^3,10^4,10^5,10^6,10^6,10^6,10^7,10^8,10^9,10^9,10^9) )

# Calculating the cost of property and crop damage for each recorded event
Economic_data$PROPDMG <- Economic_data$PROPDMG * as.numeric( as.character( Economic_data$PROPDMGEXP ) )
Economic_data$CROPDMG <- Economic_data$CROPDMG * as.numeric( as.character( Economic_data$CROPDMGEXP ) )

# Calculating the total for each type of weather event
Economic_data <- aggregate( cbind(PROPDMG, CROPDMG, TotalDMG=PROPDMG+CROPDMG) ~ EVTYPE, data=Economic_data, FUN=sum, subset= PROPDMG>0 | CROPDMG>0 )

str( Economic_data )
## 'data.frame':    429 obs. of  4 variables:
##  $ EVTYPE  : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 1 3 5 6 9 14 15 16 17 19 ...
##  $ PROPDMG : num  200000 50000 8100000 8000 5000 ...
##  $ CROPDMG : num  0 0 0 0 0 ...
##  $ TotalDMG: num  200000 50000 8100000 8000 5000 ...

Results

As noted earlier, we don’t fix misspelled and non-standardized values, nor do we merge possible duplicates.
This data will definitely inflate the number of types of weather events and therefore skew the mean and median values.
For this reason, we will consider the ten most significant (in terms of consequences) types of weather events.

Impact on public health

This section provides information about weather events that are most harmful to population health.

# Get the top 10 event types where most fatalities occurred
print( head( PublicHealth_data[ order(PublicHealth_data$FATALITIES, decreasing = TRUE), c("EVTYPE","FATALITIES") ], n=10 ), row.names=FALSE)
##          EVTYPE FATALITIES
##         TORNADO       5633
##  EXCESSIVE HEAT       1903
##     FLASH FLOOD        978
##            HEAT        937
##       LIGHTNING        816
##       TSTM WIND        504
##           FLOOD        470
##     RIP CURRENT        368
##       HIGH WIND        248
##       AVALANCHE        224
# Get the top 10 event types where most injuries occurred
print( head( PublicHealth_data[ order(PublicHealth_data$INJURIES, decreasing = TRUE), c("EVTYPE","INJURIES") ], n=10 ), row.names=FALSE)
##             EVTYPE INJURIES
##            TORNADO    91346
##          TSTM WIND     6957
##              FLOOD     6789
##     EXCESSIVE HEAT     6525
##          LIGHTNING     5230
##               HEAT     2100
##          ICE STORM     1975
##        FLASH FLOOD     1777
##  THUNDERSTORM WIND     1488
##               HAIL     1361
# Make a figure of two bar plots (fatalities and injuries by the type of weather event)
par( mfcol=c(1,2) , mar=c(9,6,4,2) , mgp=c(4, 1, 0) )

with( head( PublicHealth_data[ order(PublicHealth_data$FATALITIES, decreasing = TRUE), ], n=10 ),
      barplot( FATALITIES, names=EVTYPE, main="Fatalities by\nevent type", cex.names = 0.7, las=2,
               xlab="", ylab="Fatalities" ) )
title( xlab="Weather event" , mgp=c(7, 1, 0) )

with( head( PublicHealth_data[ order(PublicHealth_data$INJURIES, decreasing = TRUE), ], n=10 ),
      barplot( INJURIES, names=EVTYPE, main="Injuries by\nevent type", cex.names = 0.7 , las=2,
               xlab="", ylab="Injuries" ) )
title( xlab="Weather event" , mgp=c(8, 1, 0) )

box(which = "outer", lty = "solid" )

From the plot above, we can see that tornadoes result in the largest number of fatalities and injuries.

Economic consequences

This section provides information about property and crop damage caused by severe weather events.

# Get the top 10 event types of the greatest property damage
print( head( Economic_data[ order(Economic_data$PROPDMG, decreasing = TRUE), c("EVTYPE","PROPDMG") ], n=10 ), row.names=FALSE)
##             EVTYPE      PROPDMG
##              FLOOD 144657709807
##  HURRICANE/TYPHOON  69305840000
##            TORNADO  56947380616
##        STORM SURGE  43323536000
##        FLASH FLOOD  16822673978
##               HAIL  15735267513
##          HURRICANE  11868319010
##     TROPICAL STORM   7703890550
##       WINTER STORM   6688497251
##          HIGH WIND   5270046260
# Get the top 10 event types of the greatest crop damage
print( head( Economic_data[ order(Economic_data$CROPDMG, decreasing = TRUE), c("EVTYPE","CROPDMG") ], n=10 ), row.names=FALSE)
##             EVTYPE     CROPDMG
##            DROUGHT 13972566000
##              FLOOD  5661968450
##        RIVER FLOOD  5029459000
##          ICE STORM  5022113500
##               HAIL  3025954473
##          HURRICANE  2741910000
##  HURRICANE/TYPHOON  2607872800
##        FLASH FLOOD  1421317100
##       EXTREME COLD  1292973000
##       FROST/FREEZE  1094086000
# Make a figure of two bar plots (property and crop damage by the type of weather event)
par( mfcol=c(1,2) , mar=c(9,6,4,2) , mgp=c(4, 1, 0) )

with( head( Economic_data[ order(Economic_data$PROPDMG, decreasing = TRUE), ], n=10 ),
      barplot( PROPDMG/10^6, names=EVTYPE, main="Damage to property\nby event type", cex.names = 0.7, las=2,
               xlab="", ylab="Damage (million dollars)" ) )
title( xlab="Weather event" , mgp=c(8, 1, 0) )

with( head( Economic_data[ order(Economic_data$CROPDMG, decreasing = TRUE), ], n=10 ),
      barplot( CROPDMG/10^6, names=EVTYPE, main="Damage to crops\nby event type", cex.names = 0.7 , las=2,
               xlab="", ylab="Damage (million dollars)" ) )
title( xlab="Weather event" , mgp=c(8, 1, 0) )

box(which = "outer", lty = "solid" )

From the plot above, we can see that floods cause the most damage to property, while drought hits crops.


The total cost incurred as a result of severe weather is shown below.

# Get the top 10 event types causing the greatest total damage
print( head( Economic_data[ order(Economic_data$TotalDMG, decreasing = TRUE), c("EVTYPE","TotalDMG") ], n=10 ), row.names=FALSE)
##             EVTYPE     TotalDMG
##              FLOOD 150319678257
##  HURRICANE/TYPHOON  71913712800
##            TORNADO  57362333886
##        STORM SURGE  43323541000
##               HAIL  18761221986
##        FLASH FLOOD  18243991078
##            DROUGHT  15018672000
##          HURRICANE  14610229010
##        RIVER FLOOD  10148404500
##          ICE STORM   8967041360
# Make a bar plot (total cost of damage by the type of weather event)
par( mar=c(9,8,5,2) , mgp=c(6, 1, 0) )

with( head( Economic_data[ order(Economic_data$TotalDMG, decreasing = TRUE), ], n=10 ),
      barplot( TotalDMG/10^6, names=EVTYPE, main="Total damage cost\nby event type", cex.names = 0.7, las=2,
               xlab="", ylab="Damage (million dollars)" ) )
title( xlab="Weather event" , mgp=c(7, 1, 0) )

box(which = "outer", lty = "solid" )

From the plot above, we can see that the greatest cost of total damage comes from floods.