The idea behind this analysis is to identify types of severe weather events that have the greatest impact on public health and the economy. We use the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database as a source of characteristics of major storms and weather events in the United States for the period from 1950 to 2011.
To evaluate the impact on public health, we count the number of fatalities and injuries. The negative effect on the economy is measured by property and crop damage (actual dollar amounts).
From these data, we found that tornadoes result in the largest number of fatalities and injuries, and floods have the greatest economic consequences.
Important note: some variables in the NOAA storm database contain misspelled and non-standardized values (e.g. weather event type “TORNDAO” vs “TORNADO”, etc) or possible duplicates (e.g. weather event type “TORNADO”, “TORNADO F2”, “TORNADOES”, “TORNADOS”, etc). Any alteration to these values or merging them is out of scope of this analysis (it is arguably a subject of another research) and we process this data “as is”.
The data for this analysis come in the form of a comma-separated-value (CSV) file compressed via the bzip2 algorithm.
The following resources were used to obtain the data itself and additional documentation:
# Load data
storm_data <- read.csv( bzfile("repdata_data_StormData.csv.bz2") )
colnames( storm_data )
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
We identified the following variables of our interest:
This dataset is created in preparation for the analysis of weather-related fatalities and injuries. We extract these numbers from the original dataset. Then the total is calculated for each type of weather event.
# Extracting selected data (where either fatalities or injuries occurred) and calculating the total
PublicHealth_data <- aggregate( cbind(FATALITIES, INJURIES) ~ EVTYPE, data=storm_data, FUN=sum, subset= FATALITIES>0 | INJURIES>0 )
str( PublicHealth_data )
## 'data.frame': 220 obs. of 3 variables:
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 18 19 29 30 42 44 49 54 56 57 ...
## $ FATALITIES: num 1 224 1 101 1 1 0 3 2 1 ...
## $ INJURIES : num 0 170 24 805 1 13 2 2 0 0 ...
This dataset is created in preparation for analyzing the types of severe weather events that have the greatest impact on the economy in terms of property and crop damage. We extract these numbers from the original dataset and transform the magnitude value. The total is then calculated for each type of weather event.
# Extracting selected data (events where damage to either property or crops occurred)
Economic_data <- subset( storm_data, subset= PROPDMG>0 | CROPDMG>0, select= c(EVTYPE,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP) )
# Transforming the value of magnitude for property damage
Economic_data$PROPDMGEXP <- factor( Economic_data$PROPDMGEXP, levels=c("-","?","+","","0","1","2","h","H","3","k","K","4","5","6","m","M","7","8","9","b","B"), labels=c(0,0,0,1,1,10,10^2,10^2,10^2,10^3,10^3,10^3,10^4,10^5,10^6,10^6,10^6,10^7,10^8,10^9,10^9,10^9) )
# Transforming the value of magnitude for crop damage
Economic_data$CROPDMGEXP <- factor( Economic_data$CROPDMGEXP, levels=c("-","?","+","","0","1","2","h","H","3","k","K","4","5","6","m","M","7","8","9","b","B"), labels=c(0,0,0,1,1,10,10^2,10^2,10^2,10^3,10^3,10^3,10^4,10^5,10^6,10^6,10^6,10^7,10^8,10^9,10^9,10^9) )
# Calculating the cost of property and crop damage for each recorded event
Economic_data$PROPDMG <- Economic_data$PROPDMG * as.numeric( as.character( Economic_data$PROPDMGEXP ) )
Economic_data$CROPDMG <- Economic_data$CROPDMG * as.numeric( as.character( Economic_data$CROPDMGEXP ) )
# Calculating the total for each type of weather event
Economic_data <- aggregate( cbind(PROPDMG, CROPDMG, TotalDMG=PROPDMG+CROPDMG) ~ EVTYPE, data=Economic_data, FUN=sum, subset= PROPDMG>0 | CROPDMG>0 )
str( Economic_data )
## 'data.frame': 429 obs. of 4 variables:
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 1 3 5 6 9 14 15 16 17 19 ...
## $ PROPDMG : num 200000 50000 8100000 8000 5000 ...
## $ CROPDMG : num 0 0 0 0 0 ...
## $ TotalDMG: num 200000 50000 8100000 8000 5000 ...
As noted earlier, we don’t fix misspelled and non-standardized values, nor do we merge possible duplicates.
This data will definitely inflate the number of types of weather events and therefore skew the mean and median values.
For this reason, we will consider the ten most significant (in terms of consequences) types of weather events.
This section provides information about weather events that are most harmful to population health.
# Get the top 10 event types where most fatalities occurred
print( head( PublicHealth_data[ order(PublicHealth_data$FATALITIES, decreasing = TRUE), c("EVTYPE","FATALITIES") ], n=10 ), row.names=FALSE)
## EVTYPE FATALITIES
## TORNADO 5633
## EXCESSIVE HEAT 1903
## FLASH FLOOD 978
## HEAT 937
## LIGHTNING 816
## TSTM WIND 504
## FLOOD 470
## RIP CURRENT 368
## HIGH WIND 248
## AVALANCHE 224
# Get the top 10 event types where most injuries occurred
print( head( PublicHealth_data[ order(PublicHealth_data$INJURIES, decreasing = TRUE), c("EVTYPE","INJURIES") ], n=10 ), row.names=FALSE)
## EVTYPE INJURIES
## TORNADO 91346
## TSTM WIND 6957
## FLOOD 6789
## EXCESSIVE HEAT 6525
## LIGHTNING 5230
## HEAT 2100
## ICE STORM 1975
## FLASH FLOOD 1777
## THUNDERSTORM WIND 1488
## HAIL 1361
# Make a figure of two bar plots (fatalities and injuries by the type of weather event)
par( mfcol=c(1,2) , mar=c(9,6,4,2) , mgp=c(4, 1, 0) )
with( head( PublicHealth_data[ order(PublicHealth_data$FATALITIES, decreasing = TRUE), ], n=10 ),
barplot( FATALITIES, names=EVTYPE, main="Fatalities by\nevent type", cex.names = 0.7, las=2,
xlab="", ylab="Fatalities" ) )
title( xlab="Weather event" , mgp=c(7, 1, 0) )
with( head( PublicHealth_data[ order(PublicHealth_data$INJURIES, decreasing = TRUE), ], n=10 ),
barplot( INJURIES, names=EVTYPE, main="Injuries by\nevent type", cex.names = 0.7 , las=2,
xlab="", ylab="Injuries" ) )
title( xlab="Weather event" , mgp=c(8, 1, 0) )
box(which = "outer", lty = "solid" )
From the plot above, we can see that tornadoes result in the largest number of fatalities and injuries.
This section provides information about property and crop damage caused by severe weather events.
# Get the top 10 event types of the greatest property damage
print( head( Economic_data[ order(Economic_data$PROPDMG, decreasing = TRUE), c("EVTYPE","PROPDMG") ], n=10 ), row.names=FALSE)
## EVTYPE PROPDMG
## FLOOD 144657709807
## HURRICANE/TYPHOON 69305840000
## TORNADO 56947380616
## STORM SURGE 43323536000
## FLASH FLOOD 16822673978
## HAIL 15735267513
## HURRICANE 11868319010
## TROPICAL STORM 7703890550
## WINTER STORM 6688497251
## HIGH WIND 5270046260
# Get the top 10 event types of the greatest crop damage
print( head( Economic_data[ order(Economic_data$CROPDMG, decreasing = TRUE), c("EVTYPE","CROPDMG") ], n=10 ), row.names=FALSE)
## EVTYPE CROPDMG
## DROUGHT 13972566000
## FLOOD 5661968450
## RIVER FLOOD 5029459000
## ICE STORM 5022113500
## HAIL 3025954473
## HURRICANE 2741910000
## HURRICANE/TYPHOON 2607872800
## FLASH FLOOD 1421317100
## EXTREME COLD 1292973000
## FROST/FREEZE 1094086000
# Make a figure of two bar plots (property and crop damage by the type of weather event)
par( mfcol=c(1,2) , mar=c(9,6,4,2) , mgp=c(4, 1, 0) )
with( head( Economic_data[ order(Economic_data$PROPDMG, decreasing = TRUE), ], n=10 ),
barplot( PROPDMG/10^6, names=EVTYPE, main="Damage to property\nby event type", cex.names = 0.7, las=2,
xlab="", ylab="Damage (million dollars)" ) )
title( xlab="Weather event" , mgp=c(8, 1, 0) )
with( head( Economic_data[ order(Economic_data$CROPDMG, decreasing = TRUE), ], n=10 ),
barplot( CROPDMG/10^6, names=EVTYPE, main="Damage to crops\nby event type", cex.names = 0.7 , las=2,
xlab="", ylab="Damage (million dollars)" ) )
title( xlab="Weather event" , mgp=c(8, 1, 0) )
box(which = "outer", lty = "solid" )
From the plot above, we can see that floods cause the most damage to property, while drought hits crops.
The total cost incurred as a result of severe weather is shown below.
# Get the top 10 event types causing the greatest total damage
print( head( Economic_data[ order(Economic_data$TotalDMG, decreasing = TRUE), c("EVTYPE","TotalDMG") ], n=10 ), row.names=FALSE)
## EVTYPE TotalDMG
## FLOOD 150319678257
## HURRICANE/TYPHOON 71913712800
## TORNADO 57362333886
## STORM SURGE 43323541000
## HAIL 18761221986
## FLASH FLOOD 18243991078
## DROUGHT 15018672000
## HURRICANE 14610229010
## RIVER FLOOD 10148404500
## ICE STORM 8967041360
# Make a bar plot (total cost of damage by the type of weather event)
par( mar=c(9,8,5,2) , mgp=c(6, 1, 0) )
with( head( Economic_data[ order(Economic_data$TotalDMG, decreasing = TRUE), ], n=10 ),
barplot( TotalDMG/10^6, names=EVTYPE, main="Total damage cost\nby event type", cex.names = 0.7, las=2,
xlab="", ylab="Damage (million dollars)" ) )
title( xlab="Weather event" , mgp=c(7, 1, 0) )
box(which = "outer", lty = "solid" )
From the plot above, we can see that the greatest cost of total damage comes from floods.