This analysis explores the National Oceanic and Atmospheric Administration (NOAA) storm database. The database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage during. This report will analyse the data to discover which weather events have the greatest economic and public health impact.
The data and documentaion for this report. Data: Storm Data 47Mb] Documentation: National Weather Service Storm Data Documentation
Load Library
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Load the data. Data must be in working directory:
NOAA_data <- read.csv("repdata-data-StormData.csv.bz2")
View the data variables
names(NOAA_data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
Exploring the property damage and crop damage variables. Using the column recording a multiplier for each observation. The columns are: PROPDMGEXP and CROPDMGEXP. In this two columns we have abbreviated multipliers H (Hundred), K (Thousand), M (Million) and B (Billion).
## Cleaning PROPDMG Column
NOAA_data$PROPDMGEXP <- gsub("[Hh]", "2", NOAA_data$PROPDMGEXP)
NOAA_data$PROPDMGEXP <- gsub("[Kk]", "3", NOAA_data$PROPDMGEXP)
NOAA_data$PROPDMGEXP <- gsub("[Mm]", "6", NOAA_data$PROPDMGEXP)
NOAA_data$PROPDMGEXP <- gsub("[Bb]", "9", NOAA_data$PROPDMGEXP)
NOAA_data$PROPDMGEXP <- gsub("\\+|\\-|\\?\\ ", "0", NOAA_data$PROPDMGEXP)
NOAA_data$PROPDMGEXP <- as.numeric(NOAA_data$PROPDMGEXP)
## Warning: NAs introduced by coercion
NOAA_data$PROPDMGEXP[is.na(NOAA_data$PROPDMGEXP)] <- 0
## Cleaning CROPDMGEXP Column
NOAA_data$CROPDMGEXP <- gsub("[Hh]", "2", NOAA_data$CROPDMGEXP)
NOAA_data$CROPDMGEXP <- gsub("[Kk]", "3", NOAA_data$CROPDMGEXP)
NOAA_data$CROPDMGEXP <- gsub("[Mm]", "6", NOAA_data$CROPDMGEXP)
NOAA_data$CROPDMGEXP <- gsub("[Bb]", "9", NOAA_data$CROPDMGEXP)
NOAA_data$CROPDMGEXP <- gsub("\\+|\\-|\\?\\ ", "0", NOAA_data$CROPDMGEXP)
NOAA_data$CROPDMGEXP <- as.numeric(NOAA_data$CROPDMGEXP)
## Warning: NAs introduced by coercion
NOAA_data$CROPDMGEXP[is.na(NOAA_data$CROPDMGEXP)] <- 0
Creating two new variables (PROPDMGVAR and CROPDMGVAR) containg the total values of property and crop damages.
## Cleaning some columns of interest
NOAA_data <- mutate(NOAA_data, PROPDMGVAR = PROPDMG * (10 ^ PROPDMGEXP), CROPDMGVAR = CROPDMG *
(10 ^ CROPDMGEXP))
1. Across the United States, which types of events are most harmful with respect to population health?
Summarise the varaibles (Fatalities and Injuries) by the type of weather event
POP <- summarise(group_by(NOAA_data, EVTYPE), TOTAL_FATALITIES = sum(FATALITIES), TOTAL_INJURIES = sum(INJURIES))
Create a variable of Total Injuries and Fatalities
POP <- mutate(POP, TOTAL_LOSS = TOTAL_FATALITIES + TOTAL_INJURIES)
Ouput showing (in descending order) total fatalities by weather event type
arrange(POP, desc(TOTAL_FATALITIES))[1:10, 1:2]
## Source: local data frame [10 x 2]
##
## EVTYPE TOTAL_FATALITIES
## (fctr) (dbl)
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
Ouput showing (in descending order) total injuries by weather event type
arrange(POP, desc(TOTAL_INJURIES))[1:10, c(1, 3)]
## Source: local data frame [10 x 2]
##
## EVTYPE TOTAL_INJURIES
## (fctr) (dbl)
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
Ouput showing (in descending order) total injuries & total fatalities by weather event type
arrange(POP, desc(TOTAL_LOSS))[1:10, c(1, 4)]
## Source: local data frame [10 x 2]
##
## EVTYPE TOTAL_LOSS
## (fctr) (dbl)
## 1 TORNADO 96979
## 2 EXCESSIVE HEAT 8428
## 3 TSTM WIND 7461
## 4 FLOOD 7259
## 5 LIGHTNING 6046
## 6 HEAT 3037
## 7 FLASH FLOOD 2755
## 8 ICE STORM 2064
## 9 THUNDERSTORM WIND 1621
## 10 WINTER STORM 1527
A Plot showing the total injuries and fatalities
TOTAL_POP <- arrange(POP, desc(TOTAL_LOSS))[1:10, c(1, 4)]
par(mar=c(11,5,1,1))
barplot(height = TOTAL_POP$TOTAL_LOSS, names.arg = TOTAL_POP$EVTYPE, main = 'Fatalities', las=2)
2. Across the United States, which types of events have the greatest economic consequences
Summarise the varaibles (Property and Crop Damage) by the type of weather event
ECON <- summarise(group_by(NOAA_data, EVTYPE), TOTAL_PROPDMG = sum(PROPDMGVAR), TOTAL_CROPDMG = sum(CROPDMGVAR))
Create a variable of total property and crop damage
ECON <- mutate(ECON, TOTAL_ECON_LOSS = TOTAL_PROPDMG + TOTAL_CROPDMG)
Ouput showing (in descending order) total property damage by weather event type par(mar=c(10,5,1,1))
arrange(ECON, desc(TOTAL_PROPDMG))[1:10, 1:2]
## Source: local data frame [10 x 2]
##
## EVTYPE TOTAL_PROPDMG
## (fctr) (dbl)
## 1 FLOOD 144657709807
## 2 HURRICANE/TYPHOON 69305840000
## 3 TORNADO 56947380676
## 4 STORM SURGE 43323536000
## 5 FLASH FLOOD 16822673978
## 6 HAIL 15735267513
## 7 HURRICANE 11868319010
## 8 TROPICAL STORM 7703890550
## 9 WINTER STORM 6688497251
## 10 HIGH WIND 5270046295
Ouput showing (in descending order) total crop damage by weather event type
arrange(ECON, desc(TOTAL_CROPDMG))[1:10, c(1, 3)]
## Source: local data frame [10 x 2]
##
## EVTYPE TOTAL_CROPDMG
## (fctr) (dbl)
## 1 DROUGHT 13972566000
## 2 FLOOD 5661968450
## 3 RIVER FLOOD 5029459000
## 4 ICE STORM 5022113500
## 5 HAIL 3025954473
## 6 HURRICANE 2741910000
## 7 HURRICANE/TYPHOON 2607872800
## 8 FLASH FLOOD 1421317100
## 9 EXTREME COLD 1292973000
## 10 FROST/FREEZE 1094086000
Ouput showing (in descending order) total property and crop by weather event type
arrange(ECON, desc(TOTAL_ECON_LOSS))[1:10, c(1, 4)]
## Source: local data frame [10 x 2]
##
## EVTYPE TOTAL_ECON_LOSS
## (fctr) (dbl)
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57362333946
## 4 STORM SURGE 43323541000
## 5 HAIL 18761221986
## 6 FLASH FLOOD 18243991078
## 7 DROUGHT 15018672000
## 8 HURRICANE 14610229010
## 9 RIVER FLOOD 10148404500
## 10 ICE STORM 8967041360
A Plot showing the total injuries and fatalities
TOTAL_ECON <- arrange(ECON, desc(TOTAL_ECON_LOSS))[1:10, c(1, 4)]
par(mar=c(11,5,1,1))
barplot(height = TOTAL_ECON$TOTAL_ECON_LOSS, names.arg = TOTAL_ECON$EVTYPE, main = 'Property Damage', las=2)
This analysis shows weather events that cause the greatest public health and economic problems are: