EVALUATION OF FATALITIES, INJURIES, AND DAMAGE TO BOTH PROPERTY AND
CROPS BY TYPES OF EVENTS IN THE UNITED STATES FROM 1950-2011
SYNOPSIS:
The event type that is responsible for the most fatalities and
injuries is the tornado. After this, the events most responsible for
fatality are heat, flood, winter weather, and lightning, while those
most responsible for injury are thunderstorm wind, heat, flood, and
winter weather.
The event type that is responsible for the most property damage is
the tornado. After this, the events most responsible for property damage
are thunderstorm wind, flood, hail, and lightning.
The event most responsible for crop damage is hail. After this the
event types most responsible for crop damage are flood, thunderstorm
wind, tornado, and draught.
Property damage is greatest in the month of January. From 1950-2011,
property damage due to various storms has increased exponentially.
DATA PROCESSING - PART 1
First, I will read the data into R.
setwd("C:/Users/user/Dropbox/Education/Coursera/JH Data Science/REPRODUCIBLE RESEARCH/COURSE PROJECT 5.2/")
stormData<-read.csv("dataset.csv")
stormData[,2]<-as.Date(stormData[,2], format="%m/%d/%Y")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Next, I have consolidated the EVTYPEs (event types). I placed
various event types together depending on the similarity of the EVTYPE
variable names. For example, I placed “EXCESSIVE HEAT”, “HEAT WAVE”, and
“EXTREME HEAT” together under “HEAT”.
stormData <- stormData %>%
mutate(EVTYPE = case_when(
EVTYPE=="FLASH FLOOD" ~ "FLOOD",
EVTYPE=="EXCESSIVE HEAT" ~ "HEAT",
EVTYPE=="HEAT WAVE" ~ "HEAT",
EVTYPE=="EXTREME HEAT" ~ "HEAT",
EVTYPE=="THUNDERSTORM WINDS" ~ "THUNDERSTORM WIND",
EVTYPE=="TSTM WIND" ~ "THUNDERSTORM WIND",
EVTYPE=="RIP CURRENT"~"RIP CURRENTS",
EVTYPE=="HEAVY SURF/HIGH SURF"~"HIGH SURF",
EVTYPE=="WIND"~"STRONG WIND",
EVTYPE=="HIGH WIND"~"STRONG WIND",
EVTYPE=="HIGH WINDS"~"STRONG WIND",
EVTYPE=="TORNADOES, TSTM WIND, HAIL"~"TORNADO",
EVTYPE=="TROPICAL STORM"~"HURRICANE/TYPHOON",
EVTYPE=="HURRICANE"~"HURRICANE/TYPHOON",
EVTYPE=="WILDFIRE"~"WILD/FOREST FIRE",
EVTYPE=="EXTREME COLD/WIND CHILL"~"WINTER WEATHER",
EVTYPE=="WINTER STORM"~"WINTER WEATHER",
EVTYPE=="EXTREME COLD"~"WINTER WEATHER",
EVTYPE=="WINTER WEATHER/MIX"~"WINTER WEATHER",
EVTYPE=="COLD"~"WINTER WEATHER",
EVTYPE=="COLD/WIND CHILL"~"WINTER WEATHER",
EVTYPE=="ICE STORM"~"WINTER WEATHER",
EVTYPE=="HEAVY SNOW"~"WINTER WEATHER",
EVTYPE=="BLIZZARD"~"WINTER WEATHER",
EVTYPE=="FLASH FLOODING"~"FLOOD",
EVTYPE=="FLOOD/FLASH FLOOD"~"FLOOD",
EVTYPE=="RIVER FLOOD"~"FLOOD",
EVTYPE=="URBAN FLOOD"~"FLOOD",
EVTYPE=="COASTAL FLOOD"~"FLOOD",
EVTYPE=="URBAN FLOODING"~"FLOOD",
EVTYPE=="FLOODING"~"FLOOD",
EVTYPE=="URBAN/SML STREAM FLD"~"FLOOD",
EVTYPE=="COASTAL FLOODING"~"FLOOD",
EVTYPE=="FLASH FLOOD/FLOOD"~"FLOOD",
TRUE ~ EVTYPE # Keep other values unchanged
))
DATA PROCESSING - PART 2
Next, I calculated the property damages for the various events by
multiplying PROPDMG by the conversion for PROPDMGEXP.
stormData[stormData$PROPDMGEXP=="", 38]<-"1"
stormData[stormData$PROPDMGEXP=="-", 38]<-"1"
stormData[stormData$PROPDMGEXP=="?", 38]<-"1"
stormData[stormData$PROPDMGEXP=="+", 38]<-"1"
stormData[stormData$PROPDMGEXP=="0", 38]<-"1"
stormData[stormData$PROPDMGEXP=="1", 38]<-"10"
stormData[stormData$PROPDMGEXP=="2", 38]<-"100"
stormData[stormData$PROPDMGEXP=="3", 38]<-"1000"
stormData[stormData$PROPDMGEXP=="4", 38]<-"10000"
stormData[stormData$PROPDMGEXP=="5", 38]<-"100000"
stormData[stormData$PROPDMGEXP=="6", 38]<-"1000000"
stormData[stormData$PROPDMGEXP=="7", 38]<-"10000000"
stormData[stormData$PROPDMGEXP=="8", 38]<-"100000000"
stormData[stormData$PROPDMGEXP=="B", 38]<-"1000000000"
stormData[stormData$PROPDMGEXP=="h", 38]<-"100"
stormData[stormData$PROPDMGEXP=="H", 38]<-"100"
stormData[stormData$PROPDMGEXP=="K", 38]<-"1000"
stormData[stormData$PROPDMGEXP=="m", 38]<-"1000000"
stormData[stormData$PROPDMGEXP=="M", 38]<-"1000000"
stormData[,38]<-as.numeric(stormData[,38])
## Warning: NAs introduced by coercion
stormData[,39]<-as.numeric(stormData[,25])*stormData[,38]
## Warning: NAs introduced by coercion
Next, I calculated the crop damages for the various events by
multiplying CROPDMG by the conversion for CROPDMGEXP.
stormData[stormData$PROPDMGEXP=="", 40]<-"1"
stormData[stormData$PROPDMGEXP=="?", 40]<-"1"
stormData[stormData$PROPDMGEXP=="0", 40]<-"1"
stormData[stormData$PROPDMGEXP=="2", 40]<-"100"
stormData[stormData$PROPDMGEXP=="B", 40]<-"1000000000"
stormData[stormData$PROPDMGEXP=="k", 40]<-"1000"
stormData[stormData$PROPDMGEXP=="K", 40]<-"1000"
stormData[stormData$PROPDMGEXP=="m", 40]<-"1000000"
stormData[stormData$PROPDMGEXP=="M", 40]<-"1000000"
stormData[,40]<-as.numeric(stormData[,40])
## Warning: NAs introduced by coercion
stormData[,41]<-as.numeric(stormData[,27])*stormData[,40]
## Warning: NAs introduced by coercion
RESULTS - PART 1
Next, I listed the categories of event types with the most
fatalities and printed the top 15
stormData <- stormData %>%
mutate(FATALITIES = as.numeric(FATALITIES)) # Ensure numeric
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `FATALITIES = as.numeric(FATALITIES)`.
## Caused by warning:
## ! NAs introduced by coercion
fatalStormData <- stormData %>%
group_by(EVTYPE) %>%
summarise(Total_Fatalities = sum(FATALITIES, na.rm = TRUE))
fatalStormDataOrdered <- fatalStormData %>%
arrange(desc(Total_Fatalities))
print(fatalStormDataOrdered[1:15, ])
## # A tibble: 15 × 2
## EVTYPE Total_Fatalities
## <chr> <dbl>
## 1 TORNADO 5618
## 2 HEAT 3108
## 3 FLOOD 1538
## 4 WINTER WEATHER 999
## 5 LIGHTNING 816
## 6 N 812
## 7 THUNDERSTORM WIND 701
## 8 RIP CURRENTS 572
## 9 WNW 560.
## 10 S 544
## 11 NW 500.
## 12 WSW 491
## 13 SSE 425
## 14 STRONG WIND 409
## 15 NNE 262
Next, I listed the categories of event types with the most injuries
and printed the top 15
stormData <- stormData %>%
mutate(INJURIES = as.numeric(INJURIES)) # Convert to numeric
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `INJURIES = as.numeric(INJURIES)`.
## Caused by warning:
## ! NAs introduced by coercion
injuredStormData <- stormData %>%
group_by(EVTYPE) %>%
summarise(Total_Injured = sum(INJURIES, na.rm = TRUE))
injuredStormDataOrdered<-injuredStormData %>%
arrange(desc(Total_Injured))
print(injuredStormDataOrdered[1:15, ])
## # A tibble: 15 × 2
## EVTYPE Total_Injured
## <chr> <dbl>
## 1 TORNADO 90671
## 2 THUNDERSTORM WIND 9353
## 3 HEAT 9089
## 4 FLOOD 8674
## 5 WINTER WEATHER 5907
## 6 LIGHTNING 5230
## 7 STRONG WIND 1805
## 8 HURRICANE/TYPHOON 1661
## 9 WILD/FOREST FIRE 1456
## 10 HAIL 1361
## 11 FOG 734
## 12 RIP CURRENTS 529
## 13 DUST STORM 440
## 14 DENSE FOG 342
## 15 HEAVY RAIN 251
The event type that is responsible for the most fatalities and
injuries is the tornado. After this, the events most responsible for
fatality are heat, flood, winter weather, and lightning, while those
most responsible for injury are thunderstorm wind, heat, flood, and
winter weather.
RESULTS - PART 2
Next, I listed the categories of event types with the highest
property and printed the top 15
# Convert PROPDMG to numeric
stormData <- stormData %>%
mutate(PROPDMG = as.numeric(PROPDMG))
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `PROPDMG = as.numeric(PROPDMG)`.
## Caused by warning:
## ! NAs introduced by coercion
# Now perform aggregation
propdmgStormDataOrdered <- stormData %>%
group_by(EVTYPE) %>%
summarise(Total_Prop_Damage = sum(PROPDMG, na.rm = TRUE)) %>%
arrange(desc(Total_Prop_Damage))
# Print top 15 results
print(propdmgStormDataOrdered[1:15,])
## # A tibble: 15 × 2
## EVTYPE Total_Prop_Damage
## <chr> <dbl>
## 1 TORNADO 3208665.
## 2 THUNDERSTORM WIND 2658988.
## 3 FLOOD 2450035
## 4 HAIL 687863.
## 5 LIGHTNING 603352.
## 6 STRONG WIND 446001.
## 7 WINTER WEATHER 375942.
## 8 WILD/FOREST FIRE 123804.
## 9 HURRICANE/TYPHOON 69777.
## 10 HEAVY RAIN 50842.
## 11 STORM SURGE 19393.
## 12 LANDSLIDE 18962.
## 13 LAKE-EFFECT SNOW 14141
## 14 WATERSPOUT 9354.
## 15 FOG 8850.
Next, I listed the categories of event types with the highest crop
damage and printed the top 15
# Convert CROPDMG to numeric
stormData <- stormData %>%
mutate(CROPDMG = as.numeric(CROPDMG))
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `CROPDMG = as.numeric(CROPDMG)`.
## Caused by warning:
## ! NAs introduced by coercion
# Now perform aggregation
cropdmgStormDataOrdered <- stormData %>%
group_by(EVTYPE) %>%
summarise(Total_Crop_Damage = sum(CROPDMG, na.rm = TRUE)) %>%
arrange(desc(Total_Crop_Damage))
# Print top 15 results
print(cropdmgStormDataOrdered[1:15,])
## # A tibble: 15 × 2
## EVTYPE Total_Crop_Damage
## <chr> <dbl>
## 1 HAIL 579596.
## 2 FLOOD 365338.
## 3 THUNDERSTORM WIND 194679.
## 4 TORNADO 100021.
## 5 DROUGHT 33899.
## 6 STRONG WIND 20960.
## 7 HURRICANE/TYPHOON 16037.
## 8 WINTER WEATHER 12792.
## 9 HEAVY RAIN 11123.
## 10 WILD/FOREST FIRE 8554.
## 11 FROST/FREEZE 7034.
## 12 TSTM WIND/HAIL 4357.
## 13 LIGHTNING 3581.
## 14 HIGH WINDS/COLD 2005
## 15 SMALL HAIL 1732.
The event type that is responsible for the most property damage is
the tornado. After this, the events most responsible for property damage
are thunderstorm wind, flood, hail, and lightning.
The event most responsible for crop damage is hail. After this the
event types most responsible for crop damage are flood, thunderstorm
wind, tornado, and draught.
PLOT OF PROPERTY DAMAGE BY CALENDAR YEAR
stormData[,42]<-as.numeric(format(stormData[,2],"%Y"))
propDmgPerYear<-aggregate(stormData[39], by=stormData[42], sum)
propDmgPerYear[,2]<-log10(propDmgPerYear[,2])
plot(propDmgPerYear, main="log10 Property Damage by Year - Plot", xlab = "Year", ylab = "log10 Property Damage (in Dollars)", type="l")

PLOT OF PROPERTY DAMAGE BY MONTH OF THE YEAR
stormData[,43]<-as.numeric(format(stormData[,2],"%m"))
propDmgPerMonth<-aggregate(stormData[39], by=stormData[43], sum)
propDmgPerMonth[,2]<-propDmgPerMonth[,2]/10^9
plot(propDmgPerMonth, main="Property Damage by Month - Plot", xlab = "Month of Year", ylab = "Property Damage in Billions of Dollars)", type="l")
