Analyzing national weather service storm data, it was found out that among almost a thousand weather types, the top 10 event types that lead to the most population health damage are heat wave, tropical storm gordon, tornadoes, tstm wind, hail, cold and snow, thunderstormw, high wind and seas, heat wave drought, snow/high winds, winter storm high winds. The top 10 event types that lead to the most economic damage are tropical storm gordon, coastal erosion, heavy rain and flood, river and stream flood, landslump, dust storm/high winds, high winds/cold, forest fires, blizzard/winter storm, flash flood. Beside, we found out that events that cause the most fatalities have little intersection with events that cause the most injuries.
setwd("~/Desktop/JHU/Reproducible_Research/PA2")
df <- read.csv("repdata-data-StormData.csv")
head(df)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
# extract data useful for this task
df <- df[, c(8, 23, 24, 25, 26, 27, 28)]
summary(df)
## EVTYPE FATALITIES INJURIES PROPDMG
## HAIL :288661 Min. : 0 Min. : 0.0 Min. : 0
## TSTM WIND :219940 1st Qu.: 0 1st Qu.: 0.0 1st Qu.: 0
## THUNDERSTORM WIND: 82563 Median : 0 Median : 0.0 Median : 0
## TORNADO : 60652 Mean : 0 Mean : 0.2 Mean : 12
## FLASH FLOOD : 54277 3rd Qu.: 0 3rd Qu.: 0.0 3rd Qu.: 0
## FLOOD : 25326 Max. :583 Max. :1700.0 Max. :5000
## (Other) :170878
## PROPDMGEXP CROPDMG CROPDMGEXP
## :465934 Min. : 0.0 :618413
## K :424665 1st Qu.: 0.0 K :281832
## M : 11330 Median : 0.0 M : 1994
## 0 : 216 Mean : 1.5 k : 21
## B : 40 3rd Qu.: 0.0 0 : 19
## 5 : 28 Max. :990.0 B : 9
## (Other): 84 (Other): 9
str(df)
## 'data.frame': 902297 obs. of 7 variables:
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
names(df)
## [1] "EVTYPE" "FATALITIES" "INJURIES" "PROPDMG" "PROPDMGEXP"
## [6] "CROPDMG" "CROPDMGEXP"
event_fatality_mean <- tapply(df$FATALITIES, df$EVTYPE, mean, na.rm = TRUE)
sorted_event_fatality_mean <- sort(event_fatality_mean, decreasing = TRUE)
sorted_event_fatality_mean[1:10]
## TORNADOES, TSTM WIND, HAIL COLD AND SNOW
## 25.000 14.000
## TROPICAL STORM GORDON RECORD/EXCESSIVE HEAT
## 8.000 5.667
## EXTREME HEAT HEAT WAVE DROUGHT
## 4.364 4.000
## HIGH WIND/SEAS MARINE MISHAP
## 4.000 3.500
## WINTER STORMS Heavy surf and wind
## 3.333 3.000
So, the top 10 event types that cause the highest averge fatalities are TORNADOES, TSTM WIND, HAIL, COLD AND SNOW, TROPICAL STORM GORDON, RECORD/EXCESSIVE HEAT, EXTREME HEAT, HEAT WAVE DROUGHT, HIGH WIND/SEAS, MARINE MISHAP, WINTER STORMS, Heavy surf and wind, with average fatalities 25, 14, 8, 5.67, 4.36, 4, 4, 3.5, 3.33, 3, respectively.
event_fatality_median <- tapply(df$FATALITIES, df$EVTYPE, median, na.rm = TRUE)
sorted_event_fatality_median <- sort(event_fatality_median, decreasing = TRUE)
sorted_event_fatality_median[1:10]
## TORNADOES, TSTM WIND, HAIL COLD AND SNOW
## 25.0 14.0
## TROPICAL STORM GORDON HEAT WAVE DROUGHT
## 8.0 4.0
## HIGH WIND/SEAS MARINE MISHAP
## 4.0 3.5
## Heavy surf and wind HIGH WIND AND SEAS
## 3.0 3.0
## HEAT WAVES RIP CURRENTS/HEAVY SURF
## 2.5 2.5
So, the top 10 event types that cause the highest fatalities (from the aspect of median) are TORNADOES, TSTM WIND, HAIL, COLD AND SNOW, TROPICAL STORM GORDON, HEAT WAVE DROUGHT, HIGH WIND/SEAS, MARINE MISHAP, Heavy surf and wind, HIGH WIND AND SEAS, HEAT WAVES, RIP CURRENTS/HEAVY SURF, with average fatalities 25, 14, 8, 4, 4, 3.5, 3, 3, 2.5, 2.5, respectively.
event_injuries_mean <- tapply(df$INJURIES, df$EVTYPE, mean, na.rm = TRUE)
sorted_event_injuries_mean <- sort(event_injuries_mean, decreasing = TRUE)
sorted_event_injuries_mean[1:10]
## Heat Wave TROPICAL STORM GORDON WILD FIRES
## 70.00 43.00 37.50
## THUNDERSTORMW HIGH WIND AND SEAS SNOW/HIGH WINDS
## 27.00 20.00 18.00
## GLAZE/ICE STORM HEAT WAVE DROUGHT WINTER STORM HIGH WINDS
## 15.00 15.00 15.00
## HURRICANE/TYPHOON
## 14.49
So, the top 10 event types that cause the highest averge injuries are Heat Wave, TROPICAL STORM GORDON, WILD FIRES, THUNDERSTORMW, HIGH WIND AND SEAS, SNOW/HIGH WINDS, GLAZE/ICE STORM, HEAT WAVE DROUGHT, WINTER STORM HIGH WINDS, HURRICANE/TYPHOON, with average injuries 70, 43, 37.5, 27, 20, 18, 15, 15, 15, 14.49, respectively.
event_injuries_median <- tapply(df$INJURIES, df$EVTYPE, median, na.rm = TRUE)
sorted_event_injuries_median <- sort(event_injuries_median, decreasing = TRUE)
sorted_event_injuries_median[1:10]
## Heat Wave TROPICAL STORM GORDON THUNDERSTORMW
## 70 43 27
## HIGH WIND AND SEAS SNOW/HIGH WINDS GLAZE/ICE STORM
## 20 18 15
## HEAT WAVE DROUGHT WINTER STORM HIGH WINDS NON-SEVERE WIND DAMAGE
## 15 15 7
## TORNADO F2
## 4
So, the top 10 event types that cause the highest averge injuries are Heat Wave, TROPICAL STORM GORDON, THUNDERSTORMW, HIGH WIND AND SEAS, SNOW/HIGH WINDS, GLAZE/ICE STORM, HEAT WAVE DROUGHT, WINTER STORM HIGH WINDS, NON-SEVERE WIND DAMAGE, TORNADO F2, with average injuries 70, 43, 27, 20, 18, 15, 15, 15, 7, 4, respectively.
intersect(names(sorted_event_fatality_median[1:10]), names(sorted_event_fatality_mean[1:10]))
## [1] "TORNADOES, TSTM WIND, HAIL" "COLD AND SNOW"
## [3] "TROPICAL STORM GORDON" "HEAT WAVE DROUGHT"
## [5] "HIGH WIND/SEAS" "MARINE MISHAP"
## [7] "Heavy surf and wind"
intersect(names(sorted_event_injuries_median[1:10]), names(sorted_event_injuries_mean[1:10]))
## [1] "Heat Wave" "TROPICAL STORM GORDON"
## [3] "THUNDERSTORMW" "HIGH WIND AND SEAS"
## [5] "SNOW/HIGH WINDS" "GLAZE/ICE STORM"
## [7] "HEAT WAVE DROUGHT" "WINTER STORM HIGH WINDS"
intersect(names(sorted_event_fatality_mean[1:10]), names(sorted_event_injuries_mean[1:10]))
## [1] "TROPICAL STORM GORDON" "HEAT WAVE DROUGHT"
intersect(names(sorted_event_fatality_median[1:10]), names(sorted_event_injuries_median[1:10]))
## [1] "TROPICAL STORM GORDON" "HEAT WAVE DROUGHT" "HIGH WIND AND SEAS"
*From the analysis above, and Figure 1(sub.figure 1 and 2), for either fatality or injury data, the results from using mean doesn't differ much from using median (intersection = 7 or 8 out of 10). But for either mean or median, the results from using fatality data differ very much( intersection = 2 or 3 out of 10) *Therefore, a weighted data using both fatality and injury data is used below (1 on fatality, 0.5 on injury):
df$FATA_INJU <- df$FATALITIES + 0.5 * df$INJURIES
event_fatality_injury_median <- tapply(df$FATA_INJU, df$EVTYPE, median, na.rm = TRUE)
event_fatality_injury_mean <- tapply(df$FATA_INJU, df$EVTYPE, mean, na.rm = TRUE)
sorted_event_fatality_injury_median <- sort(event_fatality_injury_median, decreasing = TRUE)
sorted_event_fatality_injury_mean <- sort(event_fatality_injury_mean, decreasing = TRUE)
sorted_event_fatality_injury_median[1:10]
## Heat Wave TROPICAL STORM GORDON
## 35.0 29.5
## TORNADOES, TSTM WIND, HAIL COLD AND SNOW
## 25.0 14.0
## THUNDERSTORMW HIGH WIND AND SEAS
## 13.5 13.0
## HEAT WAVE DROUGHT SNOW/HIGH WINDS
## 11.5 9.0
## WINTER STORM HIGH WINDS GLAZE/ICE STORM
## 8.5 7.5
sorted_event_fatality_injury_mean[1:10]
## Heat Wave TROPICAL STORM GORDON
## 35.0 29.5
## TORNADOES, TSTM WIND, HAIL WILD FIRES
## 25.0 19.5
## COLD AND SNOW THUNDERSTORMW
## 14.0 13.5
## HIGH WIND AND SEAS HEAT WAVE DROUGHT
## 13.0 11.5
## SNOW/HIGH WINDS WINTER STORM HIGH WINDS
## 9.0 8.5
x <- intersect(names(sorted_event_fatality_injury_median[1:10]), names(sorted_event_fatality_injury_mean[1:10]))
So, the top 10 event types that cause the highest fatalities and injuries (from the aspect of median) are Heat Wave, TROPICAL STORM GORDON, TORNADOES, TSTM WIND, HAIL, COLD AND SNOW, THUNDERSTORMW, HIGH WIND AND SEAS, HEAT WAVE DROUGHT, SNOW/HIGH WINDS, WINTER STORM HIGH WINDS
#### Here is the scatterplot
```r
par(mfrow = c(3, 1))
x <- 1:length(event_injuries_median)
plot(x, event_fatality_median, type = "l", col = "green", lwd = 2, xlab = "",
ylab = "fatalities", cex.axis = 1.5, cex.lab = 1.5)
points(x, event_fatality_mean, lty = 3, type = "l", col = "red", lwd = 2)
legend("topleft", legend = c("mean", "median"), lty = c(3, 1), col = c("red",
"green"), lwd = 3, cex = 1.5)
title("Figure 1. number of fatalities and/or injuries under different event types",
cex.main = 2)
plot(x, event_injuries_median, type = "l", col = "green", lwd = 2, xlab = "",
ylab = "injuries", cex.axis = 1.5, cex.lab = 1.5)
points(x, event_injuries_mean, lty = 3, type = "l", col = "red", lwd = 2)
legend("topleft", legend = c("mean", "median"), lty = c(3, 1), col = c("red",
"green"), lwd = 3, cex = 1.5)
plot(x, event_fatality_injury_median, type = "l", col = "green", lwd = 2, xlab = "event type",
ylab = "fatalities and injuries", cex.axis = 1.5, cex.lab = 1.5)
points(x, event_fatality_injury_mean, lty = 3, type = "l", col = "red", lwd = 2)
legend("topleft", legend = c("mean", "median"), lty = c(3, 1), col = c("red",
"green"), lwd = 3, cex = 1.5)
df$EconDMG <- df$PROPDMG + df$CROPDMG
event_EconDMG_mean <- tapply(df$EconDMG, df$EVTYPE, mean, na.rm = TRUE)
event_EconDMG_median <- tapply(df$EconDMG, df$EVTYPE, median, na.rm = TRUE)
sorted_event_EconDMG_mean <- sort(event_EconDMG_mean, decreasing = TRUE)
sorted_event_EconDMG_median <- sort(event_EconDMG_median, decreasing = TRUE)
intersect(names(sorted_event_EconDMG_mean[1:10]), names(sorted_event_EconDMG_median[1:10]))
## [1] "TROPICAL STORM GORDON" "COASTAL EROSION"
## [3] "HEAVY RAIN AND FLOOD" "RIVER AND STREAM FLOOD"
## [5] "Landslump" "DUST STORM/HIGH WINDS"
## [7] "HIGH WINDS/COLD" "FOREST FIRES"
## [9] "BLIZZARD/WINTER STORM" "FLASH FLOOD/"
x <- 1:length(event_EconDMG_median)
plot(x, event_EconDMG_median, type = "l", col = "green", lwd = 2, xlab = "",
ylab = "economic damage", cex.axis = 1.5, cex.lab = 1.5)
points(x, event_EconDMG_mean, lty = 3, type = "l", col = "red", lwd = 2)
legend("topleft", legend = c("mean", "median"), lty = c(3, 1), col = c("red",
"green"), lwd = 3, cex = 1.5)
title("Figure 2. number of economic damages under different event types", cex.main = 1)
Therefore, the results from mean and median are almost the same, and the top 10 events that cause the most serious economic damage are TROPICAL STORM GORDON, COASTAL EROSION, HEAVY RAIN AND FLOOD, RIVER AND STREAM FLOOD, Landslump, DUST STORM/HIGH WINDS, HIGH WINDS/COLD, FOREST FIRES, BLIZZARD/WINTER STORM, FLASH FLOOD/
So from the analysis above, the top 10 event types that lead to the most population health damage are heat wave, tropical storm gordon, tornadoes, tstm wind, hail, cold and snow, thunderstormw, high wind and seas, heat wave drought, snow/high winds, winter storm high winds. The top 10 event types that lead to the most economic damage are tropical storm gordon, coastal erosion, heavy rain and flood, river and stream flood, landslump, dust storm/high winds, high winds/cold, forest fires, blizzard/winter storm, flash flood.