The National Oceanic and Atmospheric Administration (NOAA) keeps track of the amount of damage caused by natural events both physically and financially. Events are categorized into 16 discrete categories (Tornado, SNOW/ICE, STORMS, etc). The physical damage is tracked for properties and crops separately. This reports looks at the overall damage caused by these events by adding up the number of fatalities and injuries per category. In the report, I also take a look at the top 20 events recorded nationally. For financial harm, I computed the top 20 events that caused property damage and crop damage. This summary includes all the data process and computations note needed to reproduce the analysis. The results section includes the key data tables and figures all in one place.
As part of the data processing, I like to make sure that I have the needed libraries loaded (see “Loading the appropriate libraries) and that I access the file directly from the server (see”Getting the stormdata file from the coursera site").
To be able to answer the questions in this assignment, data variables need to be created (columns added to the data frame) by combining data from the existing data frame. Those steps are also shown below.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
##
## between, first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Attaching package: 'mice'
## The following object is masked from 'package:stats':
##
## filter
## The following objects are masked from 'package:base':
##
## cbind, rbind
##
## Attaching package: 'pastecs'
## The following objects are masked from 'package:dplyr':
##
## first, last
## The following objects are masked from 'package:data.table':
##
## first, last
url <-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <-("~/Desktop/SantanaHardDisk/visualizationcourse/R_practice_files/reproducibility/week4/StormData.csv")
download.file(url, destfile)
stormdata <- read.table("StormData.csv", sep = ",", header = TRUE)
head(stormdata[1:5,1:4])
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## 1 1 4/18/1950 0:00:00 0130 CST
## 2 1 4/18/1950 0:00:00 0145 CST
## 3 1 2/20/1951 0:00:00 1600 CST
## 4 1 6/8/1951 0:00:00 0900 CST
## 5 1 11/15/1951 0:00:00 1500 CST
To help with the analysis it is helpful to create a more manegeable subset of the data.
SUBstormdata <- select(stormdata, c(1:8, 23:28))
head(SUBstormdata)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 0 15 25.0 K 0
## 2 0 0 2.5 K 0
## 3 0 2 25.0 K 0
## 4 0 2 2.5 K 0
## 5 0 2 2.5 K 0
## 6 0 6 2.5 K 0
Determining the Healthy Impact by using the sum of the fatalities and injuries for that state.
The code below adds a column called ‘healthimpact’ by adding the fatalities and injuries together.
SUBstormdata$healthimpact <- SUBstormdata$FATALITIES + SUBstormdata$INJURIES
summary(SUBstormdata)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0.0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31.0 Class :character Class :character Class :character
## Median : 75.0 Mode :character Mode :character Mode :character
## Mean :100.6
## 3rd Qu.:131.0
## Max. :873.0
## FATALITIES INJURIES PROPDMG PROPDMGEXP
## Min. : 0.0000 Min. : 0.0000 Min. : 0.00 Length:902297
## 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.00 Class :character
## Median : 0.0000 Median : 0.0000 Median : 0.00 Mode :character
## Mean : 0.0168 Mean : 0.1557 Mean : 12.06
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 0.50
## Max. :583.0000 Max. :1700.0000 Max. :5000.00
## CROPDMG CROPDMGEXP healthimpact
## Min. : 0.000 Length:902297 Min. : 0.0000
## 1st Qu.: 0.000 Class :character 1st Qu.: 0.0000
## Median : 0.000 Mode :character Median : 0.0000
## Mean : 1.527 Mean : 0.1725
## 3rd Qu.: 0.000 3rd Qu.: 0.0000
## Max. :990.000 Max. :1742.0000
Now, I will calculate the total number of cases (sum of Health Impact) by Event Type.
SUBevtype <- group_by(SUBstormdata, SUBstormdata$EVTYPE)
SUBtableevtype <- summarise(SUBevtype, healthimpact = sum(healthimpact))
## `summarise()` ungrouping output (override with `.groups` argument)
SUBevtypes_top20 <- head(arrange(SUBtableevtype, desc(healthimpact)), 20)
print(SUBevtypes_top20)
## # A tibble: 20 x 2
## `SUBstormdata$EVTYPE` healthimpact
## <chr> <dbl>
## 1 TORNADO 96979
## 2 EXCESSIVE HEAT 8428
## 3 TSTM WIND 7461
## 4 FLOOD 7259
## 5 LIGHTNING 6046
## 6 HEAT 3037
## 7 FLASH FLOOD 2755
## 8 ICE STORM 2064
## 9 THUNDERSTORM WIND 1621
## 10 WINTER STORM 1527
## 11 HIGH WIND 1385
## 12 HAIL 1376
## 13 HURRICANE/TYPHOON 1339
## 14 HEAVY SNOW 1148
## 15 WILDFIRE 986
## 16 THUNDERSTORM WINDS 972
## 17 BLIZZARD 906
## 18 FOG 796
## 19 RIP CURRENT 600
## 20 WILD/FOREST FIRE 557
Visualizing the information is always helpful. Below is the same data in a barplot.
color <- brewer.pal(5, "Blues")
barplot(SUBevtypes_top20$healthimpact, main = "Top 20 Health Impact Events",
names.arg = SUBevtypes_top20$`SUBstormdata$EVTYPE`, cex.names = 0.6,
col=color,
xlab = "Event Type Count")
This graph indicates that TORNADO has had the most significant health impact among the top 20 events. In fact, it is approximately 12 times higher than the next highest impact event.
Next, conversion values for propdmg and cropdmg variables where they are set to below values for propdmgexp values are included in two different columns. The conversation values used are: 1000 for K, 1000,000 for M, 1000,000,000 for B and 0. It is not clear what conversion value to use for the rest of the variables.
SUBstormdata$propconv <- 0
SUBstormdata$cropconv <- 0
head(SUBstormdata)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP healthimpact
## 1 0 15 25.0 K 0 15
## 2 0 0 2.5 K 0 0
## 3 0 2 25.0 K 0 2
## 4 0 2 2.5 K 0 2
## 5 0 2 2.5 K 0 2
## 6 0 6 2.5 K 0 6
## propconv cropconv
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 0 0
The table below shows that there are 465858 row where PRPDMEXP=0–>0 indicates that there is no impact and thus they do not need to be included in the analysis.
table(filter(SUBstormdata, SUBstormdata$PROPDMGEXP == "")[,11])
##
## 0 0.41 1 2 3 4 5 6 7 8 9
## 465858 1 4 7 16 9 11 6 3 2 3
## 10 20 35 75
## 8 4 1 1
Removing entries where the PRPDMGEXP = 0
SUBstormdata[(SUBstormdata$PROPDMGEXP == "K"),][,16] <- 1000
head(filter(SUBstormdata, SUBstormdata$PROPDMGEXP == "K"))
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP healthimpact
## 1 0 15 25.0 K 0 15
## 2 0 0 2.5 K 0 0
## 3 0 2 25.0 K 0 2
## 4 0 2 2.5 K 0 2
## 5 0 2 2.5 K 0 2
## 6 0 6 2.5 K 0 6
## propconv cropconv
## 1 1000 0
## 2 1000 0
## 3 1000 0
## 4 1000 0
## 5 1000 0
## 6 1000 0
SUBstormdata[(SUBstormdata$PROPDMGEXP == "M"),][,16] <- 1000000
head(filter(SUBstormdata, SUBstormdata$PROPDMGEXP == "M"))
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 2/13/1952 0:00:00 2030 CST 43 CULLMAN AL TORNADO
## 2 1 2/13/1952 0:00:00 2030 CST 9 BLOUNT AL TORNADO
## 3 1 4/18/1953 0:00:00 1700 CST 81 LEE AL TORNADO
## 4 1 4/24/1955 0:00:00 0545 CST 103 MORGAN AL TORNADO
## 5 1 10/16/1955 0:00:00 1840 CST 31 COFFEE AL TORNADO
## 6 1 10/16/1955 0:00:00 1840 CST 45 DALE AL TORNADO
## FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP healthimpact
## 1 0 3 2.5 M 0 3
## 2 0 3 2.5 M 0 3
## 3 6 195 2.5 M 0 201
## 4 5 20 2.5 M 0 25
## 5 0 0 2.5 M 0 0
## 6 0 5 2.5 M 0 5
## propconv cropconv
## 1 1e+06 0
## 2 1e+06 0
## 3 1e+06 0
## 4 1e+06 0
## 5 1e+06 0
## 6 1e+06 0
SUBstormdata[(SUBstormdata$PROPDMGEXP == "B"),][,16] <- 1000000000
head(filter(SUBstormdata, SUBstormdata$PROPDMGEXP == "B"))
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY
## 1 1 3/12/1993 0:00:00 2200 CST 0
## 2 1 10/4/1995 0:00:00 1200 CST 0
## 3 12 10/3/1995 0:00:00 1200 CST 0
## 4 12 3/12/1993 0:00:00 2200 EST 0
## 5 12 10/4/1995 0:00:00 0000 EST 0
## 6 17 8/31/1993 0:00:00 0000 CST 1
## COUNTYNAME STATE EVTYPE
## 1 ALZ001>018 AL WINTER STORM
## 2 ALZ001>050 AL HURRICANE OPAL/HIGH WINDS
## 3 FLZ001 - 002 - 003 - 004 - 005 - 006 FL HURRICANE OPAL
## 4 FLZ001>023 FL TORNADOES, TSTM WIND, HAIL
## 5 FLZ007>019 - 026>029 - 034 FL HURRICANE OPAL
## 6 ADAMS, CALHOUN AND JERSEY IL RIVER FLOOD
## FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP healthimpact
## 1 4 0 5.0 B 0.0 4
## 2 2 0 0.1 B 10.0 M 2
## 3 1 0 2.1 B 5.0 M 1
## 4 25 0 1.6 B 2.5 M 25
## 5 0 0 1.0 B 0.0 0
## 6 0 0 5.0 B 5.0 B 0
## propconv cropconv
## 1 1e+09 0
## 2 1e+09 0
## 3 1e+09 0
## 4 1e+09 0
## 5 1e+09 0
## 6 1e+09 0
SUBstormdata[!(SUBstormdata$PROPDMGEXP %in% c("K","M","B")),][,16] <- 0
head(SUBstormdata[!(SUBstormdata$PROPDMGEXP %in% c("K","M","B")),])
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 54 1 3/21/1955 0:00:00 1800 CST 73 JEFFERSON AL
## 55 1 3/25/1955 0:00:00 1610 CST 103 MORGAN AL
## 56 1 3/25/1955 0:00:00 1700 CST 83 LIMESTONE AL
## 57 1 4/6/1955 0:00:00 0600 CST 73 JEFFERSON AL
## 58 1 4/6/1955 0:00:00 1730 CST 57 FAYETTE AL
## 59 1 4/6/1955 0:00:00 1816 CST 9 BLOUNT AL
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 54 TSTM WIND 0 0 0 0
## 55 HAIL 0 0 0 0
## 56 HAIL 0 0 0 0
## 57 TSTM WIND 0 0 0 0
## 58 HAIL 0 0 0 0
## 59 TSTM WIND 0 0 0 0
## healthimpact propconv cropconv
## 54 0 0 0
## 55 0 0 0
## 56 0 0 0
## 57 0 0 0
## 58 0 0 0
## 59 0 0 0
Showing the CROPDMGEXP cells with value of 0
table(filter(SUBstormdata, SUBstormdata$CROPDMGEXP == "")[,11])
##
## 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.1 0.11
## 473376 901 75 1604 3 547 24 3 2 1221 27
## 0.12 0.13 0.15 0.16 0.17 0.2 0.21 0.22 0.25 0.3 0.35
## 6 1 39 1 2 415 2 1 1783 198 6
## 0.4 0.45 0.5 0.51 0.55 0.6 0.65 0.66 0.7 0.75 0.76
## 77 2 4551 2 2 55 3 1 34 306 10
## 0.78 0.8 0.81 0.84 0.85 0.86 0.9 0.95 0.99 1 1.01
## 1 77 1 1 12 1 30 1 1 9225 3
## 1.02 1.03 1.04 1.05 1.06 1.07 1.1 1.11 1.13 1.14 1.15
## 3 2 4 9 4 2 55 2 5 2 13
## 1.16 1.18 1.2 1.24 1.25 1.26 1.27 1.28 1.29 1.3 1.32
## 1 1 101 3 50 3 1 3 2 36 2
## 1.33 1.34 1.35 1.37 1.38 1.4 1.41 1.43 1.45 1.46 1.47
## 1 1 2 2 1 35 1 2 2 2 2
## 1.48 1.49 1.5 1.51 1.53 1.54 1.55 1.57 1.58 1.59 1.6
## 2 2 959 3 1 1 3 2 1 2 32
## 1.61 1.62 1.65 1.68 1.69 1.7 1.71 1.72 1.73 1.74 1.75
## 2 2 4 2 1 28 1 2 1 1 12
## 1.76 1.77 1.78 1.79 1.8 1.85 1.86 1.87 1.88 1.9 1.92
## 1 1 1 1 38 2 1 1 2 19 1
## 1.95 1.99 2 2.01 2.03 2.05 2.09 2.1 2.16 2.19 2.2
## 6 8 9672 1 3 4 1 21 2 1 21
## 2.22 2.25 2.27 2.3 2.32 2.33 2.35 2.36 2.38 2.4 2.41
## 1 9 1 8 2 1 1 1 2 17 1
## 2.45 2.46 2.48 2.5 2.52 2.53 2.54 2.55 2.57 2.58 2.6
## 2 1 1 7841 1 1 1 3 1 2 11
## 2.65 2.66 2.69 2.7 2.73 2.75 2.77 2.78 2.8 2.81 2.82
## 2 1 1 13 1 2 1 2 11 1 1
## 2.9 2.95 3 3.02 3.04 3.05 3.1 3.13 3.15 3.17 3.2
## 8 2 5686 1 2 4 5 1 2 1 16
## 3.24 3.25 3.3 3.31 3.37 3.4 3.43 3.45 3.47 3.5 3.53
## 1 5 15 1 1 5 2 1 1 216 2
## 3.54 3.55 3.57 3.6 3.64 3.65 3.68 3.7 3.71 3.72 3.74
## 1 3 1 8 2 1 1 2 1 1 1
## 3.75 3.78 3.8 3.83 3.9 3.92 3.94 3.96 4 4.02 4.1
## 1 1 6 1 5 2 1 1 1771 2 8
## 4.12 4.15 4.2 4.22 4.25 4.26 4.3 4.36 4.38 4.4 4.43
## 2 3 10 1 3 1 5 4 1 9 1
## 4.44 4.5 4.51 4.52 4.57 4.6 4.65 4.7 4.71 4.8 4.85
## 10 76 1 1 1 2 1 2 1 4 1
## 4.9 4.96 5 5.05 5.08 5.09 5.1 5.13 5.15 5.16 5.2
## 5 2 20197 2 1 1 4 1 1 1 5
## 5.24 5.27 5.3 5.4 5.42 5.47 5.5 5.51 5.55 5.58 5.6
## 1 1 3 6 1 1 34 1 13 1 4
## 5.7 5.75 5.8 5.9 5.94 5.99 6 6.06 6.07 6.1 6.14
## 4 1 4 2 1 2 896 1 1 2 1
## 6.2 6.25 6.3 6.32 6.34 6.4 6.45 6.5 6.51 6.55 6.6
## 2 3 2 1 1 4 2 22 1 1 3
## 6.68 6.7 6.75 6.8 6.9 7 7.05 7.15 7.2 7.29 7.3
## 1 3 2 2 1 689 2 1 3 1 2
## 7.35 7.45 7.5 7.51 7.55 7.6 7.64 7.7 7.72 7.75 7.8
## 1 2 122 1 1 5 1 5 1 1 3
## 7.9 8 8.09 8.1 8.2 8.25 8.3 8.37 8.4 8.43 8.45
## 3 1501 1 2 1 1 2 1 2 1 1
## 8.5 8.6 8.7 8.8 8.85 8.87 8.9 8.97 9 9.06 9.17
## 24 1 3 2 1 1 1 1 184 1 1
## 9.2 9.3 9.31 9.5 9.51 9.6 9.72 9.76 9.77 9.8 9.9
## 1 2 1 8 1 1 1 1 1 1 4
## 10 10.05 10.1 10.15 10.2 10.25 10.3 10.4 10.5 10.72 10.8
## 12072 1 1 1 5 1 2 2 7 1 1
## 10.88 11 11.02 11.1 11.15 11.16 11.18 11.26 11.5 11.6 11.62
## 1 142 2 3 1 1 1 1 7 2 2
## 11.7 11.83 11.85 12 12.05 12.2 12.4 12.5 12.7 12.71 12.8
## 3 1 1 681 1 1 1 10 1 1 3
## 12.9 13 13.25 13.3 13.36 13.4 13.47 13.5 13.53 13.8 13.95
## 2 97 1 3 1 1 1 4 1 1 1
## 14 14.2 14.25 14.28 14.3 14.4 14.5 14.6 14.96 14.98 15
## 95 2 2 1 1 2 2 1 1 1 4737
## 15.3 15.5 15.75 16 16.05 16.1 16.2 16.25 16.5 16.6 16.74
## 1 6 1 85 1 1 2 1 2 2 1
## 16.9 16.93 16.96 17 17.03 17.3 17.5 17.6 17.7 17.75 17.8
## 2 1 1 86 1 1 6 1 1 1 1
## 17.9 18 18.05 18.5 18.54 18.8 18.97 19 19.2 19.3 19.5
## 1 216 1 2 1 1 1 30 1 1 4
## 19.64 19.77 19.9 19.94 20 20.02 20.1 20.4 20.5 21 21.1
## 1 1 1 8 5227 2 1 1 1 32 1
## 21.2 21.3 21.5 21.7 21.88 22 22.14 22.18 22.2 22.5 22.7
## 1 1 6 1 1 93 1 1 1 2 1
## 22.75 22.88 23 23.2 23.23 23.5 23.7 24 24.5 24.7 25
## 1 1 46 1 2 2 1 34 2 1 13121
## 25.13 25.5 25.52 26 26.2 26.3 26.5 26.87 27 27.5 28
## 1 2 1 26 1 1 1 1 41 1 47
## 28.5 28.68 29 29.96 30 30.06 30.3 30.5 31 31.3 31.5
## 3 1 20 1 2447 1 1 1 17 1 4
## 31.52 31.95 32 32.2 32.22 32.5 33 33.5 34 34.89 35
## 1 2 20 1 1 1 21 1 16 1 657
## 36 36.2 37 37.4 37.5 38 38.5 39 39.6 40 40.2
## 14 1 16 1 9 26 2 4 1 1347 1
## 41 41.7 42 42.31 42.4 43 43.6 44 44.6 44.7 44.72
## 7 1 21 1 1 9 1 12 1 1 1
## 45 45.5 45.7 46 46.5 46.8 47 47.3 47.5 48 48.02
## 246 1 2 11 1 1 12 1 1 9 1
## 49 49.94 49.98 50 50.02 50.1 51 51.5 52 53 53.8
## 8 1 1 8484 1 1 8 1 6 7 1
## 54 54.1 54.9 55 55.08 55.6 55.9 56 56.54 57 57.12
## 3 1 1 134 1 1 1 7 1 6 1
## 58 59 60 61 61.98 62 63 64 65 65.5 66
## 9 2 578 6 1 10 11 6 112 1 3
## 66.5 66.9 67 68 69 70 71 71.5 72 72.7 73
## 1 1 4 8 2 341 4 1 3 1 7
## 74 74.25 75 75.3 76 76.3 77 78 78.2 79 79.2
## 2 1 1149 1 3 1 3 4 1 2 1
## 79.98 80 81 82 82.5 83 84 85 86 86.6 87
## 1 499 5 7 1 4 1 78 1 1 2
## 87.8 88 88.15 88.5 89 89.5 90 90.43 91 92 93
## 1 5 1 1 3 1 154 1 1 4 2
## 94 94.5 95 96 97 98 98.26 99 99.97 100 100.02
## 2 1 39 4 3 4 1 5 1 3251 1
## 100.03 100.5 101 102 103 104 105 106 106.72 107 108
## 1 1 5 3 1 3 20 4 1 1 7
## 108.63 109 110 111 112 113 114 115 116 117 118
## 1 1 76 1 2 3 7 25 2 1 4
## 120 121.7 122 123 124.9 125 126 127 127.2 128 130
## 150 1 2 1 1 224 2 3 1 2 58
## 130.02 132 133 134 134.8 135 136 137.9 138 138.6 139
## 1 4 1 1 1 25 1 1 1 1 1
## 140 141 142 143 144 145 146 146.5 147 148 148.25
## 40 1 1 1 1 9 1 2 3 1 1
## 149 149.58 149.85 150 151 151.4 153 153.55 154 155 158
## 2 1 1 1077 1 1 1 1 1 11 2
## 159 159.5 160 160.8 161 161.11 162 163 163.5 164 165
## 1 1 55 1 2 1 1 1 1 1 17
## 166 166.5 167 168 170 171 172 173 174.4 175 176
## 3 1 1 1 30 1 2 2 1 104 3
## 177 178 178.4 179 179.4 179.5 179.61 180 183 183.5 184
## 2 1 1 1 1 1 7 41 1 1 1
## 185 186 187 189 190 192 193 195 196 198.5 200
## 6 1 1 1 20 1 1 9 1 1 1168
## 201 202 203 204 205 206 207 209 210 212 213
## 1 2 2 1 5 1 2 1 25 1 2
## 214 215 218 219 220 224 225 227 229 229.9 230
## 1 6 1 1 21 2 46 3 1 1 18
## 231 233 235 237 240 242 243 245 246 246.1 250
## 2 1 5 2 18 1 2 3 1 1 7294
## 250.03 253 255 257 257.95 259 260 261 262 265 266
## 1 1 1 1 2 1 11 1 1 5 1
## 269 270 270.75 271 275 277 278 278.6 279 280 280.1
## 1 7 1 1 48 1 1 1 2 12 1
## 283 284 285 286 287.18 288 290 294 295 297.08 299.88
## 1 2 1 1 1 1 9 1 3 1 1
## 300 303 304 305 308 310 312 315 320 322.2 324
## 573 1 1 3 1 6 1 4 13 1 3
## 325 327 328 330 331 335 337 340 343 345 346
## 31 1 1 13 1 1 1 8 1 1 1
## 347 348 350 352 354 355 357 358 359 360 363
## 1 2 168 1 1 2 1 1 1 9 1
## 365 367 370 373 375 378 380 381 382 382.5 385
## 3 1 4 1 17 1 12 1 1 2 2
## 390 400 405 410 411 411.14 413.5 415 420 425 430
## 1 312 1 3 1 1 1 1 4 9 1
## 431.72 432 435 437 438 440 442 445 450 451 459
## 1 1 1 2 1 3 1 1 74 1 1
## 460 460.56 463 465 467 470 475 476 478 479 480
## 3 1 1 4 2 3 6 2 1 2 7
## 482 485 490 493 499.92 499.96 500 500.01 500.4 501 502
## 2 3 2 1 1 1 2491 2 1 1 2
## 502.7 505 510 510.07 515 518 520 524 525 529 530
## 1 1 6 1 3 1 3 1 9 1 2
## 531.1 534 535 536 540 545 547 550 552 554 557
## 1 1 2 1 4 1 3 35 1 1 1
## 560 561 565 570 570.45 571 575 580 585 586 590
## 3 1 1 24 1 1 7 3 2 1 2
## 592 594 595 600 602 604 605 610 613 617 619
## 1 1 2 130 1 2 1 2 1 1 1
## 620 623 625 630 632 635 640 643 645 645.15 650
## 2 1 5 3 1 1 5 1 2 1 31
## 655 660 661 662 665 670 675 680 690 693.4 696.4
## 1 1 1 1 3 3 2 3 1 1 1
## 700 701 702 706 710 713 720 724 725 733.4 738
## 94 1 1 1 1 1 4 1 1 1 1
## 740 745 746 750 755 758.25 760 762 766 770 775
## 1 1 1 202 1 1 1 1 1 2 4
## 777.8 780 785 787 792.15 800 806.77 810 815 820 824
## 1 2 1 1 1 116 1 3 1 1 1
## 825 830 840 850 868.5 870 875 880 887 888 890
## 5 1 1 21 1 1 1 2 1 1 1
## 900 910 915 920 925 930 932 935 936 940 950
## 52 2 1 2 3 2 2 1 1 1 8
## 952.5 954 955 960 968 970 971 973 975 979 988
## 1 1 1 2 1 1 1 1 2 1 1
## 990 995 996
## 1 1 1
This goes through the same process removing entries where CROPDMGEXP = 0
SUBstormdata[(SUBstormdata$CROPDMGEXP == "K"),][,17] <- 1000
head(filter(SUBstormdata, SUBstormdata$CROPDMGEXP == "K"))
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 6/26/1994 0:00:00 2025 CST 83 LIMESTONE AL
## 2 1 3/24/1994 0:00:00 1410 CST 7 BIBB AL
## 3 1 3/24/1994 0:00:00 1425 CST 7 BIBB AL
## 4 1 11/28/1994 0:00:00 0030 CST 9 BLOUNT AL
## 5 1 3/24/1994 0:00:00 1804 CST 17 CHAMBERS AL
## 6 1 7/9/1995 0:00:00 1610 CST 19 CHEROKEE AL
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 1 THUNDERSTORM WINDS 0 0 5 M 500
## 2 THUNDERSTORM WINDS 0 0 50 K 50
## 3 THUNDERSTORM WINDS 0 2 500 K 50
## 4 TORNADO 0 5 500 K 5
## 5 TORNADO 0 0 500 K 50
## 6 THUNDERSTORM WINDS/HAIL 0 0 5 K 15
## CROPDMGEXP healthimpact propconv cropconv
## 1 K 0 1e+06 1000
## 2 K 0 1e+03 1000
## 3 K 2 1e+03 1000
## 4 K 5 1e+03 1000
## 5 K 0 1e+03 1000
## 6 K 0 1e+03 1000
SUBstormdata[(SUBstormdata$CROPDMGEXP == "M"),][,17] <- 1000000
head(filter(SUBstormdata, SUBstormdata$CROPDMGEXP == "M"))
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY
## 1 1 10/4/1995 0:00:00 1200 CST 0
## 2 1 8/3/1995 0:00:00 0900 CST 0
## 3 1 10/3/1995 0:00:00 1200 CST 0
## 4 1 7/3/1994 0:00:00 0000 CST 77
## 5 1 7/3/1994 0:00:00 0000 CST 77
## 6 1 3/27/1994 0:00:00 1500 CST 107
## COUNTYNAME STATE
## 1 ALZ001>050 AL
## 2 ALZ051 - 052 - 053 - 055 - 056 - 059 - 061 - 062 - 063 - 064 AL
## 3 ALZ051>064 AL
## 4 LAUDERDALE AL
## 5 LAUDERDALE AL
## 6 PICKENS AL
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 1 HURRICANE OPAL/HIGH WINDS 2 0 0.1 B 10
## 2 HURRICANE ERIN 0 0 25.0 M 1
## 3 HURRICANE OPAL 0 0 48.0 M 4
## 4 FLOODING 2 0 50.0 M 5
## 5 HEAVY RAIN 2 0 50.0 M 5
## 6 THUNDERSTORM WINDS 0 0 500.0 K 5
## CROPDMGEXP healthimpact propconv cropconv
## 1 M 2 1e+09 1e+06
## 2 M 0 1e+06 1e+06
## 3 M 0 1e+06 1e+06
## 4 M 2 1e+06 1e+06
## 5 M 2 1e+06 1e+06
## 6 M 0 1e+03 1e+06
SUBstormdata[(SUBstormdata$CROPDMGEXP == "B"),][,17] <- 1000000000
head(filter(SUBstormdata, SUBstormdata$CROPDMGEXP == "B"))
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY
## 1 1 8/20/1995 0:00:00 0000 CST 121
## 2 17 8/31/1993 0:00:00 0000 CST 1
## 3 19 8/1/1995 0:00:00 0000 CST 0
## 4 19 9/21/1995 0:00:00 2300 CST 0
## 5 28 2/9/1994 0:00:00 0000 CST 0
## 6 28 8/29/2005 0:00:00 08:00:00 AM CST 18
## COUNTYNAME
## 1 TALLADEGA
## 2 ADAMS, CALHOUN AND JERSEY
## 3 IAZ004>011 - 015 - 019 - 023>030 - 033>042 - 044>054 - 057>068 - 070>078 - 081>089 - 092>099
## 4 IAZ004>011 - 015>019 - 023>030 - 033>039 - 044>050 - 057>062 - 070>075 - 081>086 - 092>097
## 5 MSZ001 - 023 - 025 - 026 - 034
## 6 MSZ018>019 - 025>066 - 072>074
## STATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 1 AL HEAT 0 0 0.00 0.40
## 2 IL RIVER FLOOD 0 0 5.00 B 5.00
## 3 IA DROUGHT 0 0 0.00 0.50
## 4 IA FREEZE 0 0 0.00 0.20
## 5 MS ICE STORM 0 0 500.00 K 5.00
## 6 MS HURRICANE/TYPHOON 15 104 5.88 B 1.51
## CROPDMGEXP healthimpact propconv cropconv
## 1 B 0 0e+00 1e+09
## 2 B 0 1e+09 1e+09
## 3 B 0 0e+00 1e+09
## 4 B 0 0e+00 1e+09
## 5 B 0 1e+03 1e+09
## 6 B 119 1e+09 1e+09
SUBstormdata[!(SUBstormdata$CROPDMGEXP %in% c("K","M","B")),][,17] <- 0
head(SUBstormdata[!(SUBstormdata$CROPDMGEXP %in% c("K","M","B")),])
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP healthimpact
## 1 0 15 25.0 K 0 15
## 2 0 0 2.5 K 0 0
## 3 0 2 25.0 K 0 2
## 4 0 2 2.5 K 0 2
## 5 0 2 2.5 K 0 2
## 6 0 6 2.5 K 0 6
## propconv cropconv
## 1 1000 0
## 2 1000 0
## 3 1000 0
## 4 1000 0
## 5 1000 0
## 6 1000 0
After cleaning up the data frame we are now ready to calculate the prop and crop damange amount. This can be achieved by multiplying the propdmg and crop dmg values by propconv and cropconv respectively. After this we are able to compute the top events that had maximum prop and crop damage.
SUBstormdata$PROPDMGAMT <- 0
SUBstormdata$CROPDMGAMT <- 0
head(SUBstormdata)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP healthimpact
## 1 0 15 25.0 K 0 15
## 2 0 0 2.5 K 0 0
## 3 0 2 25.0 K 0 2
## 4 0 2 2.5 K 0 2
## 5 0 2 2.5 K 0 2
## 6 0 6 2.5 K 0 6
## propconv cropconv PROPDMGAMT CROPDMGAMT
## 1 1000 0 0 0
## 2 1000 0 0 0
## 3 1000 0 0 0
## 4 1000 0 0 0
## 5 1000 0 0 0
## 6 1000 0 0 0
Now that the variable columns have been created, we can run the analysis to find the top 20 events that had the most impact.
SUBstormdata$PROPDMGAMT <- SUBstormdata$propconv*SUBstormdata$PROPDMG
SUBstormdata$CROPDMGAMT <- SUBstormdata$cropconv*SUBstormdata$CROPDMG
grpbyevtype <- group_by(SUBstormdata,SUBstormdata$EVTYPE)
storm_prop_impact <- summarise(grpbyevtype,propimpact = sum(PROPDMGAMT))
## `summarise()` ungrouping output (override with `.groups` argument)
storm_crop_impact <- summarise(grpbyevtype,cropimpact = sum(CROPDMGAMT))
## `summarise()` ungrouping output (override with `.groups` argument)
storm_top20_propimpact<- head(arrange(storm_prop_impact,desc(propimpact)),20)
print(storm_top20_propimpact)
## # A tibble: 20 x 2
## `SUBstormdata$EVTYPE` propimpact
## <chr> <dbl>
## 1 FLOOD 144657709800
## 2 HURRICANE/TYPHOON 69305840000
## 3 TORNADO 56925660480
## 4 STORM SURGE 43323536000
## 5 FLASH FLOOD 16140811510
## 6 HAIL 15727366720
## 7 HURRICANE 11868319010
## 8 TROPICAL STORM 7703890550
## 9 WINTER STORM 6688497250
## 10 HIGH WIND 5270046260
## 11 RIVER FLOOD 5118945500
## 12 WILDFIRE 4765114000
## 13 STORM SURGE/TIDE 4641188000
## 14 TSTM WIND 4484928440
## 15 ICE STORM 3944927810
## 16 THUNDERSTORM WIND 3483121140
## 17 HURRICANE OPAL 3152846000
## 18 WILD/FOREST FIRE 3001829500
## 19 HEAVY RAIN/SEVERE WEATHER 2500000000
## 20 THUNDERSTORM WINDS 1733452850
storm_top20_cropimpact<- head(arrange(storm_crop_impact,desc(cropimpact)),20)
print(storm_top20_cropimpact)
## # A tibble: 20 x 2
## `SUBstormdata$EVTYPE` cropimpact
## <chr> <dbl>
## 1 DROUGHT 13972566000
## 2 FLOOD 5661968450
## 3 RIVER FLOOD 5029459000
## 4 ICE STORM 5022113500
## 5 HAIL 3025537450
## 6 HURRICANE 2741910000
## 7 HURRICANE/TYPHOON 2607872800
## 8 FLASH FLOOD 1421317100
## 9 EXTREME COLD 1292973000
## 10 FROST/FREEZE 1094086000
## 11 HEAVY RAIN 733399800
## 12 TROPICAL STORM 678346000
## 13 HIGH WIND 638571300
## 14 TSTM WIND 554007350
## 15 EXCESSIVE HEAT 492402000
## 16 FREEZE 446225000
## 17 TORNADO 414953110
## 18 THUNDERSTORM WIND 414843050
## 19 HEAT 401461500
## 20 WILDFIRE 295472800
To compute the events that had the most economic impact, I added up the property and crop damage.
SUBstormdata$ECONtot <- 0
SUBstormdata$ECONtot <- SUBstormdata$PROPDMGAMT + SUBstormdata$CROPDMGAMT
grpbyevtype <- group_by(SUBstormdata,SUBstormdata$EVTYPE)
storm_econ_impact<- summarise(grpbyevtype,econimpact = sum(ECONtot))
## `summarise()` ungrouping output (override with `.groups` argument)
top20_econ <-head(arrange(storm_econ_impact, desc(econimpact)),20)
Overall, the table below shows the top 20 events are are most harmful with respect to population health.
print(SUBevtypes_top20)
## # A tibble: 20 x 2
## `SUBstormdata$EVTYPE` healthimpact
## <chr> <dbl>
## 1 TORNADO 96979
## 2 EXCESSIVE HEAT 8428
## 3 TSTM WIND 7461
## 4 FLOOD 7259
## 5 LIGHTNING 6046
## 6 HEAT 3037
## 7 FLASH FLOOD 2755
## 8 ICE STORM 2064
## 9 THUNDERSTORM WIND 1621
## 10 WINTER STORM 1527
## 11 HIGH WIND 1385
## 12 HAIL 1376
## 13 HURRICANE/TYPHOON 1339
## 14 HEAVY SNOW 1148
## 15 WILDFIRE 986
## 16 THUNDERSTORM WINDS 972
## 17 BLIZZARD 906
## 18 FOG 796
## 19 RIP CURRENT 600
## 20 WILD/FOREST FIRE 557
The table below shows the top 20 events with the biggest property damage.
print(storm_top20_propimpact)
## # A tibble: 20 x 2
## `SUBstormdata$EVTYPE` propimpact
## <chr> <dbl>
## 1 FLOOD 144657709800
## 2 HURRICANE/TYPHOON 69305840000
## 3 TORNADO 56925660480
## 4 STORM SURGE 43323536000
## 5 FLASH FLOOD 16140811510
## 6 HAIL 15727366720
## 7 HURRICANE 11868319010
## 8 TROPICAL STORM 7703890550
## 9 WINTER STORM 6688497250
## 10 HIGH WIND 5270046260
## 11 RIVER FLOOD 5118945500
## 12 WILDFIRE 4765114000
## 13 STORM SURGE/TIDE 4641188000
## 14 TSTM WIND 4484928440
## 15 ICE STORM 3944927810
## 16 THUNDERSTORM WIND 3483121140
## 17 HURRICANE OPAL 3152846000
## 18 WILD/FOREST FIRE 3001829500
## 19 HEAVY RAIN/SEVERE WEATHER 2500000000
## 20 THUNDERSTORM WINDS 1733452850
This table shows the top 20 events with the biggest crop damage.
print(storm_top20_cropimpact)
## # A tibble: 20 x 2
## `SUBstormdata$EVTYPE` cropimpact
## <chr> <dbl>
## 1 DROUGHT 13972566000
## 2 FLOOD 5661968450
## 3 RIVER FLOOD 5029459000
## 4 ICE STORM 5022113500
## 5 HAIL 3025537450
## 6 HURRICANE 2741910000
## 7 HURRICANE/TYPHOON 2607872800
## 8 FLASH FLOOD 1421317100
## 9 EXTREME COLD 1292973000
## 10 FROST/FREEZE 1094086000
## 11 HEAVY RAIN 733399800
## 12 TROPICAL STORM 678346000
## 13 HIGH WIND 638571300
## 14 TSTM WIND 554007350
## 15 EXCESSIVE HEAT 492402000
## 16 FREEZE 446225000
## 17 TORNADO 414953110
## 18 THUNDERSTORM WIND 414843050
## 19 HEAT 401461500
## 20 WILDFIRE 295472800
Finally, this last table shows the total economic impact, combining the property and crop financial damage.
print(top20_econ)
## # A tibble: 20 x 2
## `SUBstormdata$EVTYPE` econimpact
## <chr> <dbl>
## 1 FLOOD 150319678250
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57340613590
## 4 STORM SURGE 43323541000
## 5 HAIL 18752904170
## 6 FLASH FLOOD 17562128610
## 7 DROUGHT 15018672000
## 8 HURRICANE 14610229010
## 9 RIVER FLOOD 10148404500
## 10 ICE STORM 8967041310
## 11 TROPICAL STORM 8382236550
## 12 WINTER STORM 6715441250
## 13 HIGH WIND 5908617560
## 14 WILDFIRE 5060586800
## 15 TSTM WIND 5038935790
## 16 STORM SURGE/TIDE 4642038000
## 17 THUNDERSTORM WIND 3897964190
## 18 HURRICANE OPAL 3161846000
## 19 WILD/FOREST FIRE 3108626330
## 20 HEAVY RAIN/SEVERE WEATHER 2500000000
This visualization includes the Top 20 Overall Events and the Top 20 Economic Impact Events.
par(mfrow = c(1, 2))
barplot(SUBevtypes_top20$healthimpact, main = "Top 20 Overall Events",
names.arg = SUBevtypes_top20$`SUBstormdata$EVTYPE`, cex.names = 0.5,
col=color,
ylab = "Event Type Count")
barplot(top20_econ$econimpact, main="Top 20 Economic Impact Events",
names.arg = top20_econ$`SUBstormdata$EVTYPE`, cex.names=0.6,
col=color,
xlab = "Event Type Count")