This report tends to describe severe weather events which caused the greatest negative impact on the economy and public health across the united states between the years 1996 and 2011. The data for this project is obtained from the U.S. National Oceanic and Atmospheric Administration’s(NOAA) storm database. The dataset of concern/interest for us is the one showing the estimates of any fatalities, injuries and property and crop damage. The data obtained was specific from 1996 to 2011 because multiple weather patterns begun being recorded as from this date. From the analysis of the data, we identified the top ten weather event types from each variable of interest and the result is summarised in graphical form.
The data is obtained from the NOAA database, and we specifically obtained the 1996 to 2011 data files. This data contains the characteristics of major storms and weather events in the United States, as well as estimates of any fatalities, injuries, and property/crop damage.
The data is in the form of .CSV file compressed via the bzip2 algorithm to reduce its size. Download the file into your working directory and unzip it ready for analysis.
stormdata <- read.csv("repdata_data_StormData.csv.bz2")
dim(stormdata)
## [1] 902297 37
Check the first few rows to get a feel of the dataset. (There are 902297 records on 37 variables). We convert the variable names to lower case for easier handling of the data.
head(stormdata)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
names(stormdata) <- tolower(names(stormdata))
names(stormdata)
## [1] "state__" "bgn_date" "bgn_time" "time_zone" "county"
## [6] "countyname" "state" "evtype" "bgn_range" "bgn_azi"
## [11] "bgn_locati" "end_date" "end_time" "county_end" "countyendn"
## [16] "end_range" "end_azi" "end_locati" "length" "width"
## [21] "f" "mag" "fatalities" "injuries" "propdmg"
## [26] "propdmgexp" "cropdmg" "cropdmgexp" "wfo" "stateoffic"
## [31] "zonenames" "latitude" "longitude" "latitude_e" "longitude_"
## [36] "remarks" "refnum"
We will subset the dataset to the variables which we are interested in.
stormdata1 <- stormdata[, c(2,8,23:28)]
head(stormdata1)
## bgn_date evtype fatalities injuries propdmg propdmgexp
## 1 4/18/1950 0:00:00 TORNADO 0 15 25.0 K
## 2 4/18/1950 0:00:00 TORNADO 0 0 2.5 K
## 3 2/20/1951 0:00:00 TORNADO 0 2 25.0 K
## 4 6/8/1951 0:00:00 TORNADO 0 2 2.5 K
## 5 11/15/1951 0:00:00 TORNADO 0 2 2.5 K
## 6 11/15/1951 0:00:00 TORNADO 0 6 2.5 K
## cropdmg cropdmgexp
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
Our dataset will only include data from 1996 to 2011. Therefore we will subset our data begining Jan, 1996, by first transforming the bgn_date variable to Date form/class using the POSIXct() function as follows.
class(stormdata1$bgn_date)
## [1] "factor"
stormdata1$bgn_date <- as.POSIXct(strptime(stormdata1$bgn_date, "%m/%d/%Y"))
stormdata2 <- subset(stormdata1, stormdata1$bgn_date > "1995-12-31", select = c(2:8))
dim(stormdata2)
## [1] 653530 7
The dataset now has 653530 records of 7 variables that will be used for analysis. Now we will process and analyse the data inorder to obtain the total population health impact and economic impact. The data will be analysed in relation to the weather event type
# aggregate data by event type
fatal <- aggregate(fatalities ~ evtype, stormdata2, sum)
injury <- aggregate(injuries ~ evtype, stormdata2, sum)
# property damage data
stormdata2$propdmgexp <- gsub("\\ |\\-|\\+|\\?", "0", stormdata2$propdmgexp)
stormdata2$propdmgexp <- gsub("[Hh]", "2", stormdata2$propdmgexp)
stormdata2$propdmgexp <- gsub("[Mm]", "3", stormdata2$propdmgexp)
stormdata2$propdmgexp <- gsub("[Kk]", "6", stormdata2$propdmgexp)
stormdata2$propdmgexp <- gsub("[Bb]", "9", stormdata2$propdmgexp)
stormdata2$propdmgexp <- as.numeric(stormdata2$propdmgexp)
head(stormdata2$propdmgexp)
## [1] 6 6 6 6 6 NA
sum(is.na(stormdata2$propdmgexp))
## [1] 276185
stormdata2$propdmgexp[is.na(stormdata2$propdmgexp)] <- 0
# compute total property damage value
stormdata2$total_propdmg <- stormdata2$propdmg * (10 ^ stormdata2$propdmgexp)
# crop damage data
stormdata2$cropdmgexp <- gsub("\\ |\\-|\\+|\\?", "0", stormdata2$cropdmgexp)
stormdata2$cropdmgexp <- gsub("[Hh]", "2", stormdata2$cropdmgexp)
stormdata2$cropdmgexp <- gsub("[Mm]", "3", stormdata2$cropdmgexp)
stormdata2$cropdmgexp <- gsub("[Kk]", "6", stormdata2$cropdmgexp)
stormdata2$cropdmgexp <- gsub("[Bb]", "9", stormdata2$cropdmgexp)
stormdata2$cropdmgexp <- as.numeric(stormdata2$cropdmgexp)
head(stormdata2$cropdmgexp)
## [1] 6 NA NA NA NA NA
sum(is.na(stormdata2$cropdmgexp))
## [1] 373069
stormdata2$cropdmgexp[is.na(stormdata2$cropdmgexp)] <- 0
# compute total crop damage value
stormdata2$total_cropdmg <- stormdata2$cropdmg * (10 ^ stormdata2$cropdmgexp)
## aggregate the economic impact data by event type
propdmg <- aggregate(total_propdmg ~ evtype, stormdata2, sum)
cropdmg <- aggregate(total_cropdmg ~ evtype, stormdata2, sum)
Across the United States, whic types of events are most harmful with respect to population health?
# get top10 event with highest fatalities
fatal10 <- fatal[order(-fatal$fatalities), ][1:10, ]
print(fatal10)
## evtype fatalities
## 81 EXCESSIVE HEAT 1797
## 426 TORNADO 1511
## 98 FLASH FLOOD 887
## 224 LIGHTNING 651
## 102 FLOOD 414
## 300 RIP CURRENT 340
## 434 TSTM WIND 241
## 147 HEAT 237
## 177 HIGH WIND 235
## 16 AVALANCHE 223
# get top10 event with highest injuries
injury10 <- injury[order(-injury$injuries), ][1:10, ]
print(injury10)
## evtype injuries
## 426 TORNADO 20667
## 102 FLOOD 6758
## 81 EXCESSIVE HEAT 6391
## 224 LIGHTNING 4141
## 434 TSTM WIND 3629
## 98 FLASH FLOOD 1674
## 421 THUNDERSTORM WIND 1400
## 507 WINTER STORM 1292
## 185 HURRICANE/TYPHOON 1275
## 147 HEAT 1222
## plot a graph showing the results
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(fatal10$fatalities, las = 3, names.arg = fatal10$evtype, main = "Weather Events With The Top 10 Highest Fatalities",
ylab = "number of fatalities", col = "red")
barplot(injury10$injuries, las = 3, names.arg = injury10$evtype, main = "Weather Events With the Top 10 Highest Injuries",
ylab = "number of injuries", col = "blue")
From the plots above, we see that Excessive Heat and Tornado are the leading event types causing most fatalities and Tornado and floods cause most injuries to the populations across the U.S.
Across the United States, which types of events have the greatest economic consequences?
# get top 10 events with highest property damage
propdmg10 <- propdmg[order(-propdmg$total_propdmg), ][1:10, ]
propdmg10
## evtype total_propdmg
## 434 TSTM WIND 1.327560e+12
## 98 FLASH FLOOD 1.235587e+12
## 426 TORNADO 1.175044e+12
## 102 FLOOD 9.266942e+11
## 421 THUNDERSTORM WIND 8.597370e+11
## 142 HAIL 5.648957e+11
## 224 LIGHTNING 4.883073e+11
## 177 HIGH WIND 3.127640e+11
## 507 WINTER STORM 1.255047e+11
## 161 HEAVY SNOW 8.884809e+10
# get top 10 events with highest crop damage
cropdmg10 <- cropdmg[order(-cropdmg$total_cropdmg), ][1:10, ]
cropdmg10
## evtype total_cropdmg
## 142 HAIL 496361429670
## 98 FLASH FLOOD 159892875010
## 102 FLOOD 147003227780
## 434 TSTM WIND 108665795250
## 426 TORNADO 89935203490
## 421 THUNDERSTORM WIND 66331332000
## 63 DROUGHT 21958346620
## 177 HIGH WIND 16651916910
## 154 HEAVY RAIN 10260517910
## 122 FROST/FREEZE 5947088140
## plot a graph showing the results
barplot(propdmg10$total_propdmg, las = 3, names.arg = propdmg10$evtype, main = "Events With Highest Property Damage",
ylab = "Total Property damage", col = "red")
barplot(cropdmg10$total_cropdmg, las = 3, names.arg = cropdmg10$evtype, main = "Events With Highest Crop damage",
ylab = "Total Crop damage", col = "blue")
The plots above show that TSTM wind and Flash floods caused the highest property damage, while Hail and Flash floods caused the highest Crop damages.