The weather data from the U.S National Oceanic and Atmospheri Administration’s archives, covers the period 1950s to date. However the weather data monitored was limited prior to 1996: therefore the following assessment only covers the 1996 onward period. It is worth noting that the majority of property damage (having the highest economic impact), result from flooding, avalanche, smoke, hail, and high-wind events (both tornadoes and thunderstorm winds), however by far the largest damage to agriculture is drought. The majority of fatalities were caused by exessive heat and tornadoes.
As mentioned, the data was filtered to look at 1996 onwards. thereafter the “EVTYPE” (Event type) was matched against the recognized list of event types. Thereafter the events true value was calculated and compared.
# Loading and processing data:
file <- "repdata_data_StormData.csv.bz2" # Zipped file
# system.time(bunzip2(file, remove = FALSE)) # Check speed for loading file, and unzip
file2 <- "repdata_data_StormData.csv" # Unzipped file name
raw_data <- read.csv(file2,header = TRUE, sep = ",") # Load data
file_eventname <- "ExtremeEvent_list.txt" # Event list from NOAA
eventname <- read.table(file_eventname, sep = ",") # Loading list
# Subsetting the data:
raw_data$Date2 <- as.Date(as.character(raw_data$BGN_DATE), "%m/%d/%Y %H:%M:%S") #creating a new column of corrected dates
data1996 <- raw_data %>% select(STATE,EVTYPE,FATALITIES,INJURIES,PROPDMG, PROPDMGEXP,CROPDMG,CROPDMGEXP,Date2) %>%
filter(format(as.Date(Date2),"%Y") >1995) #filtering for required columns, and greater than 1995
data1996_prop <- filter(data1996, data1996$PROPDMG>0) # remove rows that are zero
data1996_crop <- filter(data1996, data1996$CROPDMG>0) # remove rows that are zero
data1996_people <- filter(data1996, data1996$FATALITIES>0 | data1996$INJURIES>0) # remove rows that have no people impact
for (i in 1:nrow(data1996_prop)) {
if (data1996_prop$PROPDMGEXP[i] == "B" | data1996_prop$PROPDMGEXP[i] == "b") {
data1996_prop$Total[i] = data1996_prop$PROPDMG[i]*10^9 }
else if (data1996_prop$PROPDMGEXP[i] == "m" | data1996_prop$PROPDMGEXP[i] == "M") {
data1996_prop$Total[i] = data1996_prop$PROPDMG[i]*10^6 }
else if (data1996_prop$PROPDMGEXP[i] == "k" | data1996_prop$PROPDMGEXP[i] == "K") {
data1996_prop$Total[i] = data1996_prop$PROPDMG[i]*10^3 }
else if (data1996_prop$PROPDMGEXP[i] == "h" | data1996_prop$PROPDMGEXP[i] == "H") {
data1996_prop$Total[i] = data1996_prop$PROPDMG[i]*10^2 }
else {data1996_prop$Total[i] = data1996_prop$PROPDMG[i]*10^(as.numeric(data1996_prop$PROPDMGEXP[i]))}
} #Assigning total number to amount for property damage
for (i in 1:nrow(data1996_crop)) {
if (data1996_crop$CROPDMGEXP[i] == "B" | data1996_crop$CROPDMGEXP[i] == "b") {
data1996_crop$Total[i] = data1996_crop$CROPDMG[i]*10^9 }
else if (data1996_crop$CROPDMGEXP[i] == "m" | data1996_crop$CROPDMGEXP[i] == "M") {
data1996_crop$Total[i] = data1996_crop$CROPDMG[i]*10^6 }
else if (data1996_crop$CROPDMGEXP[i] == "k" | data1996_crop$CROPDMGEXP[i] == "K") {
data1996_crop$Total[i] = data1996_crop$CROPDMG[i]*10^3 }
else if (data1996_crop$CROPDMGEXP[i] == "h" | data1996_crop$CROPDMGEXP[i] == "H") {
data1996_crop$Total[i] = data1996_crop$CROPDMG[i]*10^2 }
else {data1996_crop$Total[i] = data1996_crop$CROPDMG[i]*10^(as.numeric(data1996_crop$CROPDMGEXP[i]))}
} #Assigning total number to amount for Crop damage
maxProp <- data1996_prop[which.max(data1996_prop$Total),]
maxCrop <- data1996_crop[which.max(data1996_crop$Total),]
MatchProp <- amatch(data1996_prop$EVTYPE,eventname$V1,method='osa',maxDist = 25) #matching names to events column
for (i in 1:nrow(data1996_prop)) { data1996_prop$NewName[i] <- eventname[MatchProp[i],1] } # Reassigning names to storms
MatchCrop <- amatch(data1996_crop$EVTYPE,eventname$V1,method='osa',maxDist=25) # matching names to events column
for (i in 1:nrow(data1996_crop)) { data1996_crop$NewName[i] <- eventname[MatchCrop[i],1] } # Reassigning names to storms
Matchpeople <- amatch(data1996_people$EVTYPE,eventname$V1,method='osa',maxDist=25) # matching names to events column
for (i in 1:nrow(data1996_people)) { data1996_people$NewName[i] <- eventname[Matchpeople[i],1] } # Reassigning names to storms
It is clear from the tables below that while both the property damage and the agricultural damage are impacted by flooding, the impact to the agricultural sector by drought is significant.
# filter on top offenders:
data_prop_sort <- data1996_prop %>% group_by(NewName) %>%
summarise(total = sum(Total))
data_crop_sort <- data1996_crop %>% group_by(NewName) %>%
summarise(total = sum(Total))
bigOff_prop <- data_prop_sort[order(data_prop_sort$total),] #order proerty damage
bigOff_crop <- data_crop_sort[order(data_crop_sort$total),] #order agric damage
# Bar graph on economic loss:
p1 <- ggplot(aes(x= NewName, y = Total), data = data1996_prop) +
geom_bar(position = "stack", stat = "identity") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "none") +
ggtitle("Financial Property damage per environmental catastrophy") +labs( x = "Event Type")
p2 <- ggplot(aes(x= NewName, y = Total), data = data1996_crop) +
geom_bar(position = "stack", stat = "identity") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "bottom") +
ggtitle("Financial Agricultural damage per environmental catastrophy") +labs( x = "Event Type")
grid.arrange(p1,p2,nrow = 2)
The majority of fatalities occur from severe heat, with a close seond being tornadoes, while the vast majority of injuries result from tornadoes. This ties in with the heavy property damage seen in the same severe weather.
# Fatalities:
data_people_sort <- data1996_people %>% group_by(NewName) %>% summarise(Fatalities = sum(FATALITIES))
bigOff_people <- data_people_sort[order(-data_people_sort$Fatalities),] #order fatalities
# Injuries:
data_people_sort2 <- data1996_people %>% group_by(NewName) %>% summarise(Injury = sum(INJURIES))
MinOff_people <- data_people_sort2[order(-data_people_sort2$Injury),]
# Bar graph on People loss:
c1 <- ggplot(aes(x= NewName, y = Fatalities), data = bigOff_people) +
geom_bar(position = "stack", stat = "identity") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
ggtitle("Fatalities per environmental catastrophy")+labs( x = "Event Type")
c2 <- ggplot(aes(x= NewName, y = Injury), data = MinOff_people) +
geom_bar(position = "stack", stat = "identity") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position = "bottom") +
ggtitle("Injuries per environmental catastrophy")+labs( x = "Event Type")
grid.arrange(c1,c2,nrow = 2)
The below table outlines the fatalities and injuries per state. Clearly Texas is the place most likely to kill you.
data_people_sum <- data1996_people %>% group_by(STATE) %>%summarise(Fatality = sum(FATALITIES), Injury = sum(INJURIES))
data_people_sum <- data_people_sum[order(data_people_sum$Fatality),]
kable(data_people_sum,caption = "Fatalities and Injuries by State")
| STATE | Fatality | Injury |
|---|---|---|
| GM | 1 | 0 |
| LS | 1 | 0 |
| PH | 1 | 0 |
| LM | 4 | 2 |
| PZ | 5 | 3 |
| RI | 6 | 25 |
| VI | 7 | 2 |
| AM | 10 | 30 |
| AN | 12 | 23 |
| DC | 13 | 369 |
| VT | 19 | 41 |
| ME | 22 | 130 |
| DE | 24 | 255 |
| NH | 24 | 139 |
| HI | 33 | 81 |
| MA | 34 | 687 |
| CT | 35 | 172 |
| SD | 36 | 473 |
| AS | 41 | 164 |
| ND | 41 | 265 |
| ID | 42 | 173 |
| NE | 42 | 350 |
| MT | 52 | 150 |
| WY | 52 | 309 |
| IA | 61 | 984 |
| NM | 61 | 168 |
| AK | 62 | 104 |
| WV | 67 | 124 |
| MN | 72 | 513 |
| OR | 72 | 201 |
| GU | 81 | 416 |
| NV | 89 | 205 |
| MI | 110 | 1195 |
| WI | 110 | 806 |
| PR | 111 | 50 |
| VA | 114 | 902 |
| KY | 117 | 850 |
| WA | 119 | 258 |
| UT | 130 | 979 |
| SC | 131 | 559 |
| IN | 133 | 835 |
| KS | 140 | 845 |
| MD | 141 | 1293 |
| LA | 144 | 812 |
| CO | 147 | 662 |
| NJ | 147 | 936 |
| OH | 158 | 895 |
| GA | 160 | 1666 |
| MS | 160 | 1217 |
| AZ | 175 | 635 |
| OK | 219 | 2375 |
| AR | 228 | 1656 |
| NC | 263 | 1378 |
| NY | 268 | 908 |
| TN | 327 | 2385 |
| AL | 449 | 3707 |
| PA | 492 | 1450 |
| CA | 498 | 2769 |
| MO | 533 | 5960 |
| FL | 544 | 2884 |
| IL | 586 | 1328 |
| TX | 756 | 9222 |
Therefore its possible to conclude that the severest weather events in relation to property are flooding, hail and high wind, while the greatest impact to agriculture is drought. It is worth noting that the majority of events are not considered massively impactful, however there are the occasional event that results in significant damage and impact. Such as: CA, FLOOD, 0, 0, 115, B, 32.5, M, 2006-01-01, 1.15^{11} This event resulted in significant cost. The most significant impacts to human health are heat as well as tornadoes, that had the highest impact on injuries.