Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
This report uses the above data to answer the following questions: “Across the United States, which types of events are most harmful with respect to population health?” and “Across the United States, which types of events have the greatest economic consequences?”. This was done first by creating a bar plot showing number of injuries and fatalities for each weather event in order to address the first question. To address the second question another bar plot was creating showing amount of property and crop damage for each weather event.
library(ggplot2)
library(reshape2)
## Warning: package 'reshape2' was built under R version 4.5.3
library(knitr)
data <- read.csv("repdata_data_StormData.csv")
head(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
summary(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
##
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0.0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31.0 Class :character Class :character Class :character
## Median : 75.0 Mode :character Mode :character Mode :character
## Mean :100.6
## 3rd Qu.:131.0
## Max. :873.0
##
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE
## Min. : 0.000 Length:902297 Length:902297 Length:902297
## 1st Qu.: 0.000 Class :character Class :character Class :character
## Median : 0.000 Mode :character Mode :character Mode :character
## Mean : 1.484
## 3rd Qu.: 1.000
## Max. :3749.000
##
## END_TIME COUNTY_END COUNTYENDN END_RANGE
## Length:902297 Min. :0 Mode:logical Min. : 0.0000
## Class :character 1st Qu.:0 NA's:902297 1st Qu.: 0.0000
## Mode :character Median :0 Median : 0.0000
## Mean :0 Mean : 0.9862
## 3rd Qu.:0 3rd Qu.: 0.0000
## Max. :0 Max. :925.0000
##
## END_AZI END_LOCATI LENGTH WIDTH
## Length:902297 Length:902297 Min. : 0.0000 Min. : 0.000
## Class :character Class :character 1st Qu.: 0.0000 1st Qu.: 0.000
## Mode :character Mode :character Median : 0.0000 Median : 0.000
## Mean : 0.2301 Mean : 7.503
## 3rd Qu.: 0.0000 3rd Qu.: 0.000
## Max. :2315.0000 Max. :4400.000
##
## F MAG FATALITIES INJURIES
## Min. :0.00 Min. : 0.0 Min. : 0.00000 Min. : 0.0000
## 1st Qu.:0.00 1st Qu.: 0.0 1st Qu.: 0.00000 1st Qu.: 0.0000
## Median :1.00 Median : 50.0 Median : 0.00000 Median : 0.0000
## Mean :0.91 Mean : 46.9 Mean : 0.01678 Mean : 0.1557
## 3rd Qu.:1.00 3rd Qu.: 75.0 3rd Qu.: 0.00000 3rd Qu.: 0.0000
## Max. :5.00 Max. :22000.0 Max. :583.00000 Max. :1700.0000
## NA's :843563
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0.00 Length:902297 Min. : 0.000 Length:902297
## 1st Qu.: 0.00 Class :character 1st Qu.: 0.000 Class :character
## Median : 0.00 Mode :character Median : 0.000 Mode :character
## Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :5000.00 Max. :990.000
##
## WFO STATEOFFIC ZONENAMES LATITUDE
## Length:902297 Length:902297 Length:902297 Min. : 0
## Class :character Class :character Class :character 1st Qu.:2802
## Mode :character Mode :character Mode :character Median :3540
## Mean :2875
## 3rd Qu.:4019
## Max. :9706
## NA's :47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## Min. :-14451 Min. : 0 Min. :-14455 Length:902297
## 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 8707 Median : 0 Median : 0 Mode :character
## Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. : 17124 Max. :9706 Max. :106220
## NA's :40
## REFNUM
## Min. : 1
## 1st Qu.:225575
## Median :451149
## Mean :451149
## 3rd Qu.:676723
## Max. :902297
##
data_sub = subset(data, select = c("EVTYPE","FATALITIES","INJURIES","PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))
#Data Prepping
events <- data_sub$EVTYPE
events_factors <- factor(events)
fatalities <- data_sub$FATALITIES
injuries <- data_sub$INJURIES
fatalities_sum <- aggregate(fatalities, list(events_factors), sum)
injuries_sum <- aggregate(injuries, list(events_factors), sum)
names(fatalities_sum) <- c("Event", "Count"); names(injuries_sum) <- c("Event", "Count")
pop_health <- data.frame(fatalities_sum$Event, injuries_sum$Count, fatalities_sum$Count)
names(pop_health) <- c("Event", "Injuries", "Fatalities")
pop_health <- pop_health[with(pop_health, order(-Injuries, -Fatalities)), ][1:10,]
pop_health<- melt(pop_health, id.vars = "Event")
ggplot(data=pop_health, aes(x=Event, y=value, fill=variable)) + geom_bar(stat="identity") + labs(title = "Harmful Weather Measured by Fatalities & Injuries 1950 - 2011", y = "Number of People", x = "Weather Event", fill = "Harm") + theme(axis.text.x = element_text(angle = 45, hjust = 1))
We can see in the plot that tornadoes are significantly more harmful form of weather compared to all other hearmful weather events.
#Data Prepping
convertUnits <- function(coeff, expon){
if (is.na(expon)){
as.numeric(coeff)
}
else if (toupper(expon)== "K"){
as.numeric(coeff)*10^3
}
else if (toupper(expon) == "M"){
as.numeric(coeff)*10^6
}
else if (toupper(expon)== "B"){
as.numeric(coeff)*10^9
}
else{
as.numeric(coeff)
}
}
property_dam <- apply(data_sub[, c('PROPDMG', 'PROPDMGEXP')], 1, function(y) convertUnits(y['PROPDMG'], y['PROPDMGEXP']))
crop_dam <- apply(data_sub[, c('CROPDMG', 'CROPDMGEXP')], 1, function(y) convertUnits(y['CROPDMG'], y['CROPDMGEXP']))
property_dam_sum <- aggregate(property_dam, list(events_factors), sum)
crop_dam_sum <- aggregate(crop_dam, list(events_factors), sum)
names(property_dam_sum) <- c("Event", "Count"); names(crop_dam_sum) <- c("Event", "Count")
economics <- data.frame(property_dam_sum$Event, crop_dam_sum$Count, property_dam_sum$Count)
names(economics) <- c("Event", "Crop_Damage", "Property_Damage")
economics <- economics[with(economics, order(-Crop_Damage, -Property_Damage)), ][1:10,]
economics <- melt(economics, id.vars = "Event")
ggplot(data=economics, aes(x=Event, y=value/10^9, fill=variable)) + geom_bar(stat="identity") + labs(title = "Harmful Weather Measured by Property & Crop Damage", y = "Number of People", x = "Weather Event", fill = "Damage") + theme(axis.text.x = element_text(angle = 45, hjust = 1))
Looking at the plot, floods and hurricanes/typhoons have the largest impact on property damage and droughts have the largest impact on crop damage.