Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Questions:
This data analysis addresses the following questions:
1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
2. Across the United States, which types of events have the greatest economic consequences?
The analysis below shows that:
The biggest number of casualties as well as injuries is caused by Tornados. The biggest economical impact is caused by heavy winds and floods.
The information is loaded from here
There are over 900 different event types in the available dataset. Some have different names, however some of them represent the same events. For example: “Thundrestorm” and “TSM”, or “Strong wind” and “Strong WND”. In order to ensure that similar events are calculated together, the all events have been classified into 8 types of events:
1. Flood
2. Hail
3. Heavy rain / heavy snow
4. Heavy wind
5. Snow
6. Thunderstorm
7. Tornado
8. Other
Disclaimer. Categorization of these event types lays beyond the scope of the exercise. It was performed using hclust method. Results of this clasorization had been stored in dsEvents.csv file available here.
To prepare data, we perform the following steps:
1. Download it.
2. Unpack and read it.
3. Format dates and specify a special column for year
strSource <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
strName <- "data.csv.bz2"
# if ( !file.exists(strName) ){
# download.file(strSource,strName)
# }
if ( !exists("dsSource") ){
dsSource <- read.csv(strName)
}
ds <- dsSource[!is.na(dsSource$BGN_DATE),c("BGN_DATE",
"EVTYPE",
"FATALITIES",
"INJURIES",
"PROPDMG",
"CROPDMG")]
ds$BGN_DATE <- as.Date(ds$BGN_DATE,format = "%m/%d/%Y")
ds$YEAR <- year(ds$BGN_DATE)
dsEvents <- read.csv2("dsEvents.csv")
Number of people injured or dead because of each type of these events is shown below (number of injured people is shown in red and dead people in blue).
dsTemp <- merge(ds,dsEvents, by.x = "EVTYPE",by.y = "Event", all.x = T)
dsTemp <- dsTemp %>%
select(FATALITIES,INJURIES,YEAR,Group.name) %>%
filter(!is.na(Group.name)&!is.na(YEAR)) %>%
group_by(Group.name,YEAR) %>%
summarise(Injuries = sum(INJURIES),Fatalities = sum(FATALITIES))
ggplot(dsTemp, aes(YEAR))+
geom_line(aes(y=log10(Injuries)),colour="red",show.legend = T)+
geom_line(aes(y=log10(Fatalities)),colour="blue",show.legend = T)+
scale_color_discrete(name = "Impacts", labels = c("Injuries", "Fatalities"))+
facet_wrap(~Group.name)
From this chart we can clearly see that the biggest number of casualties as well as injuries is caused by Tornados.
There are 2 types of damage that a given event may cause:
1. Damage to the property
2. Damage to Crops
Based on this we can calculate amount of total impact by summing these values. The chart below shows amount of economical empact per each type of event:
dsTemp <- merge(ds,dsEvents, by.x = "EVTYPE",by.y = "Event", all.x = T)
dsTemp <- dsTemp %>%
select(PROPDMG,CROPDMG,YEAR,Group.name) %>%
mutate(DMG = PROPDMG+CROPDMG) %>%
filter(!is.na(Group.name)&!is.na(YEAR)) %>%
group_by(Group.name,YEAR) %>%
summarise(IMPACT = sum(DMG))
ggplot(dsTemp, aes(YEAR))+
geom_line(aes(y=IMPACT/1000),colour="red",show.legend = T)+
facet_wrap(~Group.name)
From the cart we can see that event linked to flood and heavy wing have the greatest economical impact.