title: Weather-related events: an analysis of public health and economic loss author: “Arouna” date: “15 décembre 2015” output: html_document —
This anaysis is about the study of the public health impact as well as economic loss due to severe Weather events in the United States over the period from 1950 to 2011. The data used in this study come from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.The data can be downloaded from the sourse web site. For more information you can go to the following urls:
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
Here are the principal results: tornadoes are most harmful with respect to population health causing the greatest number of fatalities and the greatest number of injuries. Concerning the economic consequences, floods are weather events which cause the geatest economic loss. .
library(data.table)
library(dplyr)
library(ggplot2)
Here we decide to load the dataset directly from the source url where the file is stored in the web. This strategy ensure that there is no manual manipulation of the data source.
resourceURL <- "http://d396qusza40orc.cloudfront.net/repdata/data/StormData.csv.bz2"
downlodedfile <- "StormData.csv.bz2"
if (!file.exists(downlodedfile)) {
download.file(resourceURL, destfile = downlodedfile)
}
data <- read.table('StormData.csv.bz2', header = TRUE, sep = ',')
The initial dataset contains 902297 observations and 37 variables. We do not need all these observations and variables in our analysis.
we decicide to retain only 8 variables of interest from the initial dataset with 37 variables. These variables are:BGN_DATE, EVTYPE, FATALITIES,INJURIES,PROPDMG,PROPDMGEXP, CROPDMG and CROPDMGEXP. We the create a new date EVDATE variable based on the character variable –BGN_DATE–.
Amongs all 902297 observations, we decide to exclude those observations which do not contains any value for four variables of interest in this study. These variables are: FATALITIES,INJURIES,PROPDMG,PROPDMGEXP and CROPDMG. The number of observations is reduced to 254633.
The original data frame containing 902297 observations and 902297 variables is then removed from the memory.
mydata <-data %>% mutate(EVDATE = as.Date(BGN_DATE, format = "%m/%d/%Y")) %>% select(EVDATE, EVTYPE, FATALITIES,INJURIES,PROPDMG,PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
mutate(useobservation=ifelse((INJURIES==0) & (FATALITIES ==0) & (PROPDMG == 0) & (CROPDMG ==0),0,1)) %>%
filter((useobservation==1)) %>% select(-useobservation)
In the preceding chunk code, the variable useobservation is a dummy variable taken a value 1 if at least one of the 4 variables of interest is not equal to 0.
The original data frame containing 37 variables is then removed from the memory.
rm(data)
For this analysis we decide to express all damage in $million. After this conversion, we restrict the number of variables to 6. ** EVTYPE, FATALITIES, INJURIES, PROPDMG (in $million)** and CROPDMG (in $million).
mydata2 <-mydata %>% mutate(PROPDMG=ifelse((PROPDMGEXP=="K"),PROPDMG*0.001,PROPDMG))%>% mutate(PROPDMG=ifelse((PROPDMGEXP=="B"),PROPDMG*1000,PROPDMG)) %>% mutate(CROPDMG=ifelse((CROPDMGEXP=="K"),CROPDMG*0.001,CROPDMG))%>% mutate(CROPDMG=ifelse((CROPDMGEXP=="B"),CROPDMG*1000,CROPDMG)) %>% select(EVDATE, EVTYPE, FATALITIES,INJURIES,PROPDMG,CROPDMG)
The conversion of damage to $million we use here do not take into account the inflation effect. Indeed,$1 in 1951 is not equal $1 in 2011. The dataset do not contains any information about inflation. We need a simple strategy to lessen the importance of the inflation effect in our analysis. To do this we decide to restrict our analysis to the last 10 years of the dataset. With this restriction in place, we are sure to minimise the long term effect of inflation in our analysis. Also, with this restriction to the last 10 years, we have a more recent dataset.
mydata3 <- mydata2 %>% filter(EVDATE>as.Date("2000-12-31")) %>% group_by(EVTYPE) %>% summarize(PROPDMG=sum(PROPDMG),CROPDMG= sum(CROPDMG),Injuries=sum(INJURIES),Fatalities=sum(FATALITIES))
From mydata3 data frame, we can find the 10 events associated with the greatest number of injuries:
injuries <- mydata3 %>% mutate(Events = EVTYPE) %>% select(Events, Injuries) %>% arrange(desc(Injuries)) %>% top_n(10,Injuries)
From mydata3 data frame, we can find the 10 events associated with the greatest number of fatalities:
fatalities <- mydata3 %>% mutate(Events = EVTYPE) %>% select(Events, Fatalities) %>% arrange(desc(Fatalities)) %>% top_n(10,Fatalities)
From mydata3 data frame, we can find the 10 events which cause the greatest economic loss, accounting for both property and crop damage.
cost <- mydata3 %>% mutate(Events = EVTYPE, totalCost=(PROPDMG + CROPDMG)/1000) %>% select(Events, totalCost) %>% arrange(desc(totalCost)) %>% top_n(10,totalCost)
The 2 following graphics below display the top 10 much harmfull weather events. Amongs these events , tornadoes are the most harmfull with respect to both fatalities and injuries.
injuriesp <- ggplot(injuries, aes(x = reorder(Events, -Injuries), y = Injuries)) +
geom_bar(stat = "identity", fill="steelblue")+
geom_text(aes(label=Injuries), vjust=1.6, color="white", size=3.5)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
xlab("Events type") +
ylab("Number of infuries") +
ggtitle("Top 10 events causing most injuries in US from 2001 to 2011 ")
injuriesp
fatalitiesp <- ggplot(fatalities, aes(x = reorder(Events, -Fatalities), y = Fatalities)) +
geom_bar(stat = "identity", fill="steelblue")+
geom_text(aes(label=Fatalities), vjust=1.6, color="white", size=3.5)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
xlab("Events type") +
ylab("Number of death") +
ggtitle("Top 10 events causing most fatalities in US from 2001 to 2011")
fatalitiesp
In the following graphic below we display the top 10 weather events which cause the greatest economic consequences. Amongs these events , floods are the weather events which cause the greatest economic loss.
Costp <- ggplot(cost, aes(x = reorder(Events, -totalCost), y = totalCost)) +
geom_bar(stat = "identity", fill="steelblue")+
geom_text(aes(label=format(totalCost,digits = 3)), vjust=1.6, color="white", size=3.5)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
xlab("Events type") +
ylab("Total economic loss in $ billions") +
ggtitle("Top 10 events causing most economic loss in US from 2001 to 2011")
Costp
Amongs all the weather events tornadoes are most harmful with respect to population health causing the greatest number of fatalities and the greatest number of injuries in United States. Concerning the economic consequences, floods are weather events which cause the geatest economic loss.