author: “A. Mopa”
date: “15 décembre 2015”
This anaysis is about the study of the public health impact as well as economic loss due to severe Weather events in the United States over the period from 1950 to 2011. The data used in this study come from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.The data can be downloaded from the sourse web site. For more information you can go to the following urls:
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
Here are the principal results: tornadoes are most harmful with respect to population health causing the greatest number of fatalities and the greatest number of injuries. Concerning the economic consequences, floods are weather events which cause the geatest economic loss.
library(data.table)
library(dplyr)
library(ggplot2)
Here we decide to load the dataset directly from the source url where the file is stored in the web. This strategy ensure that there is no manual manipulation of the data source.
resourceURL <- "http://d396qusza40orc.cloudfront.net/repdata/data/StormData.csv.bz2"
downlodedfile <- "StormData.csv.bz2"
if (!file.exists(downlodedfile)) {
download.file(resourceURL, destfile = downlodedfile)
}
data <- read.table('StormData.csv.bz2', header = TRUE, sep = ',')
The initial dataset contains 902297 observations and 37 variables. We do not need all these observations and variables in our analysis.
we decicide to retain only 8 variables of interest from the initial dataset with 37 variables. These variables are:BGN_DATE, EVTYPE, FATALITIES,INJURIES,PROPDMG,PROPDMGEXP, CROPDMG and CROPDMGEXP. We the create a new date EVDATE variable based on the character variable –BGN_DATE–.
Amongs all 902297 observations, we decide to exclude those observations which do not contains any value for four variables of interest in this study. These variables are: FATALITIES,INJURIES,PROPDMG,PROPDMGEXP and CROPDMG. The number of observations is reduced to 254633.
The original data frame containing 902297 observations and 902297 variables is then removed from the memory.
mydata <-data %>% mutate(EVDATE = as.Date(BGN_DATE, format = "%m/%d/%Y")) %>% select(EVDATE, EVTYPE, FATALITIES,INJURIES,PROPDMG,PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
mutate(useobservation=ifelse((INJURIES==0) & (FATALITIES ==0) & (PROPDMG == 0) & (CROPDMG ==0),0,1)) %>%
filter((useobservation==1)) %>% select(-useobservation)
In the preceding chunk code, the variable useobservation is a dummy variable taken a value 1 if at least one of the 4 variables of interest is not equal to 0.
The original data frame containing 37 variables is then removed from the memory.
rm(data)
For this analysis we decide to express all damage in $million. After this conversion, we restrict the number of variables to 6. ** EVTYPE, FATALITIES, INJURIES, PROPDMG (in $million)** and CROPDMG (in $million).
mydata2 <-mydata %>% mutate(PROPDMG=ifelse((PROPDMGEXP=="K"),PROPDMG*0.001,PROPDMG))%>% mutate(PROPDMG=ifelse((PROPDMGEXP=="B"),PROPDMG*1000,PROPDMG)) %>% mutate(CROPDMG=ifelse((CROPDMGEXP=="K"),CROPDMG*0.001,CROPDMG))%>% mutate(CROPDMG=ifelse((CROPDMGEXP=="B"),CROPDMG*1000,CROPDMG)) %>% select(EVDATE, EVTYPE, FATALITIES,INJURIES,PROPDMG,CROPDMG)
After the conversion of damage to $million we are now able to evaluate the impact of severe weather events in term of economic loss and public health. To this end we group the observations in mydata2 by event type and calculate the sum of each group in $million. From this calculation, we can therefore extract the top ten weather events whith respect to the economic consequences as well as public health in the United States.
mydata3 <- mydata2 %>% group_by(EVTYPE) %>% summarize(PROPDMG=sum(PROPDMG),CROPDMG= sum(CROPDMG),Injuries=sum(INJURIES),Fatalities=sum(FATALITIES))
From mydata3 data frame, we can find the 10 events associated with the greatest number of injuries:
injuriesdf <- mydata3 %>% mutate(Events = EVTYPE) %>% select(Events, Injuries) %>% arrange(desc(Injuries)) %>% top_n(10,Injuries)
From mydata3 data frame, we can find the 10 events associated with the greatest number of fatalities:
fatalitiesdf <- mydata3 %>% mutate(Events = EVTYPE) %>% select(Events, Fatalities) %>% arrange(desc(Fatalities)) %>% top_n(10,Fatalities)
From mydata3 data frame, we can find the 10 events which cause the greatest economic loss, accounting for both property and crop damage.
lossdf <- mydata3 %>% mutate(Events = EVTYPE, totalCost=(PROPDMG + CROPDMG)/1000) %>% select(Events, totalCost) %>% arrange(desc(totalCost)) %>% top_n(10,totalCost)
In the following 2 graphics, we display the top 10 much harmfull weather-related events.
Amongs the top 10 harmfull weather events in United States, tornadoes are causing the greatest number of injuries.
injuriesp <- ggplot(injuriesdf, aes(x = reorder(Events, -Injuries), y = Injuries)) +
geom_bar(stat = "identity", fill="steelblue")+
geom_text(aes(label=Injuries), vjust=1.6, color="white", size=3.5)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
xlab("Events type") +
ylab("Number of injuries") +
ggtitle("Top 10 events causing most injuries in United States")
injuriesp
Amongs the top 10 harmfull weather events in United States, tornadoes are causing the greatest number of fatalities.
fatalitiesp <- ggplot(fatalitiesdf, aes(x = reorder(Events, -Fatalities), y = Fatalities)) +
geom_bar(stat = "identity", fill="steelblue")+
geom_text(aes(label=Fatalities), vjust=1.6, color="white", size=3.5)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
xlab("Events type") +
ylab("Number of death") +
ggtitle("Top 10 events causing most fatalities in United States")
fatalitiesp
In the following graphic below we display the top 10 weather events which cause the greatest economic consequences. Amongs these events , floods are the weather events which cause the greatest economic loss.
Costp <- ggplot(lossdf, aes(x = reorder(Events, -totalCost), y = totalCost)) +
geom_bar(stat = "identity", fill="steelblue")+
geom_text(aes(label=format(totalCost,digits = 3)), vjust=1.6, color="white", size=3.5)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
xlab("Events type") +
ylab("Total economic loss in $ billions") +
ggtitle("Top 10 events causing most economic loss in United States")
Costp
Amongs all the weather events tornadoes are most harmful with respect to population health causing the greatest number of fatalities and the greatest number of injuries in United States. Concerning the economic consequences, floods are weather events which cause the geatest economic loss.