title: Weather-related events: an analysis of public health and economic loss author: “Arouna” date: “15 décembre 2015” output: html_document —

Synopsis

This anaysis is about the study of the public health impact as well as economic loss due to severe Weather events in the United States over the period from 1950 to 2011. The data used in this study come from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.The data can be downloaded from the sourse web site. For more information you can go to the following urls:

National Weather Service Storm Data Documentation

National Climatic Data Center Storm Events FAQ

Here are the principal results: tornadoes are most harmful with respect to population health causing the greatest number of fatalities and the greatest number of injuries. Concerning the economic consequences, floods are weather events which cause the geatest economic loss. .

Data Processing

Data package used in this analysis

library(data.table)
library(dplyr)
library(ggplot2)

loading the data

Here we decide to load the dataset directly from the source url where the file is stored in the web. This strategy ensure that there is no manual manipulation of the data source.

resourceURL <- "http://d396qusza40orc.cloudfront.net/repdata/data/StormData.csv.bz2"
downlodedfile <- "StormData.csv.bz2"
if (!file.exists(downlodedfile)) {
    download.file(resourceURL, destfile = downlodedfile)
}

data <- read.table('StormData.csv.bz2', header = TRUE, sep = ',')

Data transformation

The initial dataset contains 902297 observations and 37 variables. We do not need all these observations and variables in our analysis.

Variables selection

we decicide to retain only 8 variables of interest from the initial dataset with 37 variables. These variables are:BGN_DATE, EVTYPE, FATALITIES,INJURIES,PROPDMG,PROPDMGEXP, CROPDMG and CROPDMGEXP. We the create a new date EVDATE variable based on the character variable –BGN_DATE–.

Selecting observations

Amongs all 902297 observations, we decide to exclude those observations which do not contains any value for four variables of interest in this study. These variables are: FATALITIES,INJURIES,PROPDMG,PROPDMGEXP and CROPDMG. The number of observations is reduced to 254633.

The original data frame containing 902297 observations and 902297 variables is then removed from the memory.

mydata <-data %>% mutate(EVDATE = as.Date(BGN_DATE, format = "%m/%d/%Y")) %>% select(EVDATE, EVTYPE, FATALITIES,INJURIES,PROPDMG,PROPDMGEXP, CROPDMG, CROPDMGEXP) %>% 
 mutate(useobservation=ifelse((INJURIES==0) & (FATALITIES ==0) & (PROPDMG == 0)  & (CROPDMG ==0),0,1)) %>% 
filter((useobservation==1)) %>% select(-useobservation)

In the preceding chunk code, the variable useobservation is a dummy variable taken a value 1 if at least one of the 4 variables of interest is not equal to 0.

The original data frame containing 37 variables is then removed from the memory.

rm(data)

Converting to Million USD

For this analysis we decide to express all damage in $million. After this conversion, we restrict the number of variables to 6. ** EVTYPE, FATALITIES, INJURIES, PROPDMG (in $million)** and CROPDMG (in $million).

mydata2 <-mydata %>%  mutate(PROPDMG=ifelse((PROPDMGEXP=="K"),PROPDMG*0.001,PROPDMG))%>%    mutate(PROPDMG=ifelse((PROPDMGEXP=="B"),PROPDMG*1000,PROPDMG)) %>%  mutate(CROPDMG=ifelse((CROPDMGEXP=="K"),CROPDMG*0.001,CROPDMG))%>%  mutate(CROPDMG=ifelse((CROPDMGEXP=="B"),CROPDMG*1000,CROPDMG)) %>% select(EVDATE, EVTYPE, FATALITIES,INJURIES,PROPDMG,CROPDMG)

Processing data

The conversion of damage to $million we use here do not take into account the inflation effect. Indeed,$1 in 1951 is not equal $1 in 2011. The dataset do not contains any information about inflation. We need a simple strategy to lessen the importance of the inflation effect in our analysis. To do this we decide to restrict our analysis to the last 10 years of the dataset. With this restriction in place, we are sure to minimise the long term effect of inflation in our analysis. Also, with this restriction to the last 10 years, we have a more recent dataset.

mydata3 <- mydata2 %>% filter(EVDATE>as.Date("2000-12-31")) %>% group_by(EVTYPE) %>% summarize(PROPDMG=sum(PROPDMG),CROPDMG= sum(CROPDMG),Injuries=sum(INJURIES),Fatalities=sum(FATALITIES))

From mydata3 data frame, we can find the 10 events associated with the greatest number of injuries:

injuries <- mydata3 %>% mutate(Events = EVTYPE) %>% select(Events, Injuries) %>% arrange(desc(Injuries)) %>% top_n(10,Injuries)

From mydata3 data frame, we can find the 10 events associated with the greatest number of fatalities:

fatalities <- mydata3 %>% mutate(Events = EVTYPE) %>% select(Events, Fatalities) %>% arrange(desc(Fatalities)) %>% top_n(10,Fatalities)

From mydata3 data frame, we can find the 10 events which cause the greatest economic loss, accounting for both property and crop damage.

cost <- mydata3 %>% mutate(Events = EVTYPE, totalCost=(PROPDMG + CROPDMG)/1000) %>% select(Events, totalCost) %>% arrange(desc(totalCost)) %>% top_n(10,totalCost)

Result

  1. The most harmful weather events with respect to population health across the United States,

The 2 following graphics below display the top 10 much harmfull weather events. Amongs these events , tornadoes are the most harmfull with respect to both fatalities and injuries.

injuriesp <- ggplot(injuries, aes(x = reorder(Events, -Injuries), y = Injuries)) +
  geom_bar(stat = "identity", fill="steelblue")+
  geom_text(aes(label=Injuries), vjust=1.6, color="white", size=3.5)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))+
  xlab("Events type") +
  ylab("Number of infuries") +
  ggtitle("Top 10 events causing most injuries in US from 2001 to 2011 ")
injuriesp

fatalitiesp <- ggplot(fatalities, aes(x = reorder(Events, -Fatalities), y = Fatalities)) +
  geom_bar(stat = "identity", fill="steelblue")+
  geom_text(aes(label=Fatalities), vjust=1.6, color="white", size=3.5)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))+
  xlab("Events type") +
  ylab("Number of death") +
  ggtitle("Top 10 events causing most fatalities in US from 2001 to 2011")
fatalitiesp

  1. Across the United States, flood events cause the most economic loss in United States.

In the following graphic below we display the top 10 weather events which cause the greatest economic consequences. Amongs these events , floods are the weather events which cause the greatest economic loss.

Costp <- ggplot(cost, aes(x = reorder(Events, -totalCost), y = totalCost)) +
  geom_bar(stat = "identity", fill="steelblue")+
  geom_text(aes(label=format(totalCost,digits = 3)), vjust=1.6, color="white", size=3.5)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))+
  xlab("Events type") +
  ylab("Total economic loss in $ billions") +
  ggtitle("Top 10 events causing most economic loss in US from 2001 to 2011")
Costp

Conclusion

Amongs all the weather events tornadoes are most harmful with respect to population health causing the greatest number of fatalities and the greatest number of injuries in United States. Concerning the economic consequences, floods are weather events which cause the geatest economic loss.