Title: “Harmful Weather Events Analysis”


The analysis is structured in some steps:
(1) Synopsis
(2) Data Processing (3) Results

1 Synopsis

The analysis, after a phase of data exploration (e.g. structure of the dataset, number of observation, factors and levels) brought to simplify the DataSet, selecting a limited number of variable. Preliminary Data Trasformation task is required to bring measurement of economic damages (Property and Crop) in thousand of dollars.
The observation are then grouped by event type and the indicator related to Health (Casualties, Injuries) has been both summed and averaged, to get a full set of KPIs to analyze. In order to adopt a unique indicator for “Health Harmfullness”, it has been analyzed the correlation between injuries and casualties; considering the linear correlation betweeb them the “number of casuaty” has been chosen to rank the event type. Similar analysis has been run for economic damages, where the KPI for ranking the event type has been built, adding crop and property damages.

2 Data Processing

Data are dowloaded and extracted and then structured in the “stormData” DataFrame.
In order to facilitate file reading the following code chunk part related to file dowload is commented. Please make sure You placed the “storm_data.csv.bz2”" file in the Main Dir.

QuickData Exploration

The following code chunk is aimed to provide a quick data exploration.

str(stormData)
summary(stormData)
head(stormData,5)

DataSet Reduction

The following code chunk is aimed to cleanUp the “stormData” dataframe to carve out all un-necessary variables.

Data Transformation

This section is aimed to get harmonized information of Property Damage (i.e. transform the value in “.000 $”“). Considering the factor levels, the value corresponding to the symbol shown in colummn PROPDMGEXP has been identified using the corresponding value in this site where single events have been compared.

The following chunk of code creates two vectors (one for damages to the Property and damages to Crop), each of them summarizing the coefficient (in k$) to be applied to the columns “propdmg”" e “cropdmg” .

#Build up a conversion table (propConvTab)
listsymbol<-as.character(c("H","h","K","k","M","m","B","b","+","-","?","0","1","2","3","4","5","6","7","8",""))
listcoeff<-c(1/10,1/10,1,1,1000,1000,1000000,1000000,1/1000,0,0,10/1000,10/1000,10/1000,10/1000,10/1000,10/1000,10/1000,10/1000,10/1000,0)
propConvTab<-data.frame(sym=listsymbol, coeff=listcoeff)


l<-length(TinystormData$PROPDMGEXP)
PropCoeff<-vector("numeric", l)
CropCoeff<-vector("numeric", l)

The vectors are then added to the data frame “TinystormData”.

The last step of data trasformation process consists in multiplying the coefficient identified in the previous step by the variables “propdmg”" e “cropdmg”. This step enables to calculate the value of damages to properties and crops in thousand of dollars. Property and Crop damages are then added to get the overall amount of damages per event.

library(dplyr)
TinystormDataN<-cbind(TinystormData,PropCoeff,CropCoeff)
AnalysisDS<-TinystormDataN %>% 
        mutate(NormPropDamage = PROPDMG*PropCoeff,NormCropDamage=CROPDMG*CropCoeff, TotalDamage=NormPropDamage+NormCropDamage)

Data Preparation for question 1

In order to answer to question 1, data are grouped by “event type” to calculate total number and average number of fatailities and injuries “per event type”, taking out event type that not caused damges to population. Event Type are then ranked by total number of fatalities.

Data Preparation for question 2

In order to answer to question 1, data are grouped by “event type” to calculate total value of daages, taking out event type that not caused damages to Properties and Crops.
Event Type are then ranked by total value of damages.

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.2.2
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
econHarm<-AnalysisDS %>% 
        dplyr::group_by(EVTYPE)%>%
        dplyr::summarise(TotDam=sum(TotalDamage))%>%
        dplyr::filter(TotDam!="0")%>%
        dplyr::arrange(desc(TotDam))
Top20econHarm<-head(econHarm,20)

3 Results

Results are presented to respond to the 2 questions raised:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

The following diagram highlights that there are significant correlation between fatalities and injuries, we can consider then correct taking fatalities as a proxy to determine the most harmful type of event.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.2
p <- ggplot(data=PopHealth, aes(x=TotInj, y=TotFat))
p<-p + layer(geom="jitter")+stat_smooth(method="lm", se=FALSE)+ coord_cartesian(xlim = c(0,10000), ylim = c(0,2000))
p+ggtitle("Regression analysis between Injuries and Fatalities for each type of event") +
        xlab("Total Number of Injuries") + ylab("Total Number of Fatalities")

p

The following diagram higlights the 20 more harmful type of events (most harmul in terms of fatalities).
The orizontal axis describe the logatimic value of fatalities; in fact the 2 most harmful type of events causing fatalities with higher scale than the ones with lower number of fatailities.

p1 <- ggplot(data=Top20PopHealth,
            aes(x=LogTotFat,y=reorder(factor(EVTYPE),LogTotFat)))
p1 = p1 + geom_point()
p1=p1+ggtitle("Top 20 Harmful Type of Event for population Health ") +
        xlab("Total Fatalties in log scale ") + ylab("Most Harmful Event Type")
p1

Across the United States, which types of events have the greatest economic consequences?

The following diagram higlights the 20 more dangerous type of events for economics.
The orizontal axis describe the logatimic value of value of damages; in fact the 2 most harmful type of events causing damages with higher scale than the ones with lower number of fatailities.

p2 <- ggplot(data=Top20econHarm,
            aes(x=TotDam,y=reorder(factor(EVTYPE),TotDam)))
p2=p2 + geom_point()
p2=p2+ggtitle("Top 20 Harmful Type of Event for economic consequences") +
        xlab("Total economic damages") + ylab("Most Harmful Event Type")
p2