The data is analysed by firstly reading it into R as a data frame. To enable analysis of the damage to human health, the dataframe is summarised by Event type to calculate the total damage by each event type. Datapoints with comparitively little damage are then filtered out of the dataset to make the final plots more readable. A metric for the total damage to human health (fatalities + injuries) is also calculated with a weighting such that a fatality is worth twice as much as a injury. The total financial damage is calculated by first expanding the data (i.e. 2.5 and k becomes 2500) before summarising it in a similar way to the human health data. In this case the metric for total damage is simply the sum of the property and crop damage. Both datasets are also melted before plotting to make it easier to use plotting libraries (in this case ggplot2). Also, the datasets are arranged in descending order of total damage to make it easier to visualise the data when it is presented in a tabular format.
library(dplyr)
library(ggplot2) library(reshape2) Reads the dataset into R.
setwd(“C:/Users/gargi roy/Documents/Coursera/Reproducible/course_project_2”) rawData <- read.csv(“./repdata_data_StormData.csv”, sep = “,”) rawData <- rawData %>% group_by(EVTYPE)
Summarises the relevant variables (Fatalities, INJURIES, and PROPDMG) to make it easier to plot them. Also filters out the data points to remove some of the “noise”" (insignificantly small values) to make the plot more readable.
summarisedData <- rawData %>% group_by(EVTYPE) %>% summarise(totalFatalities = sum(FATALITIES), totalInjuries = sum(INJURIES)) humanDamageData <- summarisedData %>% filter(totalFatalities>10, totalInjuries>10)
Attempts to estimate the total damage to human health as a function of the totalFatilities and totalInjuries. I chose to weight them such that a fatality is worth twice as much as an injury.
humanDamageData <- humanDamageData %>% mutate(totalHumanDamage = 2*totalFatalities + totalInjuries) %>% arrange(desc(totalHumanDamage)) Calculates the total property and crop damage using the PROPDMGEXP column before filtering datapoints with low values to make the plot more readable
propDamageData <- rawData %>% mutate(completePropDamage = ifelse(tolower(PROPDMGEXP)==“k”,PROPDMG1000,ifelse(tolower(PROPDMGEXP)==“m”, PROPDMG1000000, ifelse(tolower(PROPDMGEXP)==“b”,PROPDMG1000000000,PROPDMG)))) %>% mutate(completeCropDamage = ifelse(tolower(CROPDMGEXP)==“k”,CROPDMG1000,ifelse(tolower(CROPDMGEXP)==“m”, CROPDMG1000000, ifelse(tolower(CROPDMGEXP)==“b”,CROPDMG1000000000,CROPDMG)))) %>% select(EVTYPE, completePropDamage, completeCropDamage) %>% mutate(totalDamage=completePropDamage+completeCropDamage) %>% filter(completePropDamage>1000000, completeCropDamage>1000000) propDamageData <- propDamageData %>% group_by(EVTYPE) %>% summarise(totalPropDamage=sum(completePropDamage), totalCropDamage=sum(completeCropDamage), totalDamage=sum(totalDamage))
Plots only the fatality and injury data to help visualise any overall trends.
##Melts the data to make it a long dataset (which can be plotted in ggplot2) fatalityAndInjuryData <- melt(humanDamageData %>% select(EVTYPE, totalFatalities, totalInjuries), id=“EVTYPE”) g <- ggplot(fatalityAndInjuryData, aes(EVTYPE, value)) + facet_grid(variable~., scales=“free”) + geom_point() + theme(axis.text.x = element_text(angle=90, hjust=1)) + ggtitle(“Plot showing the total number of fatalities and injuries for different events”) print(g)
Plots the eariler calculated metric for total human health damage. This helps to visualise the data from the 2 earlier plots and any trends between them.
g <- ggplot(humanDamageData, aes(EVTYPE, totalHumanDamage)) + geom_point() + theme(axis.text.x = element_text(angle=90, hjust=1)) + ggtitle(“Plot showing the total human damage for different (based on the earlier calculated metric)”) print(g)
Displays the 10 events which are most damaging to human health to help inform any decision making (by providing precise information)
head(arrange(humanDamageData, desc(totalHumanDamage)), 10)
####Property Damage
Plots the property and crop damage data side by side to help visualise any overall trends
##Melts the data to make it a long dataset (which can be plotted in ggplot2) meltedDamageData <- melt(propDamageData %>% select(-totalDamage), id=“EVTYPE”) g <- ggplot(meltedDamageData, aes(EVTYPE, value)) + facet_grid(variable~., scales=“free”) + geom_point() + theme(axis.text.x = element_text(angle=90, hjust=1)) + ggtitle(“Plot showing the total property and crop damage for different events”) print(g)
Displays the 10 events which cause the most overall financial impact to help inform any decision making (by providing precise information)
head(arrange(propDamageData, desc(totalDamage), desc(totalPropDamage), desc(totalCropDamage)),10)
Based on the above histograms, we find that flood and hurricane/typhoon cause most property damage; drought and flood causes most crop damage in the United States from 1995 to 2011.
From these data, we found that excessive heat and tornado are most harmful with respect to population health, while flood, drought, and hurricane/typhoon have the greatest economic consequences.