Through the data set obtained from National Weather Service and making a brief analysis on it, we can draw a conclusion that Heat is the most harmful with respect to population health while tropical storm Gordon have the greatest economic consequences.
The size of the data set is pretty large, so we extract several columns that is needed to be analyzed, “EVTYPE” denotes the type of events, “FATALITIES” denotes the fatalities, “INJURIES” denotes injuries caused by the events, “PRODMG” denotes the property damage and “CROPDMG” denotes the crop damage cause by the events.
library(dplyr)
library(ggplot2)
StormData <- read.csv("StormData.csv", stringsAsFactors = F)
First we need to clean the data. Actually, this dataset has a great deal of messy data, thus we have to subset the data.
StormData <- StormData[1: 546500, ]
#The suvset data with respect to Fatality data
FatalityData <- select(StormData, EVTYPE, FATALITIES, INJURIES)
#The suvset data with respect to economic damage data
dmgData <- select(StormData, EVTYPE, PROPDMG, CROPDMG)
#Change the class of column
FatalityData$FATALITIES <- as.numeric(FatalityData$FATALITIES)
FatalityData$INJURIES <- as.numeric(FatalityData$INJURIES)
FatalityData$EVTYPE <- as.factor(FatalityData$EVTYPE)
dmgData$PROPDMG <- as.numeric(dmgData$PROPDMG)
dmgData$CROPDMG <- as.numeric(dmgData$CROPDMG)
dmgData$EVTYPE <- as.factor(dmgData$EVTYPE)
#Change the name of each column
names(FatalityData) <- c("Event_Type", "Fatalities", "Injuries")
names(dmgData) <- c("Event_Type", "Property_Damage", "Crop_Damage")
To determine the most harmful event with respect to population, we will utilize the Fatality data set.
FatalityData <- group_by(FatalityData, Event_Type)
#Aggregate the data through summarise()
Smmry_FatalityData <- summarise(FatalityData,
Mean_Fatality = mean(Fatalities, na.rm = T),
Mean_Injuries = mean(Injuries, na.rm = T))
Since we only need to determine the most harmful event, so we can subset the data, leaving those most harmful event.
Smmry_FatalityData <- Smmry_FatalityData[Smmry_FatalityData$Mean_Fatality > 4
&Smmry_FatalityData$Mean_Injuries > 5,]
To determine the most harmful event with respect to economic damage, we will utilize the property data set. We can simply reproduce the code above.
dmgData <- group_by(dmgData, Event_Type)
#Aggregate the data through summarise()
Smmry_dmgData <- summarise(dmgData,
Mean_ProDmg = mean(Property_Damage, na.rm = T),
Mean_CropDmg = mean(Crop_Damage, na.rm = T))
Smmry_dmgData <- Smmry_dmgData[Smmry_dmgData$Mean_ProDmg>100 &
Smmry_dmgData$Mean_CropDmg>100,]
First let’s take a loot at the data sets we obtained.
Smmry_FatalityData
## Source: local data frame [3 x 3]
##
## Event_Type Mean_Fatality Mean_Injuries
## (fctr) (dbl) (dbl)
## 1 EXTREME HEAT 4.363636 7.045455
## 2 HEAT 12.206897 15.137931
## 3 TROPICAL STORM GORDON 8.000000 43.000000
Smmry_dmgData
## Source: local data frame [4 x 3]
##
## Event_Type Mean_ProDmg Mean_CropDmg
## (fctr) (dbl) (dbl)
## 1 HIGH WINDS/COLD 122.0000 401.0000
## 2 HURRICANE FELIX 250.0000 250.0000
## 3 TROPICAL STORM GORDON 500.0000 500.0000
## 4 WINTER STORMS 166.6667 166.6667
It seems that “HEAT” cause most fatalities while “Tropical Storm Gordan” cause most injuries. Making a plot might be a good idea to see the insight meaning.
Besides, according to the damage data, tropical storm Gordon cause economic consequece at most.
#Reshape and plot the fatalities dataframe
PlotData <- data.frame(Event_Type = tolower(rep(Smmry_FatalityData$Event_Type, 2)),
Mean_Data = c(Smmry_FatalityData$Mean_Fatality,
Smmry_FatalityData$Mean_Injuries),
Type = factor(rep(c("Mean_Fatalities","Mean_Injuries"), each = 3)))
g <-ggplot(PlotData, aes(x = Event_Type,
y = Mean_Data,
fill = Type))
g + geom_bar(stat = "identity", position = "dodge")
#Reshape and plot the economic damage dataframe
PlotData2 <- data.frame(Event_Type = tolower(rep(Smmry_dmgData$Event_Type, 2)),
Mean_Data = c(Smmry_dmgData$Mean_ProDmg,
Smmry_dmgData$Mean_CropDmg),
Type = factor(rep(c("Mean_Property_Damage","Mean_Crop_Damage"),
each = 4)))
g <- ggplot(PlotData2, aes(x = Event_Type,
y = Mean_Data,
fill = Type))
g + geom_bar(stat = "identity", position = "dodge")
The plots above show directly that tropical storm cause the most serious damage on economic and the most terrible injuries with respect to population. However, heat is the one which is responsible to the largest fatalities.