Synopsis

Through the data set obtained from National Weather Service and making a brief analysis on it, we can draw a conclusion that Heat is the most harmful with respect to population health while tropical storm Gordon have the greatest economic consequences.

The size of the data set is pretty large, so we extract several columns that is needed to be analyzed, “EVTYPE” denotes the type of events, “FATALITIES” denotes the fatalities, “INJURIES” denotes injuries caused by the events, “PRODMG” denotes the property damage and “CROPDMG” denotes the crop damage cause by the events.

Data Processing

Load the data

library(dplyr)
library(ggplot2)
StormData <- read.csv("StormData.csv", stringsAsFactors = F)

First we need to clean the data. Actually, this dataset has a great deal of messy data, thus we have to subset the data.

StormData <- StormData[1: 546500, ]
#The suvset data with respect to Fatality data
FatalityData <- select(StormData, EVTYPE, FATALITIES, INJURIES) 
#The suvset data with respect to economic damage data
dmgData <- select(StormData, EVTYPE, PROPDMG, CROPDMG)
#Change the class of column
FatalityData$FATALITIES <- as.numeric(FatalityData$FATALITIES)
FatalityData$INJURIES <- as.numeric(FatalityData$INJURIES)
FatalityData$EVTYPE <- as.factor(FatalityData$EVTYPE)

dmgData$PROPDMG <- as.numeric(dmgData$PROPDMG)
dmgData$CROPDMG <- as.numeric(dmgData$CROPDMG)
dmgData$EVTYPE <- as.factor(dmgData$EVTYPE)

#Change the name of each column
names(FatalityData) <- c("Event_Type", "Fatalities", "Injuries")
names(dmgData) <- c("Event_Type", "Property_Damage", "Crop_Damage")

Fatalities Data Processing

To determine the most harmful event with respect to population, we will utilize the Fatality data set.

FatalityData <- group_by(FatalityData, Event_Type)
#Aggregate the data through summarise()
Smmry_FatalityData <- summarise(FatalityData,
                                Mean_Fatality = mean(Fatalities, na.rm = T),
                                Mean_Injuries = mean(Injuries, na.rm = T))

Since we only need to determine the most harmful event, so we can subset the data, leaving those most harmful event.

Smmry_FatalityData <- Smmry_FatalityData[Smmry_FatalityData$Mean_Fatality > 4 
                                         &Smmry_FatalityData$Mean_Injuries > 5,]

Economic-Damage Data Processing

To determine the most harmful event with respect to economic damage, we will utilize the property data set. We can simply reproduce the code above.

dmgData <- group_by(dmgData, Event_Type)
#Aggregate the data through summarise()
Smmry_dmgData <- summarise(dmgData, 
                           Mean_ProDmg = mean(Property_Damage, na.rm = T),
                           Mean_CropDmg = mean(Crop_Damage, na.rm = T))

Smmry_dmgData <- Smmry_dmgData[Smmry_dmgData$Mean_ProDmg>100 & 
                                Smmry_dmgData$Mean_CropDmg>100,]

Results

First let’s take a loot at the data sets we obtained.

Smmry_FatalityData
## Source: local data frame [3 x 3]
## 
##              Event_Type Mean_Fatality Mean_Injuries
##                  (fctr)         (dbl)         (dbl)
## 1          EXTREME HEAT      4.363636      7.045455
## 2                  HEAT     12.206897     15.137931
## 3 TROPICAL STORM GORDON      8.000000     43.000000
Smmry_dmgData
## Source: local data frame [4 x 3]
## 
##              Event_Type Mean_ProDmg Mean_CropDmg
##                  (fctr)       (dbl)        (dbl)
## 1       HIGH WINDS/COLD    122.0000     401.0000
## 2       HURRICANE FELIX    250.0000     250.0000
## 3 TROPICAL STORM GORDON    500.0000     500.0000
## 4         WINTER STORMS    166.6667     166.6667

It seems that “HEAT” cause most fatalities while “Tropical Storm Gordan” cause most injuries. Making a plot might be a good idea to see the insight meaning.

Besides, according to the damage data, tropical storm Gordon cause economic consequece at most.

#Reshape and plot the fatalities dataframe
PlotData <- data.frame(Event_Type = tolower(rep(Smmry_FatalityData$Event_Type, 2)),
                       Mean_Data  = c(Smmry_FatalityData$Mean_Fatality, 
                                      Smmry_FatalityData$Mean_Injuries),
                       Type = factor(rep(c("Mean_Fatalities","Mean_Injuries"), each = 3)))
g <-ggplot(PlotData, aes(x = Event_Type, 
                         y = Mean_Data, 
                         fill = Type)) 
g + geom_bar(stat = "identity", position = "dodge") 

#Reshape and plot the economic damage dataframe
PlotData2 <- data.frame(Event_Type = tolower(rep(Smmry_dmgData$Event_Type, 2)),
                        Mean_Data  = c(Smmry_dmgData$Mean_ProDmg,
                                       Smmry_dmgData$Mean_CropDmg),
                        Type = factor(rep(c("Mean_Property_Damage","Mean_Crop_Damage"),
                                          each = 4)))

g <- ggplot(PlotData2, aes(x = Event_Type,
                           y = Mean_Data,
                           fill = Type))
g + geom_bar(stat = "identity", position = "dodge")

The plots above show directly that tropical storm cause the most serious damage on economic and the most terrible injuries with respect to population. However, heat is the one which is responsible to the largest fatalities.