In this report, the Storm Data set obtained from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database will be explored to identify the effects of different weather events on public health and economic welfare. The events in the database start in 1950 and span dates till November 2011.
The report makes use of the dplyr package for data processing and this is loaded in.
library(dplyr)
rawdata <- read.csv("repdata_data_Stormdata.csv/repdata_data_Stormdata.csv",
header = TRUE, colClasses = "character")
copyrawdata <-rawdata
Here, the data is loaded in from the working directory. This analysis will be broken down into two sections. They will cover: * 1) The effect of natural disaster events on public and population health across the United States. * 2) The economic consequences of natural disaster events across the United States. To streamline the process of analysing the data for both questions individually, a copy of the raw data was saved into a new variable ‘copyrawdata’.
rawdata$EVTYPE <- factor(rawdata$EVTYPE)
rawdata$FATALITIES <- as.numeric(rawdata$FATALITIES)
rawdata$INJURIES <- as.numeric(rawdata$INJURIES)
FatalitiesGroups <- rawdata %>% group_by(EVTYPE)
InjuriesGroups <- rawdata %>% group_by(EVTYPE)
FatalitiesSummary <- FatalitiesGroups %>% summarise(Total.Fatalities = sum(FATALITIES))
InjuriesSummary <- InjuriesGroups %>% summarise(Total.Injuries = sum(INJURIES))
FatalitiesSummary <- FatalitiesSummary[FatalitiesSummary$Total.Fatalities != 0, ]
InjuriesSummary <- InjuriesSummary[InjuriesSummary$Total.Injuries != 0, ]
InjuriesSummary <- InjuriesSummary %>% arrange(desc(Total.Injuries))
FatalitiesSummary <- FatalitiesSummary %>% arrange(desc(Total.Fatalities))
mergedframe <- merge.data.frame(FatalitiesSummary, InjuriesSummary, by = "EVTYPE")
finalmergedframe <- mutate(mergedframe,Injuries.and.Fatalities = Total.Injuries + Total.Fatalities )
finalmergedframe <- finalmergedframe %>% arrange(desc(Injuries.and.Fatalities))
totalframe <- finalmergedframe %>% select(EVTYPE, Injuries.and.Fatalities)
mergedframe$EVTYPE <- factor(mergedframe$EVTYPE)
In this data processing section the columns are converted to the correct data type and then groups of each factor natural disaster are made for two data frames considering the Injuries and Fatalities. Once this is done, the sum of all the fatalities and injuries for each type of natural disaster are summed and then any values = 0 are removed. The data frames are organised in descending order of injuries and fatalities and then a new data frame containing only total values and the type of natural disaster is created. These will allow for an ordered list to be generated.
DataofInterest <- copyrawdata %>% select("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG", "CROPDMGEXP")
lettertonum <- function(columnname) {
columnname <- as.numeric(gsub( "K", "1000",
gsub("M", "1000000", gsub("B", "1000000000", columnname))))
}
DataofInterest$PROPDMGEXP <- lettertonum(DataofInterest$PROPDMGEXP)
## Warning in lettertonum(DataofInterest$PROPDMGEXP): NAs introduced by coercion
DataofInterest$CROPDMGEXP <- lettertonum(DataofInterest$CROPDMGEXP)
## Warning in lettertonum(DataofInterest$CROPDMGEXP): NAs introduced by coercion
DataofInterest <- DataofInterest[!is.na(DataofInterest$PROPDMGEXP),]
DataofInterest <- DataofInterest[!is.na(DataofInterest$CROPDMGEXP),]
DataofInterest$PROPDMG <- as.numeric(DataofInterest$PROPDMG)
DataofInterest$CROPDMG <- as.numeric(DataofInterest$CROPDMG)
DataofInterest$COMBINEDPROPDMG <- DataofInterest$PROPDMG * DataofInterest$PROPDMGEXP
DataofInterest$COMBINEDCROPDMG <- DataofInterest$CROPDMG * DataofInterest$CROPDMGEXP
DataofInterest <- DataofInterest[!(DataofInterest$COMBINEDPROPDMG == "0" | DataofInterest$COMBINEDCROPDMG == "0"),]
DataofInterest$TotalDamage <- DataofInterest$COMBINEDPROPDMG + DataofInterest$COMBINEDCROPDMG
DataofInterest$EVTYPE <- factor(DataofInterest$EVTYPE)
groups <- DataofInterest %>% group_by(EVTYPE)
summary <- groups %>% summarise(EventTotal = sum(TotalDamage))
summary <- summary %>% arrange(desc(EventTotal))
First a new data frame called ‘Data of Interest’ containing only EVTYPE and expense columns is created and saved.
A new function that converts the letters to their corresponding number was also created called ‘lettertonum’. Through the data documentation, the values of each letter was obtained and using this function, the columns where changed. Rows that had a value of 0 created NA values which are then removed in the following process. Once this is done, the DMG value is multiplied by the EXP value to give the actual value. A value of 0 was removed.
After this, the values were grouped together by event type and a new column called EventTotal was created to sum up all the expense for the different types of events into a data frame called summary. This was then ordered in descending order as well.
In this section the results obtained from the data processing will be used to address the effects of the natural disaster events on population health and economic welfare.
totalframe[1:5, ]
## EVTYPE Injuries.and.Fatalities
## 1 TORNADO 96979
## 2 EXCESSIVE HEAT 8428
## 3 TSTM WIND 7461
## 4 FLOOD 7259
## 5 LIGHTNING 6046
FatalitiesSummary[1:5,]
## # A tibble: 5 × 2
## EVTYPE Total.Fatalities
## <fct> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
InjuriesSummary[1:5,]
## # A tibble: 5 × 2
## EVTYPE Total.Injuries
## <fct> <dbl>
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
These show the top 5 natural disaster event types that are responsible for i) the total of fatalities and injuries, ii) fatalities alone, and iii) injuries alone.
model <- lm(log10(mergedframe$Total.Injuries) ~ log10(mergedframe$Total.Fatalities))
summary(model)
##
## Call:
## lm(formula = log10(mergedframe$Total.Injuries) ~ log10(mergedframe$Total.Fatalities))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.41451 -0.47632 -0.06813 0.55550 1.45787
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.47632 0.10042 4.743 6.71e-06 ***
## log10(mergedframe$Total.Fatalities) 1.02004 0.07017 14.537 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6379 on 104 degrees of freedom
## Multiple R-squared: 0.6702, Adjusted R-squared: 0.667
## F-statistic: 211.3 on 1 and 104 DF, p-value: < 2.2e-16
plot(log10(mergedframe$Total.Fatalities), log10(mergedframe$Total.Injuries), pch = 19,
col = rainbow(length(levels(mergedframe$EVTYPE))), xlab = "log( Total Fatalities )",
ylab = "log( Total Injuries )")
abline(model, col = "black", lwd = 3)
Figure 1 - Scatter plot of log(Injuries) against log(Fatalities) for each of the different types of natural disaster events from the data set.
This figure aims to identify is there is any relationship between the injuries and fatalities when analyzing the impact of each natural disaster on public health, i.e. can it justifiably be said that because an event caused more injuries that it is likely that it also caused more fatalities. A linear regression model is fitted to the scatter plot and a summary of its characteristics is also included in results section previously.
summary(DataofInterest$COMBINEDCROPDMG)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.000e+01 5.000e+03 1.500e+04 1.724e+06 1.000e+05 5.000e+09
summary(DataofInterest$COMBINEDPROPDMG)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00e+01 5.00e+03 2.00e+04 1.11e+07 1.00e+05 1.15e+11
summary[1:10, ]
## # A tibble: 10 × 2
## EVTYPE EventTotal
## <fct> <dbl>
## 1 FLOOD 126044533500
## 2 HURRICANE/TYPHOON 29348117800
## 3 HURRICANE 10498188000
## 4 RIVER FLOOD 10108369000
## 5 ICE STORM 5108614000
## 6 FLASH FLOOD 4309101392
## 7 HAIL 3838339690
## 8 TORNADO 2335763950
## 9 HURRICANE OPAL 2157000000
## 10 HIGH WIND 1918571300
These show the summary statistics of the Combined property and crop damage, as well as a list of the 10 event types that caused the most combined property and crop damage.
hist(summary$EventTotal,col = "blue",ylim = c(0, 100), main = "Hisogram of Total Expense for
Each Natural Disaster Event", xlab = "Total Expense")
Figure 2 - Histogram of the total expense (crop damage and property damage) for all the event types.