In this report I aim to describe the harmful impact the weather events have on public health(1) and properties and crops(2).
The hypothesis is that some of weather events are highly harmful and to show this we will use data downloaded from the NOAA. From these data I found that tornado is the most harmful for public health: it produces maximum number of injuries and fatalities.
Flood is the most harmful for properties and crops. Among with hurricane, tornado and storm surge it makes most of the damages to properties and crops.
Data was downloaded from National Weather Service. The events in the database start in the year 1950 and end in November 2011.
destfile <- "RR_CP2_stormdata.csv.bz2"
# Checking if archieve already exists.
if (!file.exists(destfile)){
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, destfile=destfile)
}
storm_data<-read.csv(destfile)
library(ggplot2)
library(plyr)
Now we will prepare data for both questions.
#### Reading and preparing data for public health impact
sdaggfat<-(with(storm_data, aggregate(INJURIES+FATALITIES~EVTYPE, FUN="sum")))
t5<-head(sdaggfat[order(-sdaggfat[2]),], 5)
colnames(t5)<-c("EVTYPE","COUNT")
We could see many events recorded, but we choose to show top 5 only to identify what is the event with the maximum damage.
Now data of top 5 events is prepared:
t5
## EVTYPE COUNT
## 834 TORNADO 96979
## 130 EXCESSIVE HEAT 8428
## 856 TSTM WIND 7461
## 170 FLOOD 7259
## 464 LIGHTNING 6046
Let’s prepare property and crop damage data to answer second question:
substorm<-storm_data[storm_data$PROPDMG > 0 | storm_data$CROPDMG > 0, c(8,25:28)]
I take subset of the data for which damage is greater than zero and also I take only columns relevant for my research.
I identified measures stored in xxxEXP columns. They were quite confusing and since it was no description of the values, I mapped them as per my understanding. In case of doubt - just keep the value of damage as it is.
unique(substorm$PROPDMGEXP)
## [1] K M B m + 0 5 6 4 h 2 7 3 H -
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(substorm$CROPDMGEXP)
## [1] M K m B ? 0 k
## Levels: ? 0 2 B k K m M
Mapping:
mapPROPDMG <- mapvalues(substorm$PROPDMGEXP,
c("K","M","", "B","m","+","0","5","6","?","4","2","3","h","7","H","-","1","8"),
c(1e3,1e6, 1, 1e9,1e6, 1, 1, 1e5,1e6, 1, 1e4,1e2,1e3, 1, 1e7,1e2, 1, 10, 1e8))
mapCROPDMG <- mapvalues(substorm$CROPDMGEXP,
c("","M","K","m","B","?","0","k","2"),
c( 1,1e6,1e3,1e6,1e9, 1, 1, 1e3,1e2))
And final data preparation to calculate Property and Crop damage counts separately and their total.
substorm$PROPTOTALDMG <- as.numeric(levels(mapPROPDMG))[mapPROPDMG] * substorm$PROPDMG
substorm$CROPTOTALDMG <- as.numeric(levels(mapCROPDMG))[mapCROPDMG] * substorm$CROPDMG
substorm$TOTALDMG<-substorm$PROPTOTALDMG+substorm$CROPTOTALDMG
agg_dmg <- with(substorm, aggregate(TOTALDMG ~ EVTYPE, data=substorm, FUN = "sum"))
ord_agg_dmg <- agg_dmg[order(-agg_dmg$TOTALDMG),]
head(ord_agg_dmg)
## EVTYPE TOTALDMG
## 72 FLOOD 150319678257
## 197 HURRICANE/TYPHOON 71913712800
## 354 TORNADO 57362333946
## 299 STORM SURGE 43323541000
## 116 HAIL 18761221986
## 59 FLASH FLOOD 18243991078
Input data is ready to present the results!
We will draw the graph of top 5 harmful events. The result has totals of Injuries and Fatals without separating them in the graph.
jColors <- c('chartreuse3', 'cornflowerblue', 'darkgoldenrod1', 'peachpuff3',
'mediumorchid2')
p1 <- ggplot(t5, aes(EVTYPE, COUNT)) + geom_col(fill=jColors)+
labs(x = "Event Type", y = "Count") +
labs(title = "Top 5 harmful events")
print(p1)
We can see that TORNADO has the maximum impact on public health. The runner up is EXCESSIVE HEAT
We will draw the graph of 5 events making maximum damage to the properties and crops. The result has totals of damages without separating it into Properties and Crops damages.
tp5<-head(ord_agg_dmg, 5)
p1 <- ggplot(tp5, aes(EVTYPE, TOTALDMG)) + geom_col(fill=jColors)+
labs(x = "Event Type", y = "Count") +
labs(title = "Top 5 harmful events")
print(p1)
We can see that FLOOD has the maximum impact on properties and crop. The runner up is HURRICANE/TYPHOON and second runner up is TORNADO