Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
*National Weather Service Storm Data Documentation
*National Climatic Data Center Storm Events FAQ
p2 <- read.csv("~/repdata-data-StormData.csv")
p2.1<-p2[,c(8,22:27)]
p2.1$EVTYPE<-as.factor(p2.1$EVTYPE)
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.2.3
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
p2.1grp<-group_by(p2.1,as.factor(p2.1$EVTYPE))
p2.2grp<-p2.1grp[,c(2:8)]
The preceding code reads in the storm data to R and stores for variable p2. Next the data is reduced down to 7 columns: EVTYPE, MAG, FATALITIES, INJURIES, PROPDMG, PROPDMEXP, and CROPDMG. This reduced version of the data set is stored under p2.1, so that p2 remains original. Furthermore, the EVTYPE variable is changed into a factor form in order to group.
Next the “dplyr” packaged is called into the working director, and the group_by() function is called. The new dataset, grouped by EVTYPE, is stored under p2.1grp. The final line simply takes out the duplicate of Event type and stores for p2.2grp.
summary(p2.2grp)
## MAG FATALITIES INJURIES
## Min. : 0.0 Min. : 0.0000 Min. : 0.0000
## 1st Qu.: 0.0 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median : 50.0 Median : 0.0000 Median : 0.0000
## Mean : 46.9 Mean : 0.0168 Mean : 0.1557
## 3rd Qu.: 75.0 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :22000.0 Max. :583.0000 Max. :1700.0000
##
## PROPDMG PROPDMGEXP CROPDMG
## Min. : 0.00 :465934 Min. : 0.000
## 1st Qu.: 0.00 K :424665 1st Qu.: 0.000
## Median : 0.00 M : 11330 Median : 0.000
## Mean : 12.06 0 : 216 Mean : 1.527
## 3rd Qu.: 0.50 B : 40 3rd Qu.: 0.000
## Max. :5000.00 5 : 28 Max. :990.000
## (Other): 84
## as.factor(p2.1$EVTYPE)
## HAIL :288661
## TSTM WIND :219940
## THUNDERSTORM WIND: 82563
## TORNADO : 60652
## FLASH FLOOD : 54277
## FLOOD : 25326
## (Other) :170878
From the summary above, we can see that there seem to be outstanding outliersbecause many of the first three quartiles have a reading of 0, or very low number, however the max seems to tremendously skew the data. This may imply that there is some specific event that has potential to do more damage than another type.
Therefore further investigation is required.
tapply(X = p2.2grp$FATALITIES,INDEX = p2.2grp[,7],FUN = mean)->p2.2grp_fatal_means
tapply(X = p2.2grp$INJURIES,INDEX = p2.2grp[,7],FUN = mean)->p2.2grp_injury_means
tapply(X = p2.2grp$PROPDMG,INDEX = p2.2grp[,7],FUN = mean)->p2.2grp_propdmg_means
tapply(X = p2.2grp$CROPDMG,INDEX = p2.2grp[,7],FUN = mean)->p2.2grp_cropdmg_means
cbind(unique(p2.2grp$`as.factor(p2.1$EVTYPE)`),p2.2grp_fatal_means,p2.2grp_injury_means,p2.2grp_propdmg_means,p2.2grp_cropdmg_means)->bindedmeans
evnt_means_df<-as.data.frame(bindedmeans)
evnt_means_df<-evnt_means_df[,-c(1)]
evnt_avg_sum<-as.data.frame(cbind(rowSums(x = evnt_means_df[,c(1,2)]),rowSums(x = evnt_means_df[,c(3,4)]),rowSums(x = evnt_means_df)))
names(evnt_avg_sum)<-c("Average_Population_Casualties","Average_Economic_Loss","Average_Total")
evnt_avg_sum$EventType<-row.names(x = evnt_avg_sum)
arrange(evnt_avg_sum,desc(evnt_avg_sum$Average_Economic_Loss))->Economy
arrange(evnt_avg_sum,desc(evnt_avg_sum$Average_Population_Casualties))->Casualties
The above code will do the following:
*calculate means indexed by event type
+fatality count means
+injury count means
+property damage means ($)
+crop damage means ($)
the next step binds the mean vectors and unifies them in a single data frame names evnt_means _df.
Now we oberserve the following:
library(lattice)
Casualties$EventType[1:10]->x1
Casualties$Average_Population_Casualties[1:10]->y1
names(y1)<-x1
barchart(y1,main = "Combined Fatality and Injury Count (per person)",xlab = "")
Economy$EventType[1:10]->x2
Economy$Average_Economic_Loss[1:10]->y2
names(y2)<-x2
barchart(y2,main = "Combined Property and Crop Damage (per $1000)", xlab = "")