These data I am analyzing are from the “National Weather Service Storm Data”" Documentation obtained from the coursera class. Downloaded data were as a bzipfile. Data were not unziped as a csv file (R can read it!), after observing that “evtype” had the same type of events (storms) under different names, I converted to the same name the events that had an impact for harmful or economic consequences. The next step was creating tables with the 10 most important causes of injuries, and fatalities to evaluate harmful consequences, a figure with both graphics was created. To evaluate economic impact, I need to convert the exponential to a numeric character and I created tables with the 10 most important economic harmful for properties and crop, and a graphic.
Charging the data
data <- read.table("repdata-data-StormData.csv.bz2", sep = ",", header=T)
Processing the data
Type of events were described with different forms, as an example some description were under the name of hurricane and other hurricane/typhoon, I attempted to unify the most important variables, deleting some spaces and other steps described in the chunk.
names(data) <- tolower(names(data))
data$evtype <- as.factor(tolower(data$evtype))
data$evtype <- as.factor(sub("^ ","",data$evtype))
data$evtype <- as.factor(sub("^ ","",data$evtype))
tornado <- grep("tornado", data$evtype)
data$evtype[tornado] <- "tornado"
wind <- (grep("wind", data$evtype))
data$evtype[wind] <- "wind storm"
flood <- grep("flood", data$evtype)
data$evtype[flood] <- "flood"
heat <- grep("heat", data$evtype)
data$evtype[heat] <- "heat"
lightning <- grep("lightning", data$evtype)
data$evtype[lightning] <- "lightning"
hail <- grep("hail", data$evtype)
data$evtype[hail] <- "hail"
ice <- grep("ice", data$evtype)
data$evtype[ice] <- "ice storm"
winter <- grep("winter", data$evtype)
data$evtype[winter] <- "winter storm"
hurricane <- grep("hurricane", data$evtype)
data$evtype[hurricane] <- "hurricane/typhoon"
typhoon <- grep("typhoon", data$evtype)
data$evtype[typhoon] <- "hurricane/typhoon"
snow <- grep("snow", data$evtype)
data$evtype[snow] <- "snow"
rcurrent <- grep("rip current", data$evtype)
data$evtype[rcurrent] <- "rip current"
cold <- grep("cold", data$evtype)
data$evtype[cold] <- "cold"
Evaluating injuries by type of event
first I created a table with the 10 most important causes of injuries.
injbytype <- aggregate(data$injuries, list(data$evtype), sum, na.rm=T)
names(injbytype) <- c("evtype", "suminjuries")
inj_by_type_ord <- injbytype[order(injbytype$suminjuries, decreasing=T),]
table1 <- data.frame(inj_by_type_ord[1:10,], row.names=NULL)
The next step was creating a table of the 10 most important causes of fatalities.
fatbytype <- aggregate(data$fatalities, list(data$evtype), sum, na.rm=T)
names(fatbytype) <- c("evtype", "sumfatalities")
fat_by_type_ord <- fatbytype[order(fatbytype$sumfatalities, decreasing=T),]
table2 <- data.frame(fat_by_type_ord[1:10,], row.names=NULL)
Finally I crated a graphic with both results that is showed in the results.
Economic consequences
To evaluate the economic consequences I had to convert the “exponential” variables to a number, I created a function and performed this conversion. Also I created new variables with the total of properties and crop damage.
data$propdmgexp <- toupper(data$propdmgexp)
data$cropdmgexp <- toupper(data$cropdmgexp)
setexp <- function(x) {
if (x == 0) {
x <- 1}
else if (x == "1") {
x <- 10}
else if (x == "2") {
x <- 100}
else if (x == "3") {
x <- 1000}
else if (x == "K") {
x <- 1000}
else if (x == "4") {
x <- 10000}
else if (x == "5") {
x <- 100000}
else if (x == "6") {
x <- 1000000}
else if (x == "M") {
x <- 1000000}
else if (x == "7") {
x <- 10000000}
else if (x == "8") {
x <- 100000000}
else if (x == "B") {
x <- 1000000000}
else x <- NA
}
data$propexp2 <- sapply(data[,"propdmgexp"], setexp)
data$propdmgtotal <- data$propexp2*data$propdmg
data$cropexp2 <- sapply(data[,"cropdmgexp"], setexp)
data$cropdmgtotal <- data$cropexp2*data$cropdmg
After I created a table with the properties damage
totalcostpropdmg <- aggregate(data$propdmgtotal, list(data$evtype), sum, na.rm=T)
names(totalcostpropdmg) <- c("evtype", "sumtotalcost")
cost_prop_ord <- totalcostpropdmg[order(totalcostpropdmg$sumtotalcost, decreasing=T),]
table3 <- data.frame(cost_prop_ord[1:10,], row.names=NULL)
And also created a table with crop damage
totalcostcropdmg <- aggregate(data$cropdmgtotal, list(data$evtype), sum, na.rm=T)
names(totalcostcropdmg) <- c("evtype", "sumtotalcost")
cost_crop_ord <- totalcostcropdmg[order(totalcostcropdmg$sumtotalcost, decreasing=T),]
table4 <- data.frame(cost_crop_ord[1:10,], row.names=NULL)
A Figure 2 was created that is in the result section.
The question was
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
The table 1 shows the 10 most imprtant storms causing injuries. The most important event causing injuries was tornado.
Table1. Sum of injuries by type of event.
table1edited <- table1
names(table1edited) <- c("Type of event", "Sum of injuries")
table1edited
## Type of event Sum of injuries
## 1 tornado 91407
## 2 wind storm 11498
## 3 heat 9224
## 4 flood 8604
## 5 lightning 5232
## 6 ice storm 2164
## 7 winter storm 1876
## 8 hail 1371
## 9 hurricane/typhoon 1333
## 10 snow 1111
The table 2 shows that tornado is again te most important cause of death.
Table2. Sum of deaths by event
table2edited <- table2
names(table2edited) <- c("Type of event", "Sum of injuries")
table2edited
## Type of event Sum of injuries
## 1 tornado 5661
## 2 heat 3138
## 3 flood 1525
## 4 wind storm 1426
## 5 lightning 817
## 6 rip current 577
## 7 winter storm 277
## 8 avalanche 224
## 9 cold 215
## 10 snow 159
The Figure 1 shows the previous results informed in a graphic.
par(mar= c(5.1,9,4.1,2.1))
par(mfcol=c(1,2))
grap1names <- as.factor(table1[1:10,1])
barplot(table1$suminjuries, names.arg= grap1names, las=2, horiz=T, col= rainbow(10), main= "Fig 1A. Injuries")
grap2names <- as.factor(table2[1:10,1])
barplot(table2$sumfatalities, names.arg= grap2names, las=2, horiz=T, col= rainbow(10), xlim= c(0,6000), main="Fig1B. Fatalities")
Figure1 The figure 1A shows the sum of injuries and 1B shows the sum of fatalities.
The question for it was:
Across the United States, which types of events have the greatest economic consequences?
The table 3 shows the 10 most important storms according costs in properties.
table3edited <- table3
names(table3edited) <- c("Type of event", "Sum of of total cost dls")
table3edited
## Type of event Sum of of total cost dls
## 1 flood 1.682e+11
## 2 hurricane/typhoon 8.526e+10
## 3 tornado 5.860e+10
## 4 storm surge 4.332e+10
## 5 wind storm 1.635e+10
## 6 hail 1.598e+10
## 7 tropical storm 7.704e+09
## 8 winter storm 6.717e+09
## 9 wildfire 4.765e+09
## 10 storm surge/tide 4.641e+09
The table 4 shows the crop dammage that can be implicated in costs
table4edited <- table4
names(table4edited) <- c("Type of event", "Sum of of total crop cost dls")
table4edited
## Type of event Sum of of total crop cost dls
## 1 drought 1.397e+10
## 2 flood 1.227e+10
## 3 hurricane/typhoon 5.506e+09
## 4 ice storm 5.022e+09
## 5 hail 3.047e+09
## 6 wind storm 2.157e+09
## 7 cold 1.409e+09
## 8 frost/freeze 1.094e+09
## 9 heat 9.045e+08
## 10 heavy rain 7.334e+08
Figure 2 shows in a graphic the results described in the previous tables.
par(mar= c(5.1,9,4.1,2.1))
par(mfcol=c(1,2))
grap3names <- as.factor(table3[1:10,1])
barplot(table3$sumtotalcost, names.arg= grap3names, las=2, horiz=T, col= rainbow(10), main= "A.Costs properties dls")
grap4names <- as.factor(table4[1:10,1])
barplot(table4$sumtotalcost, names.arg= grap4names, las=2, horiz=T, col= rainbow(10), main= "B.Costs crop dls")
Figure 2 This figure shows in the panel A costs related with properties and the panel B the costs related with Crop
These results implies the Top10 reasons to prevent health and economic dammage.