The storm data provided by NOAA was analyzed to determine its health and human consequences. This data is categorized into different types such as “Tornado”, “Hail” etc. The fatalities and injuries were combined into a composite index by weighing the latter. The same thing was done to find the total damage caused by the events. It was found that Tornadoes were the most dangerous type of events causing most negative health as well as economic consequences. It was also observed that Hail specifically caused the most Crop damages although Tornadoes caused higher total amount of economic damages.
Reading in the data and exploring it
setwd("F:/RR/Proj2")
dat <-read.csv("repdata-data-StormData.csv")
str(dat)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436774 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
Aggregating FATALITIES and INJURIES to answer the relevant questions
pop_health_r <-aggregate (cbind(FATALITIES, INJURIES)~ EVTYPE, data = dat,sum)
removing rows where both fatalities and injuries are 0
pop_health <- pop_health_r[pop_health_r$FATALITIES != 0 | pop_health_r$INJURIES != 0, ]
Create a composite pop_health_effect measure by adding Fatalities and injuries (weighted by 0.25)
pop_health$pop_health_effect = pop_health$FATALITIES + 0.25*pop_health$INJURIES
Sort by decreasing measure
pop_health <- pop_health[order(pop_health$pop_health_effect, decreasing = TRUE),]
Subset 10 most significant EVTypes as decided by measure
pop_health_sig <-pop_health[1:10,]
Aggregating Property and Crop damage by EVTYPE
damage_r <- aggregate(cbind(PROPDMG, CROPDMG) ~ EVTYPE, data = dat, sum)
removing rows where both damages are 0
damage <- damage_r[damage_r$PROPDMG != 0 | damage_r$CROPDMG != 0, ]
Create a combined damage measure by summing up property and crop damage
damage$Total_dmg = damage$PROPDMG + damage$CROPDMG
Sort by decreasing value of damages
damage <- damage[order(damage$Total_dmg, decreasing = TRUE),]
Subset 10 most significant EVTypes as decided by damage
damage_sig <- damage[1:10,]
par(mfrow = c(1,3))
fatalities <- pop_health_sig$FATALITIES
names(fatalities)<- pop_health_sig$EVTYPE
barplot(fatalities, col = 'red', main ="FATALITIES")
injuries <- pop_health_sig$INJURIES
names(injuries)<- pop_health_sig$EVTYPE
barplot(injuries, col = 'green', main = "INJURIES")
health_index <- pop_health_sig$pop_health_effect
names(health_index)<- pop_health_sig$EVTYPE
barplot(health_index, col = 'blue', main = "HEALTH INDEX")
It can be clearly seen from above that Tornadoes cause the most population health negative effects amongst the categories selected.
par(mfrow = c(1,3))
prop_dmg <- damage_sig$PROPDMG
names(prop_dmg)<- damage_sig$EVTYPE
barplot(prop_dmg, col = 'red', main ="Property Damage")
crop_dmg <- damage_sig$CROPDMG
names(crop_dmg)<- damage_sig$EVTYPE
barplot(crop_dmg, col = 'green', main = "Crop Damage")
total_dmg <- damage_sig$Total_dmg
names(total_dmg)<- damage_sig$EVTYPE
barplot(total_dmg, col = 'blue', main = "Total Damage")
Except for most crop damage (hail), the overall highest economic damages are also caused by tornadoes
For the data it was suggested to analyze the effects per different EVTYPE categories that were given. No comment was made on the quality of the data and whether any cleaning was needed. It was observed that some category types are probably same but classified slightly differently due to various reasons (typos, different spellings etc.). A future analysis should involve more work on ensuring that the categories are consolidated well.