Synopsis:

The storm data provided by NOAA was analyzed to determine its health and human consequences. This data is categorized into different types such as “Tornado”, “Hail” etc. The fatalities and injuries were combined into a composite index by weighing the latter. The same thing was done to find the total damage caused by the events. It was found that Tornadoes were the most dangerous type of events causing most negative health as well as economic consequences. It was also observed that Hail specifically caused the most Crop damages although Tornadoes caused higher total amount of economic damages.

DATA PROCESSING

Reading in the data and exploring it

setwd("F:/RR/Proj2")
dat <-read.csv("repdata-data-StormData.csv")
str(dat)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436774 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Population health processing and anaysis

Aggregating FATALITIES and INJURIES to answer the relevant questions

pop_health_r <-aggregate (cbind(FATALITIES, INJURIES)~ EVTYPE, data = dat,sum)

removing rows where both fatalities and injuries are 0

pop_health <- pop_health_r[pop_health_r$FATALITIES != 0 | pop_health_r$INJURIES != 0, ]

Create a composite pop_health_effect measure by adding Fatalities and injuries (weighted by 0.25)

pop_health$pop_health_effect = pop_health$FATALITIES + 0.25*pop_health$INJURIES

Sort by decreasing measure

pop_health <- pop_health[order(pop_health$pop_health_effect, decreasing = TRUE),]

Subset 10 most significant EVTypes as decided by measure

pop_health_sig <-pop_health[1:10,]

Economic damage processing and analysis

Aggregating Property and Crop damage by EVTYPE

damage_r <- aggregate(cbind(PROPDMG, CROPDMG) ~ EVTYPE, data = dat, sum)

removing rows where both damages are 0

damage <- damage_r[damage_r$PROPDMG != 0 | damage_r$CROPDMG != 0, ]

Create a combined damage measure by summing up property and crop damage

damage$Total_dmg = damage$PROPDMG + damage$CROPDMG

Sort by decreasing value of damages

damage <- damage[order(damage$Total_dmg, decreasing = TRUE),]

Subset 10 most significant EVTypes as decided by damage

damage_sig <- damage[1:10,]

RESULTS

Population health consequences

par(mfrow = c(1,3))

fatalities <- pop_health_sig$FATALITIES
names(fatalities)<- pop_health_sig$EVTYPE
barplot(fatalities, col = 'red', main ="FATALITIES")

injuries <- pop_health_sig$INJURIES
names(injuries)<- pop_health_sig$EVTYPE
barplot(injuries, col = 'green', main = "INJURIES")

health_index <- pop_health_sig$pop_health_effect
names(health_index)<- pop_health_sig$EVTYPE
barplot(health_index, col = 'blue', main = "HEALTH INDEX")

It can be clearly seen from above that Tornadoes cause the most population health negative effects amongst the categories selected.

Economic consequences

par(mfrow = c(1,3))

prop_dmg <- damage_sig$PROPDMG
names(prop_dmg)<- damage_sig$EVTYPE
barplot(prop_dmg, col = 'red', main ="Property Damage")

crop_dmg <- damage_sig$CROPDMG
names(crop_dmg)<- damage_sig$EVTYPE
barplot(crop_dmg, col = 'green', main = "Crop Damage")

total_dmg <- damage_sig$Total_dmg
names(total_dmg)<- damage_sig$EVTYPE
barplot(total_dmg, col = 'blue', main = "Total Damage")

Except for most crop damage (hail), the overall highest economic damages are also caused by tornadoes

CAVEAT

For the data it was suggested to analyze the effects per different EVTYPE categories that were given. No comment was made on the quality of the data and whether any cleaning was needed. It was observed that some category types are probably same but classified slightly differently due to various reasons (typos, different spellings etc.). A future analysis should involve more work on ensuring that the categories are consolidated well.