Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This study is targeted at following two questions:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

This study aggregates the fatalities and injuries to identify the events that cause the most harm to population health. Similarly, property damages and crop damages are aggregated to identify the events that have the greatest economic consequences.

Loading and Processing the Raw Data:

The input data is taken from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Reading the raw data:

We first read the data using read.csv. For efficiency purpose, only needed columns are read and other columns are ignored. Columns extracted: EVTYPE, FATALITIES, INJURIES, PROPDMG, CROPDMG

data<-read.csv("repdata%2Fdata%2FStormData.csv.bz2",colClasses = c("NULL","NULL","NULL","NULL","NULL","NULL","NULL",NA,"NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL",NA,NA,NA,"NULL",NA,"NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL"
))

Data Cleaning:

Some events (EVTYPE) are in lower case while others are in upper case for different records. To clean the events data, all the events are converted to uppercase.

data$EVTYPE<-toupper(data$EVTYPE)

Data Processing :

Health Concerning data

  1. FATALITIES aggregated for all events using sum function in tapply.
  2. INJURIES aggregated for all events using sum function in tapply.
  3. Create a data frame for health_concerning_data containing events, FATALITIES and INJURIES.
  4. Add a column total_affected as sum of FATALITIES and INJURIES.
  5. Sort the data frame by total_affected and extract top 10 events.
fatalities_data<-tapply(data$FATALITIES,data$EVTYPE,sum)
injury_data<-tapply(data$INJURIES,data$EVTYPE,sum)
health_concerning_data<-data.frame(event=names(fatalities_data),fatalities=fatalities_data,injured=injury_data)
health_concerning_data$total_affected<-health_concerning_data$fatalities+health_concerning_data$injured
health_concerning_data_sorted<-arrange(health_concerning_data,desc(total_affected))
major_health_concerning_data<-head(health_concerning_data_sorted,10)

Economy Concerning data

  1. PROPDMG aggregated for all events using sum function in tapply.
  2. CROPDMG aggregated for all events using sum function in tapply.
  3. Create a data frame for economy_concerning_data containing events, PROPDMG and CROPDMG.
  4. Add a column total_damages as sum of PROPDMG and CROPDMG.
  5. Sort the data frame by total_damages and extract top 10 events.
prop_damage_data<-tapply(data$PROPDMG,data$EVTYPE,sum)
crop_damage_data<-tapply(data$CROPDMG,data$EVTYPE,sum)
economy_concerning_data<-data.frame(event=names(prop_damage_data),prop_damages=prop_damage_data,crop_damages=crop_damage_data)
economy_concerning_data$total_damages<-economy_concerning_data$prop_damages+economy_concerning_data$crop_damages
economy_concerning_data_sorted<-arrange(economy_concerning_data,desc(total_damages))
major_economy_concerning_data<-head(economy_concerning_data_sorted,10)

Results :

1. Across the United States, which types of events are most harmful with respect to population health?

These are the top 10 events that cause most harm with respect to population health:

print(xtable(major_health_concerning_data), type="html",html.table.attributes="border='1' width=80%")
event fatalities injured total_affected
1 TORNADO 5633.00 91346.00 96979.00
2 EXCESSIVE HEAT 1903.00 6525.00 8428.00
3 TSTM WIND 504.00 6957.00 7461.00
4 FLOOD 470.00 6789.00 7259.00
5 LIGHTNING 816.00 5230.00 6046.00
6 HEAT 937.00 2100.00 3037.00
7 FLASH FLOOD 978.00 1777.00 2755.00
8 ICE STORM 89.00 1975.00 2064.00
9 THUNDERSTORM WIND 133.00 1488.00 1621.00
10 WINTER STORM 206.00 1321.00 1527.00

Plot of events that cause most harm with respect to population health:

qplot(x=event, y=total_affected,data=major_health_concerning_data,col=event,main = "Events causing most harm to population health",ylab="Injuries + fatalities",xlab="Events")+theme(axis.text.x = element_text(angle=60, hjust=1))

2. Across the United States, which types of events have the greatest economic consequences?

These are the top 10 events that have the greatest economic consequences:

print(xtable(major_economy_concerning_data), type="html",html.table.attributes="border='1' width=80%")
event prop_damages crop_damages total_damages
1 TORNADO 3212258.16 100018.52 3312276.68
2 FLASH FLOOD 1420124.59 179200.46 1599325.05
3 TSTM WIND 1335995.61 109202.60 1445198.21
4 HAIL 688693.38 579596.28 1268289.66
5 FLOOD 899938.48 168037.88 1067976.36
6 THUNDERSTORM WIND 876844.17 66791.45 943635.62
7 LIGHTNING 603351.78 3580.61 606932.39
8 THUNDERSTORM WINDS 446293.18 18684.93 464978.11
9 HIGH WIND 324731.56 17283.21 342014.77
10 WINTER STORM 132720.59 1978.99 134699.58

Plot of events that have the greatest economic consequences:

qplot(x=event, y=total_damages,data=major_economy_concerning_data,col=event,main="Events causing greatest economic consequences",ylab="Crop & Property damages",xlab="Events")+theme(axis.text.x = element_text(angle=60, hjust=1))