You need to install “reshape2” and “ggplot2” packages to run this code.

Synopsis


In this report, we explore the NOAA Storm Database from 1950-2011.

The objective is to find out the following :
1. Across U.S., which types of events cause the most population health impact;
2. Across U.S., which types of events cause the most economic impact.

It was found that for 1950-1992, only 3 broad categories of events were captured. However, for 1993-2011, all 11 broad categories of events were captured.
For meaningful comparison, only data for 1993-2011 were used for comparison.

Considering the data from 1993-2011, we found that :
1. Personal Health Impact
a. The events that cause the most personal health impact are : tornado, heat, flood and wintry weather.
b. Heat cause the highest fatalities, while tornado causes the highest injurues.

2. Economic Impact
a. The events that cause the most economic impact are : flood, hurricane/storm, tornado, rain and hail.
b. Flood causes the highest property damage, while hail causes the highest crop damage.


Data Processing



1 : Loading the data

We first read in csv file as data frame. The data is delimited and missing value is coded as “?”.

DF <- read.csv(bzfile("repdata_data_StormData.csv.bz2"), na.strings="?")


After reading, we preview the data frame (DF).

dim(DF)
## [1] 902297     37
head(DF)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6


The columns we are interested in are FATALITES, INJURIES, PROPDMG & CROPDMG. So we extract the columns and do a check to see if there are any NA values.

sum(is.na(DF$FATALITIES))
## [1] 0
sum(is.na(DF$INJURIES))
## [1] 0
sum(is.na(DF$PROPDMG))
## [1] 0
sum(is.na(DF$CROPDMG))
## [1] 0


From the above, we see that there are no missing values for the 4 columns that we are concerned with.

2a : Cleaning the data (EVTYPE)

We then clean up the data as follows :

- convert all events to small letters.
- EVTYPE with the words “summary”, “apache county”, “southeast”, “monthly” are ignored.
- EVTYPE is re-classified into 11 broad categories.
- Each of the 11 broad categories includes EVTYPE with the following keys words tag to it :

1. sea/coast - surf, swell, sea, wave, marine, seiche, beach, coastal, coastal flood, dam, tidal flood, storm surge, blow out flood, low tide, high tide, tsunami, rip current, red flag
2. flood - flash, flood, rapidly rising/high water, urban, small stream, drowning
3. hail - hail
4. rain - torrential, thunderstorm, heavy/excessive rain/shower, wet, metro storm, tropical depression
5. storm/hurr - hurricane, typhoon, tropical storm, tstm, floyd
6. lightning - lightning
7. tornado - wall cloud, tornado, water/land spout, funnel
8. wintry - wintry, thundersnow, blizzard, heavy snow, snow, freeze, frost, ice, sleet, freezing rain, glaze, low temperature, cold, cool, wind chill, hypothermia, icy
9. wind - wind, gust, microburst, downburst
10. heat - heat, hot, high/record temperature, warm, hyperthermia, drought, below normal precipitation, dry, driest
11. others - precipitation, heavy mix, severe turbulence, northern lights, record high/low, none, no severe weather, mild pattern, high, excessive, other, dust, fog, avalanche, land slide/slump, volcanic, vog

DF$EVTYPE <- tolower(DF$EVTYPE)

DF <- DF[grepl("summary(.*)", DF$EVTYPE)==F,] 
DF <- DF[grepl("apache county", DF$EVTYPE)==F,] 
DF <- DF[grepl("southeast", DF$EVTYPE)==F,] 
DF <- DF[grepl("monthly(.*)", DF$EVTYPE)==F,] 

DF[grepl("(.*)surf(.*)", DF$EVTYPE)==T,8] <- "sea/coast"
DF[grepl("(.*)swell(.*)", DF$EVTYPE)==T,8] <- "sea/coast"
DF[grepl("(.*)sea(.*)", DF$EVTYPE)==T,8] <- "sea/coast"
DF[grepl("(.*)wave(.*)", DF$EVTYPE)==T,8] <- "sea/coast"
DF[grepl("marine(.*)", DF$EVTYPE)==T,8] <- "sea/coast"
DF[grepl("seiche(.*)", DF$EVTYPE)==T,8] <- "sea/coast"

DF[grepl("beach(.*)", DF$EVTYPE)==T,8] <- "sea/coast"
DF[grepl("coastal(.*)", DF$EVTYPE)==T,8] <- "sea/coast"
DF[grepl("c(.*)st(.*)l(.*)flood(.*)", DF$EVTYPE)==T,8] <- "sea/coast"
DF[grepl("dam(.*)", DF$EVTYPE)==T,8] <- "sea/coast"
DF[grepl("tidal flood(.*)", DF$EVTYPE)==T,8] <- "sea/coast"
DF[grepl("storm surge(.*)", DF$EVTYPE)==T,8] <- "sea/coast"
DF[grepl("blow(.*)out tide(.*)", DF$EVTYPE)==T,8] <- "sea/coast"
DF[grepl("(.*)low tide(.*)", DF$EVTYPE)==T,8] <- "sea/coast"
DF[grepl("(.*)high tide(.*)", DF$EVTYPE)==T,8] <- "sea/coast"
DF[grepl("(.*)tsunami(.*)", DF$EVTYPE)==T,8] <- "sea/coast"
DF[grepl("(.*)rip current(.*)", DF$EVTYPE)==T,8] <- "sea/coast"
DF[grepl("(.*)red flag(.*)", DF$EVTYPE)==T,8] <- "sea/coast"

DF[grepl("(.*)flood(.*)", DF$EVTYPE)==T,8] <- "flood"
DF[grepl("(.*)rapidly rising water(.*)", DF$EVTYPE)==T,8] <- "flood"
DF[grepl("(.*)flash fl(.*)", DF$EVTYPE)==T,8] <- "flood"
DF[grepl("(.*)urban(.*)", DF$EVTYPE)==T,8] <- "flood"
DF[grepl("(.*)sm(.*)stream(.*)", DF$EVTYPE)==T,8] <- "flood"
DF[grepl("(.*)high water(.*)", DF$EVTYPE)==T,8] <- "flood"
DF[grepl("drowning", DF$EVTYPE)==T,8] <- "flood"

DF[grepl("hail(.*)", DF$EVTYPE)==T,8] <- "hail"

DF[grepl("torrential rain(.*)", DF$EVTYPE)==T,8] <- "rain"
DF[grepl("t(.*)u(.*)e(.*)storm(.*)", DF$EVTYPE)==T,8] <- "rain"
DF[grepl("h(.*)vy rain(.*)", DF$EVTYPE)==T,8] <- "rain"
DF[grepl("excessive rain(.*)", DF$EVTYPE)==T,8] <- "rain"
DF[grepl("heavy shower(.*)", DF$EVTYPE)==T,8] <- "rain"
DF[grepl("(.*)wet(.*)", DF$EVTYPE)==T,8] <- "rain"
DF[grepl("metro storm", DF$EVTYPE)==T,8] <- "rain"
DF[grepl("(.*)rain(.*)", DF$EVTYPE)==T,8] <- "rain"
DF[grepl("tropical depression(.*)", DF$EVTYPE)==T,8] <- "rain"

DF[grepl("hurricane(.*)", DF$EVTYPE)==T,8] <- "storm/hurr"
DF[grepl("typhoon(.*)", DF$EVTYPE)==T,8] <- "storm/hurr"
DF[grepl("tropical storm(.*)", DF$EVTYPE)==T,8] <- "storm/hurr"
DF[grepl("tstm(.*)", DF$EVTYPE)==T,8] <- "storm/hurr"
DF[grepl("(.)floyd(.*)", DF$EVTYPE)==T,8] <- "storm/hurr"

DF[grepl("lig(.*)t(.*)ing(.*)", DF$EVTYPE)==T,8] <- "lightning"

DF[grepl("wall cloud(.*)", DF$EVTYPE)==T,8] <- "tornado"
DF[grepl("torn(.*)o(.*)", DF$EVTYPE)==T,8] <- "tornado"
DF[grepl("wa(.*)ter(.*)spout(.*)", DF$EVTYPE)==T,8] <- "tornado"
DF[grepl("(.*)funnel(.*)", DF$EVTYPE)==T,8] <- "tornado"
DF[grepl("land(.*)spout(.*)", DF$EVTYPE)==T,8] <- "tornado"

DF[grepl("wint(.*)r(.*)", DF$EVTYPE)==T,8] <- "wintry"
DF[grepl("thunder(.*)w(.*)", DF$EVTYPE)==T,8] <- "wintry"
DF[grepl("(.*)blizzard(.*)", DF$EVTYPE)==T,8] <- "wintry"
DF[grepl("heavy(.*)snow(.*)", DF$EVTYPE)==T,8] <- "wintry"
DF[grepl("(.*)snow(.*)", DF$EVTYPE)==T,8] <- "wintry"
DF[grepl("(.*)freez(.*)", DF$EVTYPE)==T,8] <- "wintry"
DF[grepl("(.*)frost(.*)", DF$EVTYPE)==T,8] <- "wintry"
DF[grepl("(.*)ice(.*)", DF$EVTYPE)==T,8] <- "wintry"
DF[grepl("(.*)sleet(.*)", DF$EVTYPE)==T,8] <- "wintry"
DF[grepl("(.*)freezing rain(.*)", DF$EVTYPE)==T,8] <- "wintry"
DF[grepl("(.*)glaze(.*)", DF$EVTYPE)==T,8] <- "wintry"
DF[grepl("(.*)icy(.*)", DF$EVTYPE)==T,8] <- "wintry"

DF[grepl("low temperature(.*)", DF$EVTYPE)==T,8] <- "wintry"
DF[grepl("(.*)cold(.*)", DF$EVTYPE)==T,8] <- "wintry"
DF[grepl("(.*)cool(.*)", DF$EVTYPE)==T,8] <- "wintry"
DF[grepl("(.*)wind chil(.*)", DF$EVTYPE)==T,8] <- "wintry"
DF[grepl("hypothermia(.*)", DF$EVTYPE)==T,8] <- "wintry"

DF[grepl("(.*)wind(.*)", DF$EVTYPE)==T,8] <- "wind"
DF[grepl("gust(.*)", DF$EVTYPE)==T,8] <- "wind"
DF[grepl("wnd", DF$EVTYPE)==T,8] <- "wind"
DF[grepl("(.*)mic(.*)oburst", DF$EVTYPE)==T,8] <- "wind"
DF[grepl("(.*)down(.*)burst", DF$EVTYPE)==T,8] <- "wind"

DF[grepl("heat(.*)", DF$EVTYPE)==T,8] <- "heat"
DF[grepl("hot(.*)", DF$EVTYPE)==T,8] <- "heat"
DF[grepl("high temperature(.*)", DF$EVTYPE)==T,8] <- "heat"
DF[grepl("record temperature(.*)", DF$EVTYPE)==T,8] <- "heat"
DF[grepl("temperature record(.*)", DF$EVTYPE)==T,8] <- "heat"
DF[grepl("(.*)warm(.*)", DF$EVTYPE)==T,8] <- "heat"
DF[grepl("hyperthermia(.*)", DF$EVTYPE)==T,8] <- "heat"

DF[grepl("drought(.*)", DF$EVTYPE)==T,8] <- "heat"
DF[grepl("below normal precipitation", DF$EVTYPE)==T,8] <- "heat"
DF[grepl("dry(.*)", DF$EVTYPE)==T,8] <- "heat"
DF[grepl("driest(.*)", DF$EVTYPE)==T,8] <- "heat"

DF[grepl("(.*)fire(.*)", DF$EVTYPE)==T,8] <- "heat"
DF[grepl("(.*)smoke", DF$EVTYPE)==T,8] <- "heat"

DF[grepl("(.*)precip(.*)", DF$EVTYPE)==T,8] <- "others"
DF[grepl("heavy mix", DF$EVTYPE)==T,8] <- "others"
DF[grepl("severe turbulence", DF$EVTYPE)==T,8] <- "others"
DF[grepl("northern lights", DF$EVTYPE)==T,8] <- "others"
DF[grepl("record high", DF$EVTYPE)==T,8] <- "others"
DF[grepl("record low", DF$EVTYPE)==T,8] <- "others"
DF[grepl("none", DF$EVTYPE)==T,8] <- "others"
DF[grepl("no severe weather", DF$EVTYPE)==T,8] <- "others"
DF[grepl("mild pattern", DF$EVTYPE)==T,8] <- "others"
DF[grepl("high", DF$EVTYPE)==T,8] <- "others"
DF[grepl("excessive", DF$EVTYPE)==T,8] <- "others"
DF[grepl("other", DF$EVTYPE)==T,8] <- "others"

DF[grepl("dust(.*)", DF$EVTYPE)==T,8] <- "others"

DF[grepl("fog(.*)", DF$EVTYPE)==T,8] <- "others"

DF[grepl("avalanc(.*)e(.*)", DF$EVTYPE)==T,8] <- "others"
DF[grepl("(.*)slide(.*)", DF$EVTYPE)==T,8] <- "others"
DF[grepl("land(.*)slump(.*)", DF$EVTYPE)==T,8] <- "others"

DF[grepl("volcanic(.*)", DF$EVTYPE)==T,8] <- "others"
DF[grepl("vog(.*)", DF$EVTYPE)==T,8] <- "others"


2b : Cleaning the data (BGN_DATE)

After that, we pick up only the YEAR from the “BGN_DATE” field.

DF$PERIOD <- as.character(DF$BGN_DATE)
DF$PERIOD <- gsub(" 0:00:00","",DF$PERIOD)
DF$PERIOD <- sapply(strsplit(DF$PERIOD, "/"), "[", 3)


Next, we make a contingency table of the YEAR with the types of event (EVTYPE), and print out the contingency table.

YEAR <- table(DF$PERIOD, DF$EVTYPE)
YEAR
##       
##        flood  hail  heat lightning others  rain sea/coast storm/hurr
##   1950     0     0     0         0      0     0         0          0
##   1951     0     0     0         0      0     0         0          0
##   1952     0     0     0         0      0     0         0          0
##   1953     0     0     0         0      0     0         0          0
##   1954     0     0     0         0      0     0         0          0
##   1955     0   360     0         0      0     0         0        421
##   1956     0   401     0         0      0     0         0        735
##   1957     0   479     0         0      0     0         0        775
##   1958     0   706     0         0      0     0         0        899
##   1959     0   531     0         0      0     0         0        652
##   1960     0   581     0         0      0     0         0        719
##   1961     0   722     0         0      0     0         0        752
##   1962     0   886     0         0      0     0         0        830
##   1963     0   652     0         0      0     0         0        823
##   1964     0   679     0         0      0     0         0        909
##   1965     0   805     0         0      0     0         0       1055
##   1966     0   732     0         0      0     0         0       1050
##   1967     0   764     0         0      0     0         0        958
##   1968     0  1068     0         0      0     0         0       1529
##   1969     0   766     0         0      0     0         0       1510
##   1970     0   721     0         0      0     0         0       1794
##   1971     0   964     0         0      0     0         0       1544
##   1972     0   681     0         0      0     0         0        712
##   1973     0  1098     0         0      0     0         0       2166
##   1974     0  1660     0         0      0     0         0       2603
##   1975     0  1374     0         0      0     0         0       2639
##   1976     0  1091     0         0      0     0         0       1742
##   1977     0  1083     0         0      0     0         0       1723
##   1978     0  1024     0         0      0     0         0       1758
##   1979     0  1315     0         0      0     0         0       2046
##   1980     0  1993     0         0      0     0         0       3181
##   1981     0  1494     0         0      0     0         0       2193
##   1982     0  2381     0         0      0     0         0       3570
##   1983     0  2334     0         0      0     0         0       4993
##   1984     0  2749     0         0      0     0         0       3566
##   1985     0  3379     0         0      0     0         0       3827
##   1986     0  3512     0         0      0     0         0       4365
##   1987     0  2416     0         0      0     0         0       4256
##   1988     0  2537     0         0      0     0         0       3947
##   1989     0  3778     0         0      0     0         0       5711
##   1990     0  3618     0         0      0     0         0       6064
##   1991     0  4811     0         0      0     0         0       6503
##   1992     0  5687     0         0      0     0         0       6443
##   1993  1579  4216    30       467     25  3889        52         15
##   1994  1868  6733    57      1010     50  8001        73         54
##   1995  3021  8370   211      1083    123 10731       212        305
##   1996  4551 10855   218       914     85   399       159      10045
##   1997  3984  8801   145       841    142   395       173       9869
##   1998  4933 12730   452       901    101   718       277      13627
##   1999  3397 10236   750       863    144   520       186      10378
##   2000  3560 11372   700       907    152   646       172      12171
##   2001  3850 12389   487       880    206   513       371      11762
##   2002  4167 12689   656       875    161   363      1325      11849
##   2003  4912 13911   424       741    188   938      1653      12036
##   2004  5704 13142   188       705    217   741      1551      11963
##   2005  4325 13788   368       864    221   858      1422      12397
##   2006  3851 16638   709       840    267  1549      1418      13176
##   2007  5494 12711   726       719    346 13877      1253         25
##   2008  6123 17546   623       766    307 17651      1502        181
##   2009  6069 13313   471       721    225 14472      1270         11
##   2010  6700 10922   814       867    340 16947      1517         34
##   2011  7182 17761  1587       801    320 22743      1953        162
##       
##        tornado  wind wintry
##   1950     223     0      0
##   1951     269     0      0
##   1952     272     0      0
##   1953     492     0      0
##   1954     609     0      0
##   1955     632     0      0
##   1956     567     0      0
##   1957     930     0      0
##   1958     608     0      0
##   1959     630     0      0
##   1960     645     0      0
##   1961     772     0      0
##   1962     673     0      0
##   1963     493     0      0
##   1964     760     0      0
##   1965     995     0      0
##   1966     606     0      0
##   1967     966     0      0
##   1968     715     0      0
##   1969     650     0      0
##   1970     700     0      0
##   1971     963     0      0
##   1972     775     0      0
##   1973    1199     0      0
##   1974    1123     0      0
##   1975     962     0      0
##   1976     935     0      0
##   1977     922     0      0
##   1978     875     0      0
##   1979     918     0      0
##   1980     972     0      0
##   1981     830     0      0
##   1982    1181     0      0
##   1983     995     0      0
##   1984    1020     0      0
##   1985     773     0      0
##   1986     849     0      0
##   1987     695     0      0
##   1988     773     0      0
##   1989     921     0      0
##   1990    1264     0      0
##   1991    1208     0      0
##   1992    1404     0      0
##   1993     895   497    942
##   1994    1516   586    681
##   1995    1755   901   1257
##   1996    1693  1230   2046
##   1997    1769   899   1658
##   1998    2147   910   1321
##   1999    2113  1123   1571
##   2000    1748  1123   1909
##   2001    1831   992   1671
##   2002    1480  1022   1694
##   2003    2121   855   1973
##   2004    2571   846   1735
##   2005    1968   912   2061
##   2006    1860  1792   1934
##   2007    1853  1965   4320
##   2008    2483  3069   5412
##   2009    1941  2666   4658
##   2010    2119  2568   5333
##   2011    2921  2547   4197


From the above contingency table, we observe that :
1. From 1950 to 1954, only data for tornado were captured.
2. From 1955 to 1992, only data for hail, rain/storm/hurricane and tornado were captured.
3. From 1993 to 2011, data for all 11 broad categories of events were captured.

For meaningful comparison of all types of events, we choose to use only data for 1993-2011.

2c : Further Cleaning of Data (subset for Year 1993-2011)

So then, we categorize the data into “1950-1992” and “1993-2011”. Following that, we subset the data for 1993-2011 only.

DF$PERIOD <- gsub("19[5-8].","1950",DF$PERIOD)
DF$PERIOD <- gsub("199[0-2]","1950",DF$PERIOD)
DF$PERIOD <- gsub("199[3-9]","1993",DF$PERIOD)
DF$PERIOD <- gsub("20.[0-9]","1993",DF$PERIOD)

DF$PERIOD <- gsub("1950","1950-1992",DF$PERIOD)
DF$PERIOD <- gsub("1993","1993-2011",DF$PERIOD)

DF <- DF[DF$PERIOD=="1993-2011",]


Result



From this point on, only data from 1993-2011 were considered.

Analysis is for across the United States


For Personal Health Impact, it was interpreted as being contributed by the columns FATALITIES and INJURIES.
For Economic Impact, it was interpreted as being contributed by the columns PROPDMG and CROPDMG.

Re-shaping the data

We then re-shape the data by finding the sum of each of the columns : FATALITIES, INJURIES, PROPDMG, CROPDMG, segregated by events.

sumDF <- DF[DF$PERIOD == "1993-2011",]
    
sumDF <- as.data.frame(table(DF$EVTYPE))
sumDF[,2] <- data.frame(tapply(DF$FATALITIES,DF$EVTYPE,sum))
sumDF[,3] <- data.frame(tapply(DF$INJURIES,DF$EVTYPE,sum))
sumDF[,4] <- data.frame(tapply(DF$PROPDMG,DF$EVTYPE,sum))
sumDF[,5] <- data.frame(tapply(DF$CROPDMG,DF$EVTYPE,sum))
sumDF[,6] <- data.frame("1993-2011")
        
colnames(sumDF) <- c("EVTYPE","FATALITIES","INJURIES","PROPERTY","CROP","PERIOD")


Then we print the table of the sum of FATALITIES, INJURIES, PROPDMG, CROPDMG by each event.

sumDF
##        EVTYPE FATALITIES INJURIES PROPERTY   CROP    PERIOD
## 1       flood       1552     8673  2444573 367245 1993-2011
## 2        hail         40     1066   699166 585957 1993-2011
## 3        heat       3048    10444   131181  44632 1993-2011
## 4   lightning        817     5231   603397   3581 1993-2011
## 5      others        375     1815    47851   2673 1993-2011
## 6        rain        313     2757  1386979  98286 1993-2011
## 7   sea/coast       1098     1478    61835   1862 1993-2011
## 8  storm/hurr        443     5350  1411757 127306 1993-2011
## 9     tornado       1627    23403  1401014 100027 1993-2011
## 10       wind        445     1874   452839  21714 1993-2011
## 11     wintry       1107     6674   419397  24547 1993-2011


Melting the data to facilitate plotting

We then use the “reshape” library to melt the data frame into :

1. sumDF1 : PERIOD, EVTYPE, HEALTH, NUM_LIVES
2. sumDF2 : PERIOD, EVTYPE, ECONOMIC, AMOUNT_DMG

sumDF1 - sums the number of lives affected by Personal HEALTH, taking into consideration FATALITIES & INJURIES.
sumDF2 - sums the amount of ECONOMIC damages, taking into consideration PROPDMG (property damage) & CROPDMG (crop damage).

The purpose is to allow facetting in ggplot.

library(reshape2)
sumDF1 <- melt(sumDF, id.vars=c("PERIOD","EVTYPE"), measure.vars=c("FATALITIES","INJURIES"), variable.name="HEALTH", value.name = "NUM_LIVES")
sumDF2 <- melt(sumDF, id.vars=c("PERIOD","EVTYPE"), measure.vars=c("PROPERTY","CROP"), variable.name="ECONOMIC", value.name = "AMOUNT_DMG")


Plotting the Impact on Personal Health

Finally, we use ggplot to get the barplots.

We first look at the Impact on Personal Health - by plotting Types of Events against Number of Lives Affected, facetted by HEALTH (FATALITIES/INJURIES)

library(ggplot2)
    h <- ggplot(sumDF1,aes(x = reorder(EVTYPE, -NUM_LIVES), y = NUM_LIVES))
    h <- h + geom_bar(stat="identity") + facet_grid(HEALTH~., margin = TRUE)
    h <- h + labs( y = "Number of Lives Affected") 
    h <- h + labs( x = "Event")      
    h <- h + ggtitle(expression(atop("Impact on Population Health", atop("Across United States from 1993-2011",""))))
    h <- h + theme(axis.text.x = element_text(angle=+90, hjust=0, vjust=1))     
    print(h)

plot of chunk unnamed-chunk-11

Observations from the Plot on Personal Health Impact

  1. The events that cause the most personal health impact are : tornado, heat and flood.
  2. Heat causes the highest fatalities, while tornado causes the highest injurues.



Plotting the Economic Impact

Next, we look at the Economic Consequences - by plotting Types of Events against Amount of Damage, facetted by ECONOMIC (PROPDMG/CROPDMG)

  e <- ggplot(sumDF2,aes(x = reorder(EVTYPE, -AMOUNT_DMG), y = AMOUNT_DMG))
    e <- e + geom_bar(stat="identity") + facet_grid(ECONOMIC~., margin = TRUE)
    e <- e + labs( y = "Amount of Damage") 
    e <- e + labs( x = "Event")      
    e <- e + ggtitle(expression(atop("Economic Consequences", atop("Across United States from 1993-2011",""))))
    e <- e + theme(axis.text.x = element_text(angle=+90, hjust=0, vjust=1))     
    print(e)

plot of chunk unnamed-chunk-12

Observations from the Plot on Economic Impact

  1. The events that cause the most economic impact are : flood, hurricane/storm, tornado, rain and hail.
  2. Flood causes the highest property damage, while hail causes the highest crop damage.

End of Report