Analysis of population health and economics damages of USA NOAA Storm Database

This work has been created for the Peer Assessment 2 at the Reproducible Research course in Coursera (Jul 2014). In this assigment we have to address questions about both public health and economic problems caused by severe weather events exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The data analysis must address the following basic questions:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Synopsis

To solve the questions formulated we will consider the general considerations that summaryze the analysis development:

  1. The USA NOAA Storm Database contains data from 1950 to 2011, but in the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. Despite that, in the cleaning and transformation data proccess, we will treat all the data by applying some modifications that corrects this situation and filters the unappropiatte data. This processes will be commented in detail in the Getting and cleaning data section.
  2. There are 985 different types of events in the USA NOAA Storm Database which implies that a event type classification process must to be introduced in order to reduce the number of categories used in the analysis. This categorization process leads to a classification with 13 event types, that can be summarized in the following list:

    1. TORNADO : All event types that includes the word TORNADO. This includes terms like : TORNADO F0, TORNADOS.
    2. HURRICANE : All event types that includes the words HURRICANE and TYPHOON. This includes terms like : HURRICANE/TYPHOON, HURRICANE OPAL.
    3. WIND : All event types that includes the word WIND. This includes terms like : WINDS, HIGH WINDS 82, HEAVY SNOW/HIGH WIND, HIGH WIND DAMAGE, WINTER STORM/HIGH WINDS, TRONG WIND, HIGH WINDS HEAVY RAINS, HIGH WINDS DUST STORM, TSTM WIND 51.
    4. FIRE : All event types that includes the word FIRE. This includes terms like : WILDFIRE, WILD/FOREST FIRE, WILD FIRES, GRASS FIRES.
    5. STORM : All event types that includes the words STORM, GLAZE, HAIL, WETNESS, LIGHTNING, RAIN and BLIZZARD. This includes terms like : **LIGHTING, HAIL, WINTER STORM, LIGHTNING/HEAVY RAIN, BLIZZARD WEATHER, BLIZZARD/FREEZING RAIN, HAIL DAMAGE, STORM SURGE/TIDE.
    6. COLD : All event types that includes the words COLD, LOW TEMPERATURE, WINTRY and FREEZE. This includes terms like : COLD/WIND CHILL, PROLONG COLD, SEVERE COLD, COOL AND WET, RECORD COLD, COLD TEMPERATURES .
    7. SNOW : All event types that includes the word SNOW. This includes terms like : HEAVY SNOWPACK, ICE/SNOW, SNOWSTORM, ICE STORM AND SNOW, HEAVY SNOW/SLEET, WET SNOW, SNOW ADVISORY.
    8. FLOOD : All event types that includes the words FLOOD and STREAM. This includes terms like : FLOODS, RURAL FLOOD, FLASH FLOODS, URBAN/SMALL STREAM FLOODING, HEAVY STREAM FLOODING, URBAN FLOODING, LAKESHORE FLOOD, RIVER AND STREAM FLOOD.
    9. HEAT : All event types that includes the word HEAT and HOT. This includes terms like : EXCESSIVE HEAT, SEVERE HEAT, HEAT WAVE, HOT WEATHER.
    10. SURF : All event types that includes the words SURF, SEAS, MARINE, CURRENT and TSUNAMI. This includes terms like : HIGH SURF, TSUNAMI,
    11. FOG : All event types that includes the word FOG. This includes terms like : FREEZING FOG, PATCHY DENSE FOG.
    12. DRY : All event types that includes the words DRY and DROUGHT.
    13. AVALANCHE : All event types that includes the words LANDSLIDE, LAND and AVALANCHE.
    14. ICE : All event types that includes the word ICE, ICY and FROST. This includes terms like : ICE FLOES, FROST\FREEZE, ICE JAM, FIRST FROST.

    Note that the order of prevalence in the substitution process in given by the previous list, so an event type called HURRICANE OPAL/HIGH WINDS will be included in the HURRICANE class instead of the WIND class that comes after in the list. We also have to emphazise that before to introduce the classification process we make all the event types to be upper case strings, so we can apply the substitution in an robustness way. More details about this process will be showed in the Data Processing setion.

  3. The database contains 90.2297 registers, but we will consider as non complete any register that has a zero value at each specific analysis variable (injuries/facilitites/damages). Therefore in order to determine how the cleaning data process captures the information we will create for each variable a ratio of information captured and a ratio of registers captured in order to give the user a general view of how much information will be used in the exploration analysis.

  4. As the interest of the client is to see the quantities distribution arround the country, we will create several USA maps with the data aggregated to the state capitol locations. This implies the loads of two additional csv files that contains the geografical information to construct this maps.

In the next sections we will proceed to develop the generation of the report.

Data Processing

Loading libraries.

In order to exectute the R code the following libraries must be loaded.

library(ggplot2)
library(gridExtra)
library(maps)
library(xtable)
library(reshape)

Configure data filtering and visualization.

The filter coeficient (noaa.data.filter.alfa) will be used with the mean value obtained for each analysis variable in order to extract the events that will be considered. This proccess extracts all the events with a value of injuries/fatalities/damages over (1-noaa.data.filter.alfa)*mean.

noaa.data.filter.alfa <- 0.95

This parameter will set the number of states that will be shown in the tables data.

noaa.data.states.table.size <- 30

The proposed value of 0.95 achieves very good results at the event type classification, but the user can experiment with other values.

Load USA states capitols location data.

At this section we load the state capitol geographical location from the file 'state_locations.csv and the states poligon geografical data (library maps). We also apply some changes in the capitol location of some states (like Alaska) in order to create nicer map plots.

## Read USA states capitols latitude and longitude.
states.relocations <- c('AK','HI','AS','PR','VI')
states.capitols.locations <- read.csv('state_locations.csv',  header=TRUE)
states.capitols.locations$State <- as.factor(states.capitols.locations$state)
states.capitols.locations[which(states.capitols.locations$state %in% states.relocations),]$longitude <- -120
latitude = 20
for (index in 1:length(states.relocations)) {
  states.capitols.locations$latitude[states.capitols.locations$state == states.relocations[index]] <- latitude
  latitude <- latitude + 2
}

## Read the USA states poligons maps.
states.poligon.data <- map_data("state")

Read USA NOAA Storm Database.

At this point we read the USA NOAA Storm Database by means of a previous unzip operation only if needed.

## Read NOAA Storm Database.
csv.col.classes = c("NULL", "character", rep("NULL", 4), "character", "character", rep("NULL", 14), "numeric", "numeric", "numeric", "character", "numeric", "character", rep("NULL", 3), "numeric", "numeric", rep("NULL", 3), "numeric")
if (file.exists('repdata-data-StormData.csv') == FALSE){
  bunzip2('repdata-data-StormData.csv.bz2') 
}
noaa.data <- read.csv('repdata-data-StormData.csv',  header=TRUE, colClasses=csv.col.classes) # , nrows=1000000
names(noaa.data)[1]<-"DATE"

Before the cleaning process we apply some changes to the loaded data in order to create factor columns to improve the speed of the results.

## Refactor some data columns with the data type.
noaa.data$DATE <- as.Date(noaa.data$DATE, "%m/%d/%Y %H:%M:%S")
noaa.data$STATE<- as.factor(noaa.data$STATE)
noaa.data$EVTYPE <- as.factor(noaa.data$EVTYPE)
noaa.data$PROPDMGEXP<- as.factor(noaa.data$PROPDMGEXP)
noaa.data$CROPDMGEXP<- as.factor(noaa.data$CROPDMGEXP)

We also compute new columns with data related to the month and year of the events (SHORDATE col) that will be used in the temporal data plots, and a event damages column (DAMAGES col) in million dollars that is computed with the assumption that CROPDMG info has prevalence over PROPDMG info, as commented in the documentation of the database.

## Add a short date field tat includes the month and year (it gives better plots than by date)
noaa.data$SHORTDATE <-  as.Date(cut(noaa.data$DATE,  breaks = "month"))

## Add a DAMAGES that contains the total damages related to the event in million dollars. CROPDMG info has prevalence over PROPDMG info. 
noaa.data$PROPDMG_TOTAL <- ifelse(noaa.data$PROPDMGEXP == 'B', 1e3*noaa.data$PROPDMG, ifelse(noaa.data$PROPDMGEXP == 'M', noaa.data$PROPDMG, ifelse(noaa.data$PROPDMGEXP == 'm', noaa.data$PROPDMG, ifelse(noaa.data$PROPDMGEXP == 'K', 1e-3*noaa.data$PROPDMG, ifelse(noaa.data$PROPDMGEXP == 'h', 1e-4*noaa.data$PROPDMG, 0)))) ) 
noaa.data$CROPDMG_TOTAL <- ifelse(noaa.data$CROPDMGEXP == 'B', 1e3*noaa.data$CROPDMG, ifelse(noaa.data$CROPDMGEXP == 'M', noaa.data$CROPDMG, ifelse(noaa.data$CROPDMGEXP == 'm',noaa.data$CROPDMG, ifelse(noaa.data$CROPDMGEXP == 'K', 1e-3*noaa.data$CROPDMG, ifelse(noaa.data$CROPDMGEXP == 'h', 1e-4*noaa.data$CROPDMG, 0)))) ) 
noaa.data$DAMAGES <- ifelse(noaa.data$CROPDMG_TOTAL > 0, noaa.data$CROPDMG_TOTAL, noaa.data$PROPDMG_TOTAL) 

Getting and cleaning data.

The first step in this process is the event type classification, that is, to create the definitive list of event types that will be used in the analysis. To develop this task we will apply the following steps.

1.- Compute the NOAA Storm Database aggregation sum of injuries/fatalities/damages by even type.

## Compute the NOAA Storm Database aggregation sum of injuries/fatalities/damages by even type.
noaa.data.injuries.by.eventype.unfilter <- aggregate(noaa.data$INJURIES,by=list(noaa.data$EVTYPE), sum) 
names(noaa.data.injuries.by.eventype.unfilter)[1]<-"Event"
names(noaa.data.injuries.by.eventype.unfilter)[2]<-"Injuries"
noaa.data.fatalities.by.eventype.unfilter <- aggregate(noaa.data$FATALITIES,by=list(noaa.data$EVTYPE), sum) 
names(noaa.data.fatalities.by.eventype.unfilter)[1]<-"Event"
names(noaa.data.fatalities.by.eventype.unfilter)[2]<-"Fatalities"
noaa.data.damages.by.eventype.unfilter <- aggregate(noaa.data$DAMAGES,by=list(noaa.data$EVTYPE), sum) 
names(noaa.data.damages.by.eventype.unfilter)[1]<-"Event"
names(noaa.data.damages.by.eventype.unfilter)[2]<-"Damages"

2.- Delete the events with zero injuries/fatalities/damages.

## Delete the event types with zero injuries/fatalities/damages.
noaa.data.injuries.by.eventype.unfilter <- noaa.data.injuries.by.eventype.unfilter[noaa.data.injuries.by.eventype.unfilter$Injuries > 0,]
noaa.data.damages.by.eventype.unfilter <- noaa.data.damages.by.eventype.unfilter[noaa.data.damages.by.eventype.unfilter$Damages > 0,]
noaa.data.fatalities.by.eventype.unfilter <- noaa.data.fatalities.by.eventype.unfilter[noaa.data.fatalities.by.eventype.unfilter$Fatalities > 0,]

3.- Compute the mean of the remained events.

## Compute the mean of the remained events.
noaa.data.injuries.by.eventype.unfilter.mean <- mean(noaa.data.injuries.by.eventype.unfilter$Injuries)
noaa.data.fatalities.by.eventype.unfilter.mean <- mean(noaa.data.fatalities.by.eventype.unfilter$Fatalities)
noaa.data.damages.by.eventype.unfilter.mean <- mean(noaa.data.damages.by.eventype.unfilter$Damages)

4.- Apply the filter coeficient, so we extract data with values over (1-alfa)*mean.

## Apply the filter coeficient noaa.data.filter.alfa and the mean to extract the evetns that will be considered.
noaa.data.injuries.by.eventype.unfilter <- noaa.data.injuries.by.eventype.unfilter[noaa.data.injuries.by.eventype.unfilter$Injuries > (1-noaa.data.filter.alfa)*noaa.data.injuries.by.eventype.unfilter.mean ,]
noaa.data.fatalities.by.eventype.unfilter <- noaa.data.fatalities.by.eventype.unfilter[noaa.data.fatalities.by.eventype.unfilter$Fatalities > (1-noaa.data.filter.alfa)*noaa.data.fatalities.by.eventype.unfilter.mean,]
noaa.data.damages.by.eventype.unfilter <- noaa.data.damages.by.eventype.unfilter[noaa.data.damages.by.eventype.unfilter$Damages > (1-noaa.data.filter.alfa)*noaa.data.damages.by.eventype.unfilter.mean,]

5.- Calculate the ratios of information captured (division of the total sums and sum of the captured information).

## Calculate the ratios of information captured.
noaa.data.injuries.total <- sum(noaa.data$INJURIES)
noaa.data.injuries.by.eventype.unfilter.total <- sum(noaa.data.injuries.by.eventype.unfilter$Injuries)
noaa.data.injuries.total.ratio <- 100*(noaa.data.injuries.by.eventype.unfilter.total / noaa.data.injuries.total)
noaa.data.fatalities.total <- sum(noaa.data$FATALITIES)
noaa.data.fatalities.by.eventype.unfilter.total <- sum(noaa.data.fatalities.by.eventype.unfilter$Fatalities)
noaa.data.fatalities.total.ratio <- 100*(noaa.data.fatalities.by.eventype.unfilter.total / noaa.data.fatalities.total)
noaa.data.damages.total <- sum(noaa.data$DAMAGES)
noaa.data.damages.by.eventype.unfilter.total <- sum(noaa.data.damages.by.eventype.unfilter$Damages)
noaa.data.damages.total.ratio <- 100*(noaa.data.damages.by.eventype.unfilter.total / noaa.data.damages.total)

With the filter coefficient noaa.data.filter.alfa =0.95 we have captured:

To create an initial event type list we concacenate the event types lists obtained by injuries, fatalitites and damages in a unique list of events.

## Create the list of events to apply the filter. 
noaa.data.injuries.evtypes <- toupper(as.character(noaa.data.injuries.by.eventype.unfilter$Event))
noaa.data.fatalities.evtypes <- toupper(as.character(noaa.data.fatalities.by.eventype.unfilter$Event))
noaa.data.damages.evtypes <- toupper(as.character(noaa.data.damages.by.eventype.unfilter$Event))
noaa.data.evtypes.unfilter <- unique(c(noaa.data.injuries.evtypes, noaa.data.fatalities.evtypes, noaa.data.damages.evtypes))

Before the creation of the the definitive event type list we apply some filter process using the candidate event type list.

## Filter the data by considering only the list of events obtained. 
noaa.data.injuries <- noaa.data[which(noaa.data$EVTYPE %in% noaa.data.evtypes.unfilter) ,]
noaa.data.fatalities <- noaa.data[which(noaa.data$EVTYPE %in% noaa.data.evtypes.unfilter) ,]
noaa.data.damages <- noaa.data[which(noaa.data$EVTYPE %in% noaa.data.evtypes.unfilter) ,]

Now delete the events with zero injuries/fatalities/damages that we consider uncomplete cases at each specific variable.

## Delete the events with zero injuries/fatalities/damages. 
noaa.data.injuries <- noaa.data.injuries[noaa.data.injuries$INJURIES > 0 ,]
noaa.data.fatalities <- noaa.data.fatalities[noaa.data.fatalities$FATALITIES > 0 ,]
noaa.data.damages <- noaa.data.damages[noaa.data.damages$DAMAGES > 0 ,]

Finally let's compute the ratio of rows that we have captured for the analysis:

## Compute the ratio of the number of rows with information that have been captured.
noaa.data.injuries.total.count <- nrow(noaa.data[noaa.data$INJURIES > 0 ,])
noaa.data.fatalities.total.count <- nrow(noaa.data[noaa.data$FATALITIES > 0 ,])
noaa.data.damages.total.count <- nrow(noaa.data[noaa.data$DAMAGES  > 0 ,])
noaa.data.injuries.count <- nrow(noaa.data.injuries)
noaa.data.fatalities.count <- nrow(noaa.data.fatalities)
noaa.data.damages.count <- nrow(noaa.data.damages)
noaa.data.injuries.ratio.count <- 100*(noaa.data.injuries.count / noaa.data.injuries.total.count )
noaa.data.fatalities.ratio.count <- 100*(noaa.data.fatalities.count / noaa.data.fatalities.total.count)
noaa.data.damages.ratio.count <- 100*(noaa.data.damages.count / noaa.data.damages.total.count)

The ratios of rows captured are:

To obtain the definitive list of event types we count the words that appears in the actual event type list.

## Create the table that count the words presented in the event types.
noaa.data.events.words.frequency <- data.frame(table(do.call(c, lapply(noaa.data.evtypes.unfilter, function(x) unlist(strsplit(gsub("[/,]", " ", x), " "))))))
names(noaa.data.events.words.frequency)[1] <- "Word"
names(noaa.data.events.words.frequency)[2] <- "Frequency"
noaa.data.events.words.frequency.candidates <- noaa.data.events.words.frequency[with(noaa.data.events.words.frequency, order(-Frequency)), ][1:50,]

The first 50 word candidates by frequency are:

Word Frequency
89 WIND 12
25 FLOOD 11
7 COLD 8
36 HEAT 7
37 HEAVY 7
62 STORM 7
23 FLASH 5
38 HIGH 5
39 HURRICANE 5
66 SURF 5
83 WEATHER 5
92 WINTER 5
20 EXTREME 4
26 FLOODING 4
47 MARINE 4
51 RAIN 4
61 SNOW 4
69 THUNDERSTORM 4
74 TSTM 4
91 WINDS 4
2 AND 3
19 EXCESSIVE 3
29 FREEZE 3
31 FROST 3
34 HAIL 3
49 MIX 3
53 RECORD 3
54 RIP 3
65 STRONG 3
1 2
5 CHILL 2
6 COASTAL 2
10 CURRENTS 2
27 FOG 2
40 ICE 2
58 SEAS 2
59 SEVERE 2
67 SURGE 2
71 TORNADO 2
73 TROPICAL 2
76 TYPHOON 2
77 UNSEASONABLY 2
79 WARM 2
86 WILD 2
3 AVALANCHE 1
4 BLIZZARD 1
8 CONDITIONS 1
9 CURRENT 1
11 DAMAGING 1
12 DENSE 1

After analysing this table of frequency of words (out of the scope of this work) we have consider to create the substitution list presented in the Synopsis by means of the use of the following event type substituion information.

## Create the list of substitutions for the event type classification.
noaa.data.fields.to.substitute <- c('(.*)TORNADO(.*)', '(.*)HURRICANE(.*)', '(.*)TYPHOON(.*)', '(.*)WIND(.*)', '(.*)FIRE(.*)', '(.*)STORM(.*)', '(.*)GLAZE(.*)', '(.*)HAIL(.*)','(.*)WETNESS(.*)', '(.*)LIGHTNING(.*)', '(.*)RAIN(.*)', '(.*)BLIZZARD(.*)', '(.*)COLD(.*)', '(.*)LOW TEMPERATURE(.*)', '(.*)WINTRY(.*)', '(.*)WINTER(.*)', '(.*)FREEZE(.*)', '(.*)SNOW(.*)', '(.*)FLOOD(.*)', '(.*)STREAM(.*)', '(.*)HEAT(.*)', '(.*)HOT(.*)', '(.*)SURF(.*)', '(.*)SEAS(.*)', '(.*)MARINE(.*)', '(.*)CURRENT(.*)', '(.*)TSUNAMI(.*)', '(.*)FOG(.*)', '(.*)DRY(.*)', '(.*)DROUGHT(.*)', '(.*)LANDSLIDE(.*)', '(.*)AVALANCHE(.*)', '(.*)LAND(.*)', '(.*)ICE(.*)', '(.*)ICY(.*)', '(.*)FROST(.*)')
noaa.data.fields.substitutes <- c('TORNADO','HURRICANE','HURRICANE', 'WIND', 'FIRE', 'STORM', 'STORM', 'STORM', 'STORM', 'STORM', 'STORM', 'STORM', 'COLD', 'COLD', 'COLD', 'COLD', 'COLD', 'SNOW', 'FLOOD', 'FLOOD', 'HEAT', 'HEAT', 'SURF', 'SURF', 'SURF', 'SURF', 'SURF', 'FOG', 'DRY', 'DRY', 'AVALANCHE', 'AVALANCHE', 'AVALANCHE', 'ICE', 'ICE', 'ICE')
noaa.data.fields.substitutions <- data.frame(cbind(noaa.data.fields.to.substitute, noaa.data.fields.substitutes))
names(noaa.data.fields.substitutions)[1]<-"substitution"
names(noaa.data.fields.substitutions)[2]<-"substitute"
noaa.data.fields.substitutions.len <- nrow(noaa.data.fields.substitutions)

## Create the unique (no repeats) definitive list of event types.
noaa.data.evtypes <- unique(noaa.data.fields.substitutes)

At this point we apply the substitution list to generte the final data and finally we add some factorization to the data frames to improve the speed:

## Apply the substitution list.
for (row in 1:noaa.data.fields.substitutions.len) {
  data.substitution <- noaa.data.fields.substitutions[row,]
  substitution <- as.character(data.substitution[,1])
  substitute <- as.character(data.substitution[,2])
  noaa.data.injuries$EVTYPE <- gsub(substitution, substitute, noaa.data.injuries$EVTYPE)
  noaa.data.fatalities$EVTYPE <- gsub(substitution, substitute, noaa.data.fatalities$EVTYPE)
  noaa.data.damages$EVTYPE <- gsub(substitution, substitute, noaa.data.damages$EVTYPE)
}

Finally let's factorize the event type column.

## Add factors.
noaa.data.injuries$EVTYPE <- as.factor(noaa.data.injuries$EVTYPE)
noaa.data.fatalities$EVTYPE <- as.factor(noaa.data.fatalities$EVTYPE)
noaa.data.damages$EVTYPE <- as.factor(noaa.data.damages$EVTYPE)

Data computations.

Now that we have filter the data we execute the code that provides the information needed to deploy the answers to the proposed questions. This task is develop in the folloing steps:

1.- First we compute the aggregation sum of injuries/fatalities/damages by even type and the ratio of injuries versus fatalities, which give us qualitative information about how dangerous are the event types.

## Compute the NOAA Storm Database aggregation sum of injuries/fatalities/damages by even type.
noaa.data.injuries.by.eventype <- aggregate(noaa.data.injuries$INJURIES,by=list(noaa.data.injuries$EVTYPE), sum) 
names(noaa.data.injuries.by.eventype)[1]<-"Event"
names(noaa.data.injuries.by.eventype)[2]<-"Injuries"
noaa.data.fatalities.by.eventype <- aggregate(noaa.data.fatalities$FATALITIES,by=list(noaa.data.fatalities$EVTYPE), sum) 
names(noaa.data.fatalities.by.eventype)[1]<-"Event"
names(noaa.data.fatalities.by.eventype)[2]<-"Fatalities"
noaa.data.damages.by.eventype <- aggregate(noaa.data.damages$DAMAGES,by=list(noaa.data.damages$EVTYPE), sum) 
names(noaa.data.damages.by.eventype)[1]<-"Event"
names(noaa.data.damages.by.eventype)[2]<-"Damages"

## Compute the ratios of dangerous per evnt type.
noaa.data.event.dangerous <- merge(noaa.data.fatalities.by.eventype, noaa.data.injuries.by.eventype, by.x="Event", by.y="Event")
noaa.data.event.dangerous$ratio <- noaa.data.event.dangerous$Fatalities / noaa.data.event.dangerous$Injuries

2.- The next step is to compute the aggregation sum of injuries/fatalities/damages by even type and state.

## Compute the NOAA Storm Database aggregation sum of injuries/fatalities/damages by even type and state.
noaa.data.injuries.by.eventype.state <- aggregate(noaa.data.injuries$INJURIES,by=list(noaa.data.injuries$STATE,noaa.data.injuries$EVTYPE),sum)
names(noaa.data.injuries.by.eventype.state)[1]<-"State"
names(noaa.data.injuries.by.eventype.state)[2]<-"Event"
names(noaa.data.injuries.by.eventype.state)[3]<-"Injuries"
noaa.data.fatalities.by.eventype.state <- aggregate(noaa.data.fatalities$FATALITIES,by=list(noaa.data.fatalities$STATE,noaa.data.fatalities$EVTYPE),sum)
names(noaa.data.fatalities.by.eventype.state)[1]<-"State"
names(noaa.data.fatalities.by.eventype.state)[2]<-"Event"
names(noaa.data.fatalities.by.eventype.state)[3]<-"Fatalities"
noaa.data.damages.by.eventype.state <- aggregate(noaa.data.damages$DAMAGES,by=list(noaa.data.damages$STATE,noaa.data.damages$EVTYPE),sum)
names(noaa.data.damages.by.eventype.state)[1]<-"State"
names(noaa.data.damages.by.eventype.state)[2]<-"Event"
names(noaa.data.damages.by.eventype.state)[3]<-"Damages"

3.- Now we obtain the aggregation sum of injuries/fatalities/damages by state.

## Compute the NOAA Storm Database aggregation sum of injuries/fatalities/damages by even type and state.
noaa.data.injuries.by.state <- aggregate(noaa.data.injuries$INJURIES,by=list(noaa.data.injuries$STATE),sum)
names(noaa.data.injuries.by.state)[1]<-"State"
names(noaa.data.injuries.by.state)[2]<-"Injuries"
noaa.data.fatalities.by.state <- aggregate(noaa.data.fatalities$FATALITIES,by=list(noaa.data.fatalities$STATE),sum)
names(noaa.data.fatalities.by.state)[1]<-"State"
names(noaa.data.fatalities.by.state)[2]<-"Fatalities"
noaa.data.damages.by.state <- aggregate(noaa.data.damages$DAMAGES,by=list(noaa.data.damages$STATE),sum)
names(noaa.data.damages.by.state)[1]<-"State"
names(noaa.data.damages.by.state)[2]<-"Damages"

## Compute statiscal data values by state.
noaa.data.injuries.by.state.mean <- mean(noaa.data.injuries.by.state$Injuries)
noaa.data.fatalities.by.state.mean <- mean(noaa.data.fatalities.by.state$Fatalities)
noaa.data.damages.by.state.mean <- mean(noaa.data.damages.by.state$Damages)
noaa.data.injuries.by.state.sd <- sd(noaa.data.injuries.by.state$Injuries)
noaa.data.fatalities.by.state.sd <- sd(noaa.data.fatalities.by.state$Fatalities)
noaa.data.damages.by.state.sd <- sd(noaa.data.damages.by.state$Damages)

4.- We bind the information of the geographical location of the state capitols to the data obtained in order to generate the USA map plots.

## Bind the location (lattitude and longitude) of the capitol of the states.
noaa.data.injuries.by.eventype.state <- merge(noaa.data.injuries.by.eventype.state, states.capitols.locations, by.x="State", by.y="State")
noaa.data.fatalities.by.eventype.state <- merge(noaa.data.fatalities.by.eventype.state, states.capitols.locations, by.x="State", by.y="State")
noaa.data.damages.by.eventype.state <- merge(noaa.data.damages.by.eventype.state, states.capitols.locations, by.x="State", by.y="State")

5.- Let's compute the aggregation sum of injuries/fatalities by even type and short date, that is, month and year of the event.

## Compute the NOAA Storm Database injuries fatalities by month and year and compute the sum of aggregation.
noaa.data.injuries.by.date <- aggregate(noaa.data.injuries$INJURIES,by=list(noaa.data.injuries$SHORTDATE, noaa.data.injuries$EVTYPE), sum) 
names(noaa.data.injuries.by.date)[1]<-"Date"
names(noaa.data.injuries.by.date)[2]<-"Event"
names(noaa.data.injuries.by.date)[3]<-"Injuries"
noaa.data.fatalities.by.date <- aggregate(noaa.data.fatalities$FATALITIES,by=list(noaa.data.fatalities$SHORTDATE, noaa.data.fatalities$EVTYPE), sum) 
names(noaa.data.fatalities.by.date)[1]<-"Date"
names(noaa.data.fatalities.by.date)[2]<-"Event"
names(noaa.data.fatalities.by.date)[3]<-"Fatalities"

6.- Finally we create data tables that will be used for print relevant information in the exploratory analysis.

## Compute the tables of the most affected states and the number of injuries/fatalitites/damages per event type.
noaa.data.injuries.by.eventype.state.melted <- melt(noaa.data.injuries.by.eventype.state, id=c("State", "Event"), measure.vars=c("Injuries"))
names(noaa.data.injuries.by.eventype.state.melted)[4] <- "Injuries"
noaa.data.injuries.by.eventype.state.melted <- cast(noaa.data.injuries.by.eventype.state.melted, State ~ Event, fill=FALSE, value = "Injuries")
noaa.data.injuries.by.eventype.state.melted$TOTAL <- rowSums(noaa.data.injuries.by.eventype.state.melted[,2:14])
noaa.data.injuries.by.eventype.state.table <-noaa.data.injuries.by.eventype.state.melted[order(-noaa.data.injuries.by.eventype.state.melted$TOTAL), ][1:noaa.data.states.table.size,]

noaa.data.fatalities.by.eventype.state.melted <- melt(noaa.data.fatalities.by.eventype.state, id=c("State", "Event"), measure.vars=c("Fatalities"))
names(noaa.data.fatalities.by.eventype.state.melted)[4] <- 'Fatalities'
noaa.data.fatalities.by.eventype.state.melted <- cast(noaa.data.fatalities.by.eventype.state.melted, State ~ Event, fill=FALSE, value = "Fatalities")
noaa.data.fatalities.by.eventype.state.melted$TOTAL <- rowSums(noaa.data.fatalities.by.eventype.state.melted[,2:14])
noaa.data.fatalities.by.eventype.state.table <-noaa.data.fatalities.by.eventype.state.melted[order(-noaa.data.fatalities.by.eventype.state.melted$TOTAL), ][1:noaa.data.states.table.size,]

noaa.data.damages.by.eventype.state.melted <- melt(noaa.data.damages.by.eventype.state, id=c("State", "Event"), measure.vars=c("Damages"))
names(noaa.data.damages.by.eventype.state.melted)[4] <- "Damages"
noaa.data.damages.by.eventype.state.melted <- cast(noaa.data.damages.by.eventype.state.melted, State ~ Event, fill=FALSE, value = "Damages")
noaa.data.damages.by.eventype.state.melted$TOTAL <- rowSums(noaa.data.damages.by.eventype.state.melted[,2:14])
noaa.data.damages.by.eventype.state.table <-noaa.data.damages.by.eventype.state.melted[order(-noaa.data.damages.by.eventype.state.melted$TOTAL), ][1:noaa.data.states.table.size,]

Results

In this section we explore the results and expose the solution to the questions that have been proposed.

Across the United States, which types of events are most harmful with respect to population health?

To answer the first question we start our analysis by visualizing the evolution in time and the event type distribution of the population health data, that is, injuries and fatalities. In the next figure we can show these results for injuries and fatilities.

## Plot the time evolution of total Injuries and fatalitites.
g.plot.injuries.evolution <- ggplot(xlab='Date', ylab="Injuries") + scale_colour_discrete(name="Event type")
g.plot.injuries.evolution <- g.plot.injuries.evolution + geom_line(data=noaa.data.injuries.by.date, aes(x=Date, y=Injuries, colour=Event))
g.plot.injuries.evolution <- g.plot.injuries.evolution + guides(fill=FALSE) + ggtitle("Injuries by date and event type.") 
g.plot.injuries.evolution <- g.plot.injuries.evolution  + guides(colour=FALSE, fill=FALSE)
g.plot.fatalities.evolution <- qplot(xlab='Date', ylab='Fatalities')  + ggtitle("Fatalitites. by date and event type.") + scale_colour_discrete(name="Event type")
g.plot.fatalities.evolution <- g.plot.fatalities.evolution + geom_line(data=noaa.data.fatalities.by.date, aes(x=Date, y=Fatalities, colour=Event)) 
g.plot.fatalities.evolution <- g.plot.fatalities.evolution + guides(fill=FALSE)

## Plot the total injuries/fatalities  by event type.
g.plot.injuries.by.event.type <- ggplot(data=noaa.data.injuries.by.eventype , aes(x=Event, y=Injuries, colour=Event, fill=Event))
g.plot.injuries.by.event.type <- g.plot.injuries.by.event.type + geom_bar(stat="identity") 
g.plot.injuries.by.event.type <- g.plot.injuries.by.event.type + xlab("Event type") + ylab("Injuries")  + ggtitle("Total injuries by event type.") 
g.plot.injuries.by.event.type <- g.plot.injuries.by.event.type + labs(fill="Event type") + guides(colour=FALSE, fill=FALSE)
g.plot.fatalities.by.event.type <- ggplot(data=noaa.data.fatalities.by.eventype , aes(x=Event, y=Fatalities, colour=Event, fill=Event))
g.plot.fatalities.by.event.type <- g.plot.fatalities.by.event.type + geom_bar(stat="identity") 
g.plot.fatalities.by.event.type <- g.plot.fatalities.by.event.type + xlab("Event type") + ylab("Fatalities")  + ggtitle("Total fatalities by event type.") 
g.plot.fatalities.by.event.type <- g.plot.fatalities.by.event.type + labs(fill="Event type") + guides(colour=FALSE)

# PLot grid
grid.arrange(g.plot.injuries.evolution, g.plot.fatalities.evolution, g.plot.injuries.by.event.type, g.plot.fatalities.by.event.type, nrow = 2, ncol = 2, main = "Injuries and fatalities summary.")

plot of chunk unnamed-chunk-28 From this plot we can observe several major charsteristics of the data:

  1. The TORNADO event type is the principal cause of injuries and fatalities and it appears distributed over all the time, but ocasionally TORNADO causes large peaks in the time evolution. This suggests that there are some kind of tornados with larger effect than the others, so if the information of types of tornados can be obtained a detailed anlysis of this question could show new information.
  2. Event types WIND and STORM also appears distributed over all the time, but it causes no large peaks, so it accumulates data with less variation that the other event types.
  3. Event types HEAT and FLOOD ocasionally appears and causes a considerable increment in the number of injuries and fatalities but very concentrated in time.
  4. Event types AVALANCHE and SURF has a superior ratio of fatalities versus injuries, which implies that this kind of events are qualitive (not quantitative) more dangerous.
  5. Event type HURRICANE causes less injuries/fatalitites than expected if we think about the nature of the event, but we must also consider the possibility of prediction of this kind of events which help us to save lives.
  6. The difficult to predict AVALANCHE and SURF could also be a good clue to understand why this kind of events have the highest dangerous ratios and how the availability of prediction models have a great importance in order to reduce the impact in the population health.

The ratios (fatalities / injuries) of dangerous for the event types are:

Event ratio
1 AVALANCHE 1.18
11 SURF 0.88
6 HEAT 0.34
2 COLD 0.32
4 FLOOD 0.18
9 SNOW 0.13
13 WIND 0.12
10 STORM 0.12
7 HURRICANE 0.10
5 FOG 0.07
8 ICE 0.07
12 TORNADO 0.06
3 FIRE 0.06

Now that we have a general view of the event types and its time evolution with respect to population health we focus our interest in the USA state distribution of the injuries and the fatalities. The next plot shows this distribution in a USA map.

## Plot the map with the most harmful events by state by injuries.
g.plot.injuries <- ggplot()
g.plot.injuries <- g.plot.injuries + geom_polygon(data=states.poligon.data , aes(x=long, y=lat, group = group), colour="white", fill="#eeeecc" ) 
g.plot.injuries <- g.plot.injuries + geom_point(data=noaa.data.injuries.by.eventype.state, aes(x=longitude, y=latitude, size=Injuries, colour=Event), shape = 1) 
g.plot.injuries <- g.plot.injuries + geom_text(data=noaa.data.injuries.by.eventype.state, hjust=0.5, vjust=-0.5, aes(x=longitude, y=latitude, label=State), colour="#333333", size=4)
g.plot.injuries <- g.plot.injuries + labs(size="Total injuries" , color="Event type") + scale_size(range = c(1, 15)) # + theme(legend.text=element_text(size=4))
g.plot.injuries <- g.plot.injuries + xlab("Longitude") + ylab("Latitude")  + ggtitle("State distribution of total injuries by event type.") 

## Plot the map with the most harmful events by state by fatalities
g.plot.fatalities <- ggplot()
g.plot.fatalities <- g.plot.fatalities + geom_polygon(data=states.poligon.data , aes(x=long, y=lat, group = group), colour="white", fill="#eeeecc" ) 
g.plot.fatalities <- g.plot.fatalities + geom_point(data=noaa.data.fatalities.by.eventype.state, aes(x=longitude, y=latitude, size=Fatalities, colour=Event), shape = 1) 
g.plot.fatalities <- g.plot.fatalities + geom_text(data=noaa.data.fatalities.by.eventype.state, hjust=0.5, vjust=-0.5, aes(x=longitude, y=latitude, label=state), colour="#333333", size=4)
g.plot.fatalities <- g.plot.fatalities + labs(size="Total fatalities" , color="Event type") + scale_size(range = c(1, 15)) # + theme(legend.text=element_text(size=4))
g.plot.fatalities <- g.plot.fatalities + xlab("Longitude") + ylab("Latitude")  + ggtitle("State distribution of total fatalities by event type.") 

# PLot grid
grid.arrange(g.plot.injuries, g.plot.fatalities,  ncol = 2, main = "State distribution of events that are most harmful with respect to population health")

plot of chunk unnamed-chunk-30

From this plot we can conclude that the south and mid east regions obtained with the union of the states TX, LA, OK, AR, MO, IA, MN, WI, IL, TN, AL, GA, MI, KY, OH, FL, PA and NC concentrates the mayor impact events with respect to population health.

The mean as standard deviation of injuries and fatalities by state are:

Injuries

Fatalities

The following table shows the first states by number of injuries.

State AVALANCHE COLD DRY FIRE FLOOD FOG HEAT HURRICANE ICE SNOW STORM SURF TORNADO WIND TOTAL
46 TX 0.00 175.00 0.00 146.00 6926.00 4.00 787.00 17.00 0.00 1.00 584.00 8.00 8207.00 763.00 16855.00
26 MO 0.00 21.00 0.00 5.00 41.00 0.00 4185.00 0.00 0.00 2.00 70.00 0.00 4330.00 329.00 8654.00
2 AL 0.00 0.00 0.00 0.00 15.00 91.00 73.00 0.00 0.00 0.00 177.00 4.00 7929.00 451.00 8289.00
37 OH 0.00 1.00 0.00 0.00 33.00 33.00 112.00 0.00 0.00 32.00 2060.00 0.00 4438.00 399.00 6709.00
27 MS 0.00 0.00 0.00 0.00 9.00 0.00 5.00 105.00 0.00 0.00 58.00 0.00 6244.00 252.00 6421.00
11 FL 0.00 0.00 0.00 105.00 4.00 47.00 11.00 812.00 0.00 0.00 993.00 258.00 3343.00 311.00 5573.00
38 OK 0.00 6.00 4.00 29.00 175.00 0.00 219.00 0.00 0.00 0.00 119.00 0.00 4829.00 329.00 5381.00
3 AR 0.00 1.00 0.00 4.00 42.00 0.00 7.00 0.00 0.00 0.00 135.00 0.00 5116.00 245.00 5305.00
16 IL 0.00 6.00 0.00 0.00 31.00 0.00 594.00 0.00 0.00 30.00 146.00 0.00 4145.00 611.00 4952.00
45 TN 0.00 0.00 0.00 0.00 45.00 0.00 1.00 0.00 0.00 1.00 143.00 0.00 4748.00 260.00 4938.00
12 GA 0.00 0.00 0.00 10.00 26.00 0.00 3.00 0.00 0.00 1.00 673.00 6.00 3926.00 415.00 4645.00
17 IN 0.00 4.00 0.00 0.00 13.00 0.00 84.00 0.00 0.00 3.00 92.00 4.00 4224.00 296.00 4424.00
24 MI 0.00 30.00 0.00 4.00 7.00 1.00 594.00 0.00 0.00 7.00 178.00 9.00 3362.00 394.00 4192.00
29 NC 11.00 4.00 0.00 0.00 29.00 16.00 17.00 25.00 0.00 2.00 478.00 14.00 2536.00 266.00 3132.00
18 KS 0.00 11.00 0.00 4.00 24.00 25.00 17.00 0.00 0.00 18.00 256.00 0.00 2721.00 373.00 3076.00
19 KY 0.00 4.00 0.00 0.00 27.00 0.00 152.00 0.00 0.00 5.00 45.00 0.00 2806.00 441.00 3039.00
20 LA 0.00 0.00 0.00 3.00 6.00 76.00 3.00 3.00 0.00 0.00 158.00 2.00 2676.00 288.00 2927.00
6 CA 25.00 96.00 0.00 1128.00 110.00 570.00 265.00 0.00 0.00 57.00 387.00 192.00 88.00 310.00 2918.00
40 PA 0.00 234.00 0.00 0.00 166.00 16.00 366.00 0.00 67.00 303.00 409.00 0.00 1241.00 395.00 2802.00
14 IA 0.00 0.00 0.00 2.00 138.00 0.00 22.00 0.00 0.00 0.00 199.00 0.00 2208.00 310.00 2569.00
25 MN 0.00 6.00 0.00 2.00 40.00 0.00 0.00 0.00 0.00 0.00 116.00 2.00 1976.00 140.00 2142.00
52 WI 0.00 54.00 0.00 3.00 19.00 2.00 79.00 0.00 0.00 21.00 190.00 0.00 1601.00 255.00 1969.00
21 MA 0.00 8.00 0.00 0.00 5.00 0.00 0.00 0.00 0.00 1.00 175.00 2.00 1758.00 151.00 1949.00
43 SC 0.00 0.00 0.00 0.00 19.00 13.00 20.00 3.00 0.00 0.00 147.00 4.00 1314.00 256.00 1520.00
22 MD 0.00 5.00 0.00 2.00 29.00 38.00 545.00 0.00 0.00 0.00 472.00 2.00 314.00 125.00 1407.00
48 VA 0.00 51.00 0.00 4.00 16.00 0.00 252.00 4.00 0.00 1.00 141.00 13.00 914.00 285.00 1396.00
31 NE 0.00 1.00 0.00 0.00 4.00 2.00 0.00 0.00 15.00 0.00 164.00 0.00 1158.00 127.00 1344.00
47 UT 25.00 0.00 0.00 6.00 41.00 32.00 0.00 0.00 0.00 259.00 523.00 0.00 91.00 93.00 977.00
33 NJ 0.00 8.00 0.00 10.00 198.00 17.00 304.00 0.00 0.00 12.00 189.00 34.00 70.00 307.00 842.00
7 CO 49.00 15.00 0.00 11.00 64.00 5.00 0.00 0.00 0.00 15.00 412.00 0.00 261.00 155.00 832.00

The following table shows the first states by number of fatalities.

State AVALANCHE COLD FIRE FLOOD FOG HEAT HURRICANE ICE SNOW STORM SURF TORNADO WIND TOTAL
16 IL 0.00 15.00 0.00 27.00 0.00 983.00 0.00 0.00 7.00 33.00 1.00 203.00 152.00 1421.00
46 TX 0.00 23.00 24.00 253.00 0.00 298.00 6.00 0.00 3.00 130.00 23.00 538.00 65.00 1363.00
40 PA 0.00 18.00 0.00 86.00 0.00 510.00 0.00 0.00 5.00 50.00 29.00 82.00 57.00 837.00
2 AL 0.00 9.00 0.00 19.00 1.00 22.00 2.00 0.00 0.00 40.00 26.00 617.00 46.00 782.00
26 MO 0.00 4.00 0.00 88.00 0.00 233.00 0.00 0.00 0.00 25.00 0.00 388.00 15.00 753.00
11 FL 0.00 2.00 1.00 10.00 6.00 11.00 47.00 0.00 0.00 154.00 282.00 186.00 45.00 744.00
27 MS 0.00 3.00 0.00 16.00 0.00 26.00 16.00 0.00 0.00 17.00 0.00 450.00 26.00 554.00
6 CA 35.00 24.00 39.00 78.00 32.00 118.00 0.00 0.00 14.00 61.00 98.00 0.00 40.00 539.00
3 AR 0.00 4.00 0.00 61.00 0.00 35.00 0.00 0.00 0.00 21.00 0.00 379.00 29.00 529.00
45 TN 0.00 5.00 0.00 59.00 0.00 39.00 0.00 0.00 0.00 24.00 0.00 368.00 26.00 521.00
38 OK 0.00 5.00 5.00 38.00 0.00 87.00 0.00 0.00 1.00 14.00 0.00 296.00 12.00 458.00
37 OH 0.00 15.00 0.00 54.00 5.00 31.00 0.00 0.00 3.00 39.00 0.00 191.00 64.00 402.00
24 MI 0.00 16.00 0.00 9.00 1.00 23.00 0.00 0.00 1.00 21.00 22.00 243.00 58.00 394.00
29 NC 4.00 26.00 0.00 66.00 1.00 15.00 31.00 0.00 3.00 43.00 40.00 126.00 38.00 393.00
17 IN 0.00 5.00 0.00 43.00 1.00 16.00 0.00 0.00 1.00 15.00 11.00 252.00 47.00 391.00
18 KS 0.00 3.00 0.00 24.00 5.00 15.00 0.00 0.00 9.00 43.00 0.00 236.00 20.00 355.00
36 NY 1.00 3.00 0.00 60.00 2.00 100.00 0.00 1.00 7.00 25.00 38.00 22.00 78.00 337.00
12 GA 0.00 1.00 0.00 43.00 0.00 13.00 0.00 0.00 0.00 45.00 1.00 180.00 44.00 327.00
20 LA 0.00 2.00 0.00 13.00 1.00 62.00 2.00 0.00 0.00 31.00 2.00 156.00 36.00 305.00
52 WI 0.00 11.00 1.00 10.00 12.00 98.00 0.00 0.00 1.00 13.00 0.00 96.00 35.00 277.00
19 KY 0.00 0.00 0.00 60.00 0.00 6.00 0.00 2.00 5.00 14.00 0.00 125.00 23.00 235.00
43 SC 0.00 28.00 0.00 8.00 1.00 41.00 1.00 0.00 0.00 27.00 20.00 59.00 31.00 216.00
5 AZ 0.00 0.00 0.00 63.00 0.00 58.00 0.00 0.00 3.00 57.00 0.00 3.00 22.00 206.00
33 NJ 0.00 2.00 0.00 25.00 0.00 48.00 0.00 0.00 1.00 25.00 38.00 1.00 38.00 178.00
25 MN 0.00 2.00 0.00 18.00 0.00 15.00 0.00 0.00 0.00 18.00 0.00 99.00 16.00 168.00
48 VA 1.00 9.00 0.00 47.00 0.00 6.00 5.00 0.00 3.00 31.00 5.00 36.00 22.00 165.00
7 CO 52.00 5.00 0.00 15.00 1.00 0.00 0.00 0.00 9.00 62.00 0.00 5.00 13.00 162.00
22 MD 0.00 5.00 0.00 13.00 1.00 100.00 0.00 0.00 0.00 26.00 1.00 7.00 9.00 162.00
14 IA 0.00 4.00 0.00 9.00 0.00 5.00 0.00 0.00 3.00 23.00 0.00 81.00 15.00 140.00
51 WA 36.00 4.00 4.00 8.00 1.00 3.00 0.00 4.00 8.00 16.00 1.00 6.00 47.00 138.00

From this tables we have that:

  1. FLOOD injuries and fatalities are concentrated in TX, OK, IA, MO, NJ, CA and PA.
  2. HEAT injuries and fatalities are concentrated in TX,PA, MO, MI, OK, NJ and IL.
  3. TORNADO and WIND injuries and fatalities are distributed over the mid east and south east USA states.
  4. STORM injuries and fatalities are concentrated in OH, TX, and FL.
  5. HURRICANE injuries and fatalities are concentrated in MS, NC and FL.
  6. AVALANCHE injuries are concentrated in CO, WA and CA.
  7. FIRE injuries and fatalities are concentrated in TX, FL, OK, GA and CA.
  8. SNOW and COLD injuries and fatalities are concentrated in TX, PA, IL, MI and CA.
  9. FOG injuries and fatalities are concentrated in AL, WI and CA.
  10. FOG injuries and fatalities are concentrated in PA.

Across the United States, which types of events have the greatest economic consequences?

To explore the economic consequences by state and event type we present the next plot, where the left figure shows tha damages states distribution in a USA map and the right figure show the distribution os damages by event type.

## Plot the map with the most economics consequences events by state.
g.plot.damages.map <- ggplot()
g.plot.damages.map <- g.plot.damages.map + geom_polygon(data=states.poligon.data , aes(x=long, y=lat, group = group), colour="white", fill="#eeeecc" ) 
g.plot.damages.map <- g.plot.damages.map + geom_point(data=noaa.data.damages.by.eventype.state, aes(x=longitude, y=latitude, size=Damages, colour=Event), shape = 1) 
g.plot.damages.map <- g.plot.damages.map + geom_text(data=noaa.data.damages.by.eventype.state, hjust=0.5, vjust=-0.5, aes(x=longitude, y=latitude, label=State), colour="#333333", size=4)
g.plot.damages.map <- g.plot.damages.map + labs(size="Total damages" , color="Event type") + scale_size(range = c(1, 15)) # + theme(legend.text=element_text(size=4))
g.plot.damages.map <- g.plot.damages.map + xlab("Longitude") + ylab("Latitude")  + ggtitle("State distribution of total damages (in USA million dollars) by event type.") 

## Plot the total damages  by event type.
g.plot.damages.by.event.type <- ggplot(data=noaa.data.damages.by.eventype.state , aes(x=Event, y=Damages, colour=Event, fill=Event))
g.plot.damages.by.event.type <- g.plot.damages.by.event.type + geom_bar(stat="identity") + labs(fill="Event type") + guides(colour=FALSE)
g.plot.damages.by.event.type <- g.plot.damages.by.event.type + xlab("Event type") + ylab("Total damages")  + ggtitle("Total damages (in USA million dollars)  by event type.") 

# PLot grid
grid.arrange(g.plot.damages.map, g.plot.damages.by.event.type,  ncol = 2,  main = "Total damages across USA by event type.")

plot of chunk unnamed-chunk-33 From this plot we have that:

  1. The STORM event type is the principal cause of damages, altough it was not one of the principal causes in population health.
  2. The HURRICANE, TORNADO AND FLOOD event types causes almost the total of the non STORM damages.
  3. We can observe that the most dangerous events: AVALANCHE and SURF have a low impact in the economics results.

The mean as standard deviation of damages in million dollars by state are:

The following table shows the first states by damages in million dollars .

State AVALANCHE COLD DRY FIRE FLOOD FOG HEAT HURRICANE ICE SNOW STORM SURF TORNADO WIND TOTAL
20 LA 0.00 57.87 587.43 3.59 774.04 0.25 0.11 21581.67 0.00 0.30 35153.38 0.00 1190.41 151.88 59349.06
46 TX 0.00 21.63 6523.63 608.01 1152.64 0.38 0.25 4143.14 0.00 31.23 14717.59 0.00 3645.37 945.42 30843.87
27 MS 0.00 0.24 3.15 0.00 1453.85 0.00 0.08 9808.55 0.00 4.71 16441.58 0.00 2074.88 268.26 29787.03
11 FL 0.00 874.11 100.00 410.64 1738.89 1.73 0.00 13220.81 0.00 0.00 1136.34 7.62 1753.92 2647.04 19244.06
2 AL 0.13 52.01 0.10 0.10 1423.76 1.50 400.10 1049.29 0.00 0.00 5134.01 8.10 6146.15 150.81 14215.25
6 CA 228.59 1001.77 0.05 3894.29 2240.97 13.79 492.41 0.00 0.00 2.37 1140.50 61.65 113.76 743.86 9190.15
14 IA 0.00 298.56 2654.78 1.10 3081.56 0.36 6.49 0.00 51.00 76.19 662.70 0.00 2130.58 400.07 8963.33
16 IL 0.05 29.42 284.57 2.12 6065.67 0.10 0.46 0.00 0.00 5.62 142.76 0.00 1748.02 633.95 8278.79
26 MO 0.00 11.10 30.84 2.11 1122.43 0.00 1.25 0.00 0.00 18.63 1231.97 0.00 4740.33 256.09 7158.66
37 OH 0.00 11.12 200.00 0.00 1476.91 0.56 6.10 0.00 0.00 72.62 1838.91 0.00 2263.07 862.75 5869.29
38 OK 0.00 0.96 1097.91 15.76 212.42 0.06 0.01 0.00 0.00 7.45 944.85 0.00 3318.66 1020.26 5598.09
12 GA 0.10 160.50 717.29 27.78 630.16 0.00 0.00 7.70 0.00 0.59 533.94 0.02 3225.99 357.80 5304.06
30 ND 0.00 0.67 0.00 0.60 4137.39 0.00 0.00 0.00 0.01 0.68 984.25 0.00 119.99 422.33 5243.59
29 NC 30.74 3.61 13.07 2.60 484.26 0.05 0.00 2383.06 0.00 4.04 628.97 0.00 1549.76 251.96 5100.17
40 PA 0.07 32.39 539.40 1.41 2197.75 0.00 0.01 0.00 0.57 92.80 161.70 0.00 1786.52 240.48 4812.61
45 TN 0.00 0.30 0.00 0.10 3113.27 0.00 0.00 0.00 0.50 5.45 47.27 0.00 1516.48 206.89 4683.36
25 MN 0.00 0.05 0.00 4.03 1452.39 0.02 2.13 0.00 0.00 0.12 1106.75 0.00 1871.19 865.10 4436.67
31 NE 0.00 313.36 720.05 6.69 193.19 0.49 8.08 0.00 0.18 5.10 1480.83 0.00 1666.98 347.94 4394.95
18 KS 0.00 35.38 155.95 2.65 461.42 0.12 0.11 0.00 0.00 59.44 962.80 0.00 2613.65 268.12 4291.51
36 NY 0.50 7.82 100.20 0.10 3150.70 0.00 0.00 0.00 11.00 175.80 264.50 39.20 456.43 602.64 4206.24
3 AR 0.08 1.08 2.65 8.43 536.32 0.00 0.00 7.70 0.00 50.76 914.56 0.00 2568.74 127.00 4090.31
17 IN 0.00 7.88 73.00 0.00 1152.26 0.47 1.00 0.00 0.00 3.39 114.56 0.00 2523.43 122.82 3875.99
5 AZ 0.01 1.83 0.00 222.72 123.84 0.00 0.00 0.00 0.00 0.11 3046.77 0.00 47.76 505.35 3443.02
33 NJ 0.00 0.00 80.00 1.26 2858.31 0.00 0.00 1.15 0.00 19.10 104.99 4.00 78.61 82.54 3147.43
7 CO 3.47 28.14 0.00 305.88 479.60 0.35 0.00 0.00 0.00 5.74 1551.01 0.00 293.77 85.19 2667.97
19 KY 0.00 5.17 226.00 0.03 806.96 0.00 0.00 0.00 0.00 11.12 655.12 0.00 885.93 345.64 2590.33
52 WI 0.00 15.64 4.53 2.24 1035.47 0.02 0.04 0.00 0.00 0.13 510.46 0.00 833.50 198.32 2402.03
24 MI 0.00 54.58 150.00 7.60 316.29 0.00 0.00 0.00 0.00 11.62 512.05 0.00 1058.69 365.37 2110.82
34 NM 0.00 0.00 14.40 1647.88 69.25 0.01 0.00 0.00 0.00 0.18 122.36 0.00 57.94 23.61 1912.02
48 VA 0.03 6.12 297.48 5.77 408.39 0.80 0.60 60.10 0.00 6.00 155.96 0.08 434.75 153.36 1376.07

From the previous plot and this table we have that the south east region (LA, TX, AL, MS, FL) have the larger impact in ecomonomics because of climatic events.