This work has been created for the Peer Assessment 2 at the Reproducible Research course in Coursera (Jul 2014). In this assigment we have to address questions about both public health and economic problems caused by severe weather events exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The data analysis must address the following basic questions:
To solve the questions formulated we will consider the general considerations that summaryze the analysis development:
There are 985 different types of events in the USA NOAA Storm Database which implies that a event type classification process must to be introduced in order to reduce the number of categories used in the analysis. This categorization process leads to a classification with 13 event types, that can be summarized in the following list:
Note that the order of prevalence in the substitution process in given by the previous list, so an event type called HURRICANE OPAL/HIGH WINDS will be included in the HURRICANE class instead of the WIND class that comes after in the list. We also have to emphazise that before to introduce the classification process we make all the event types to be upper case strings, so we can apply the substitution in an robustness way. More details about this process will be showed in the Data Processing setion.
The database contains 90.2297 registers, but we will consider as non complete any register that has a zero value at each specific analysis variable (injuries/facilitites/damages). Therefore in order to determine how the cleaning data process captures the information we will create for each variable a ratio of information captured and a ratio of registers captured in order to give the user a general view of how much information will be used in the exploration analysis.
As the interest of the client is to see the quantities distribution arround the country, we will create several USA maps with the data aggregated to the state capitol locations. This implies the loads of two additional csv files that contains the geografical information to construct this maps.
In the next sections we will proceed to develop the generation of the report.
In order to exectute the R code the following libraries must be loaded.
library(ggplot2)
library(gridExtra)
library(maps)
library(xtable)
library(reshape)
The filter coeficient (noaa.data.filter.alfa) will be used with the mean value obtained for each analysis variable in order to extract the events that will be considered. This proccess extracts all the events with a value of injuries/fatalities/damages over (1-noaa.data.filter.alfa)*mean.
noaa.data.filter.alfa <- 0.95
This parameter will set the number of states that will be shown in the tables data.
noaa.data.states.table.size <- 30
The proposed value of 0.95 achieves very good results at the event type classification, but the user can experiment with other values.
At this section we load the state capitol geographical location from the file 'state_locations.csv and the states poligon geografical data (library maps). We also apply some changes in the capitol location of some states (like Alaska) in order to create nicer map plots.
## Read USA states capitols latitude and longitude.
states.relocations <- c('AK','HI','AS','PR','VI')
states.capitols.locations <- read.csv('state_locations.csv', header=TRUE)
states.capitols.locations$State <- as.factor(states.capitols.locations$state)
states.capitols.locations[which(states.capitols.locations$state %in% states.relocations),]$longitude <- -120
latitude = 20
for (index in 1:length(states.relocations)) {
states.capitols.locations$latitude[states.capitols.locations$state == states.relocations[index]] <- latitude
latitude <- latitude + 2
}
## Read the USA states poligons maps.
states.poligon.data <- map_data("state")
At this point we read the USA NOAA Storm Database by means of a previous unzip operation only if needed.
## Read NOAA Storm Database.
csv.col.classes = c("NULL", "character", rep("NULL", 4), "character", "character", rep("NULL", 14), "numeric", "numeric", "numeric", "character", "numeric", "character", rep("NULL", 3), "numeric", "numeric", rep("NULL", 3), "numeric")
if (file.exists('repdata-data-StormData.csv') == FALSE){
bunzip2('repdata-data-StormData.csv.bz2')
}
noaa.data <- read.csv('repdata-data-StormData.csv', header=TRUE, colClasses=csv.col.classes) # , nrows=1000000
names(noaa.data)[1]<-"DATE"
Before the cleaning process we apply some changes to the loaded data in order to create factor columns to improve the speed of the results.
## Refactor some data columns with the data type.
noaa.data$DATE <- as.Date(noaa.data$DATE, "%m/%d/%Y %H:%M:%S")
noaa.data$STATE<- as.factor(noaa.data$STATE)
noaa.data$EVTYPE <- as.factor(noaa.data$EVTYPE)
noaa.data$PROPDMGEXP<- as.factor(noaa.data$PROPDMGEXP)
noaa.data$CROPDMGEXP<- as.factor(noaa.data$CROPDMGEXP)
We also compute new columns with data related to the month and year of the events (SHORDATE col) that will be used in the temporal data plots, and a event damages column (DAMAGES col) in million dollars that is computed with the assumption that CROPDMG info has prevalence over PROPDMG info, as commented in the documentation of the database.
## Add a short date field tat includes the month and year (it gives better plots than by date)
noaa.data$SHORTDATE <- as.Date(cut(noaa.data$DATE, breaks = "month"))
## Add a DAMAGES that contains the total damages related to the event in million dollars. CROPDMG info has prevalence over PROPDMG info.
noaa.data$PROPDMG_TOTAL <- ifelse(noaa.data$PROPDMGEXP == 'B', 1e3*noaa.data$PROPDMG, ifelse(noaa.data$PROPDMGEXP == 'M', noaa.data$PROPDMG, ifelse(noaa.data$PROPDMGEXP == 'm', noaa.data$PROPDMG, ifelse(noaa.data$PROPDMGEXP == 'K', 1e-3*noaa.data$PROPDMG, ifelse(noaa.data$PROPDMGEXP == 'h', 1e-4*noaa.data$PROPDMG, 0)))) )
noaa.data$CROPDMG_TOTAL <- ifelse(noaa.data$CROPDMGEXP == 'B', 1e3*noaa.data$CROPDMG, ifelse(noaa.data$CROPDMGEXP == 'M', noaa.data$CROPDMG, ifelse(noaa.data$CROPDMGEXP == 'm',noaa.data$CROPDMG, ifelse(noaa.data$CROPDMGEXP == 'K', 1e-3*noaa.data$CROPDMG, ifelse(noaa.data$CROPDMGEXP == 'h', 1e-4*noaa.data$CROPDMG, 0)))) )
noaa.data$DAMAGES <- ifelse(noaa.data$CROPDMG_TOTAL > 0, noaa.data$CROPDMG_TOTAL, noaa.data$PROPDMG_TOTAL)
The first step in this process is the event type classification, that is, to create the definitive list of event types that will be used in the analysis. To develop this task we will apply the following steps.
1.- Compute the NOAA Storm Database aggregation sum of injuries/fatalities/damages by even type.
## Compute the NOAA Storm Database aggregation sum of injuries/fatalities/damages by even type.
noaa.data.injuries.by.eventype.unfilter <- aggregate(noaa.data$INJURIES,by=list(noaa.data$EVTYPE), sum)
names(noaa.data.injuries.by.eventype.unfilter)[1]<-"Event"
names(noaa.data.injuries.by.eventype.unfilter)[2]<-"Injuries"
noaa.data.fatalities.by.eventype.unfilter <- aggregate(noaa.data$FATALITIES,by=list(noaa.data$EVTYPE), sum)
names(noaa.data.fatalities.by.eventype.unfilter)[1]<-"Event"
names(noaa.data.fatalities.by.eventype.unfilter)[2]<-"Fatalities"
noaa.data.damages.by.eventype.unfilter <- aggregate(noaa.data$DAMAGES,by=list(noaa.data$EVTYPE), sum)
names(noaa.data.damages.by.eventype.unfilter)[1]<-"Event"
names(noaa.data.damages.by.eventype.unfilter)[2]<-"Damages"
2.- Delete the events with zero injuries/fatalities/damages.
## Delete the event types with zero injuries/fatalities/damages.
noaa.data.injuries.by.eventype.unfilter <- noaa.data.injuries.by.eventype.unfilter[noaa.data.injuries.by.eventype.unfilter$Injuries > 0,]
noaa.data.damages.by.eventype.unfilter <- noaa.data.damages.by.eventype.unfilter[noaa.data.damages.by.eventype.unfilter$Damages > 0,]
noaa.data.fatalities.by.eventype.unfilter <- noaa.data.fatalities.by.eventype.unfilter[noaa.data.fatalities.by.eventype.unfilter$Fatalities > 0,]
3.- Compute the mean of the remained events.
## Compute the mean of the remained events.
noaa.data.injuries.by.eventype.unfilter.mean <- mean(noaa.data.injuries.by.eventype.unfilter$Injuries)
noaa.data.fatalities.by.eventype.unfilter.mean <- mean(noaa.data.fatalities.by.eventype.unfilter$Fatalities)
noaa.data.damages.by.eventype.unfilter.mean <- mean(noaa.data.damages.by.eventype.unfilter$Damages)
4.- Apply the filter coeficient, so we extract data with values over (1-alfa)*mean.
## Apply the filter coeficient noaa.data.filter.alfa and the mean to extract the evetns that will be considered.
noaa.data.injuries.by.eventype.unfilter <- noaa.data.injuries.by.eventype.unfilter[noaa.data.injuries.by.eventype.unfilter$Injuries > (1-noaa.data.filter.alfa)*noaa.data.injuries.by.eventype.unfilter.mean ,]
noaa.data.fatalities.by.eventype.unfilter <- noaa.data.fatalities.by.eventype.unfilter[noaa.data.fatalities.by.eventype.unfilter$Fatalities > (1-noaa.data.filter.alfa)*noaa.data.fatalities.by.eventype.unfilter.mean,]
noaa.data.damages.by.eventype.unfilter <- noaa.data.damages.by.eventype.unfilter[noaa.data.damages.by.eventype.unfilter$Damages > (1-noaa.data.filter.alfa)*noaa.data.damages.by.eventype.unfilter.mean,]
5.- Calculate the ratios of information captured (division of the total sums and sum of the captured information).
## Calculate the ratios of information captured.
noaa.data.injuries.total <- sum(noaa.data$INJURIES)
noaa.data.injuries.by.eventype.unfilter.total <- sum(noaa.data.injuries.by.eventype.unfilter$Injuries)
noaa.data.injuries.total.ratio <- 100*(noaa.data.injuries.by.eventype.unfilter.total / noaa.data.injuries.total)
noaa.data.fatalities.total <- sum(noaa.data$FATALITIES)
noaa.data.fatalities.by.eventype.unfilter.total <- sum(noaa.data.fatalities.by.eventype.unfilter$Fatalities)
noaa.data.fatalities.total.ratio <- 100*(noaa.data.fatalities.by.eventype.unfilter.total / noaa.data.fatalities.total)
noaa.data.damages.total <- sum(noaa.data$DAMAGES)
noaa.data.damages.by.eventype.unfilter.total <- sum(noaa.data.damages.by.eventype.unfilter$Damages)
noaa.data.damages.total.ratio <- 100*(noaa.data.damages.by.eventype.unfilter.total / noaa.data.damages.total)
With the filter coefficient noaa.data.filter.alfa =0.95 we have captured:
To create an initial event type list we concacenate the event types lists obtained by injuries, fatalitites and damages in a unique list of events.
## Create the list of events to apply the filter.
noaa.data.injuries.evtypes <- toupper(as.character(noaa.data.injuries.by.eventype.unfilter$Event))
noaa.data.fatalities.evtypes <- toupper(as.character(noaa.data.fatalities.by.eventype.unfilter$Event))
noaa.data.damages.evtypes <- toupper(as.character(noaa.data.damages.by.eventype.unfilter$Event))
noaa.data.evtypes.unfilter <- unique(c(noaa.data.injuries.evtypes, noaa.data.fatalities.evtypes, noaa.data.damages.evtypes))
Before the creation of the the definitive event type list we apply some filter process using the candidate event type list.
## Filter the data by considering only the list of events obtained.
noaa.data.injuries <- noaa.data[which(noaa.data$EVTYPE %in% noaa.data.evtypes.unfilter) ,]
noaa.data.fatalities <- noaa.data[which(noaa.data$EVTYPE %in% noaa.data.evtypes.unfilter) ,]
noaa.data.damages <- noaa.data[which(noaa.data$EVTYPE %in% noaa.data.evtypes.unfilter) ,]
Now delete the events with zero injuries/fatalities/damages that we consider uncomplete cases at each specific variable.
## Delete the events with zero injuries/fatalities/damages.
noaa.data.injuries <- noaa.data.injuries[noaa.data.injuries$INJURIES > 0 ,]
noaa.data.fatalities <- noaa.data.fatalities[noaa.data.fatalities$FATALITIES > 0 ,]
noaa.data.damages <- noaa.data.damages[noaa.data.damages$DAMAGES > 0 ,]
Finally let's compute the ratio of rows that we have captured for the analysis:
## Compute the ratio of the number of rows with information that have been captured.
noaa.data.injuries.total.count <- nrow(noaa.data[noaa.data$INJURIES > 0 ,])
noaa.data.fatalities.total.count <- nrow(noaa.data[noaa.data$FATALITIES > 0 ,])
noaa.data.damages.total.count <- nrow(noaa.data[noaa.data$DAMAGES > 0 ,])
noaa.data.injuries.count <- nrow(noaa.data.injuries)
noaa.data.fatalities.count <- nrow(noaa.data.fatalities)
noaa.data.damages.count <- nrow(noaa.data.damages)
noaa.data.injuries.ratio.count <- 100*(noaa.data.injuries.count / noaa.data.injuries.total.count )
noaa.data.fatalities.ratio.count <- 100*(noaa.data.fatalities.count / noaa.data.fatalities.total.count)
noaa.data.damages.ratio.count <- 100*(noaa.data.damages.count / noaa.data.damages.total.count)
The ratios of rows captured are:
To obtain the definitive list of event types we count the words that appears in the actual event type list.
## Create the table that count the words presented in the event types.
noaa.data.events.words.frequency <- data.frame(table(do.call(c, lapply(noaa.data.evtypes.unfilter, function(x) unlist(strsplit(gsub("[/,]", " ", x), " "))))))
names(noaa.data.events.words.frequency)[1] <- "Word"
names(noaa.data.events.words.frequency)[2] <- "Frequency"
noaa.data.events.words.frequency.candidates <- noaa.data.events.words.frequency[with(noaa.data.events.words.frequency, order(-Frequency)), ][1:50,]
The first 50 word candidates by frequency are:
| Word | Frequency | |
|---|---|---|
| 89 | WIND | 12 |
| 25 | FLOOD | 11 |
| 7 | COLD | 8 |
| 36 | HEAT | 7 |
| 37 | HEAVY | 7 |
| 62 | STORM | 7 |
| 23 | FLASH | 5 |
| 38 | HIGH | 5 |
| 39 | HURRICANE | 5 |
| 66 | SURF | 5 |
| 83 | WEATHER | 5 |
| 92 | WINTER | 5 |
| 20 | EXTREME | 4 |
| 26 | FLOODING | 4 |
| 47 | MARINE | 4 |
| 51 | RAIN | 4 |
| 61 | SNOW | 4 |
| 69 | THUNDERSTORM | 4 |
| 74 | TSTM | 4 |
| 91 | WINDS | 4 |
| 2 | AND | 3 |
| 19 | EXCESSIVE | 3 |
| 29 | FREEZE | 3 |
| 31 | FROST | 3 |
| 34 | HAIL | 3 |
| 49 | MIX | 3 |
| 53 | RECORD | 3 |
| 54 | RIP | 3 |
| 65 | STRONG | 3 |
| 1 | 2 | |
| 5 | CHILL | 2 |
| 6 | COASTAL | 2 |
| 10 | CURRENTS | 2 |
| 27 | FOG | 2 |
| 40 | ICE | 2 |
| 58 | SEAS | 2 |
| 59 | SEVERE | 2 |
| 67 | SURGE | 2 |
| 71 | TORNADO | 2 |
| 73 | TROPICAL | 2 |
| 76 | TYPHOON | 2 |
| 77 | UNSEASONABLY | 2 |
| 79 | WARM | 2 |
| 86 | WILD | 2 |
| 3 | AVALANCHE | 1 |
| 4 | BLIZZARD | 1 |
| 8 | CONDITIONS | 1 |
| 9 | CURRENT | 1 |
| 11 | DAMAGING | 1 |
| 12 | DENSE | 1 |
After analysing this table of frequency of words (out of the scope of this work) we have consider to create the substitution list presented in the Synopsis by means of the use of the following event type substituion information.
## Create the list of substitutions for the event type classification.
noaa.data.fields.to.substitute <- c('(.*)TORNADO(.*)', '(.*)HURRICANE(.*)', '(.*)TYPHOON(.*)', '(.*)WIND(.*)', '(.*)FIRE(.*)', '(.*)STORM(.*)', '(.*)GLAZE(.*)', '(.*)HAIL(.*)','(.*)WETNESS(.*)', '(.*)LIGHTNING(.*)', '(.*)RAIN(.*)', '(.*)BLIZZARD(.*)', '(.*)COLD(.*)', '(.*)LOW TEMPERATURE(.*)', '(.*)WINTRY(.*)', '(.*)WINTER(.*)', '(.*)FREEZE(.*)', '(.*)SNOW(.*)', '(.*)FLOOD(.*)', '(.*)STREAM(.*)', '(.*)HEAT(.*)', '(.*)HOT(.*)', '(.*)SURF(.*)', '(.*)SEAS(.*)', '(.*)MARINE(.*)', '(.*)CURRENT(.*)', '(.*)TSUNAMI(.*)', '(.*)FOG(.*)', '(.*)DRY(.*)', '(.*)DROUGHT(.*)', '(.*)LANDSLIDE(.*)', '(.*)AVALANCHE(.*)', '(.*)LAND(.*)', '(.*)ICE(.*)', '(.*)ICY(.*)', '(.*)FROST(.*)')
noaa.data.fields.substitutes <- c('TORNADO','HURRICANE','HURRICANE', 'WIND', 'FIRE', 'STORM', 'STORM', 'STORM', 'STORM', 'STORM', 'STORM', 'STORM', 'COLD', 'COLD', 'COLD', 'COLD', 'COLD', 'SNOW', 'FLOOD', 'FLOOD', 'HEAT', 'HEAT', 'SURF', 'SURF', 'SURF', 'SURF', 'SURF', 'FOG', 'DRY', 'DRY', 'AVALANCHE', 'AVALANCHE', 'AVALANCHE', 'ICE', 'ICE', 'ICE')
noaa.data.fields.substitutions <- data.frame(cbind(noaa.data.fields.to.substitute, noaa.data.fields.substitutes))
names(noaa.data.fields.substitutions)[1]<-"substitution"
names(noaa.data.fields.substitutions)[2]<-"substitute"
noaa.data.fields.substitutions.len <- nrow(noaa.data.fields.substitutions)
## Create the unique (no repeats) definitive list of event types.
noaa.data.evtypes <- unique(noaa.data.fields.substitutes)
At this point we apply the substitution list to generte the final data and finally we add some factorization to the data frames to improve the speed:
## Apply the substitution list.
for (row in 1:noaa.data.fields.substitutions.len) {
data.substitution <- noaa.data.fields.substitutions[row,]
substitution <- as.character(data.substitution[,1])
substitute <- as.character(data.substitution[,2])
noaa.data.injuries$EVTYPE <- gsub(substitution, substitute, noaa.data.injuries$EVTYPE)
noaa.data.fatalities$EVTYPE <- gsub(substitution, substitute, noaa.data.fatalities$EVTYPE)
noaa.data.damages$EVTYPE <- gsub(substitution, substitute, noaa.data.damages$EVTYPE)
}
Finally let's factorize the event type column.
## Add factors.
noaa.data.injuries$EVTYPE <- as.factor(noaa.data.injuries$EVTYPE)
noaa.data.fatalities$EVTYPE <- as.factor(noaa.data.fatalities$EVTYPE)
noaa.data.damages$EVTYPE <- as.factor(noaa.data.damages$EVTYPE)
Now that we have filter the data we execute the code that provides the information needed to deploy the answers to the proposed questions. This task is develop in the folloing steps:
1.- First we compute the aggregation sum of injuries/fatalities/damages by even type and the ratio of injuries versus fatalities, which give us qualitative information about how dangerous are the event types.
## Compute the NOAA Storm Database aggregation sum of injuries/fatalities/damages by even type.
noaa.data.injuries.by.eventype <- aggregate(noaa.data.injuries$INJURIES,by=list(noaa.data.injuries$EVTYPE), sum)
names(noaa.data.injuries.by.eventype)[1]<-"Event"
names(noaa.data.injuries.by.eventype)[2]<-"Injuries"
noaa.data.fatalities.by.eventype <- aggregate(noaa.data.fatalities$FATALITIES,by=list(noaa.data.fatalities$EVTYPE), sum)
names(noaa.data.fatalities.by.eventype)[1]<-"Event"
names(noaa.data.fatalities.by.eventype)[2]<-"Fatalities"
noaa.data.damages.by.eventype <- aggregate(noaa.data.damages$DAMAGES,by=list(noaa.data.damages$EVTYPE), sum)
names(noaa.data.damages.by.eventype)[1]<-"Event"
names(noaa.data.damages.by.eventype)[2]<-"Damages"
## Compute the ratios of dangerous per evnt type.
noaa.data.event.dangerous <- merge(noaa.data.fatalities.by.eventype, noaa.data.injuries.by.eventype, by.x="Event", by.y="Event")
noaa.data.event.dangerous$ratio <- noaa.data.event.dangerous$Fatalities / noaa.data.event.dangerous$Injuries
2.- The next step is to compute the aggregation sum of injuries/fatalities/damages by even type and state.
## Compute the NOAA Storm Database aggregation sum of injuries/fatalities/damages by even type and state.
noaa.data.injuries.by.eventype.state <- aggregate(noaa.data.injuries$INJURIES,by=list(noaa.data.injuries$STATE,noaa.data.injuries$EVTYPE),sum)
names(noaa.data.injuries.by.eventype.state)[1]<-"State"
names(noaa.data.injuries.by.eventype.state)[2]<-"Event"
names(noaa.data.injuries.by.eventype.state)[3]<-"Injuries"
noaa.data.fatalities.by.eventype.state <- aggregate(noaa.data.fatalities$FATALITIES,by=list(noaa.data.fatalities$STATE,noaa.data.fatalities$EVTYPE),sum)
names(noaa.data.fatalities.by.eventype.state)[1]<-"State"
names(noaa.data.fatalities.by.eventype.state)[2]<-"Event"
names(noaa.data.fatalities.by.eventype.state)[3]<-"Fatalities"
noaa.data.damages.by.eventype.state <- aggregate(noaa.data.damages$DAMAGES,by=list(noaa.data.damages$STATE,noaa.data.damages$EVTYPE),sum)
names(noaa.data.damages.by.eventype.state)[1]<-"State"
names(noaa.data.damages.by.eventype.state)[2]<-"Event"
names(noaa.data.damages.by.eventype.state)[3]<-"Damages"
3.- Now we obtain the aggregation sum of injuries/fatalities/damages by state.
## Compute the NOAA Storm Database aggregation sum of injuries/fatalities/damages by even type and state.
noaa.data.injuries.by.state <- aggregate(noaa.data.injuries$INJURIES,by=list(noaa.data.injuries$STATE),sum)
names(noaa.data.injuries.by.state)[1]<-"State"
names(noaa.data.injuries.by.state)[2]<-"Injuries"
noaa.data.fatalities.by.state <- aggregate(noaa.data.fatalities$FATALITIES,by=list(noaa.data.fatalities$STATE),sum)
names(noaa.data.fatalities.by.state)[1]<-"State"
names(noaa.data.fatalities.by.state)[2]<-"Fatalities"
noaa.data.damages.by.state <- aggregate(noaa.data.damages$DAMAGES,by=list(noaa.data.damages$STATE),sum)
names(noaa.data.damages.by.state)[1]<-"State"
names(noaa.data.damages.by.state)[2]<-"Damages"
## Compute statiscal data values by state.
noaa.data.injuries.by.state.mean <- mean(noaa.data.injuries.by.state$Injuries)
noaa.data.fatalities.by.state.mean <- mean(noaa.data.fatalities.by.state$Fatalities)
noaa.data.damages.by.state.mean <- mean(noaa.data.damages.by.state$Damages)
noaa.data.injuries.by.state.sd <- sd(noaa.data.injuries.by.state$Injuries)
noaa.data.fatalities.by.state.sd <- sd(noaa.data.fatalities.by.state$Fatalities)
noaa.data.damages.by.state.sd <- sd(noaa.data.damages.by.state$Damages)
4.- We bind the information of the geographical location of the state capitols to the data obtained in order to generate the USA map plots.
## Bind the location (lattitude and longitude) of the capitol of the states.
noaa.data.injuries.by.eventype.state <- merge(noaa.data.injuries.by.eventype.state, states.capitols.locations, by.x="State", by.y="State")
noaa.data.fatalities.by.eventype.state <- merge(noaa.data.fatalities.by.eventype.state, states.capitols.locations, by.x="State", by.y="State")
noaa.data.damages.by.eventype.state <- merge(noaa.data.damages.by.eventype.state, states.capitols.locations, by.x="State", by.y="State")
5.- Let's compute the aggregation sum of injuries/fatalities by even type and short date, that is, month and year of the event.
## Compute the NOAA Storm Database injuries fatalities by month and year and compute the sum of aggregation.
noaa.data.injuries.by.date <- aggregate(noaa.data.injuries$INJURIES,by=list(noaa.data.injuries$SHORTDATE, noaa.data.injuries$EVTYPE), sum)
names(noaa.data.injuries.by.date)[1]<-"Date"
names(noaa.data.injuries.by.date)[2]<-"Event"
names(noaa.data.injuries.by.date)[3]<-"Injuries"
noaa.data.fatalities.by.date <- aggregate(noaa.data.fatalities$FATALITIES,by=list(noaa.data.fatalities$SHORTDATE, noaa.data.fatalities$EVTYPE), sum)
names(noaa.data.fatalities.by.date)[1]<-"Date"
names(noaa.data.fatalities.by.date)[2]<-"Event"
names(noaa.data.fatalities.by.date)[3]<-"Fatalities"
6.- Finally we create data tables that will be used for print relevant information in the exploratory analysis.
## Compute the tables of the most affected states and the number of injuries/fatalitites/damages per event type.
noaa.data.injuries.by.eventype.state.melted <- melt(noaa.data.injuries.by.eventype.state, id=c("State", "Event"), measure.vars=c("Injuries"))
names(noaa.data.injuries.by.eventype.state.melted)[4] <- "Injuries"
noaa.data.injuries.by.eventype.state.melted <- cast(noaa.data.injuries.by.eventype.state.melted, State ~ Event, fill=FALSE, value = "Injuries")
noaa.data.injuries.by.eventype.state.melted$TOTAL <- rowSums(noaa.data.injuries.by.eventype.state.melted[,2:14])
noaa.data.injuries.by.eventype.state.table <-noaa.data.injuries.by.eventype.state.melted[order(-noaa.data.injuries.by.eventype.state.melted$TOTAL), ][1:noaa.data.states.table.size,]
noaa.data.fatalities.by.eventype.state.melted <- melt(noaa.data.fatalities.by.eventype.state, id=c("State", "Event"), measure.vars=c("Fatalities"))
names(noaa.data.fatalities.by.eventype.state.melted)[4] <- 'Fatalities'
noaa.data.fatalities.by.eventype.state.melted <- cast(noaa.data.fatalities.by.eventype.state.melted, State ~ Event, fill=FALSE, value = "Fatalities")
noaa.data.fatalities.by.eventype.state.melted$TOTAL <- rowSums(noaa.data.fatalities.by.eventype.state.melted[,2:14])
noaa.data.fatalities.by.eventype.state.table <-noaa.data.fatalities.by.eventype.state.melted[order(-noaa.data.fatalities.by.eventype.state.melted$TOTAL), ][1:noaa.data.states.table.size,]
noaa.data.damages.by.eventype.state.melted <- melt(noaa.data.damages.by.eventype.state, id=c("State", "Event"), measure.vars=c("Damages"))
names(noaa.data.damages.by.eventype.state.melted)[4] <- "Damages"
noaa.data.damages.by.eventype.state.melted <- cast(noaa.data.damages.by.eventype.state.melted, State ~ Event, fill=FALSE, value = "Damages")
noaa.data.damages.by.eventype.state.melted$TOTAL <- rowSums(noaa.data.damages.by.eventype.state.melted[,2:14])
noaa.data.damages.by.eventype.state.table <-noaa.data.damages.by.eventype.state.melted[order(-noaa.data.damages.by.eventype.state.melted$TOTAL), ][1:noaa.data.states.table.size,]
In this section we explore the results and expose the solution to the questions that have been proposed.
To answer the first question we start our analysis by visualizing the evolution in time and the event type distribution of the population health data, that is, injuries and fatalities. In the next figure we can show these results for injuries and fatilities.
## Plot the time evolution of total Injuries and fatalitites.
g.plot.injuries.evolution <- ggplot(xlab='Date', ylab="Injuries") + scale_colour_discrete(name="Event type")
g.plot.injuries.evolution <- g.plot.injuries.evolution + geom_line(data=noaa.data.injuries.by.date, aes(x=Date, y=Injuries, colour=Event))
g.plot.injuries.evolution <- g.plot.injuries.evolution + guides(fill=FALSE) + ggtitle("Injuries by date and event type.")
g.plot.injuries.evolution <- g.plot.injuries.evolution + guides(colour=FALSE, fill=FALSE)
g.plot.fatalities.evolution <- qplot(xlab='Date', ylab='Fatalities') + ggtitle("Fatalitites. by date and event type.") + scale_colour_discrete(name="Event type")
g.plot.fatalities.evolution <- g.plot.fatalities.evolution + geom_line(data=noaa.data.fatalities.by.date, aes(x=Date, y=Fatalities, colour=Event))
g.plot.fatalities.evolution <- g.plot.fatalities.evolution + guides(fill=FALSE)
## Plot the total injuries/fatalities by event type.
g.plot.injuries.by.event.type <- ggplot(data=noaa.data.injuries.by.eventype , aes(x=Event, y=Injuries, colour=Event, fill=Event))
g.plot.injuries.by.event.type <- g.plot.injuries.by.event.type + geom_bar(stat="identity")
g.plot.injuries.by.event.type <- g.plot.injuries.by.event.type + xlab("Event type") + ylab("Injuries") + ggtitle("Total injuries by event type.")
g.plot.injuries.by.event.type <- g.plot.injuries.by.event.type + labs(fill="Event type") + guides(colour=FALSE, fill=FALSE)
g.plot.fatalities.by.event.type <- ggplot(data=noaa.data.fatalities.by.eventype , aes(x=Event, y=Fatalities, colour=Event, fill=Event))
g.plot.fatalities.by.event.type <- g.plot.fatalities.by.event.type + geom_bar(stat="identity")
g.plot.fatalities.by.event.type <- g.plot.fatalities.by.event.type + xlab("Event type") + ylab("Fatalities") + ggtitle("Total fatalities by event type.")
g.plot.fatalities.by.event.type <- g.plot.fatalities.by.event.type + labs(fill="Event type") + guides(colour=FALSE)
# PLot grid
grid.arrange(g.plot.injuries.evolution, g.plot.fatalities.evolution, g.plot.injuries.by.event.type, g.plot.fatalities.by.event.type, nrow = 2, ncol = 2, main = "Injuries and fatalities summary.")
From this plot we can observe several major charsteristics of the data:
The ratios (fatalities / injuries) of dangerous for the event types are:
| Event | ratio | |
|---|---|---|
| 1 | AVALANCHE | 1.18 |
| 11 | SURF | 0.88 |
| 6 | HEAT | 0.34 |
| 2 | COLD | 0.32 |
| 4 | FLOOD | 0.18 |
| 9 | SNOW | 0.13 |
| 13 | WIND | 0.12 |
| 10 | STORM | 0.12 |
| 7 | HURRICANE | 0.10 |
| 5 | FOG | 0.07 |
| 8 | ICE | 0.07 |
| 12 | TORNADO | 0.06 |
| 3 | FIRE | 0.06 |
Now that we have a general view of the event types and its time evolution with respect to population health we focus our interest in the USA state distribution of the injuries and the fatalities. The next plot shows this distribution in a USA map.
## Plot the map with the most harmful events by state by injuries.
g.plot.injuries <- ggplot()
g.plot.injuries <- g.plot.injuries + geom_polygon(data=states.poligon.data , aes(x=long, y=lat, group = group), colour="white", fill="#eeeecc" )
g.plot.injuries <- g.plot.injuries + geom_point(data=noaa.data.injuries.by.eventype.state, aes(x=longitude, y=latitude, size=Injuries, colour=Event), shape = 1)
g.plot.injuries <- g.plot.injuries + geom_text(data=noaa.data.injuries.by.eventype.state, hjust=0.5, vjust=-0.5, aes(x=longitude, y=latitude, label=State), colour="#333333", size=4)
g.plot.injuries <- g.plot.injuries + labs(size="Total injuries" , color="Event type") + scale_size(range = c(1, 15)) # + theme(legend.text=element_text(size=4))
g.plot.injuries <- g.plot.injuries + xlab("Longitude") + ylab("Latitude") + ggtitle("State distribution of total injuries by event type.")
## Plot the map with the most harmful events by state by fatalities
g.plot.fatalities <- ggplot()
g.plot.fatalities <- g.plot.fatalities + geom_polygon(data=states.poligon.data , aes(x=long, y=lat, group = group), colour="white", fill="#eeeecc" )
g.plot.fatalities <- g.plot.fatalities + geom_point(data=noaa.data.fatalities.by.eventype.state, aes(x=longitude, y=latitude, size=Fatalities, colour=Event), shape = 1)
g.plot.fatalities <- g.plot.fatalities + geom_text(data=noaa.data.fatalities.by.eventype.state, hjust=0.5, vjust=-0.5, aes(x=longitude, y=latitude, label=state), colour="#333333", size=4)
g.plot.fatalities <- g.plot.fatalities + labs(size="Total fatalities" , color="Event type") + scale_size(range = c(1, 15)) # + theme(legend.text=element_text(size=4))
g.plot.fatalities <- g.plot.fatalities + xlab("Longitude") + ylab("Latitude") + ggtitle("State distribution of total fatalities by event type.")
# PLot grid
grid.arrange(g.plot.injuries, g.plot.fatalities, ncol = 2, main = "State distribution of events that are most harmful with respect to population health")
From this plot we can conclude that the south and mid east regions obtained with the union of the states TX, LA, OK, AR, MO, IA, MN, WI, IL, TN, AL, GA, MI, KY, OH, FL, PA and NC concentrates the mayor impact events with respect to population health.
The mean as standard deviation of injuries and fatalities by state are:
Injuries
Fatalities
The following table shows the first states by number of injuries.
| State | AVALANCHE | COLD | DRY | FIRE | FLOOD | FOG | HEAT | HURRICANE | ICE | SNOW | STORM | SURF | TORNADO | WIND | TOTAL | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 46 | TX | 0.00 | 175.00 | 0.00 | 146.00 | 6926.00 | 4.00 | 787.00 | 17.00 | 0.00 | 1.00 | 584.00 | 8.00 | 8207.00 | 763.00 | 16855.00 |
| 26 | MO | 0.00 | 21.00 | 0.00 | 5.00 | 41.00 | 0.00 | 4185.00 | 0.00 | 0.00 | 2.00 | 70.00 | 0.00 | 4330.00 | 329.00 | 8654.00 |
| 2 | AL | 0.00 | 0.00 | 0.00 | 0.00 | 15.00 | 91.00 | 73.00 | 0.00 | 0.00 | 0.00 | 177.00 | 4.00 | 7929.00 | 451.00 | 8289.00 |
| 37 | OH | 0.00 | 1.00 | 0.00 | 0.00 | 33.00 | 33.00 | 112.00 | 0.00 | 0.00 | 32.00 | 2060.00 | 0.00 | 4438.00 | 399.00 | 6709.00 |
| 27 | MS | 0.00 | 0.00 | 0.00 | 0.00 | 9.00 | 0.00 | 5.00 | 105.00 | 0.00 | 0.00 | 58.00 | 0.00 | 6244.00 | 252.00 | 6421.00 |
| 11 | FL | 0.00 | 0.00 | 0.00 | 105.00 | 4.00 | 47.00 | 11.00 | 812.00 | 0.00 | 0.00 | 993.00 | 258.00 | 3343.00 | 311.00 | 5573.00 |
| 38 | OK | 0.00 | 6.00 | 4.00 | 29.00 | 175.00 | 0.00 | 219.00 | 0.00 | 0.00 | 0.00 | 119.00 | 0.00 | 4829.00 | 329.00 | 5381.00 |
| 3 | AR | 0.00 | 1.00 | 0.00 | 4.00 | 42.00 | 0.00 | 7.00 | 0.00 | 0.00 | 0.00 | 135.00 | 0.00 | 5116.00 | 245.00 | 5305.00 |
| 16 | IL | 0.00 | 6.00 | 0.00 | 0.00 | 31.00 | 0.00 | 594.00 | 0.00 | 0.00 | 30.00 | 146.00 | 0.00 | 4145.00 | 611.00 | 4952.00 |
| 45 | TN | 0.00 | 0.00 | 0.00 | 0.00 | 45.00 | 0.00 | 1.00 | 0.00 | 0.00 | 1.00 | 143.00 | 0.00 | 4748.00 | 260.00 | 4938.00 |
| 12 | GA | 0.00 | 0.00 | 0.00 | 10.00 | 26.00 | 0.00 | 3.00 | 0.00 | 0.00 | 1.00 | 673.00 | 6.00 | 3926.00 | 415.00 | 4645.00 |
| 17 | IN | 0.00 | 4.00 | 0.00 | 0.00 | 13.00 | 0.00 | 84.00 | 0.00 | 0.00 | 3.00 | 92.00 | 4.00 | 4224.00 | 296.00 | 4424.00 |
| 24 | MI | 0.00 | 30.00 | 0.00 | 4.00 | 7.00 | 1.00 | 594.00 | 0.00 | 0.00 | 7.00 | 178.00 | 9.00 | 3362.00 | 394.00 | 4192.00 |
| 29 | NC | 11.00 | 4.00 | 0.00 | 0.00 | 29.00 | 16.00 | 17.00 | 25.00 | 0.00 | 2.00 | 478.00 | 14.00 | 2536.00 | 266.00 | 3132.00 |
| 18 | KS | 0.00 | 11.00 | 0.00 | 4.00 | 24.00 | 25.00 | 17.00 | 0.00 | 0.00 | 18.00 | 256.00 | 0.00 | 2721.00 | 373.00 | 3076.00 |
| 19 | KY | 0.00 | 4.00 | 0.00 | 0.00 | 27.00 | 0.00 | 152.00 | 0.00 | 0.00 | 5.00 | 45.00 | 0.00 | 2806.00 | 441.00 | 3039.00 |
| 20 | LA | 0.00 | 0.00 | 0.00 | 3.00 | 6.00 | 76.00 | 3.00 | 3.00 | 0.00 | 0.00 | 158.00 | 2.00 | 2676.00 | 288.00 | 2927.00 |
| 6 | CA | 25.00 | 96.00 | 0.00 | 1128.00 | 110.00 | 570.00 | 265.00 | 0.00 | 0.00 | 57.00 | 387.00 | 192.00 | 88.00 | 310.00 | 2918.00 |
| 40 | PA | 0.00 | 234.00 | 0.00 | 0.00 | 166.00 | 16.00 | 366.00 | 0.00 | 67.00 | 303.00 | 409.00 | 0.00 | 1241.00 | 395.00 | 2802.00 |
| 14 | IA | 0.00 | 0.00 | 0.00 | 2.00 | 138.00 | 0.00 | 22.00 | 0.00 | 0.00 | 0.00 | 199.00 | 0.00 | 2208.00 | 310.00 | 2569.00 |
| 25 | MN | 0.00 | 6.00 | 0.00 | 2.00 | 40.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 116.00 | 2.00 | 1976.00 | 140.00 | 2142.00 |
| 52 | WI | 0.00 | 54.00 | 0.00 | 3.00 | 19.00 | 2.00 | 79.00 | 0.00 | 0.00 | 21.00 | 190.00 | 0.00 | 1601.00 | 255.00 | 1969.00 |
| 21 | MA | 0.00 | 8.00 | 0.00 | 0.00 | 5.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | 175.00 | 2.00 | 1758.00 | 151.00 | 1949.00 |
| 43 | SC | 0.00 | 0.00 | 0.00 | 0.00 | 19.00 | 13.00 | 20.00 | 3.00 | 0.00 | 0.00 | 147.00 | 4.00 | 1314.00 | 256.00 | 1520.00 |
| 22 | MD | 0.00 | 5.00 | 0.00 | 2.00 | 29.00 | 38.00 | 545.00 | 0.00 | 0.00 | 0.00 | 472.00 | 2.00 | 314.00 | 125.00 | 1407.00 |
| 48 | VA | 0.00 | 51.00 | 0.00 | 4.00 | 16.00 | 0.00 | 252.00 | 4.00 | 0.00 | 1.00 | 141.00 | 13.00 | 914.00 | 285.00 | 1396.00 |
| 31 | NE | 0.00 | 1.00 | 0.00 | 0.00 | 4.00 | 2.00 | 0.00 | 0.00 | 15.00 | 0.00 | 164.00 | 0.00 | 1158.00 | 127.00 | 1344.00 |
| 47 | UT | 25.00 | 0.00 | 0.00 | 6.00 | 41.00 | 32.00 | 0.00 | 0.00 | 0.00 | 259.00 | 523.00 | 0.00 | 91.00 | 93.00 | 977.00 |
| 33 | NJ | 0.00 | 8.00 | 0.00 | 10.00 | 198.00 | 17.00 | 304.00 | 0.00 | 0.00 | 12.00 | 189.00 | 34.00 | 70.00 | 307.00 | 842.00 |
| 7 | CO | 49.00 | 15.00 | 0.00 | 11.00 | 64.00 | 5.00 | 0.00 | 0.00 | 0.00 | 15.00 | 412.00 | 0.00 | 261.00 | 155.00 | 832.00 |
The following table shows the first states by number of fatalities.
| State | AVALANCHE | COLD | FIRE | FLOOD | FOG | HEAT | HURRICANE | ICE | SNOW | STORM | SURF | TORNADO | WIND | TOTAL | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 16 | IL | 0.00 | 15.00 | 0.00 | 27.00 | 0.00 | 983.00 | 0.00 | 0.00 | 7.00 | 33.00 | 1.00 | 203.00 | 152.00 | 1421.00 |
| 46 | TX | 0.00 | 23.00 | 24.00 | 253.00 | 0.00 | 298.00 | 6.00 | 0.00 | 3.00 | 130.00 | 23.00 | 538.00 | 65.00 | 1363.00 |
| 40 | PA | 0.00 | 18.00 | 0.00 | 86.00 | 0.00 | 510.00 | 0.00 | 0.00 | 5.00 | 50.00 | 29.00 | 82.00 | 57.00 | 837.00 |
| 2 | AL | 0.00 | 9.00 | 0.00 | 19.00 | 1.00 | 22.00 | 2.00 | 0.00 | 0.00 | 40.00 | 26.00 | 617.00 | 46.00 | 782.00 |
| 26 | MO | 0.00 | 4.00 | 0.00 | 88.00 | 0.00 | 233.00 | 0.00 | 0.00 | 0.00 | 25.00 | 0.00 | 388.00 | 15.00 | 753.00 |
| 11 | FL | 0.00 | 2.00 | 1.00 | 10.00 | 6.00 | 11.00 | 47.00 | 0.00 | 0.00 | 154.00 | 282.00 | 186.00 | 45.00 | 744.00 |
| 27 | MS | 0.00 | 3.00 | 0.00 | 16.00 | 0.00 | 26.00 | 16.00 | 0.00 | 0.00 | 17.00 | 0.00 | 450.00 | 26.00 | 554.00 |
| 6 | CA | 35.00 | 24.00 | 39.00 | 78.00 | 32.00 | 118.00 | 0.00 | 0.00 | 14.00 | 61.00 | 98.00 | 0.00 | 40.00 | 539.00 |
| 3 | AR | 0.00 | 4.00 | 0.00 | 61.00 | 0.00 | 35.00 | 0.00 | 0.00 | 0.00 | 21.00 | 0.00 | 379.00 | 29.00 | 529.00 |
| 45 | TN | 0.00 | 5.00 | 0.00 | 59.00 | 0.00 | 39.00 | 0.00 | 0.00 | 0.00 | 24.00 | 0.00 | 368.00 | 26.00 | 521.00 |
| 38 | OK | 0.00 | 5.00 | 5.00 | 38.00 | 0.00 | 87.00 | 0.00 | 0.00 | 1.00 | 14.00 | 0.00 | 296.00 | 12.00 | 458.00 |
| 37 | OH | 0.00 | 15.00 | 0.00 | 54.00 | 5.00 | 31.00 | 0.00 | 0.00 | 3.00 | 39.00 | 0.00 | 191.00 | 64.00 | 402.00 |
| 24 | MI | 0.00 | 16.00 | 0.00 | 9.00 | 1.00 | 23.00 | 0.00 | 0.00 | 1.00 | 21.00 | 22.00 | 243.00 | 58.00 | 394.00 |
| 29 | NC | 4.00 | 26.00 | 0.00 | 66.00 | 1.00 | 15.00 | 31.00 | 0.00 | 3.00 | 43.00 | 40.00 | 126.00 | 38.00 | 393.00 |
| 17 | IN | 0.00 | 5.00 | 0.00 | 43.00 | 1.00 | 16.00 | 0.00 | 0.00 | 1.00 | 15.00 | 11.00 | 252.00 | 47.00 | 391.00 |
| 18 | KS | 0.00 | 3.00 | 0.00 | 24.00 | 5.00 | 15.00 | 0.00 | 0.00 | 9.00 | 43.00 | 0.00 | 236.00 | 20.00 | 355.00 |
| 36 | NY | 1.00 | 3.00 | 0.00 | 60.00 | 2.00 | 100.00 | 0.00 | 1.00 | 7.00 | 25.00 | 38.00 | 22.00 | 78.00 | 337.00 |
| 12 | GA | 0.00 | 1.00 | 0.00 | 43.00 | 0.00 | 13.00 | 0.00 | 0.00 | 0.00 | 45.00 | 1.00 | 180.00 | 44.00 | 327.00 |
| 20 | LA | 0.00 | 2.00 | 0.00 | 13.00 | 1.00 | 62.00 | 2.00 | 0.00 | 0.00 | 31.00 | 2.00 | 156.00 | 36.00 | 305.00 |
| 52 | WI | 0.00 | 11.00 | 1.00 | 10.00 | 12.00 | 98.00 | 0.00 | 0.00 | 1.00 | 13.00 | 0.00 | 96.00 | 35.00 | 277.00 |
| 19 | KY | 0.00 | 0.00 | 0.00 | 60.00 | 0.00 | 6.00 | 0.00 | 2.00 | 5.00 | 14.00 | 0.00 | 125.00 | 23.00 | 235.00 |
| 43 | SC | 0.00 | 28.00 | 0.00 | 8.00 | 1.00 | 41.00 | 1.00 | 0.00 | 0.00 | 27.00 | 20.00 | 59.00 | 31.00 | 216.00 |
| 5 | AZ | 0.00 | 0.00 | 0.00 | 63.00 | 0.00 | 58.00 | 0.00 | 0.00 | 3.00 | 57.00 | 0.00 | 3.00 | 22.00 | 206.00 |
| 33 | NJ | 0.00 | 2.00 | 0.00 | 25.00 | 0.00 | 48.00 | 0.00 | 0.00 | 1.00 | 25.00 | 38.00 | 1.00 | 38.00 | 178.00 |
| 25 | MN | 0.00 | 2.00 | 0.00 | 18.00 | 0.00 | 15.00 | 0.00 | 0.00 | 0.00 | 18.00 | 0.00 | 99.00 | 16.00 | 168.00 |
| 48 | VA | 1.00 | 9.00 | 0.00 | 47.00 | 0.00 | 6.00 | 5.00 | 0.00 | 3.00 | 31.00 | 5.00 | 36.00 | 22.00 | 165.00 |
| 7 | CO | 52.00 | 5.00 | 0.00 | 15.00 | 1.00 | 0.00 | 0.00 | 0.00 | 9.00 | 62.00 | 0.00 | 5.00 | 13.00 | 162.00 |
| 22 | MD | 0.00 | 5.00 | 0.00 | 13.00 | 1.00 | 100.00 | 0.00 | 0.00 | 0.00 | 26.00 | 1.00 | 7.00 | 9.00 | 162.00 |
| 14 | IA | 0.00 | 4.00 | 0.00 | 9.00 | 0.00 | 5.00 | 0.00 | 0.00 | 3.00 | 23.00 | 0.00 | 81.00 | 15.00 | 140.00 |
| 51 | WA | 36.00 | 4.00 | 4.00 | 8.00 | 1.00 | 3.00 | 0.00 | 4.00 | 8.00 | 16.00 | 1.00 | 6.00 | 47.00 | 138.00 |
From this tables we have that:
To explore the economic consequences by state and event type we present the next plot, where the left figure shows tha damages states distribution in a USA map and the right figure show the distribution os damages by event type.
## Plot the map with the most economics consequences events by state.
g.plot.damages.map <- ggplot()
g.plot.damages.map <- g.plot.damages.map + geom_polygon(data=states.poligon.data , aes(x=long, y=lat, group = group), colour="white", fill="#eeeecc" )
g.plot.damages.map <- g.plot.damages.map + geom_point(data=noaa.data.damages.by.eventype.state, aes(x=longitude, y=latitude, size=Damages, colour=Event), shape = 1)
g.plot.damages.map <- g.plot.damages.map + geom_text(data=noaa.data.damages.by.eventype.state, hjust=0.5, vjust=-0.5, aes(x=longitude, y=latitude, label=State), colour="#333333", size=4)
g.plot.damages.map <- g.plot.damages.map + labs(size="Total damages" , color="Event type") + scale_size(range = c(1, 15)) # + theme(legend.text=element_text(size=4))
g.plot.damages.map <- g.plot.damages.map + xlab("Longitude") + ylab("Latitude") + ggtitle("State distribution of total damages (in USA million dollars) by event type.")
## Plot the total damages by event type.
g.plot.damages.by.event.type <- ggplot(data=noaa.data.damages.by.eventype.state , aes(x=Event, y=Damages, colour=Event, fill=Event))
g.plot.damages.by.event.type <- g.plot.damages.by.event.type + geom_bar(stat="identity") + labs(fill="Event type") + guides(colour=FALSE)
g.plot.damages.by.event.type <- g.plot.damages.by.event.type + xlab("Event type") + ylab("Total damages") + ggtitle("Total damages (in USA million dollars) by event type.")
# PLot grid
grid.arrange(g.plot.damages.map, g.plot.damages.by.event.type, ncol = 2, main = "Total damages across USA by event type.")
From this plot we have that:
The mean as standard deviation of damages in million dollars by state are:
The following table shows the first states by damages in million dollars .
| State | AVALANCHE | COLD | DRY | FIRE | FLOOD | FOG | HEAT | HURRICANE | ICE | SNOW | STORM | SURF | TORNADO | WIND | TOTAL | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 20 | LA | 0.00 | 57.87 | 587.43 | 3.59 | 774.04 | 0.25 | 0.11 | 21581.67 | 0.00 | 0.30 | 35153.38 | 0.00 | 1190.41 | 151.88 | 59349.06 |
| 46 | TX | 0.00 | 21.63 | 6523.63 | 608.01 | 1152.64 | 0.38 | 0.25 | 4143.14 | 0.00 | 31.23 | 14717.59 | 0.00 | 3645.37 | 945.42 | 30843.87 |
| 27 | MS | 0.00 | 0.24 | 3.15 | 0.00 | 1453.85 | 0.00 | 0.08 | 9808.55 | 0.00 | 4.71 | 16441.58 | 0.00 | 2074.88 | 268.26 | 29787.03 |
| 11 | FL | 0.00 | 874.11 | 100.00 | 410.64 | 1738.89 | 1.73 | 0.00 | 13220.81 | 0.00 | 0.00 | 1136.34 | 7.62 | 1753.92 | 2647.04 | 19244.06 |
| 2 | AL | 0.13 | 52.01 | 0.10 | 0.10 | 1423.76 | 1.50 | 400.10 | 1049.29 | 0.00 | 0.00 | 5134.01 | 8.10 | 6146.15 | 150.81 | 14215.25 |
| 6 | CA | 228.59 | 1001.77 | 0.05 | 3894.29 | 2240.97 | 13.79 | 492.41 | 0.00 | 0.00 | 2.37 | 1140.50 | 61.65 | 113.76 | 743.86 | 9190.15 |
| 14 | IA | 0.00 | 298.56 | 2654.78 | 1.10 | 3081.56 | 0.36 | 6.49 | 0.00 | 51.00 | 76.19 | 662.70 | 0.00 | 2130.58 | 400.07 | 8963.33 |
| 16 | IL | 0.05 | 29.42 | 284.57 | 2.12 | 6065.67 | 0.10 | 0.46 | 0.00 | 0.00 | 5.62 | 142.76 | 0.00 | 1748.02 | 633.95 | 8278.79 |
| 26 | MO | 0.00 | 11.10 | 30.84 | 2.11 | 1122.43 | 0.00 | 1.25 | 0.00 | 0.00 | 18.63 | 1231.97 | 0.00 | 4740.33 | 256.09 | 7158.66 |
| 37 | OH | 0.00 | 11.12 | 200.00 | 0.00 | 1476.91 | 0.56 | 6.10 | 0.00 | 0.00 | 72.62 | 1838.91 | 0.00 | 2263.07 | 862.75 | 5869.29 |
| 38 | OK | 0.00 | 0.96 | 1097.91 | 15.76 | 212.42 | 0.06 | 0.01 | 0.00 | 0.00 | 7.45 | 944.85 | 0.00 | 3318.66 | 1020.26 | 5598.09 |
| 12 | GA | 0.10 | 160.50 | 717.29 | 27.78 | 630.16 | 0.00 | 0.00 | 7.70 | 0.00 | 0.59 | 533.94 | 0.02 | 3225.99 | 357.80 | 5304.06 |
| 30 | ND | 0.00 | 0.67 | 0.00 | 0.60 | 4137.39 | 0.00 | 0.00 | 0.00 | 0.01 | 0.68 | 984.25 | 0.00 | 119.99 | 422.33 | 5243.59 |
| 29 | NC | 30.74 | 3.61 | 13.07 | 2.60 | 484.26 | 0.05 | 0.00 | 2383.06 | 0.00 | 4.04 | 628.97 | 0.00 | 1549.76 | 251.96 | 5100.17 |
| 40 | PA | 0.07 | 32.39 | 539.40 | 1.41 | 2197.75 | 0.00 | 0.01 | 0.00 | 0.57 | 92.80 | 161.70 | 0.00 | 1786.52 | 240.48 | 4812.61 |
| 45 | TN | 0.00 | 0.30 | 0.00 | 0.10 | 3113.27 | 0.00 | 0.00 | 0.00 | 0.50 | 5.45 | 47.27 | 0.00 | 1516.48 | 206.89 | 4683.36 |
| 25 | MN | 0.00 | 0.05 | 0.00 | 4.03 | 1452.39 | 0.02 | 2.13 | 0.00 | 0.00 | 0.12 | 1106.75 | 0.00 | 1871.19 | 865.10 | 4436.67 |
| 31 | NE | 0.00 | 313.36 | 720.05 | 6.69 | 193.19 | 0.49 | 8.08 | 0.00 | 0.18 | 5.10 | 1480.83 | 0.00 | 1666.98 | 347.94 | 4394.95 |
| 18 | KS | 0.00 | 35.38 | 155.95 | 2.65 | 461.42 | 0.12 | 0.11 | 0.00 | 0.00 | 59.44 | 962.80 | 0.00 | 2613.65 | 268.12 | 4291.51 |
| 36 | NY | 0.50 | 7.82 | 100.20 | 0.10 | 3150.70 | 0.00 | 0.00 | 0.00 | 11.00 | 175.80 | 264.50 | 39.20 | 456.43 | 602.64 | 4206.24 |
| 3 | AR | 0.08 | 1.08 | 2.65 | 8.43 | 536.32 | 0.00 | 0.00 | 7.70 | 0.00 | 50.76 | 914.56 | 0.00 | 2568.74 | 127.00 | 4090.31 |
| 17 | IN | 0.00 | 7.88 | 73.00 | 0.00 | 1152.26 | 0.47 | 1.00 | 0.00 | 0.00 | 3.39 | 114.56 | 0.00 | 2523.43 | 122.82 | 3875.99 |
| 5 | AZ | 0.01 | 1.83 | 0.00 | 222.72 | 123.84 | 0.00 | 0.00 | 0.00 | 0.00 | 0.11 | 3046.77 | 0.00 | 47.76 | 505.35 | 3443.02 |
| 33 | NJ | 0.00 | 0.00 | 80.00 | 1.26 | 2858.31 | 0.00 | 0.00 | 1.15 | 0.00 | 19.10 | 104.99 | 4.00 | 78.61 | 82.54 | 3147.43 |
| 7 | CO | 3.47 | 28.14 | 0.00 | 305.88 | 479.60 | 0.35 | 0.00 | 0.00 | 0.00 | 5.74 | 1551.01 | 0.00 | 293.77 | 85.19 | 2667.97 |
| 19 | KY | 0.00 | 5.17 | 226.00 | 0.03 | 806.96 | 0.00 | 0.00 | 0.00 | 0.00 | 11.12 | 655.12 | 0.00 | 885.93 | 345.64 | 2590.33 |
| 52 | WI | 0.00 | 15.64 | 4.53 | 2.24 | 1035.47 | 0.02 | 0.04 | 0.00 | 0.00 | 0.13 | 510.46 | 0.00 | 833.50 | 198.32 | 2402.03 |
| 24 | MI | 0.00 | 54.58 | 150.00 | 7.60 | 316.29 | 0.00 | 0.00 | 0.00 | 0.00 | 11.62 | 512.05 | 0.00 | 1058.69 | 365.37 | 2110.82 |
| 34 | NM | 0.00 | 0.00 | 14.40 | 1647.88 | 69.25 | 0.01 | 0.00 | 0.00 | 0.00 | 0.18 | 122.36 | 0.00 | 57.94 | 23.61 | 1912.02 |
| 48 | VA | 0.03 | 6.12 | 297.48 | 5.77 | 408.39 | 0.80 | 0.60 | 60.10 | 0.00 | 6.00 | 155.96 | 0.08 | 434.75 | 153.36 | 1376.07 |
From the previous plot and this table we have that the south east region (LA, TX, AL, MS, FL) have the larger impact in ecomonomics because of climatic events.