Synopsis

In this report, we will present the effects of Storms and other severe weather events to public health and its effects in economics. We colected data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The data is available since 1950.

Loading and Processing the Raw Data

For this part, we first loaded the data as following:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library (ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:graphics':
## 
##     layout
loadedData <- read.csv('repdata-data-StormData.csv',header = TRUE)

After this step, we selected just the variables of interest: EVTYPE (Event Type), FATALITIES, INJURIES, PROPDMG (Property Damage), PROPDMGEXP (Power of 10 of Property Damage), CROPDMG (Crop Damage) and CROPDMGEXP (Power of 10 of crop damage).

justFatalityData <- select(.data = loadedData, EVTYPE, FATALITIES, INJURIES)
justEconomicalData <- select(.data = loadedData, EVTYPE, PROPDMG, PROPDMGEXP,CROPDMG,CROPDMGEXP)

After doing this, we saw that the CROPDMGEXP and PROPDMGEXP were in a strange format. Some values were numbers, others were symbols or letters. So, we dids a processing to transform everything in numbers. We used sapply for this action and after this, we created new variables that can represent the total value considering the PORPDMEXP and the PROPDMG values. FinalPROPDMG = PROPDMG*10^PROPDMGEXP. The same idea was applyed to Crop Damage.

getNames <- function(x){
  if (x=='K'){
    x<-3
  }else if (x=='m'||x=='M'){
    x<-6
  }else if (x=='?'||x=='+'||x=='-'||x==''||x==' '){
    x<-0
  }else if(x=='H'||x=="h"){
    x<-2
  }else if(x=='b'||x=='B'){
    x<-9
  }else{
    x<-x
  }
}

justEconomicalData$PROPDMGEXP<-sapply(justEconomicalData$PROPDMGEXP,getNames)
justEconomicalData$CROPDMGEXP<-sapply(justEconomicalData$CROPDMGEXP,getNames)
justEconomicalData[,"FinalPROPDMG"]<-justEconomicalData[,"PROPDMG"]*10^justEconomicalData[,"PROPDMGEXP"]
justEconomicalData[,"FinalCROPDMG"]<-justEconomicalData[,"CROPDMG"]*10^justEconomicalData[,"CROPDMGEXP"]

Results

Injuries and Fatalities

In order to analyse the injuries and fatalities we used aggregate function to get just the event names and number of injuries and fatalities for each event. After this, we just selected the ones that had more than 3000 events since 1950.

aggregatedFatalitiesData <- aggregate(FATALITIES + INJURIES ~ EVTYPE  , data=loadedData, FUN = sum)
colnames (aggregatedFatalitiesData) <- c('EventType','FatalitiesInjuries')
justTopFatal <- filter(aggregatedFatalitiesData, FatalitiesInjuries > 3000)

After doing this, we plotted the data for better vizualization.

f <- list(
  family = "Courier New, monospace",
  size = 18,
  color = "#7f7f7f"
)
x <- list(
  title = "Events",
  titlefont = f
)
y <- list(
  title = "Fatalities and Injuries",
  titlefont = f
)

with(justTopFatal,
     plot_ly(
       x = EventType,
       y = FatalitiesInjuries,
       type = "bar",
     ) %>%layout(xaxis = x, yaxis = y))

As we can see, Tornado is the event that causes more fatalities and injuries in the USA, since 1950. It has almost 100,000 reported cases.

Economic Effect

After doing the preprocessing part, we could also use aggregate in a similar way we did before. The filtering part, did it for the events with an economic impact over 150 Billion dollars since 1950.

aggregatedEconomicData <- aggregate(FinalPROPDMG + FinalCROPDMG ~ EVTYPE  , data=justEconomicalData, FUN = sum)
colnames (aggregatedEconomicData) <- c('EventType','EconomicalImpact')
justTopEconomical <- filter(aggregatedEconomicData, EconomicalImpact > 150000000000)

And we finally ploted the Events that caused most economical impact in the USA.

f <- list(
  family = "Courier New, monospace",
  size = 18,
  color = "#7f7f7f"
)
x <- list(
  title = "Events",
  titlefont = f
)
y <- list(
  title = "Economical Impact in US$",
  titlefont = f
)

with(justTopEconomical,
plot_ly(
  x = EventType,
  y = EconomicalImpact,
  type = "bar",
  title="lalala"
) %>%layout(xaxis = x, yaxis = y))

As we could see, Floor and its related events are the ones that causes most of the Economical problems in the US since 1950.