NOAA Storm Data Analysis

1. Sypnosis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. The goal of this project to assess which weather event types are the ones that have the larger impact on US territory regarding public health and safety and the overall economy.

2. Data Processing

The data for this assignment can be downloaded from the following link: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

2.1.

Loading the data The data was downloaded from the above mentioned link, and subsequently loaded into R with the following code. The required R packages for the analysis will be loaded in too.

storm <- read.csv("C:/Users/10012186/Documents/work/repdata_data_StormData.csv/repdata_data_StormData.csv")
dim(storm)
## [1] 902297     37
library(knitr)
## Warning: package 'knitr' was built under R version 3.5.3
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(grid)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine

2.2. Extracting the relevant variables

The data frame has too many variables that are no informative nor useful for the purposes of this analysis. The required data needs to be selected for a more efficient analysis.

storm <-  storm[ ,c( "BGN_DATE", 
                     "EVTYPE", 
                     "FATALITIES",
                     "INJURIES",
                     "PROPDMG",
                     "PROPDMGEXP",
                     "CROPDMG",
                     "CROPDMGEXP")]

For an initial analysis of the incidences, we will first look at the ammount of data recorded by the NOAA overtime. We will have to format the date variable BGN_DATE into a new variable year, so that we can compile the events recorded by their year of occurrence.

storm$year <- as.numeric(format(as.Date(storm$BGN_DATE, "%m/%d/%Y %H :%M :%S"), "%Y"))

Now we plot the information we seek in a histogram

hist(storm$year, breaks= 61)

The histogram shows a growing curve in the ammount of data recorded over the years. This suggests that the recording efficiency has improved largely overtime: this would mean that the initial recordings may result in false variations due to the small sampling of the events. To avoid this, we will analyse the data recorded from 1994 on, which is a period where records begin to be substantialy large.

storm<-storm[storm$year >= 1994, ]
head(storm, n=2)
##                 BGN_DATE        EVTYPE FATALITIES INJURIES PROPDMG
## 187560  1/6/1995 0:00:00 FREEZING RAIN          0        0       0
## 187561 1/22/1995 0:00:00          SNOW          0        0       0
##        PROPDMGEXP CROPDMG CROPDMGEXP year
## 187560                  0            1995
## 187561                  0            1995
dim(storm)
## [1] 702131      9

2.3. Evaluating the property damage

The exponents for the property damage variable are listed and multiplied to the property damage. NA values are set to 0. The 8 event types with the most impact are selected.

unique(storm$PROPDMGEXP)
##  [1]   B K M m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
storm$PROPDMGEXP <- as.character(storm$PROPDMGEXP)
storm$PROPDMGEXP <- gsub("B", "9", storm$PROPDMGEXP)
storm$PROPDMGEXP <- gsub("H|h", "2", storm$PROPDMGEXP)
storm$PROPDMGEXP <- gsub("K", "3", storm$PROPDMGEXP)
storm$PROPDMGEXP <- gsub("M|m", "6", storm$PROPDMGEXP)
storm$PROPDMGEXP <- gsub("-|?|+", "0", storm$PROPDMGEXP)
storm$PROPDMGEXP <- as.numeric(storm$PROPDMGEXP)
## Warning: NAs introduced by coercion
storm$PROPDMGEXP[is.na(storm$PROPDMGEXP)]=0

storm$PROPDMGCOR <- storm$PROPDMG * storm$PROPDMGEXP

propdmgsum <- aggregate(PROPDMGCOR~EVTYPE, data = storm, FUN = sum)
propdmgsum <- propdmgsum[order(propdmgsum$PROPDMGCOR, decreasing = T), ]
propdmgsum <- propdmgsum[1:8, ]

2.4. Evaluating the crop damage

The exponents for the crop damage variable are listed and multiplied to the property damage. NA values are set to 0. The 8 event types with the most impact are selected.

unique(storm$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M
storm$CROPDMGEXP <- as.character(storm$CROPDMGEXP)
storm$CROPDMGEXP <- gsub("B", "9", storm$CROPDMGEXP)
storm$CROPDMGEXP <- gsub("K|k", "3", storm$CROPDMGEXP)
storm$CROPDMGEXP <- gsub("M|m", "6", storm$CROPDMGEXP)
storm$CROPDMGEXP <- as.numeric(storm$CROPDMGEXP)
## Warning: NAs introduced by coercion
storm$CROPDMGEXP[is.na(storm$CROPDMGEXP)]=0
storm$CROPDMGCOR <- storm$CROPDMG * storm$CROPDMGEXP
cropdmgsum <- aggregate(CROPDMGCOR~EVTYPE, data = storm, FUN = sum)
cropdmgsum <- cropdmgsum[order(cropdmgsum$CROPDMGCOR, decreasing = T), ]
cropdmgsum <- cropdmgsum[1:8, ]

2.4. Evaluating fatalities

Event types are evaluated by the number of fatalities they have caused, and the 8 most relevant event types are subsequently selected.

fatalities <- aggregate(FATALITIES~EVTYPE, data = storm, FUN = sum)
fataliti <- fatalities[order(fatalities$FATALITIES, decreasing = T), ]
fatalities <- fatalities[1:8, ]

2.5. Evaluating injuries

Event types are evaluated by the number of people they have injured, and the 8 most relevant event types are subsequently selected.

injuries <- aggregate(INJURIES~EVTYPE, data = storm, FUN = sum, decreasing = F)
injuries <- injuries[order(injuries$INJURIES, decreasing = T), ]
injuries <- injuries[1:8, ]

3. Results

The following plots will show the 8 most harmful climate events on US soil concerning injuries and fatalities.

f <- ggplot(data = fatalities, aes(x = EVTYPE, y = FATALITIES)) + xlab("EVENTS") +
  geom_bar(stat= "identity",color= "black", fill = "black") +
  theme_classic() + theme(axis.text.x=element_text (size = 6)) +
  geom_text(aes(y = FATALITIES, label = FATALITIES), vjust=1.6, color="white", size=2)
i <- ggplot(data = injuries, aes(x = EVTYPE, y = INJURIES)) + xlab("EVENTS") +
  geom_bar(stat= "identity",color= "black", fill = "white") +
  theme_classic() + theme(axis.text.x=element_text (size = 6)) +
  geom_text(aes(y =  INJURIES, label = INJURIES), vjust=1.6, color="black", size=2)
grid.arrange(i, f, nrow=2)

The following plots will show the 8 most harmful climate events on US soil concerning crop and property damage.

p <- ggplot(data = propdmgsum, aes(x = EVTYPE, y = PROPDMGCOR)) + xlab("EVENTS") + ylab("PROPERTY DAMAGE (US$)") +
  geom_bar(stat= "identity",color= "black", fill = "lightblue") +
  theme_classic() + theme(axis.text.x=element_text (size = 4.5)) +
  geom_text(aes(y =  PROPDMGCOR, label = PROPDMGCOR), vjust=1.6, color="black", size=2)
c <- ggplot(data = cropdmgsum, aes(x = EVTYPE, y = CROPDMGCOR)) + xlab("EVENTS") + ylab("CROP DAMAGE (US$)") +
  geom_bar(stat= "identity",color= "black", fill = "tomato1") +
  theme_classic() + theme(axis.text.x=element_text (size = 4.5)) +
  geom_text(aes(y =  CROPDMGCOR, label = CROPDMGCOR), vjust=1.6, color="black", size=2)
grid.arrange(c, p, nrow=2)