Synopsis

This report aims to show different results about the influence of storms and other weather events into the economy of communities and municipalities. There are two aspects that are considered in the research, economy and public health. For each factor it is stablished the damage done.

Data processing

Loading packages

Some steps require functions that are present in certain R packages that need to be loaded in the project.

library(R.utils)
library(dplyr)
library(plyr)
library(ggplot2)

Getting the dataset

In this section it is important to notice about the work environment:

  • In the same directory in which the .Rmd exists, there is a folder called data.

  • In the data folder it’s located the csv file required for the analysis of this project.

fileDirectory <- dirname(rstudioapi::getSourceEditorContext()$path)
dataDirectory <- paste(fileDirectory, "/", "data", sep = "")
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = paste(dataDirectory, "/repdata_data_StormData.csv.bz2", sep = ""))
bunzip2(paste(dataDirectory, "/repdata_data_StormData.csv.bz2", sep = ""), overwrite = TRUE, remove = FALSE)
storms <- read.csv(paste(dataDirectory, "/repdata_data_StormData.csv", sep = ""))

Reviewing the dataset

As for the loaded data, it can be reviewed for some of its content.

head(storms)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6

As it can be seen, the structure of the dataset can be appreciated within the first six rows. Eventhough there isn’t enough information for all the variables presented. It can be an overview as how the data is formatted for some columns and the relation within each one of them.

dim(storms)
## [1] 902297     37

As per the dimensions of the dataset, it is known that there are 902297 rows or data entries, and there are 37 columns or variables.

summary(storms)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY       COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31.0   Class :character   Class :character   Class :character  
##  Median : 75.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100.6                                                           
##  3rd Qu.:131.0                                                           
##  Max.   :873.0                                                           
##                                                                          
##    BGN_RANGE          BGN_AZI           BGN_LOCATI          END_DATE        
##  Min.   :   0.000   Length:902297      Length:902297      Length:902297     
##  1st Qu.:   0.000   Class :character   Class :character   Class :character  
##  Median :   0.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :   1.484                                                           
##  3rd Qu.:   1.000                                                           
##  Max.   :3749.000                                                           
##                                                                             
##    END_TIME           COUNTY_END COUNTYENDN       END_RANGE       
##  Length:902297      Min.   :0    Mode:logical   Min.   :  0.0000  
##  Class :character   1st Qu.:0    NA's:902297    1st Qu.:  0.0000  
##  Mode  :character   Median :0                   Median :  0.0000  
##                     Mean   :0                   Mean   :  0.9862  
##                     3rd Qu.:0                   3rd Qu.:  0.0000  
##                     Max.   :0                   Max.   :925.0000  
##                                                                   
##    END_AZI           END_LOCATI            LENGTH              WIDTH         
##  Length:902297      Length:902297      Min.   :   0.0000   Min.   :   0.000  
##  Class :character   Class :character   1st Qu.:   0.0000   1st Qu.:   0.000  
##  Mode  :character   Mode  :character   Median :   0.0000   Median :   0.000  
##                                        Mean   :   0.2301   Mean   :   7.503  
##                                        3rd Qu.:   0.0000   3rd Qu.:   0.000  
##                                        Max.   :2315.0000   Max.   :4400.000  
##                                                                              
##        F               MAG            FATALITIES          INJURIES        
##  Min.   :0.0      Min.   :    0.0   Min.   :  0.0000   Min.   :   0.0000  
##  1st Qu.:0.0      1st Qu.:    0.0   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  Median :1.0      Median :   50.0   Median :  0.0000   Median :   0.0000  
##  Mean   :0.9      Mean   :   46.9   Mean   :  0.0168   Mean   :   0.1557  
##  3rd Qu.:1.0      3rd Qu.:   75.0   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  Max.   :5.0      Max.   :22000.0   Max.   :583.0000   Max.   :1700.0000  
##  NA's   :843563                                                           
##     PROPDMG         PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Min.   :   0.00   Length:902297      Min.   :  0.000   Length:902297     
##  1st Qu.:   0.00   Class :character   1st Qu.:  0.000   Class :character  
##  Median :   0.00   Mode  :character   Median :  0.000   Mode  :character  
##  Mean   :  12.06                      Mean   :  1.527                     
##  3rd Qu.:   0.50                      3rd Qu.:  0.000                     
##  Max.   :5000.00                      Max.   :990.000                     
##                                                                           
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 

As per the description of each variable, it can be summarized by the aspect of type, in which 18 of them are character, 18 numeric and 1 is logical. There is also a specific aspect that is advisable to take into account, the NAs data.

Data processing

The dataset of storms and other weather events is required to answer to specific questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Setting necessary variables

For this type of study it is necessary to get some of the variables from the entire dataset to get a direct analysis. The columns needed to achieve answers for the principal questions are (in order of appearance):

  • EVTYPE

  • FATALITIES

  • INJURIES

  • PROPDMG

  • PROPDMGEXP

  • CROPDMG

  • CROPDMGEXP

Grouping

The weather events have different association with a principal event:

summary(unique(storms$EVTYPE))
##    Length     Class      Mode 
##       985 character character

As shown in the summary table, there are 985 weather events in which they may influence about the public health risk. Since the data is excessive, the focus about the information provided will be centered in the top 10 weather events.

Security aspect

One of the questions involves the security related to the weather events. In this aspect there are two variables that considers damage related to population:

  • FATALITIES

  • INJURIES

To get a more visible ratio of population damage, it can be calculated the total of public health risk by weather:

totalFatalities <- aggregate(storms$FATALITIES, by = list(Events = storms$EVTYPE), FUN = sum)
top10TotalFatalities <- head(arrange(totalFatalities, desc(x)), n = 10)
top10TotalFatalities
##            Events    x
## 1         TORNADO 5633
## 2  EXCESSIVE HEAT 1903
## 3     FLASH FLOOD  978
## 4            HEAT  937
## 5       LIGHTNING  816
## 6       TSTM WIND  504
## 7           FLOOD  470
## 8     RIP CURRENT  368
## 9       HIGH WIND  248
## 10      AVALANCHE  224

As it can be seen, tornadoes and heat are the two mayor events that risk people lives with more than a thousand fatalities all over the data recollection.

totalInjuries <- aggregate(storms$INJURIES, by = list(Events = storms$EVTYPE), FUN = sum)
top10TotalInjuries <- head(arrange(totalInjuries, desc(x)), n = 10)
top10TotalInjuries
##               Events     x
## 1            TORNADO 91346
## 2          TSTM WIND  6957
## 3              FLOOD  6789
## 4     EXCESSIVE HEAT  6525
## 5          LIGHTNING  5230
## 6               HEAT  2100
## 7          ICE STORM  1975
## 8        FLASH FLOOD  1777
## 9  THUNDERSTORM WIND  1488
## 10              HAIL  1361

In the other hand, there are some differences about the top 10 weather events that causes more injuries in population. But tornadoes keep representing the one event with more harmful consequences.

In this aspect, tornadoes are the weather events that threatens the most to population en public health, following by excessive heat. This two factors, even though they are opposite from each other taking into account the type of weather required for them to succeed, are a point of interest to keep track of anomalies in their respective season of high probability of occurrence.

Economy aspect

For the consideration of the economy, there are some variables that counts damage done property as well as crops. There are also other annotations that are crucial for the quantities of monetary damage generated, these are located in the EXP columns. The referred exponentials are:

  • H,h = hundreds = 100

  • K,k = kilos = thousands = 1,000

  • M,m = millions = 1,000,000

  • B,b = billions = 1,000,000,000

  • (+) = 1

  • (-) = 0

  • (?) = 0

  • black/empty character = 0

  • numeric 0..8 = 10

As a matter of more detailed numbers according to the exponential, it is necessary to get the corresponding value of the damaged done to properties and crops.

storms$PROPDMGEXP <- mapvalues(storms$PROPDMGEXP, from = c("H", "h", "K", "k", "M", "m", "B", "b", "+", "-", "?", "", "0", "1", "2", "3", "4", "5", "6", "7", "8"), to = c(10^2, 10^2, 10^3, 10^3, 10^6, 10^6, 10^9, 10^9, 1, 0, 0, 0, 10, 10, 10, 10, 10, 10, 10, 10, 10))
## The following `from` values were not present in `x`: k, b
storms$PROPDMGEXP <- as.numeric(as.character(storms$PROPDMGEXP))
storms$PROPTOTALDMG <- (storms$PROPDMG * storms$PROPDMGEXP) / 1000000000
totalPopertyDamage <- aggregate(storms$PROPTOTALDMG, by = list(Events = storms$EVTYPE), FUN = sum)
top10TotalPropertyDamage <- head(arrange(totalPopertyDamage, desc(x)), n = 10)
top10TotalPropertyDamage
##               Events          x
## 1              FLOOD 144.657710
## 2  HURRICANE/TYPHOON  69.305840
## 3            TORNADO  56.937163
## 4        STORM SURGE  43.323536
## 5        FLASH FLOOD  16.140815
## 6               HAIL  15.732270
## 7          HURRICANE  11.868319
## 8     TROPICAL STORM   7.703891
## 9       WINTER STORM   6.688497
## 10         HIGH WIND   5.270046

As the table of the top 10 weather events related to property damage, flood is the one which takes some consideration in monetary loss from structures.

storms$CROPDMGEXP <- mapvalues(storms$CROPDMGEXP, from = c("H", "h", "K", "k", "M", "m", "B", "b", "+", "-", "?", "", "0", "1", "2", "3", "4", "5", "6", "7", "8"), to = c(10^2, 10^2, 10^3, 10^3, 10^6, 10^6, 10^9, 10^9, 1, 0, 0, 0, 10, 10, 10, 10, 10, 10, 10, 10, 10))
## The following `from` values were not present in `x`: H, h, b, +, -, 1, 3, 4, 5, 6, 7, 8
storms$CROPDMGEXP <- as.numeric(as.character(storms$CROPDMGEXP))
storms$CROPTOTALDMG <- (storms$CROPDMG * storms$CROPDMGEXP) / 1000000000
totalCropDamage <- aggregate(storms$CROPTOTALDMG, by = list(Events = storms$EVTYPE), FUN = sum)
top10TotalCropDamage <- head(arrange(totalCropDamage, desc(x)), n = 10)
top10TotalCropDamage
##               Events         x
## 1            DROUGHT 13.972566
## 2              FLOOD  5.661968
## 3        RIVER FLOOD  5.029459
## 4          ICE STORM  5.022113
## 5               HAIL  3.025955
## 6          HURRICANE  2.741910
## 7  HURRICANE/TYPHOON  2.607873
## 8        FLASH FLOOD  1.421317
## 9       EXTREME COLD  1.292973
## 10      FROST/FREEZE  1.094086

As a matter of logical thinking, crops would be susceptible to drought, some of the opposite to properties. These two elements (crops and properties) are different in physical aspects, meaning that there are different events that can affect them more than others. But flood is still one of the most damaging event that can be conceptualize in negative impact as a economic factor.

Results

To get a more visible perception about the elements within the factors of analysis, the data can be plot to differentiate the magnitude of damaged caused.

Public health

fatalityPlot <- ggplot(top10TotalFatalities, aes(x = Events, y = x)) + geom_bar(stat='identity') + theme(axis.text.x = element_text(size  = 6)) + xlab("Weather events") + ylab("Total fatalities") + labs(title = "Fatalities of weather events")
fatalityPlot

injuryPlot <- ggplot(top10TotalInjuries, aes(x = Events, y = x)) + geom_bar(stat='identity') + theme(axis.text.x = element_text(size  = 4)) + xlab("Weather events") + ylab("Total injuries") + labs(title = "Injuries of weather events")
injuryPlot

As shown above, tornadoes are one of the most devastating weather events for population. Certain securities strategies need to be planned when these specific factor occurs (this doesn’t mean that other natural disaster has to be looked down as a minor one).

Economy aspect

propertyPlot <- ggplot(top10TotalPropertyDamage, aes(x = Events, y = x)) + geom_bar(stat = 'identity') + theme(axis.text.x = element_text(size  = 4)) + xlab("Weather events") + ylab("Monetary property damage (Billions)") + labs(title = "Monetary property damage by weather events")
propertyPlot

cropPlot <- ggplot(top10TotalCropDamage, aes(x = Events, y = x)) + geom_bar(stat = 'identity') + theme(axis.text.x = element_text(size  = 4)) + xlab("Weather events") + ylab("Monetary crop damage (Billions)") + labs(title = "Monetary crop damage by weather events")
cropPlot

In the economy loss related to weather events, flood and drought are the two most dangerous natural disasters. The first consist in water damage to different infrastructures and the other is the death of crops with poor weather.