Session Info

R version 4.0.0 (2020-04-24)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_4.0.0 tools_4.0.0

Reproducible Research Peer Graded Assignment: Course Project 2

Synopsis

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The file was downladed from the couse website: Storm Data(47Mb).

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
- National Weather Service Storm Data Documentation
- National Climatic Data Center Storm Events FAQ

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Data Processing

This sections seeks to describe (in words and code) how the data were loaded into R and processed for analysis.

Loading the required libraries

library("dplyr")
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library("ggplot2")
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
library(grid)

Download and read the data

file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
file_path <- "stormdata.csv.bz2"
download.file(file_url, file_path, method = "curl")
stormdata <- read.csv(file_path, stringsAsFactors = FALSE)

Pre-Analysis of Data

View a snapshot of the first rows and columns of the dataset, the summary of the dataset and the structure of the dataset

#Snapshot of the first rows and columns of dataset 
head(stormdata)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6
#Summary of the dataset
summary(stormdata)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY       COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31.0   Class :character   Class :character   Class :character  
##  Median : 75.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100.6                                                           
##  3rd Qu.:131.0                                                           
##  Max.   :873.0                                                           
##                                                                          
##    BGN_RANGE          BGN_AZI           BGN_LOCATI          END_DATE        
##  Min.   :   0.000   Length:902297      Length:902297      Length:902297     
##  1st Qu.:   0.000   Class :character   Class :character   Class :character  
##  Median :   0.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :   1.484                                                           
##  3rd Qu.:   1.000                                                           
##  Max.   :3749.000                                                           
##                                                                             
##    END_TIME           COUNTY_END COUNTYENDN       END_RANGE       
##  Length:902297      Min.   :0    Mode:logical   Min.   :  0.0000  
##  Class :character   1st Qu.:0    NA's:902297    1st Qu.:  0.0000  
##  Mode  :character   Median :0                   Median :  0.0000  
##                     Mean   :0                   Mean   :  0.9862  
##                     3rd Qu.:0                   3rd Qu.:  0.0000  
##                     Max.   :0                   Max.   :925.0000  
##                                                                   
##    END_AZI           END_LOCATI            LENGTH              WIDTH         
##  Length:902297      Length:902297      Min.   :   0.0000   Min.   :   0.000  
##  Class :character   Class :character   1st Qu.:   0.0000   1st Qu.:   0.000  
##  Mode  :character   Mode  :character   Median :   0.0000   Median :   0.000  
##                                        Mean   :   0.2301   Mean   :   7.503  
##                                        3rd Qu.:   0.0000   3rd Qu.:   0.000  
##                                        Max.   :2315.0000   Max.   :4400.000  
##                                                                              
##        F               MAG            FATALITIES          INJURIES        
##  Min.   :0.0      Min.   :    0.0   Min.   :  0.0000   Min.   :   0.0000  
##  1st Qu.:0.0      1st Qu.:    0.0   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  Median :1.0      Median :   50.0   Median :  0.0000   Median :   0.0000  
##  Mean   :0.9      Mean   :   46.9   Mean   :  0.0168   Mean   :   0.1557  
##  3rd Qu.:1.0      3rd Qu.:   75.0   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  Max.   :5.0      Max.   :22000.0   Max.   :583.0000   Max.   :1700.0000  
##  NA's   :843563                                                           
##     PROPDMG         PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Min.   :   0.00   Length:902297      Min.   :  0.000   Length:902297     
##  1st Qu.:   0.00   Class :character   1st Qu.:  0.000   Class :character  
##  Median :   0.00   Mode  :character   Median :  0.000   Mode  :character  
##  Mean   :  12.06                      Mean   :  1.527                     
##  3rd Qu.:   0.50                      3rd Qu.:  0.000                     
##  Max.   :5000.00                      Max.   :990.000                     
##                                                                           
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 
#Structure of the dataset

str(stormdata)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Create a subset of the data relevant to the population health and economic consequences of severe weather events for analysis

substormdata <- stormdata[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]

Data Processing - Polulation Health

According to the data, popluation health is affected by injuries and fatalities. These are first summarized in decending order and variables are created for each.

pophealth <-  aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data = substormdata, FUN=sum)
fatal <- pophealth[order(pophealth$FATALITIES, decreasing = T), ]
injury <- pophealth[order(pophealth$INJURIES, decreasing = T), ]

Data Processing - Economic Consequences

Create a function that converts the letter value of exponential values (h = hundred, k = thousand, m = million, b = billion) into numbers.

ExponentValue <- function(e) {
    if (e %in% c("h", "H"))
        return(2)
    else if (e %in% c("k", "K"))
        return(3)
    else if (e %in% c("m", "M"))
        return(6)
    else if (e %in% c("b", "B"))
        return(9)
    else if (!is.na(as.numeric(e))) 
        return(as.numeric(e))
    else if (e %in% c("", "-", "?", "+"))
        return(0)
    else {
        stop("Invalid value.")
    }
}

Use the function created previously to calculate the property damage and crop damage. Ignore warnings for NAs introduced by coercion.

PropertyExpVal <- sapply(substormdata$PROPDMGEXP, FUN=ExponentValue)
## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion
substormdata$PropertyDamage <- substormdata$PROPDMG * (10 ** PropertyExpVal)
CropExpVal <- sapply(substormdata$CROPDMGEXP, FUN=ExponentValue)
## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion
substormdata$CropDamage <- substormdata$CROPDMG * (10 ** CropExpVal)

Summarize the financial damage for property damage and crop damage based on the event type

EconomicDamage <- aggregate(cbind(PropertyDamage, CropDamage) ~ EVTYPE, data = substormdata, FUN=sum)

Remove events that have no financial impact

EconomicDamage <- EconomicDamage[(EconomicDamage$PropertyDamage > 0 | EconomicDamage$CropDamage > 0), ]

Sort the data in a decreasing order to be able to better identify the values that has most financial impact

PropertyDamageSorted <- EconomicDamage[order(EconomicDamage$PropertyDamage, decreasing = T), ]
CropDamageSorted <- EconomicDamage[order(EconomicDamage$CropDamage, decreasing = T), ]

Results

This section presents the results to the analysis conducted
As required, there are no more than three figures presented, however figures are allowed to have multiple plots in them (ie panel plots)
The analysis is published on RPubs.com

Results - Population Health

Display the top 10 weather events causing injuries.

head(injury[, c("EVTYPE", "INJURIES")],10)
##                EVTYPE INJURIES
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361

Display the top 10 events causing fatalitites.

head(fatal[, c("EVTYPE", "FATALITIES")],10)
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224

Create plots of the top 10 weather realated injuries and fatalaties in one grid for comparison.

library(ggplot2)
library(gridExtra)
library(grid)

p1 <- ggplot(data=head(injury,10), aes(x=reorder(EVTYPE, INJURIES), y=INJURIES)) +
   geom_bar(fill= "red",stat="identity")  +
    coord_flip(xlim = NULL, ylim = NULL, expand = TRUE, clip = "on") + 
    ylab("Top 10 Weather Related Injuries") +
    xlab("Event type") +
    ggtitle("Injuries versus Fatalities casued by Weather Events") +
    theme_light() 

p2 <- ggplot(data=head(fatal,10), aes(x=reorder(EVTYPE, FATALITIES), y=FATALITIES, add = TRUE)) +
    geom_bar(fill= "blue",stat="identity") +
    coord_flip(xlim = NULL, ylim = NULL, expand = TRUE, clip = "on") +
    ylab("Top 10 Weather Related Fatalities") +
    xlab("Event type") +
    theme_light()

grid.arrange(p1, p2, nrow=2)

Summary of Analysis - Polulation Health
According to the results of the data processing, Tornado was the number one event type to cause weather related injuries and fatalities. Other events common to both injuries and fatalities within the top 10 included Flood, Excessive Heat, TSTM Wind, Heat, Lightning and Flash Flood.

Results - Economic Consequences

Display the top 10 weather events causing financial damage to property.

head(PropertyDamageSorted[, c("EVTYPE", "PropertyDamage")], 10)
##                EVTYPE PropertyDamage
## 170             FLOOD   144657709807
## 411 HURRICANE/TYPHOON    69305840000
## 834           TORNADO    56947380677
## 670       STORM SURGE    43323536000
## 153       FLASH FLOOD    16822673979
## 244              HAIL    15735267513
## 402         HURRICANE    11868319010
## 848    TROPICAL STORM     7703890550
## 972      WINTER STORM     6688497251
## 359         HIGH WIND     5270046295

Display the top 10 weather events causing financial damage to property.

head(CropDamageSorted[, c("EVTYPE", "CropDamage")], 10)
##                EVTYPE  CropDamage
## 95            DROUGHT 13972566000
## 170             FLOOD  5661968450
## 590       RIVER FLOOD  5029459000
## 427         ICE STORM  5022113500
## 244              HAIL  3025954473
## 402         HURRICANE  2741910000
## 411 HURRICANE/TYPHOON  2607872800
## 153       FLASH FLOOD  1421317100
## 140      EXTREME COLD  1292973000
## 212      FROST/FREEZE  1094086000

Create plots of the top 10 weather realated injuries and fatalities in one grid for comparison.

p1 <- ggplot(data=head(PropertyDamageSorted,10), aes(x=reorder(EVTYPE, PropertyDamage), y= PropertyDamage/1000000000, fill=PropertyDamage)) +
    geom_bar(fill="orange", stat="identity") +
    coord_flip() +
    xlab("Event type") +
    ylab("Financial Value of damages to Property in Billion Dollars") +
    ggtitle("Property versus Crop Damage casued by Weather Events") +
    theme_light()

p2 <- ggplot(data=head(CropDamageSorted,10), aes(x=reorder(EVTYPE, CropDamage), y= CropDamage/1000000000, fill=CropDamage)) +
    geom_bar(fill="green", stat="identity") + 
    coord_flip() + 
    xlab("Event type") + 
    ylab("Financial Value of damages to Crops in Billion Dollars") + 
    theme_light()

grid.arrange(p1, p2, ncol=1, nrow =2)

Summary of Analysis - Economic Consequences
According to the results of the data processing, the event ‘Flood’ accounted for the most costly property damage while it was the second costly for damages to crops. The most costly damages to crops was caused by ‘Drought’.

Conclusion

Overall, Tornadoes accounted for the highest costs in relation to injuries and fatalities caused by weather events while drought, Hurricane/Typhoon and flood accounted for the highest costs related to crop and property damages.