Title: Severe storm weather data analysis.

Synopsis

Objectives. To investigate and determine which weather events have severe impact on both public health and economic problems to the population so as to plan in advance for prevention.

Methods. Data was obtained from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The examined events in the database start in the year 1950 and end in November 2011. The severity of the weather event was established from the estimates of any fatalities, injuries, crop and property damage. These estimates are geared to answer the following questions;

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Results. Based on the analysis, tornado is indicated to be the most severe event that cause most of the harmful to population health as well as property damage. River flood is indicated to cause severe crop damage.

Data processing

Working space is cleared and required packages loaded

## Warning: package 'R.utils' was built under R version 3.2.5
## Loading required package: R.oo
## Warning: package 'R.oo' was built under R version 3.2.5
## Loading required package: R.methodsS3
## Warning: package 'R.methodsS3' was built under R version 3.2.5
## R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.20.0 (2016-02-17) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save
## R.utils v2.4.0 (2016-09-13) successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## The following object is masked from 'package:utils':
## 
##     timestamp
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, parse, warnings
## Warning: package 'dplyr' was built under R version 3.2.5
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Warning: package 'ggplot2' was built under R version 3.2.5
## Warning: package 'gridExtra' was built under R version 3.2.5
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine

Download files from the website

temp <- tempfile()
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",temp)
bunzip2(temp, "StormData.csv", remove = FALSE, skip = TRUE)
## [1] "StormData.csv"
## attr(,"temporary")
## [1] FALSE
Data <- read.csv("StormData.csv")
unlink(temp)
glimpse(Data) # brief overview of the data
## Observations: 902,297
## Variables: 37
## $ STATE__    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ BGN_DATE   <fctr> 4/18/1950 0:00:00, 4/18/1950 0:00:00, 2/20/1951 0:...
## $ BGN_TIME   <fctr> 0130, 0145, 1600, 0900, 1500, 2000, 0100, 0900, 20...
## $ TIME_ZONE  <fctr> CST, CST, CST, CST, CST, CST, CST, CST, CST, CST, ...
## $ COUNTY     <dbl> 97, 3, 57, 89, 43, 77, 9, 123, 125, 57, 43, 9, 73, ...
## $ COUNTYNAME <fctr> MOBILE, BALDWIN, FAYETTE, MADISON, CULLMAN, LAUDER...
## $ STATE      <fctr> AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, AL...
## $ EVTYPE     <fctr> TORNADO, TORNADO, TORNADO, TORNADO, TORNADO, TORNA...
## $ BGN_RANGE  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ BGN_AZI    <fctr> , , , , , , , , , , , , , , , , , , , , , , , , 
## $ BGN_LOCATI <fctr> , , , , , , , , , , , , , , , , , , , , , , , , 
## $ END_DATE   <fctr> , , , , , , , , , , , , , , , , , , , , , , , , 
## $ END_TIME   <fctr> , , , , , , , , , , , , , , , , , , , , , , , , 
## $ COUNTY_END <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ COUNTYENDN <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ END_RANGE  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ END_AZI    <fctr> , , , , , , , , , , , , , , , , , , , , , , , , 
## $ END_LOCATI <fctr> , , , , , , , , , , , , , , , , , , , , , , , , 
## $ LENGTH     <dbl> 14.0, 2.0, 0.1, 0.0, 0.0, 1.5, 1.5, 0.0, 3.3, 2.3, ...
## $ WIDTH      <dbl> 100, 150, 123, 100, 150, 177, 33, 33, 100, 100, 400...
## $ F          <int> 3, 2, 2, 2, 2, 2, 2, 1, 3, 3, 1, 1, 3, 3, 3, 4, 1, ...
## $ MAG        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ FATALITIES <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 4, 0, ...
## $ INJURIES   <dbl> 15, 0, 2, 2, 2, 6, 1, 0, 14, 0, 3, 3, 26, 12, 6, 50...
## $ PROPDMG    <dbl> 25.0, 2.5, 25.0, 2.5, 2.5, 2.5, 2.5, 2.5, 25.0, 25....
## $ PROPDMGEXP <fctr> K, K, K, K, K, K, K, K, K, K, M, M, K, K, K, K, K,...
## $ CROPDMG    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ CROPDMGEXP <fctr> , , , , , , , , , , , , , , , , , , , , , , , , 
## $ WFO        <fctr> , , , , , , , , , , , , , , , , , , , , , , , , 
## $ STATEOFFIC <fctr> , , , , , , , , , , , , , , , , , , , , , , , , 
## $ ZONENAMES  <fctr> , , , , , , , , , , , , , , , , , , , , , , , , 
## $ LATITUDE   <dbl> 3040, 3042, 3340, 3458, 3412, 3450, 3405, 3255, 333...
## $ LONGITUDE  <dbl> 8812, 8755, 8742, 8626, 8642, 8748, 8631, 8558, 874...
## $ LATITUDE_E <dbl> 3051, 0, 0, 0, 0, 0, 0, 0, 3336, 3337, 3402, 3404, ...
## $ LONGITUDE_ <dbl> 8806, 0, 0, 0, 0, 0, 0, 0, 8738, 8737, 8644, 8640, ...
## $ REMARKS    <fctr> , , , , , , , , , , , , , , , , , , , , , , , , 
## $ REFNUM     <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, ...

From the above data, we are interested in injuries and fatalities that directly affect the population as well as the property damage, and crop damage expense. The selected variables of interest are grouped by event.

# group injuries and fatalities by event

Data2 <- aggregate(cbind(INJURIES, FATALITIES)~EVTYPE, data=Data, sum, na.rm=TRUE) 

To address the initial question; which types of events are most harmful with respect to population health? The groupped data is sorted in descending order from the highest to low injuries and fatality. Then the first top 5 are selected.

Data3a <- Data2[order(-Data2$INJURIES),]#sort by descending estimates of injuries
Data3b <- Data2[order(-Data2$FATALITIES),]#sort by descending estimates of injuries
Data3c <- Data3a[1:5,] #select top 10 injuries
Data3d <- Data3b[1:5,] #select top 10 fatalities
head(Data3c)
##             EVTYPE INJURIES FATALITIES
## 834        TORNADO    91346       5633
## 856      TSTM WIND     6957        504
## 170          FLOOD     6789        470
## 130 EXCESSIVE HEAT     6525       1903
## 464      LIGHTNING     5230        816
head(Data3d)
##             EVTYPE INJURIES FATALITIES
## 834        TORNADO    91346       5633
## 130 EXCESSIVE HEAT     6525       1903
## 153    FLASH FLOOD     1777        978
## 275           HEAT     2100        937
## 464      LIGHTNING     5230        816

To address the second question; which types of events have the greatest economic consequences?

# Overview of what we are working on
summary(Data$PROPDMGEXP) 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330
summary(Data$CROPDMGEXP)
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994
Data$PROPDMGEXP <- ifelse(Data$PROPDMGEXP =="k", 10^3, ifelse(Data$PROPDMGEXP =="K", 10^3,
                ifelse(Data$PROPDMGEXP=="h", 10^2, ifelse(Data$PROPDMGEXP=="H", 10^2,
                ifelse(Data$PROPDMGEXP=="m", 10^6, ifelse(Data$PROPDMGEXP=="M", 10^6,
                ifelse(Data$PROPDMGEXP=="B", 10^9,NA)))))))
Data$CROPDMGEXP <- ifelse(Data$CROPDMGEXP =="k", 10^3, ifelse(Data$CROPDMGEXP =="K", 10^3,
                ifelse(Data$CROPDMGEXP=="h", 10^2, ifelse(Data$CROPDMGEXP=="H", 10^2,
                ifelse(Data$CROPDMGEXP=="m", 10^6, ifelse(Data$CROPDMGEXP=="M", 10^6,
                ifelse(Data$CROPDMGEXP=="B", 10^9,NA)))))))

To get the estimated cost of the damage we multiply estimated property damage by property damage expense, crop damage by crop damage expense. Then the estimated cost of damage is sorted out in descending order from the most economical damage to the least and select only the first top 5.

Data$PropDamage <- Data$PROPDMG * Data$PROPDMGEXP
Data$CropDamage <- Data$CROPDMG * Data$CROPDMGEXP


Data4 <- aggregate(cbind(PropDamage, CropDamage)~EVTYPE, data=Data, mean, na.rm=TRUE)
head(Data4)
##                   EVTYPE  PropDamage CropDamage
## 1 ASTRONOMICAL HIGH TIDE     5000.00       0.00
## 2  ASTRONOMICAL LOW TIDE     1839.08       0.00
## 3              AVALANCHE    15492.21       0.00
## 4               BLIZZARD    54524.11   64328.36
## 5          COASTAL FLOOD   321035.56       0.00
## 6       COASTAL FLOODING 12650000.00   28000.00
Data4a <- Data4[order(-Data4$PropDamage),]#sort by descending estimates of property and crop damage
Data4b <- Data4[order(-Data4$CropDamage),]#sort by descending estimates of property and crop damage
Data4c <- Data4a[1:5,] #select top 5 property damage
Data4d <- Data4b[1:5,] #select top 5 crop damage

Results

The top 5 types of events that are most harmful with respect to population health

colnames(Data3c) <- c('EVTYPE', 'INJURIES', 'FATALITIES')

 Injuries <- ggplot(Data3c, aes(x="", y=INJURIES, fill= EVTYPE)) + geom_bar(width = 1, stat = "identity")+  ggtitle('Top 5 Storm Events by Injuries')
 pieInjuries <- Injuries + coord_polar("y", start = 0)
 
colnames(Data3d) <- c('EVTYPE', 'INJURIES', 'FATALITIES')
 
Fatalities <- ggplot(Data3d, aes(x="", y=FATALITIES, fill= EVTYPE)) + geom_bar(width = 1, stat = "identity")+  ggtitle('Top 5 Storm Events by Fatalities')
pieFatalities <- Fatalities + coord_polar("y", start = 0)
 

grid.arrange(pieInjuries, pieFatalities, nrow=2, heights=c(2.5, 2.5),
             top="  Storm events with most severe consequences to public health from 1950 to 2011")

The top 5 types of events that have the greatest economic consequences

colnames(Data4c) <- c('EVTYPE', 'PropDamage', 'CropDamage')

 Properties <- ggplot(Data4c, aes(x="", y=PropDamage, fill= EVTYPE)) + geom_bar(width = 1, stat = "identity")+  ggtitle('Top 5 Storm Events by cost of property damages')
 pieProperties <- Properties + coord_polar("y", start = 0)

 colnames(Data4d) <- c('EVTYPE', 'PropDamage', 'CropDamage')

 Crop <- ggplot(Data4d, aes(x="", y=CropDamage, fill= EVTYPE)) + geom_bar(width = 1, stat = "identity")+
 ggtitle('Top 5 Storm Events by cost of crop damages')
 pieCrop <- Crop + coord_polar("y", start = 0)
 
grid.arrange(pieProperties, pieCrop, nrow=2, heights=c(2.5, 2.5),
             top="  Storm events with greatest economic consequences from 1950 to 2011")

Conclusion

Based on the analysis, tornado is indicated to be the most severe event that cause most of the harmful consequences to population health as well as property damage. River flood is indicated to cause severe crop damage.