Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Assignment

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.

Questions

Your data analysis must address the following questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Consider writing your report as if it were to be read by a government or municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events. However, there is no need to make any specific recommendations in your report.

Data Processing

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site: Storm data.

library(dplyr)
library(tidyr)
library(ggplot2)
library(gridExtra)
setwd("C:/Users/ASUS/Documents/COURSERA/Reproducible Research/Project2")
# Reading data
StormData <- read.csv(bzfile("repdata_data_StormData.csv.bz2"))

Exploring and cleaning the data

summary(StormData)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY       COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31.0   Class :character   Class :character   Class :character  
##  Median : 75.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100.6                                                           
##  3rd Qu.:131.0                                                           
##  Max.   :873.0                                                           
##                                                                          
##    BGN_RANGE          BGN_AZI           BGN_LOCATI          END_DATE        
##  Min.   :   0.000   Length:902297      Length:902297      Length:902297     
##  1st Qu.:   0.000   Class :character   Class :character   Class :character  
##  Median :   0.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :   1.484                                                           
##  3rd Qu.:   1.000                                                           
##  Max.   :3749.000                                                           
##                                                                             
##    END_TIME           COUNTY_END COUNTYENDN       END_RANGE       
##  Length:902297      Min.   :0    Mode:logical   Min.   :  0.0000  
##  Class :character   1st Qu.:0    NA's:902297    1st Qu.:  0.0000  
##  Mode  :character   Median :0                   Median :  0.0000  
##                     Mean   :0                   Mean   :  0.9862  
##                     3rd Qu.:0                   3rd Qu.:  0.0000  
##                     Max.   :0                   Max.   :925.0000  
##                                                                   
##    END_AZI           END_LOCATI            LENGTH              WIDTH         
##  Length:902297      Length:902297      Min.   :   0.0000   Min.   :   0.000  
##  Class :character   Class :character   1st Qu.:   0.0000   1st Qu.:   0.000  
##  Mode  :character   Mode  :character   Median :   0.0000   Median :   0.000  
##                                        Mean   :   0.2301   Mean   :   7.503  
##                                        3rd Qu.:   0.0000   3rd Qu.:   0.000  
##                                        Max.   :2315.0000   Max.   :4400.000  
##                                                                              
##        F               MAG            FATALITIES          INJURIES        
##  Min.   :0.0      Min.   :    0.0   Min.   :  0.0000   Min.   :   0.0000  
##  1st Qu.:0.0      1st Qu.:    0.0   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  Median :1.0      Median :   50.0   Median :  0.0000   Median :   0.0000  
##  Mean   :0.9      Mean   :   46.9   Mean   :  0.0168   Mean   :   0.1557  
##  3rd Qu.:1.0      3rd Qu.:   75.0   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  Max.   :5.0      Max.   :22000.0   Max.   :583.0000   Max.   :1700.0000  
##  NA's   :843563                                                           
##     PROPDMG         PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Min.   :   0.00   Length:902297      Min.   :  0.000   Length:902297     
##  1st Qu.:   0.00   Class :character   1st Qu.:  0.000   Class :character  
##  Median :   0.00   Mode  :character   Median :  0.000   Mode  :character  
##  Mean   :  12.06                      Mean   :  1.527                     
##  3rd Qu.:   0.50                      3rd Qu.:  0.000                     
##  Max.   :5000.00                      Max.   :990.000                     
##                                                                           
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 

The data contains NA’s in the variables BGN_AZI, BGN_AZI, END_DATE, END_TIME and others.

The key variables for the analysis are:

  • EVTYPE: Type of the event.
  • FATALITIES: Number of fatalities from the event.
  • INJURIES: Number of injuries from the event.
  • PROPDMG: Property damage measured.
  • CROPDMG: Crop damage measured.
  • PROPDMGEXP: Property damage exponent (Mns, Bns etc).
  • CROPDMGEXP: Crop damage exponent (Mns, Bns etc).
StormData <- StormData %>% select(EVTYPE,FATALITIES,INJURIES,PROPDMG,
                              CROPDMG,PROPDMGEXP,CROPDMGEXP)
unique(StormData$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(StormData$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

PROPDMGEXP and CROPDMGEXP have some wrong values and need to be fixed, by assigning numeric values to the exponential powers.

StormData$CROPDMGEXP <- toupper(StormData$CROPDMGEXP)
StormData$CROPDMGEXP[StormData$CROPDMGEXP %in% c("", "+", "-", "?")] <- "0"
StormData$CROPDMGEXP[StormData$CROPDMGEXP %in% c("B")] <- "9"
StormData$CROPDMGEXP[StormData$CROPDMGEXP %in% c("M")] <- "6"
StormData$CROPDMGEXP[StormData$CROPDMGEXP %in% c("K")] <- "3"
StormData$CROPDMGEXP[StormData$CROPDMGEXP %in% c("H")] <- "2"

StormData$PROPDMGEXP <- toupper(StormData$PROPDMGEXP)
StormData$PROPDMGEXP[StormData$PROPDMGEXP %in% c("", "+", "-", "?")] <- "0"
StormData$PROPDMGEXP[StormData$PROPDMGEXP %in% c("B")] <- "9"
StormData$PROPDMGEXP[StormData$PROPDMGEXP %in% c("M")] <- "6"
StormData$PROPDMGEXP[StormData$PROPDMGEXP %in% c("K")] <- "3"
StormData$PROPDMGEXP[StormData$PROPDMGEXP %in% c("H")] <- "2"

Total values of property damage, crop damage and total damage are calculated using the fixed values of the variables.

StormData$PROPDMGTOTAL <- StormData$PROPDMG * (10 ^ as.numeric(StormData$PROPDMGEXP))
StormData$CROPDMGTOTAL <- StormData$CROPDMG * (10 ^ as.numeric(StormData$CROPDMGEXP))
StormData$DMGTOTAL <- StormData$PROPDMGTOTAL + StormData$CROPDMGTOTAL

Some types of events and their frecuencies:

mis.colores.3 <- colorRampPalette(c("#ff9999", "#99ff99", "#9999ff"))
a <- StormData %>% group_by(EVTYPE) %>% 
  summarise(Frecuencia = n()) 

a <- arrange(a, desc(Frecuencia))

head(a,20) %>% ggplot(aes(x = EVTYPE, y = Frecuencia)) +
        geom_bar(stat = "identity", width = 0.5,
                 fill = mis.colores.3(20)) +
        coord_flip()

Results: Analysis of critical weather events

Sum_events <- StormData %>%
    group_by(EVTYPE) %>%
    summarize(SUMFATALITIES = sum(FATALITIES),
              SUMINJURIES = sum(INJURIES),
              SUMPROPDMG = sum(PROPDMGTOTAL),
              SUMCROPDMG = sum(CROPDMGTOTAL),
              TOTALDMG = sum(DMGTOTAL))

head(Sum_events)
## # A tibble: 6 x 6
##   EVTYPE                SUMFATALITIES SUMINJURIES SUMPROPDMG SUMCROPDMG TOTALDMG
##   <chr>                         <dbl>       <dbl>      <dbl>      <dbl>    <dbl>
## 1 "   HIGH SURF ADVISO~             0           0     200000          0   200000
## 2 " COASTAL FLOOD"                  0           0          0          0        0
## 3 " FLASH FLOOD"                    0           0      50000          0    50000
## 4 " LIGHTNING"                      0           0          0          0        0
## 5 " TSTM WIND"                      0           0    8100000          0  8100000
## 6 " TSTM WIND (G45)"                0           0       8000          0     8000

Events that caused most fatalities, most injuries, and most damages:

SummStormDataFatality <- arrange(Sum_events, desc(SUMFATALITIES))
FatalityData <- head(SummStormDataFatality)

SummStormDataInjury <- arrange(Sum_events, desc(SUMINJURIES))
InjuryData <- head(SummStormDataInjury)

SummStormDataDamage <- arrange(Sum_events, desc(TOTALDMG))
DamageData <- head(SummStormDataDamage)

FatalityData$EVTYPE <- with(FatalityData, 
                            reorder(EVTYPE, -SUMFATALITIES))
x <- ggplot(FatalityData, aes(EVTYPE, SUMFATALITIES, 
                         label = SUMFATALITIES)) +
        geom_bar(stat = "identity", fill = 6) +
        geom_text(nudge_y = 200) +
        xlab("Event Type") +
        ylab("Total Fatalities") +
        ggtitle("Most Fatal Events") +
        theme(plot.title = element_text(hjust = 0.5))

InjuryData$EVTYPE <- with(InjuryData, 
                          reorder(EVTYPE, -SUMINJURIES))
y <- ggplot(InjuryData, aes(EVTYPE, SUMINJURIES, 
                            label = SUMINJURIES)) +
    geom_bar(stat = "identity", fill = 3) +
    geom_text(nudge_y = 3000) +
    xlab("Event Type") +
    ylab("Total Injuries") +
    ggtitle("Most Injury Events") +
    theme(plot.title = element_text(hjust = 0.5))

DamageData$EVTYPE <- with(DamageData, reorder(EVTYPE, -TOTALDMG))
DamageDataLong <- DamageData %>%
        gather(key = "Type", value = "TOTALDAMAGE", 
               c("SUMPROPDMG", "SUMCROPDMG")) %>%
        select(EVTYPE, Type, TOTALDAMAGE)
DamageDataLong$Type[DamageDataLong$Type %in% c("SUMPROPDMG")] <- "Property damage"
DamageDataLong$Type[DamageDataLong$Type %in% c("SUMCROPDMG")] <- "Crop damage"

# Plot
z <- ggplot(DamageDataLong, aes(x = EVTYPE, 
                                y = TOTALDAMAGE, fill = Type)) +
    geom_bar(stat = "identity", position = "stack") +
    xlab("Event Type") +
    ylab("Total Damage") +
    ggtitle("Events with Most Damage") +
    theme(plot.title = element_text(hjust = 0.5), legend.position = "bottom") +
        coord_flip()

grid.arrange(x,y)

z

Questions

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

The graphics show that Tornados are events with most total fatalitites and total injuries, which represent great damage with respect to population health.

  1. Across the United States, which types of events have the greatest economic consequences?

The graphics show that the event with most damage are floods.