Introduction

Nowadays, the National Weather Service (NWS) is an agency of the United States government that is tasked with providing weather forecasts, warnings of hazardous weather, and other weather-related products to organizations and the public for the purposes of protection, safety, and general information. It is a part of the National Oceanic and Atmospheric Administration (NOAA) branch of the Department of Commerce, and is headquartered in Silver Spring, Maryland (located just outside Washington, D.C.).[https://en.wikipedia.org/wiki/National_Weather_Service].

The database currently contains data from January 1950 to November 2015, as entered by NOAA’s National Weather Service (NWS). Due to changes in the data collection and processing procedures over time, there are unique periods of record available depending on the event type. The following timelines show the different time spans for each period of unique data collection and processing procedures. Select below for detailed decriptions of each data collection type. [http://www.ncdc.noaa.gov/stormevents/details.jsp]

Synopsis

This report consists to analyze and visualize the severe weather Events on Public Health and Economy in the US using the NOAA Storm Database from 1950 to 2011. In this paper we will higlight many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. From these data, we investigate which type of events are the most harmful to the population and financially.

Data Processing

Library loading

library(R.utils) #for bunzip2
library(ggplot2) #for plots
library(plyr)      #for count & aggregate method
library(reshape2) #Flexibly restructure and aggregate data using MELT and MERGE

Data Load

Read the source .csv file

#Unzip and read .csv file into the variable data
dataLoad <- read.csv(bzfile("repdata-data-StormData.csv.bz2"), strip.white = TRUE)

Select useful data

Subsetting data into variables that are needed and adding a new variable.

#Remove unwanted colums (not used for this analysis)
gCol <- c("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
gData <- dataLoad[, gCol]

#Head of two rows with good columns
head(gData,n=2)
##            BGN_DATE  EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 1 4/18/1950 0:00:00 TORNADO          0       15    25.0          K       0
## 2 4/18/1950 0:00:00 TORNADO          0        0     2.5          K       0
##   CROPDMGEXP
## 1           
## 2
#Types of data
str(gData)
## 'data.frame':    902297 obs. of  8 variables:
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
#Formatting date and time
gData$YEAR <- as.integer(format(as.Date(gData$BGN_DATE, "%m/%d/%Y 0:00:00"), "%Y"))

# creates new variable
gData$ECONOMICDMG <- gData$PROPDMG + gData$CROPDMG

Data integrity

Find NA Values

No missing values so moving on to examine data integrity.

#Verifying missing values in the dataset
dataIntegrity <- function(dataframe) {
        for (colName in colnames(dataframe)) {
                NAcount <- 0
                NAcount < as.numeric(sum(is.na(dataframe[,colName])))
                if(NAcount > 0) {
                        message(colName, ":", NAcount, "missing values")
                        } else {
                        message(colName, ":", "No missing values")
                        }
                }
}
dataIntegrity(gData)
## BGN_DATE:No missing values
## EVTYPE:No missing values
## FATALITIES:No missing values
## INJURIES:No missing values
## PROPDMG:No missing values
## PROPDMGEXP:No missing values
## CROPDMG:No missing values
## CROPDMGEXP:No missing values
## YEAR:No missing values
## ECONOMICDMG:No missing values

Data aggregation

Sum of good columns which will use to analyze our report group by YEAR and EVTYPE

eY <- ddply(
               gData[, -1], .(YEAR, EVTYPE),.fun = function(x) 
              {
                 return(c(sum(x$FATALITIES), sum(x$ECONOMICDMG), sum(x$INJURIES)))
              }
           )
names(eY) <- c("YEAR", "EVTYPE", "FATALITIES", "ECONOMICDMG", "INJURIES")
head(eY)
##   YEAR  EVTYPE FATALITIES ECONOMICDMG INJURIES
## 1 1950 TORNADO         70    16999.15      659
## 2 1951 TORNADO         34    10560.99      524
## 3 1952 TORNADO        230    16679.74     1915
## 4 1953 TORNADO        519    19182.20     5131
## 5 1954 TORNADO         36    23367.82      715
## 6 1955    HAIL          0        0.00        0

There are 902297 rows and 37 columns in total. The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of reliable/complete records.

hist(gData$YEAR, main = "Histogram of Evolution Data From 1950 to 2011", xlab="Year", breaks = 40)

The histogram above shows that the number of events tracked begins to increase in the middle of 1990s.

Impact on Public Health

For this point we are going to check the number of fatalities and injuries which are caused by the severe weather events. Let get first 20 most severe types of weather events.

Impact on Economy

As indicated in the EVTYPE variable, we check and determine which types of events are most harmful with respect to the economy by aggregating the total damage in US Dollars by event type for property damage, crop damage and total damage. The top 20 events with the highest amount of total damage were subsetted and plotted.

gData$PROPDMGEXP <- as.character(gData$PROPDMGEXP)
gData$PROPDMGEXP[toupper(gData$PROPDMGEXP) == 'H'] <- "2"
gData$PROPDMGEXP[toupper(gData$PROPDMGEXP) == 'K'] <- "3"
gData$PROPDMGEXP[toupper(gData$PROPDMGEXP) == 'M'] <- "6"
gData$PROPDMGEXP[toupper(gData$PROPDMGEXP) == 'B'] <- "9"
gData$PROPDMGEXP <- as.numeric(gData$PROPDMGEXP)
## Warning: NAs introduced by coercion
gData$PROPDMGEXP[is.na(gData$PROPDMGEXP)] <- 0
gData$TOTALPROPDMG <- gData$PROPDMG * 10^gData$PROPDMGEXP
gData$CROPDMGEXP <- as.character(gData$CROPDMGEXP)
gData$CROPDMGEXP[toupper(gData$CROPDMGEXP) == 'H'] <- "2"
gData$CROPDMGEXP[toupper(gData$CROPDMGEXP) == 'K'] <- "3"
gData$CROPDMGEXP[toupper(gData$CROPDMGEXP) == 'M'] <- "6"
gData$CROPDMGEXP[toupper(gData$CROPDMGEXP) == 'B'] <- "9"
gData$CROPDMGEXP <- as.numeric(gData$CROPDMGEXP)
## Warning: NAs introduced by coercion
gData$CROPDMGEXP[is.na(gData$CROPDMGEXP)] <- 0
gData$TOTALCROPDMG <- gData$CROPDMG * 10^gData$CROPDMGEXP

#Damage properties
gSumProp <- aggregate(gData$TOTALPROPDMG, by = list(gData$EVTYPE), "sum")
names(gSumProp) <- c("Event", "Cost")
gSumProp <- gSumProp[order(-gSumProp$Cost), ][1:20, ]

#Damage crop
gSumCrop <- aggregate(gData$TOTALCROPDMG, by = list(gData$EVTYPE), "sum")
names(gSumCrop) <- c("Event", "Cost")
gSumCrop <- gSumCrop[order(-gSumCrop$Cost), ][1:20, ]

#Fatalities
aggFat <- aggregate(gData$FATALITIES, by = list(gData$EVTYPE), "sum")
names(aggFat) <- c("Event", "Fatalities")
aggFat <- aggFat[order(-aggFat$Fatalities), ][1:20,]
aggFat
##                       Event Fatalities
## 834                 TORNADO       5633
## 130          EXCESSIVE HEAT       1903
## 153             FLASH FLOOD        978
## 275                    HEAT        937
## 464               LIGHTNING        816
## 856               TSTM WIND        504
## 170                   FLOOD        470
## 585             RIP CURRENT        368
## 359               HIGH WIND        248
## 19                AVALANCHE        224
## 972            WINTER STORM        206
## 586            RIP CURRENTS        204
## 278               HEAT WAVE        172
## 140            EXTREME COLD        160
## 760       THUNDERSTORM WIND        133
## 310              HEAVY SNOW        127
## 141 EXTREME COLD/WIND CHILL        125
## 676             STRONG WIND        103
## 30                 BLIZZARD        101
## 350               HIGH SURF        101
#Injuries
aggInjury <- aggregate(gData$INJURIES, by = list(gData$EVTYPE), "sum")
names(aggInjury) <- c("Event", "Injuries")
aggInjury <- aggInjury[order(-aggInjury$Injuries), ][1:20,]

Result

As for the impact on public health, we have got two sorted lists of severe weather events below by the number of people badly affected.

Damage roperties

gSumProp
##                         Event         Cost
## 170                     FLOOD 144657709807
## 411         HURRICANE/TYPHOON  69305840000
## 834                   TORNADO  56947380677
## 670               STORM SURGE  43323536000
## 153               FLASH FLOOD  16822673979
## 244                      HAIL  15735267513
## 402                 HURRICANE  11868319010
## 848            TROPICAL STORM   7703890550
## 972              WINTER STORM   6688497251
## 359                 HIGH WIND   5270046295
## 590               RIVER FLOOD   5118945500
## 957                  WILDFIRE   4765114000
## 671          STORM SURGE/TIDE   4641188000
## 856                 TSTM WIND   4484928495
## 427                 ICE STORM   3944927860
## 760         THUNDERSTORM WIND   3483122472
## 409            HURRICANE OPAL   3172846000
## 955          WILD/FOREST FIRE   3001829500
## 298 HEAVY RAIN/SEVERE WEATHER   2500000000
## 786        THUNDERSTORM WINDS   1944590859

Damage Crop

gSumCrop
##                 Event        Cost
## 95            DROUGHT 13972566000
## 170             FLOOD  5661968450
## 590       RIVER FLOOD  5029459000
## 427         ICE STORM  5022113500
## 244              HAIL  3025954473
## 402         HURRICANE  2741910000
## 411 HURRICANE/TYPHOON  2607872800
## 153       FLASH FLOOD  1421317100
## 140      EXTREME COLD  1292973000
## 212      FROST/FREEZE  1094086000
## 290        HEAVY RAIN   733399800
## 848    TROPICAL STORM   678346000
## 359         HIGH WIND   638571300
## 856         TSTM WIND   554007350
## 130    EXCESSIVE HEAT   492402000
## 192            FREEZE   446225000
## 834           TORNADO   414953270
## 760 THUNDERSTORM WIND   414843050
## 275              HEAT   401461500
## 957          WILDFIRE   295472800

Fatalities

aggFat
##                       Event Fatalities
## 834                 TORNADO       5633
## 130          EXCESSIVE HEAT       1903
## 153             FLASH FLOOD        978
## 275                    HEAT        937
## 464               LIGHTNING        816
## 856               TSTM WIND        504
## 170                   FLOOD        470
## 585             RIP CURRENT        368
## 359               HIGH WIND        248
## 19                AVALANCHE        224
## 972            WINTER STORM        206
## 586            RIP CURRENTS        204
## 278               HEAT WAVE        172
## 140            EXTREME COLD        160
## 760       THUNDERSTORM WIND        133
## 310              HEAVY SNOW        127
## 141 EXTREME COLD/WIND CHILL        125
## 676             STRONG WIND        103
## 30                 BLIZZARD        101
## 350               HIGH SURF        101

Injuries

aggInjury
##                  Event Injuries
## 834            TORNADO    91346
## 856          TSTM WIND     6957
## 170              FLOOD     6789
## 130     EXCESSIVE HEAT     6525
## 464          LIGHTNING     5230
## 275               HEAT     2100
## 427          ICE STORM     1975
## 153        FLASH FLOOD     1777
## 760  THUNDERSTORM WIND     1488
## 244               HAIL     1361
## 972       WINTER STORM     1321
## 411  HURRICANE/TYPHOON     1275
## 359          HIGH WIND     1137
## 310         HEAVY SNOW     1021
## 957           WILDFIRE      911
## 786 THUNDERSTORM WINDS      908
## 30            BLIZZARD      805
## 188                FOG      734
## 955   WILD/FOREST FIRE      545
## 117         DUST STORM      440

Find the below the following results: (1) Plot on the fatalities and injuries for the top 20 weather Events

#Plot on the fatalities
barplot(aggFat$Fatalities, names.arg = aggFat$Event, col = 'red',main = 'Selecection of Top 20 Weather Events for Fatalities', ylab = 'Nb. of Fatalities')

#Plot on the injuries
barplot(aggInjury$Injuries, names.arg = aggInjury$Event, col = 'blue',main = 'Selecection of Top 20 Weather Events for Injuries', ylab = 'Nb. of Injuries')

#Merging Sum of properties and crop
fatDamage <- merge(x = gSumProp, y = gSumCrop, by = "Event", all = TRUE)

#Merge and melt
fatDamage <- melt(fatDamage, id.vars = 'Event')

#Plot with data merged and melted
ggplot(fatDamage, aes(Event, value)) + geom_bar(aes(fill = variable), position = "dodge", stat="identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Event Type") + 
ylab("Damage (Crop and Properties), USD (Current)") + ggtitle("Crop and Property damage splitted")

Conclusion

Using NOAA Storm Database in our report we find that excessive heat and tornado are most harmful with respect to population health, while flood, drought, and hurricane/typhoon have the greatest economic consequences in the Unites States.