The aim of this report is to identify what the consequences are of storm and weather events in the United States. For this reasearch the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database is used. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The events in the database start in the year 1950 and end in November 2011.

The two main questions that will be answered in this report are:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Loading and processing the data

We first read in the data from the cvs file in the zip archive. The csv is a comma separated file, so we specify this in the arguments.

data <- read.csv("repdata_data_StormData.csv.bz2")

To explore the dataset we print the column names to see which variables are included in the dataset.

names(data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Next we look if there are NA values in the data

na_count <-sapply(data, function(y) sum(length(which(is.na(y)))))
na_count
##    STATE__   BGN_DATE   BGN_TIME  TIME_ZONE     COUNTY COUNTYNAME 
##          0          0          0          0          0          0 
##      STATE     EVTYPE  BGN_RANGE    BGN_AZI BGN_LOCATI   END_DATE 
##          0          0          0          0          0          0 
##   END_TIME COUNTY_END COUNTYENDN  END_RANGE    END_AZI END_LOCATI 
##          0          0     902297          0          0          0 
##     LENGTH      WIDTH          F        MAG FATALITIES   INJURIES 
##          0          0     843563          0          0          0 
##    PROPDMG PROPDMGEXP    CROPDMG CROPDMGEXP        WFO STATEOFFIC 
##          0          0          0          0          0          0 
##  ZONENAMES   LATITUDE  LONGITUDE LATITUDE_E LONGITUDE_    REMARKS 
##          0         47          0         40          0          0 
##     REFNUM 
##          0

For this analysis we need the variables EVTYPE, FATALITIES, INJURIES, PROPDMG and CROPDMG. As can be seen above, these variables do not contain any missing values so we do not have to do anyting about the missing values.

Results

Most harmful events for population health

To identify which type of events are most harmful with respect to population health we look at the number of injuries and fatalities. To get a sense the number of fatalities and injuries, we summarise those variables.

summary(data$FATALITIES)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   0.0000   0.0000   0.0000   0.0168   0.0000 583.0000
summary(data$INJURIES)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##    0.0000    0.0000    0.0000    0.1557    0.0000 1700.0000

Hence, the maximum number of deaths per event is 583 and the maximum number of injuries per event is 1700.

First we sum the number of fatalities per type of event, and then arrange in descending order based on the total number of fatalities per event type.

fatalities <- aggregate(list(fatalities = data$FATALITIES), by=list(evtype = data$EVTYPE), FUN=sum)
fatalities <- arrange(fatalities, desc(fatalities))
fatalities[1:10,]
##            evtype fatalities
## 1         TORNADO       5633
## 2  EXCESSIVE HEAT       1903
## 3     FLASH FLOOD        978
## 4            HEAT        937
## 5       LIGHTNING        816
## 6       TSTM WIND        504
## 7           FLOOD        470
## 8     RIP CURRENT        368
## 9       HIGH WIND        248
## 10      AVALANCHE        224

Then we sum the number of injuries per type of event.

injuries <- aggregate(list(injuries = data$INJURIES), by=list(evtype = data$EVTYPE), FUN=sum)
injuries<- arrange(injuries, desc(injuries))
injuries[1:10,]
##               evtype injuries
## 1            TORNADO    91346
## 2          TSTM WIND     6957
## 3              FLOOD     6789
## 4     EXCESSIVE HEAT     6525
## 5          LIGHTNING     5230
## 6               HEAT     2100
## 7          ICE STORM     1975
## 8        FLASH FLOOD     1777
## 9  THUNDERSTORM WIND     1488
## 10              HAIL     1361

The next step is to combine the injuries en fatalities to identify which events are most harmful to population health.

total_harm <- merge(fatalities, injuries)
total_harm <- mutate(total_harm, total = fatalities + injuries)
total_harm <- arrange(total_harm, desc(total))
barplot(total_harm$total[1:10], names=total_harm$evtype[1:10], main='Total number of injuries and fatalities per event type')

So the most harmful event types for population health are tornado’s, ecessive heat, tstm wind, flood and lightning.

Events with the greates economic consequences

For the economic consewuences of the events we look at events with proprety damage and crop damage. When there is damage, there are associated monetary costs that go along with it.

damage <- select(data, EVTYPE, PROPDMG, CROPDMG)
damage <- aggregate(list(propdmg = data$PROPDMG, cropdmg = data$CROPDMG), by=list(evtype=data$EVTYPE), FUN=sum)
damage <- mutate(damage, total = propdmg+cropdmg)
damage <- arrange(damage, desc(total))
damage[1:10,]
##                evtype   propdmg   cropdmg     total
## 1             TORNADO 3212258.2 100018.52 3312276.7
## 2         FLASH FLOOD 1420124.6 179200.46 1599325.1
## 3           TSTM WIND 1335965.6 109202.60 1445168.2
## 4                HAIL  688693.4 579596.28 1268289.7
## 5               FLOOD  899938.5 168037.88 1067976.4
## 6   THUNDERSTORM WIND  876844.2  66791.45  943635.6
## 7           LIGHTNING  603351.8   3580.61  606932.4
## 8  THUNDERSTORM WINDS  446293.2  18684.93  464978.1
## 9           HIGH WIND  324731.6  17283.21  342014.8
## 10       WINTER STORM  132720.6   1978.99  134699.6

Hence, that events that have the greates economic consequences are tornado’s, flash flood, tstm wind, hail and flood.

barplot(damage$total[1:5], names=damage$evtype[1:5], main="Total damage per event type", ylab = 'Damage in dollars')