1: Synopsis

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. The events in the database start in the year 1950 and end in November 2011.

The weather analysis covers two main questions based on harm to: 1. Health (Fatalities & Injuries) 2. Economic (Property & Crops)

2: Data Description

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

3: Data Processing

Harm to Heatlh

The data is downloaded in working directory and read into R.

#Read data into variable, must be in working directory
storm_data <- read.csv("repdata_data_StormData.csv")

#Check out data and column names
head(storm_data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6

Since we are only concerned about harm to health at this point the data is subset into another variable based on the event type, fatalities, and injuries, then checked

#Begin by subsetting data by evtype, fatalities, and injuries and storing into variable
sub_data <- c("EVTYPE", "FATALITIES", "INJURIES")

harmful_data <- storm_data[sub_data]

head(harmful_data)
##    EVTYPE FATALITIES INJURIES
## 1 TORNADO          0       15
## 2 TORNADO          0        0
## 3 TORNADO          0        2
## 4 TORNADO          0        2
## 5 TORNADO          0        2
## 6 TORNADO          0        6

Now the sum according to each event type is summed and stored in another variable along with being ordered to see which event is the most harmful.

#Sum of fatalities by each event type and store in variable
fatality <- aggregate(FATALITIES ~ EVTYPE, harmful_data, FUN = sum)

#Sum of injuries by each event type and store in variable
injury <- aggregate(INJURIES ~ EVTYPE, harmful_data, FUN = sum)

#Order fatalities by evtype in descending order
top_fatality <- fatality[order(-fatality$FATALITIES), ][1:6, ]

#Check to see if data is descending correctly, can look at more evtypes by changing 6 to whichever number desired
head(top_fatality)
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
#Order injuries by evtype in descending order
top_injury <- injury[order(-injury$INJURIES), ][1:6, ]

#Check to see if data is descending correctly, can look at more evtypes by changing 6 to whichever number desired
head(top_injury)
##             EVTYPE INJURIES
## 834        TORNADO    91346
## 856      TSTM WIND     6957
## 170          FLOOD     6789
## 130 EXCESSIVE HEAT     6525
## 464      LIGHTNING     5230
## 275           HEAT     2100

Creating two plots for the top 6 event types with the greatest fatalities and injuries is as shown.
If you wanted to see what other event types came 7th or later I have made a note in the code where you can change this specifically.

#Set paramaters for two plots to appear
par(mfrow = c(1, 2))

#Create plot for Fatalities
barplot(top_fatality$FATALITIES, names.arg = top_fatality$EVTYPE, 
        las = 2,
        main = "Event Types with Greatest Fatalities",
        ylab = "Fatalities")

#Create plot for Injuries
barplot(top_injury$INJURIES, names.arg = top_injury$EVTYPE, 
        las = 2,
        main = "Event Types with Greatest Injuries",
        ylab = "Injuries")

4. Data Processing Part 2

Harm to Economy

I begin the same process as before and download the data once again and check Since we are only concerned about harm to economy at this point the data is subset into another variable based on the event type, property damage, crop damage, property damage exponent, and crop damage exponent.

#Question 2 Which types of events have the greatest economic consequences?  
#This requires us to focus on EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP
storm_data <- read.csv("repdata_data_StormData.csv")

sub_data2 <- c("PROPDMG", "CROPDMG", "PROPDMGEXP", "CROPDMGEXP")

harmful_data2 <- storm_data[sub_data2]

head(harmful_data2)
##   PROPDMG CROPDMG PROPDMGEXP CROPDMGEXP
## 1    25.0       0          K           
## 2     2.5       0          K           
## 3    25.0       0          K           
## 4     2.5       0          K           
## 5     2.5       0          K           
## 6     2.5       0          K

There are multiple symbols in columns PROPDMGEXP & CROPDMGEXP so we change these to numerical values for later

#Figure out what values/symbols are in PROPDMGEXP & CROPDMGEXP
unique(harmful_data2$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(storm_data$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M
#Change to all capital letters due to different cases
storm_data$PROPDMGEXP <- toupper(storm_data$PROPDMGEXP)
storm_data$CROPDMGEXP <- toupper(storm_data$CROPDMGEXP)

#See if desired results occur
unique(storm_data$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "+" "0" "5" "6" "?" "4" "2" "3" "H" "7" "-" "1" "8"
unique(storm_data$CROPDMGEXP)
## [1] ""  "M" "K" "B" "?" "0" "2"
#Change all symbols to numbers (To figure out what each symbol should really by look at https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf)
storm_data[storm_data$PROPDMGEXP %in% c("?", "-", "+"), "PROPDMGEXP"] <- "0"
storm_data[storm_data$CROPDMGEXP %in% c("?"), "CROPDMGEXP"] <- "0"

#Chance specifically prop first
storm_data$PROPEXP[storm_data$PROPDMGEXP == "B"] <- 1e+09
storm_data$PROPEXP[storm_data$PROPDMGEXP == "7"] <- 1e+07
storm_data$PROPEXP[storm_data$PROPDMGEXP == "M"] <- 1e+06
storm_data$PROPEXP[storm_data$PROPDMGEXP == "6"] <- 1e+06
storm_data$PROPEXP[storm_data$PROPDMGEXP == "5"] <- 1e+05
storm_data$PROPEXP[storm_data$PROPDMGEXP == "4"] <- 1e+04
storm_data$PROPEXP[storm_data$PROPDMGEXP == "3"] <- 1000
storm_data$PROPEXP[storm_data$PROPDMGEXP == "K"] <- 1000
storm_data$PROPEXP[storm_data$PROPDMGEXP == "H"] <- 100
storm_data$PROPEXP[storm_data$PROPDMGEXP == "2"] <- 100
storm_data$PROPEXP[storm_data$PROPDMGEXP == "1"] <- 10
storm_data$PROPEXP[storm_data$PROPDMGEXP == ""] <- 1

#Now change crop
storm_data$CROPEXP[storm_data$CROPDMGEXP == "B"] <- 1e+09
storm_data$CROPEXP[storm_data$CROPDMGEXP == "M"] <- 1e+06
storm_data$CROPEXP[storm_data$CROPDMGEXP == "K"] <- 1000
storm_data$CROPEXP[storm_data$CROPDMGEXP == "2"] <- 100
storm_data$CROPEXP[storm_data$CROPDMGEXP == ""] <- 1

Find the value of the columns according to type by multiplying property damage by the new exponent column values. Then the sum will be taken and put into a new variable and ordered as before.

#Store Propdam in variable
storm_data$total_property <- storm_data$PROPDMG * storm_data$PROPEXP 

#Store Cropdam in variable 
storm_data$total_crop <- storm_data$CROPDMG * storm_data$CROPEXP

#Sum of property damage and store in variable
property_damage <- aggregate(total_property ~ EVTYPE, storm_data, FUN = sum)

#Sum of crop damage and store in variable
crop_damage <- aggregate(total_crop ~ EVTYPE, storm_data, FUN = sum)

#Order property damage by evtype in descending order
top_property <- property_damage[order(-property_damage$total_property), ][1:6, ]

#Check to see if data is descending correctly
head(top_property)
##                EVTYPE total_property
## 169             FLOOD   144657709807
## 409 HURRICANE/TYPHOON    69305840000
## 832           TORNADO    56947380483
## 668       STORM SURGE    43323536000
## 152       FLASH FLOOD    16822673717
## 242              HAIL    15735267277
#Order crop damage by evtype in descending order
top_crop <- crop_damage[order(-crop_damage$total_crop), ][1:6, ]

#Check to see if data is descending correctly
head(top_crop)
##          EVTYPE  total_crop
## 95      DROUGHT 13972566000
## 169       FLOOD  5661968450
## 588 RIVER FLOOD  5029459000
## 425   ICE STORM  5022113500
## 243        HAIL  3025954453
## 400   HURRICANE  2741910000

Creating two plots for the top 6 event types with the greatest harm to the economy separated by crop and property.
If you wanted to see what other event types came 7th or later I have made a note in the code where you can change this specifically.

#Create one figure with two plots
par(mfrow = c(1, 2))

#Create plot for Fatalities
barplot(top_property$total_property/(10^9), names.arg = top_property$EVTYPE, 
        las = 2,
        main = "EVTypes with Greatest Property Dmg.",
        ylab = "Damage Cost in Billions")

#Create plot for Injuries
barplot(top_crop$total_crop/(10^9), names.arg = top_crop$EVTYPE, 
        las = 2,
        main = "EVTypes with Greatest Crop Dmg.",
        ylab = "Damage Cost in Billions")

5: Results

Question 1: Which events are most harmful to health? The events that were most harmful to health by fatalities was the Tornado as noted by the first figure.
Similarly, events that were most harmful to health by injuries was also the Tornado as noted by the first figure.

Question 2: Which events are most harmful to the economy? The events that caused the most property damage was the Flood noted by the second figure. On the other hand, events that caused the most crop damage was a drought as noted by the second figure.