Synopsis:

Severe weather events have both public health and economic problems for communities and its citizens. Many severe events have resulted in fatalities, injuries, or property damage. Preventing such outcomes is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The results of this analysis confirm that the effect of weather events on personal as well as property damages was studied. Barplots were plotted used for the top 7 weather events that causes highest fatalities and injuries. The results identify tornados as the most severe weather event type, in terms of fatalities and injuries. In terms of crop damage, drought, flooding and ice storms have the greatest financial impacts.

Data Processing:

Load the data

Download the dataset from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 as a temporary file, then read into R as a data frame named ‘data’.

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
              destfile = "FStormData.csv")
data <- read.csv("FStormData.csv", header = TRUE, na.strings = "NA")
data <- as.data.frame(data)

Clean the data

According to the documentation for this analysis we will only require the following columns: “EVTYPE”, “FATALITIES”, “INJURIES”, “PROPDMG”, “PROPDMGEXP”, “CROPDMG”, “CROPDMGEXP”. We will store this subset of data in a data frame named ‘data1’.

fields <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", 
           "CROPDMGEXP")
data1 <- data[fields]

Clean up column EVTYPE

Field EVTyPE is a combination of mixed cases (upper and lower) and will throw off our analysis if not standardized. We will make all fields upper case to reduce error or missing data.

length(unique(data1$EVTYPE))
## [1] 985
data1$EVTYPE.UP <- toupper(data1$EVTYPE)
length(unique(data1$EVTYPE.UP))
## [1] 898

Determine Property Damage

First lets explore the property damage data.

# Expore the property damage data
unique(data1$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M

The ‘TOTALPD’ field was added to capture, organize, and consolidate the total property damage value from each event. The property damage value is calculated by multiplying the property damage and property exponent value. Any fields with invalida data will be assigned a ‘0’. The final field ‘CALTOTALPD’ is the calculate combination of the total property damage.

# Assigning '0' to invalid  data
data1$TOTALPD[data1$PROPDMGEXP == "-"] <- 0
data1$TOTALPD[data1$PROPDMGEXP == "?"] <- 0
data1$TOTALPD[data1$PROPDMGEXP == "+"] <- 0

# Assigning values to the property  data
data1$TOTALPD[data1$PROPDMGEXP %in% c("")] <- 1
data1$TOTALPD[data1$PROPDMGEXP %in% c("1")] <- 10^1
data1$TOTALPD[data1$PROPDMGEXP %in% c("H", "h", "2")] <- 10^2
data1$TOTALPD[data1$PROPDMGEXP %in% c("K", "3")] <- 10^3
data1$TOTALPD[data1$PROPDMGEXP %in% c("4")] <- 10^4
data1$TOTALPD[data1$PROPDMGEXP %in% c("5")] <- 10^5
data1$TOTALPD[data1$PROPDMGEXP %in% c("m", "M", "6")] <- 10^6
data1$TOTALPD[data1$PROPDMGEXP %in% c("7")] <- 10^7
data1$TOTALPD[data1$PROPDMGEXP %in% c("8")] <- 10^8
data1$TOTALPD[data1$PROPDMGEXP %in% c("B", "9")] <- 10^9

# Calculating the property damage value
data1$CALTOTALPD <- data1$PROPDMG * data1$TOTALPD

Determine Crop Damage

Next lets explore the crop damage data.

# Explore crop damage data
unique(data1$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M

The ‘TOTALCD’ field was added to capture, organize, and consolidate the total crop damage value from each event. The crop damage value is calculated by multiplying the crop damage and crop exponent value. Any fields with invalida data will be assigned a ‘0’. The final field ‘CALTOTALCD’ is the calculate combination of the total crop damage.

# Assign values for the crop exponent data 
data1$TOTALCD[data1$CROPDMGEXP %in% c("0", "")] <- 1
data1$TOTALCD[data1$CROPDMGEXP == "2"] <- 10^2
data1$TOTALCD[data1$CROPDMGEXP %in% c("K", "k")] <- 10^3
data1$TOTALCD[data1$CROPDMGEXP %in% c("M", "m")] <- 10^6
data1$TOTALCD[data1$CROPDMGEXP == "B"] <- 10^9

# Assign '0' to invalid data
data1$TOTALCD[data1$CROPDMGEXP == "?"] <- 0

# Calculate the crop damage value
data1$CALTOTALCP <- data1$CROPDMG * data1$TOTALCD

Aggregate the total property and crop damage by event type

Next we aggregate the fatalities and injuries (individually) by the event type. In addition we aggregate the calculated total property and crop damage by event type.

# Calculate the totals by event type
fatality <- aggregate(FATALITIES ~ EVTYPE, data1, FUN = sum)
injury <- aggregate(INJURIES ~ EVTYPE, data1, FUN = sum)
pd <- aggregate(CALTOTALPD ~ EVTYPE, data1, FUN = sum)
cd <- aggregate(CALTOTALCP ~ EVTYPE, data1, FUN = sum)

Plot the events with highest fatalities and highest injuries

Highest Property damage and highest crop damage for Top 8 events were calculated here. First, by ordering and separating the injuries and fatalities in their most lethal order ascending. Then merging the datasets and layering their display on a bar graph.

# Listing  events with highest fatalities
fatal <- fatality[order(-fatality$FATALITIES), ][1:8, ]
# Listing events with highest injuries
injury1 <- injury[order(-injury$INJURIES), ][1:8, ]

totalloss <- merge(fatal, injury1)
totalloss <- totalloss[order(-totalloss$FATALITIES),]
rownames(totalloss) <- totalloss$EVTYPE.UP
barplot(t(as.matrix(totalloss[,2:3])), las = 2, names.arg=totalloss$EVTYPE, col = c("red", "yellow"), 
        main = "Total Fatalities & Injuries \nfrom Severe Weather Events")
legend("topright", legend = c("Injuries", "Fatalities"), fill = c("yellow", "red"))

Plot the events with highest Property damage and highest crop damage

Finally, I’ve re-ordered the total property and crop damage datasets and limited them to the top 7. Next, I’ve graphed these two datasets using hte barplot function.

# Find highest property damage
pd1 <- pd[order(-pd$CALTOTALPD), ][1:7, ]

# Find highest crop damage
cd1 <- cd[order(-cd$CALTOTALCP), ][1:7, ]

par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(pd1$CALTOTALPD/(10^9), las = 3, names.arg = pd1$EVTYPE, 
        main = "Highest Property Damages", ylab = "Damage Cost ($ billions)", 
        col = "blue")
barplot(cd1$CALTOTALCP/(10^9), las = 3, names.arg = cd1$EVTYPE, 
        main = "Highest Crop Damages", ylab = "Damage Cost ($ billions)", 
        col = "green")

Results:

Across the United States, tornados, excessive heat, lighting, thunderstorms, and floos are most harmful to population health? Tornados and excessive heat acount for the majority of fatalities across the nation.

In terms of the greatest economic consequences, flooding and hurricanes have the highest impacts on property damage. While, drought, flood, and ice storms have the greatest financial impact on crops.