This R program will sort through the National Weather Service's data and discover which types of events produce the most damage—for humans and/or property.
We will see that—from the National Weather Service's data—heavy surf and wind causes the most damage considering both human and financial tolls. The next worry for the human toll is flood/rain/winds, while the next worry for the financial toll is microburst winds.
First, we can load the “bz” type of zip file with
setInternet2(TRUE)
f <- tempfile()
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
f)
srcData <- read.csv(bzfile(f))
Let us look at what we have here with the 'structure' function.
str(srcData)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
We can see some curious variables
Next, we can condense the data about the cost to humans. I am going to brashly assume that fatalities are 10 times as bad as injuries.
augmented_data <- cbind(srcData, 10 * srcData$FATALITIES + srcData$INJURIES)
Similarly, we can set up a column to focus on the financial implications.
augmented_data <- cbind(augmented_data, srcData$PROPDMG + srcData$CROPDMG)
colnames(augmented_data) <- c(colnames(srcData), "HUMANTOLL", "FINANCIALTOLL")
We will now add up the data according to the different types of storm events.
human_toll_by_event <- by(augmented_data$HUMANTOLL, augmented_data$EVTYPE, sum)
financial_toll_by_event <- by(augmented_data$FINANCIALTOLL, augmented_data$EVTYPE,
sum)
smaller_data_frame <- data.frame(unique(srcData$EVTYPE), as.numeric(human_toll_by_event))
smaller_data_frame <- cbind(smaller_data_frame, as.numeric(financial_toll_by_event))
colnames(smaller_data_frame) <- c("EVTYPE", "HUMANTOLL", "FINANCIALTOLL")
After sorting this smaller data frame, we will try to plot only the most significant events (rather than all 985 of them) that affects humans directly.
smaller_data_frame <- smaller_data_frame[order(-smaller_data_frame$HUMANTOLL),
]
plot(smaller_data_frame[1:5, 2], main = "Weather vs. Man", xaxt = "n", xlab = "Weather Event",
ylab = "Human Toll")
axis(side = 1, at = 1:5, labels = as.character(smaller_data_frame[1:5, 1]))
as.character(smaller_data_frame[1:5, 1])
## [1] "Heavy surf and wind" "FLOOD/RAIN/WINDS" "RECORD COLD/FROST"
## [4] "ABNORMAL WARMTH" "MICROBURST WINDS"
Finally, we can look for which weather events affect us financially.
smaller_data_frame <- smaller_data_frame[order(-smaller_data_frame$FINANCIALTOLL),
]
plot(smaller_data_frame[1:5, 2], main = "Weather vs. Money", xaxt = "n", xlab = "Weather Event",
ylab = "Financial Toll")
axis(side = 1, at = 1:5, labels = as.character(smaller_data_frame[1:5, 1]))
as.character(smaller_data_frame[1:5, 1])
## [1] "Heavy surf and wind" "MICROBURST WINDS" "ABNORMAL WARMTH"
## [4] "COLD WAVE" "HIGH WINDS 73"