The following code chunk performs:
- Download the file from the URL provided by the instructor.
- Read the file and load it as a data table (dtStorm).
- Display its structure.
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
"stormDB.csv")
dtStorm = as.data.table(read.csv("stormDB.csv"))
str(dtStorm)
## Classes 'data.table' and 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
## - attr(*, ".internal.selfref")=<externalptr>
This report focuses on seven variables (EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG,and CROPDMGEXP) in order to explore the dataset to address the questions stated in the earlier section of the report. For processing efficiencies:
- A data table (dtNet) hosting these variables is created.
- The Event Type (EVTYPE) values are tidied by being ‘standardized’. Slight variations noticed in values get coerced to permitted event type names.
- The dtNet table gets split to two:
- Fatalities and injuries
- Economic impact
- Property or crop damage estimates were recorded along with 3 different dollar magnitude units (B for billions, M for millions and K for thousands). They are normalized in thousands.
- Property and crop damage estimates are combined for a total estimate per event type.
- The “K” was assumed for the estimates not associated with a proper dollar magnitude units.
- Missing values are filtered out from property or crop damage estimate variables.
dtNet = dtStorm %>% select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
dtNet = dtNet %>% mutate(EVTYPE=toupper(EVTYPE))
# clean up the EVTYPE values
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^COLD",EVTYPE),"COLD",NA))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^(EXCESSIVE|EXTREME) HEAT",EVTYPE),
"EXCESSIVE HEAT",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^EXTREME (COLD|WINDCHILL)",EVTYPE),
"EXTREME COLD-WINDCHILL",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("(FREEZE|FROST)",EVTYPE),"FROST-FREEZE",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^FLOOD",EVTYPE),"FLOOD",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^FLASH FLOOD",EVTYPE),"FLASH FLOOD",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl(".FLASH FLOOD",EVTYPE),"FLASH FLOOD",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^HAIL",EVTYPE),"HAIL",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^HEAT",EVTYPE),"HEAT",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^HEAVY RAIN",EVTYPE),"HEAVY RAIN",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^HEAVY SNOW",EVTYPE),"HEAVY SNOW",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^(HEAVY|HIGH) SURF",EVTYPE),"HIGH SURF",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^HIGH WIND",EVTYPE),"HIGH WIND",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^HURRICANE",EVTYPE),"HURRICANE",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^LAKE.EFFECT",EVTYPE),"LAKE-EFFECT SNOW",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^LANDSLIDE",EVTYPE),"LANDSLIDE",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^LIGHTNING",EVTYPE),"LIGHTNING",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^(MAJOR|RIVER) FLOOD",EVTYPE),"FLOOD",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^RIP CURRENT",EVTYPE),"RIP CURRENT",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^STORM SURGE",EVTYPE),"STORM TIDE",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^STRONG WIND",EVTYPE),"STRONG WIND",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^TROPICAL STORM",EVTYPE),"TROPICAL STORM",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^WATERSPOUT",EVTYPE),"WATERSPOUT",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^WILD*.+FIRE",EVTYPE),"WILDFIRE",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^(THUNDERSTORM|TSTM)",EVTYPE),
"THUNDERSTORM",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("THUNDERSTORM$",EVTYPE),
"THUNDERSTORM",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^TORNADO",EVTYPE),"TORNADO",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^WINTER STORM",EVTYPE),"WINTER STORM",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^WINTER WEATHER",EVTYPE),"WINTER WEATHER",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(is.na(EVTYPEtidy),EVTYPE,EVTYPEtidy))
# set up a data table for population health impact
dtPHI = dtNet %>% filter((FATALITIES >0 & !is.na(FATALITIES)) | (INJURIES >0 & !is.na(INJURIES)))
dtPHI = dtPHI %>% select(EVTYPEtidy, FATALITIES, INJURIES)
# set up a data table for economic damages
dmgUnit = c("B","b","M","m","K","k")
Bil = 1000000
Mil = 1000
dtEI = dtNet %>% filter((PROPDMG >0 & !is.na(PROPDMG) & !is.na(PROPDMGEXP) & PROPDMGEXP %in% dmgUnit)|
(CROPDMG >0 & !is.na(CROPDMG) & !is.na(CROPDMGEXP) & CROPDMGEXP %in% dmgUnit))
dtEI = dtEI %>% mutate (PDTOT = paste(PROPDMG,PROPDMGEXP,sep=""),
CDTOT = paste(CROPDMG,CROPDMGEXP,sep=""))
dtEI = dtEI %>% mutate (damageTotal = ifelse(PROPDMGEXP %in% c("B","b"),PROPDMG*Bil,
ifelse(PROPDMGEXP %in% c("M","m"),PROPDMG*Mil,PROPDMG)) +
ifelse(CROPDMGEXP %in% c("B","b"),CROPDMG*Bil,
ifelse(CROPDMGEXP %in% c("M","m"),CROPDMG*Mil,CROPDMG))
)
dtEI = dtEI %>% select(EVTYPEtidy, PDTOT, CDTOT, damageTotal)