Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.
Your data analysis must address the following questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
Consider writing your report as if it were to be read by a government or municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events. However, there is no need to make any specific recommendations in your report.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site: Storm data.
library(dplyr)
library(tidyr)
library(ggplot2)
library(gridExtra)
setwd("C:/Users/ASUS/Documents/COURSERA/Reproducible Research/Project2")
# Reading data
StormData <- read.csv(bzfile("repdata_data_StormData.csv.bz2"))
summary(StormData)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
##
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0.0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31.0 Class :character Class :character Class :character
## Median : 75.0 Mode :character Mode :character Mode :character
## Mean :100.6
## 3rd Qu.:131.0
## Max. :873.0
##
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE
## Min. : 0.000 Length:902297 Length:902297 Length:902297
## 1st Qu.: 0.000 Class :character Class :character Class :character
## Median : 0.000 Mode :character Mode :character Mode :character
## Mean : 1.484
## 3rd Qu.: 1.000
## Max. :3749.000
##
## END_TIME COUNTY_END COUNTYENDN END_RANGE
## Length:902297 Min. :0 Mode:logical Min. : 0.0000
## Class :character 1st Qu.:0 NA's:902297 1st Qu.: 0.0000
## Mode :character Median :0 Median : 0.0000
## Mean :0 Mean : 0.9862
## 3rd Qu.:0 3rd Qu.: 0.0000
## Max. :0 Max. :925.0000
##
## END_AZI END_LOCATI LENGTH WIDTH
## Length:902297 Length:902297 Min. : 0.0000 Min. : 0.000
## Class :character Class :character 1st Qu.: 0.0000 1st Qu.: 0.000
## Mode :character Mode :character Median : 0.0000 Median : 0.000
## Mean : 0.2301 Mean : 7.503
## 3rd Qu.: 0.0000 3rd Qu.: 0.000
## Max. :2315.0000 Max. :4400.000
##
## F MAG FATALITIES INJURIES
## Min. :0.0 Min. : 0.0 Min. : 0.0000 Min. : 0.0000
## 1st Qu.:0.0 1st Qu.: 0.0 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median :1.0 Median : 50.0 Median : 0.0000 Median : 0.0000
## Mean :0.9 Mean : 46.9 Mean : 0.0168 Mean : 0.1557
## 3rd Qu.:1.0 3rd Qu.: 75.0 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :5.0 Max. :22000.0 Max. :583.0000 Max. :1700.0000
## NA's :843563
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0.00 Length:902297 Min. : 0.000 Length:902297
## 1st Qu.: 0.00 Class :character 1st Qu.: 0.000 Class :character
## Median : 0.00 Mode :character Median : 0.000 Mode :character
## Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :5000.00 Max. :990.000
##
## WFO STATEOFFIC ZONENAMES LATITUDE
## Length:902297 Length:902297 Length:902297 Min. : 0
## Class :character Class :character Class :character 1st Qu.:2802
## Mode :character Mode :character Mode :character Median :3540
## Mean :2875
## 3rd Qu.:4019
## Max. :9706
## NA's :47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## Min. :-14451 Min. : 0 Min. :-14455 Length:902297
## 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 8707 Median : 0 Median : 0 Mode :character
## Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. : 17124 Max. :9706 Max. :106220
## NA's :40
## REFNUM
## Min. : 1
## 1st Qu.:225575
## Median :451149
## Mean :451149
## 3rd Qu.:676723
## Max. :902297
##
The data contains NA’s in the variables BGN_AZI, BGN_AZI, END_DATE, END_TIME and others.
The key variables for the analysis are:
StormData <- StormData %>% select(EVTYPE,FATALITIES,INJURIES,PROPDMG,
CROPDMG,PROPDMGEXP,CROPDMGEXP)
unique(StormData$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(StormData$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
PROPDMGEXP and CROPDMGEXP have some wrong values and need to be fixed, by assigning numeric values to the exponential powers.
StormData$CROPDMGEXP <- toupper(StormData$CROPDMGEXP)
StormData$CROPDMGEXP[StormData$CROPDMGEXP %in% c("", "+", "-", "?")] <- "0"
StormData$CROPDMGEXP[StormData$CROPDMGEXP %in% c("B")] <- "9"
StormData$CROPDMGEXP[StormData$CROPDMGEXP %in% c("M")] <- "6"
StormData$CROPDMGEXP[StormData$CROPDMGEXP %in% c("K")] <- "3"
StormData$CROPDMGEXP[StormData$CROPDMGEXP %in% c("H")] <- "2"
StormData$PROPDMGEXP <- toupper(StormData$PROPDMGEXP)
StormData$PROPDMGEXP[StormData$PROPDMGEXP %in% c("", "+", "-", "?")] <- "0"
StormData$PROPDMGEXP[StormData$PROPDMGEXP %in% c("B")] <- "9"
StormData$PROPDMGEXP[StormData$PROPDMGEXP %in% c("M")] <- "6"
StormData$PROPDMGEXP[StormData$PROPDMGEXP %in% c("K")] <- "3"
StormData$PROPDMGEXP[StormData$PROPDMGEXP %in% c("H")] <- "2"
Total values of property damage, crop damage and total damage are calculated using the fixed values of the variables.
StormData$PROPDMGTOTAL <- StormData$PROPDMG * (10 ^ as.numeric(StormData$PROPDMGEXP))
StormData$CROPDMGTOTAL <- StormData$CROPDMG * (10 ^ as.numeric(StormData$CROPDMGEXP))
StormData$DMGTOTAL <- StormData$PROPDMGTOTAL + StormData$CROPDMGTOTAL
Some types of events and their frecuencies:
mis.colores.3 <- colorRampPalette(c("#ff9999", "#99ff99", "#9999ff"))
a <- StormData %>% group_by(EVTYPE) %>%
summarise(Frecuencia = n())
a <- arrange(a, desc(Frecuencia))
head(a,20) %>% ggplot(aes(x = EVTYPE, y = Frecuencia)) +
geom_bar(stat = "identity", width = 0.5,
fill = mis.colores.3(20)) +
coord_flip()
Sum_events <- StormData %>%
group_by(EVTYPE) %>%
summarize(SUMFATALITIES = sum(FATALITIES),
SUMINJURIES = sum(INJURIES),
SUMPROPDMG = sum(PROPDMGTOTAL),
SUMCROPDMG = sum(CROPDMGTOTAL),
TOTALDMG = sum(DMGTOTAL))
head(Sum_events)
## # A tibble: 6 x 6
## EVTYPE SUMFATALITIES SUMINJURIES SUMPROPDMG SUMCROPDMG TOTALDMG
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 " HIGH SURF ADVISO~ 0 0 200000 0 200000
## 2 " COASTAL FLOOD" 0 0 0 0 0
## 3 " FLASH FLOOD" 0 0 50000 0 50000
## 4 " LIGHTNING" 0 0 0 0 0
## 5 " TSTM WIND" 0 0 8100000 0 8100000
## 6 " TSTM WIND (G45)" 0 0 8000 0 8000
Events that caused most fatalities, most injuries, and most damages:
SummStormDataFatality <- arrange(Sum_events, desc(SUMFATALITIES))
FatalityData <- head(SummStormDataFatality)
SummStormDataInjury <- arrange(Sum_events, desc(SUMINJURIES))
InjuryData <- head(SummStormDataInjury)
SummStormDataDamage <- arrange(Sum_events, desc(TOTALDMG))
DamageData <- head(SummStormDataDamage)
FatalityData$EVTYPE <- with(FatalityData,
reorder(EVTYPE, -SUMFATALITIES))
x <- ggplot(FatalityData, aes(EVTYPE, SUMFATALITIES,
label = SUMFATALITIES)) +
geom_bar(stat = "identity", fill = 6) +
geom_text(nudge_y = 200) +
xlab("Event Type") +
ylab("Total Fatalities") +
ggtitle("Most Fatal Events") +
theme(plot.title = element_text(hjust = 0.5))
InjuryData$EVTYPE <- with(InjuryData,
reorder(EVTYPE, -SUMINJURIES))
y <- ggplot(InjuryData, aes(EVTYPE, SUMINJURIES,
label = SUMINJURIES)) +
geom_bar(stat = "identity", fill = 3) +
geom_text(nudge_y = 3000) +
xlab("Event Type") +
ylab("Total Injuries") +
ggtitle("Most Injury Events") +
theme(plot.title = element_text(hjust = 0.5))
DamageData$EVTYPE <- with(DamageData, reorder(EVTYPE, -TOTALDMG))
DamageDataLong <- DamageData %>%
gather(key = "Type", value = "TOTALDAMAGE",
c("SUMPROPDMG", "SUMCROPDMG")) %>%
select(EVTYPE, Type, TOTALDAMAGE)
DamageDataLong$Type[DamageDataLong$Type %in% c("SUMPROPDMG")] <- "Property damage"
DamageDataLong$Type[DamageDataLong$Type %in% c("SUMCROPDMG")] <- "Crop damage"
# Plot
z <- ggplot(DamageDataLong, aes(x = EVTYPE,
y = TOTALDAMAGE, fill = Type)) +
geom_bar(stat = "identity", position = "stack") +
xlab("Event Type") +
ylab("Total Damage") +
ggtitle("Events with Most Damage") +
theme(plot.title = element_text(hjust = 0.5), legend.position = "bottom") +
coord_flip()
grid.arrange(x,y)
z
The graphics show that Tornados are events with most total fatalitites and total injuries, which represent great damage with respect to population health.
The graphics show that the event with most damage are floods.