Dataset source: U.S. National Oceanic and Atmospheric Administration’s (NOAA)
This project reflects the most devastating climate events affecting population health and nation’s economy in general. The NOAA storm database used will tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
# Loading libraries required
library(dplyr)
library(lattice)
The data was sourced directly from the NOAA database website:
1. A temporary directory temp was created for storing the dataset temporarily.
2. Dataset was downloaded to the temp directory.
3. Reading the csv file from the bz2 zip file.
4. Disconnecting the database connection.
5. Converting dataset into tibble dataframe from dplyr library for the ease of analysis later.
temp <- tempfile()
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", temp)
data <- read.csv(bzfile(temp, "repdata_data_StormData.csv"), header = TRUE)
unlink(temp)
tbl <- as_tibble(data)
tbl
## # A tibble: 902,297 x 37
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE BGN_RANGE
## <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 1 4/18/19~ 0130 CST 97 MOBILE AL TORNA~ 0
## 2 1 4/18/19~ 0145 CST 3 BALDWIN AL TORNA~ 0
## 3 1 2/20/19~ 1600 CST 57 FAYETTE AL TORNA~ 0
## 4 1 6/8/195~ 0900 CST 89 MADISON AL TORNA~ 0
## 5 1 11/15/1~ 1500 CST 43 CULLMAN AL TORNA~ 0
## 6 1 11/15/1~ 2000 CST 77 LAUDERDALE AL TORNA~ 0
## 7 1 11/16/1~ 0100 CST 9 BLOUNT AL TORNA~ 0
## 8 1 1/22/19~ 0900 CST 123 TALLAPOOSA AL TORNA~ 0
## 9 1 2/13/19~ 2000 CST 125 TUSCALOOSA AL TORNA~ 0
## 10 1 2/13/19~ 2000 CST 57 FAYETTE AL TORNA~ 0
## # ... with 902,287 more rows, and 28 more variables: BGN_AZI <chr>,
## # BGN_LOCATI <chr>, END_DATE <chr>, END_TIME <chr>, COUNTY_END <dbl>,
## # COUNTYENDN <lgl>, END_RANGE <dbl>, END_AZI <chr>, END_LOCATI <chr>,
## # LENGTH <dbl>, WIDTH <dbl>, F <int>, MAG <dbl>, FATALITIES <dbl>,
## # INJURIES <dbl>, PROPDMG <dbl>, PROPDMGEXP <chr>, CROPDMG <dbl>,
## # CROPDMGEXP <chr>, WFO <chr>, STATEOFFIC <chr>, ZONENAMES <chr>,
## # LATITUDE <dbl>, LONGITUDE <dbl>, LATITUDE_E <dbl>, LONGITUDE_ <dbl>,
## # REMARKS <chr>, REFNUM <dbl>
Here we groups our data and sort it out to get the event with highest fatalities and injuries combined:
1. Group by event types.
2. Summarize the data into sum of fatalities and injuries on each types of event group.
3. Filter out empty values.
4. Create a new total column to compute the sum of both fatalities and injuries.
5. Sort the tibble in descending order of total.
health <- tbl %>%
group_by(EVTYPE) %>%
summarize(fatalities = sum(FATALITIES), injuries = sum(INJURIES)) %>%
filter(injuries!=0 & fatalities!=0) %>%
mutate(total = injuries + fatalities) %>%
arrange(desc(total))
head(health)
## # A tibble: 6 x 4
## EVTYPE fatalities injuries total
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
barchart(fatalities+injuries ~ EVTYPE,
data = health[1:5,],
stack = TRUE,
main = "Population health consequences by climate events",
xlab = "Event type",
ylab = "Total fatalities/ injuries",
auto.key = list(space='right', text=c('Fatalities','Injuries')))
Obviously from both the table and plot, tornado event brings both the highest fatalities and injuries, followed by excessive heat
Here we groups our data and sort it out to get the event with the greatest economic losses in both property and crop:
1. Filter out “B” to get the largest unit in the group. (The idea here is to choose events with both losses in billions instead of millions and thousands).
2. Group by event types.
3. Summarize the data into sum of property and crop losses on each types of event group.
4. Create a new totaldmg column to compute the sum of both property and crop losses.
5. Sort the tibble in descending order of totaldmg.
economy <- tbl %>%
filter(PROPDMGEXP=="B", CROPDMGEXP=="B") %>%
group_by(EVTYPE) %>%
summarize(propdmg = sum(PROPDMG), cropdmg = sum(CROPDMG)) %>%
mutate(totaldmg = propdmg + cropdmg) %>%
arrange(desc(totaldmg))
economy
## # A tibble: 2 x 4
## EVTYPE propdmg cropdmg totaldmg
## <chr> <dbl> <dbl> <dbl>
## 1 RIVER FLOOD 5 5 10
## 2 HURRICANE/TYPHOON 5.88 1.51 7.39
barchart(propdmg+cropdmg~EVTYPE,
data = economy,
stack = TRUE,
main = "Economic consequences by climate events",
xlab = "Event type",
ylab = "Total economic losses",
auto.key = list(space='right', text=c('Property','Crop')))
Although Hurricane/Typhoon brings a greater property damage, River Flood results in a much higher economic losses with property and crop combined.