This is a detailed report of the weather related impacts on the United states population health and its economy. Health factors that have been explored are the number of total injuries and fatalities resulting severe weather conditions. The economic impacts explored are the impact of these weather condictions on crop and property damaages.
# Checks if the data is available, and if not it downloads it.
if (!file.exists("StormData.csv.bz2")){
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, "StormData.csv.bz2")
}
stormdata <- read.csv("StormData.csv.bz2")
## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
## : EOF within quoted string
The code below does the following - Transforms the date column from a character to numeric - Filters data from 1990 to 2011 - Selects columns that will be relavant for our analysis - Removes the rows with no values in the selected columns - Prints the subset of the data
data <- transform(stormdata, BGN_DATE = as.Date(BGN_DATE, format = "%m/%d/%Y %H:%M:%S"))
data <- data %>%
filter(year(BGN_DATE) %in% 1990:2011) %>%
select(BGN_DATE, EVTYPE, FATALITIES, INJURIES,
PROPDMG,PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
filter(FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 |
PROPDMGEXP > 0 | CROPDMG > 0| CROPDMGEXP > 0)
head(data)
## BGN_DATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 1990-01-25 TORNADO 0 28 2.5 M 0
## 2 1990-02-03 TORNADO 0 0 25.0 K 0
## 3 1990-02-03 TORNADO 0 0 25.0 K 0
## 4 1990-02-03 TORNADO 0 3 2.5 M 0
## 5 1990-02-03 TORNADO 0 2 2.5 M 0
## 6 1990-02-03 TORNADO 0 15 2.5 M 0
We need to convert the exponent values in CROPDMGEXP and PROPDMGEXP
These are possible values of CROPDMGEXP and PROPDMGEXP:
H,h = hundreds = 100
K,k = kilos = thousands = 1,000
M,m = millions = 1,000,000
B,b = billions = 1,000,000,000
(+) = 1 , (-) = 0, (?) = 0
black/empty character = 0
numeric 0..8 = 10
The code below lookups the values in the CROPDMGEXP AND PROPDMGEX and replaces them with their exponest values.
data <- data %>%
mutate(CROPDMGEXP = ifelse( CROPDMGEXP %in% c("M","m"), 10^6,
ifelse( CROPDMGEXP == "K", 10^3,
ifelse( CROPDMGEXP == "B", 10^9,
ifelse(CROPDMGEXP %in% c("","?","0"), 0,
ifelse(CROPDMGEXP == "2", 10, "")
)))),
PROPDMGEXP = ifelse(PROPDMGEXP %in% c("M","m"), 10^6,
ifelse( PROPDMGEXP == "K", 10^3,
ifelse( PROPDMGEXP == "B", 10^9,
ifelse(PROPDMGEXP %in% c("","?","0"), 0,
ifelse(PROPDMGEXP %in%
c("0","5","6",
"4","2","3",
"7","H","1","8"),
10, ""))))))
The table below presents the highest total fatalities and injuries by specif event type
Inj_fat_data <- data %>%
group_by(EVTYPE) %>%
summarise(fatalities = sum(FATALITIES),
injuries = sum(INJURIES)) %>%
mutate(total = (fatalities + injuries)) %>%
arrange(desc(total)) %>%
slice(1:10)
head(Inj_fat_data)
## # A tibble: 6 × 4
## EVTYPE fatalities injuries total
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 1134 20080 21214
## 2 EXCESSIVE HEAT 1828 6324 8152
## 3 FLOOD 377 6658 7035
## 4 LIGHTNING 764 4885 5649
## 5 TSTM WIND 327 5022 5349
## 6 FLASH FLOOD 858 1561 2419
The table above shows that TORNADO has the highest impact when injuries and fatalities are combined. the charts below show difference in contribution of the top 10 events by the type of of the impact. Whilst TORNADO had the highest impact in injuries, EXCESSIVE HEAT contributed the most to fatalities impact.
par(mfrow = c(1,3))
pie(Inj_fat_data$total, Inj_fat_data$EVTYPE, col = factor(Inj_fat_data$EVTYPE))
title(main = "Total Impact")
pie(Inj_fat_data$injuries, Inj_fat_data$EVTYPE, col = factor(Inj_fat_data$EVTYPE))
title(main = "Injuries Impact")
pie(Inj_fat_data$fatalities, Inj_fat_data$EVTYPE, col = factor(Inj_fat_data$EVTYPE))
title(main = "fatalities Impact")
The code below first creates cost varirables for Crop damage and property damage. it summarises the total cost of both crops and property by the event caused.
data <- transform(data, PROPDMGEXP = as.numeric(PROPDMGEXP),
CROPDMGEXP = as.numeric(CROPDMGEXP))
econ_impact <- data %>%
mutate(CROPCOST = (CROPDMG * CROPDMGEXP),
PROPCOST = (PROPDMG * PROPDMGEXP)) %>%
group_by(EVTYPE) %>%
summarise(CROPCOST = sum(CROPCOST),
PROPCOST = sum(PROPCOST ),
TOTALCOST = (CROPCOST + PROPCOST)) %>%
arrange(desc(TOTALCOST)) %>%
slice(1:10)
head(econ_impact)
## # A tibble: 6 × 4
## EVTYPE CROPCOST PROPCOST TOTALCOST
## <chr> <dbl> <dbl> <dbl>
## 1 FLOOD 4405166450 134822237410 139227403860
## 2 HURRICANE/TYPHOON 2607872800 69305840000 71913712800
## 3 STORM SURGE 5000 43323536000 43323541000
## 4 FLASH FLOOD 1184061100 14058230296 15242291396
## 5 DROUGHT 13935485000 1045992000 14981477000
## 6 HURRICANE 2731410000 11857819010 14589229010
As outlined in the graph below and the table above, we see that floods had the greatest economic impact on both the crops and properties.
ggplot(econ_impact, aes(EVTYPE, TOTALCOST/10^9)) +
geom_bar(stat = "identity", aes(fill = EVTYPE))