First, the data was input from the original file. Since we only focused on the impact of severe weather on population health and economic influence, I selected the related variables and created the “weather_2” data frame.
library(readr)
library(dplyr)
library(stringr)
library(lubridate)
weather=read_csv("C:/Users/Lenovo/Desktop/R/rdata/repdata_task2/repdata_data_StormData.csv.bz2")
weather_2=weather %>%
select(BGN_DATE,COUNTY,COUNTYNAME,STATE,EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP) %>%
mutate(BGN_DATE=mdy(str_replace(BGN_DATE," 0:00:00","")),year=year(BGN_DATE))
In this part, I would analyze the impact of severe weather on fatality and injuries.
popuhealth=weather_2 %>%
group_by(EVTYPE) %>%
summarise(total_fatal=sum(FATALITIES),total_injury=sum(INJURIES))
top_fatal=popuhealth %>%
arrange(-total_fatal) %>%
select(EVTYPE,total_fatal)
head(top_fatal,5)
## # A tibble: 5 × 2
## EVTYPE total_fatal
## <chr> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
The above table shows the top 5 causes of mortality across 1950 to 2021, which were tornado, excessive heat, flash flood, heat, and lightning.
top_injury=popuhealth %>%
arrange(-total_injury) %>%
select(EVTYPE,total_injury)
head(top_injury,5)
## # A tibble: 5 × 2
## EVTYPE total_injury
## <chr> <dbl>
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
The above table shows the top 5 causes of injuries, which were tornado, TSTM wind, flood, excessive heat and lightning. In summary, three types of disaster caused both a large number of fatality and injuries, that is tornado, excessive heat, and lightning. And I created two time series plots to show the fatality and injuries caused by these three disaster across time.
library(ggplot2)
library(patchwork)
p1=weather_2 %>%
filter(EVTYPE %in% c("TORNADO","EXCESSIVE HEAT","LIGHTNING")) %>%
group_by(year,EVTYPE) %>%
summarize(sum_fatal=sum(FATALITIES)) %>%
ggplot(aes(x=year,y=sum_fatal,color=EVTYPE))+
geom_line()+
labs(x="Year",y="Fatality")
p2=weather_2 %>%
filter(EVTYPE %in% c("TORNADO","EXCESSIVE HEAT","LIGHTNING")) %>%
group_by(year,EVTYPE) %>%
summarize(sum_injur=sum(INJURIES)) %>%
ggplot(aes(x=year,y=sum_injur,color=EVTYPE))+
geom_line()+
labs(x="Year",y="Injuries")
p1+p2+
plot_annotation(title="Fatality and injuries caused by the most dangerous disaster weather events across 1950-2021")+
plot_layout(guides = "collect")&theme(legend.position = "right")
In the second part, I would analysis the impact of severe weather on economic consequences, which equals to the sum of property and crops damage.
weather_2=weather_2 %>%
mutate(PROPDMGEXPn=case_when(
PROPDMGEXP=="B"~1000000000,
PROPDMGEXP=="K"~1000,
PROPDMGEXP=="m"~1000000,
PROPDMGEXP=="M"~1000000,
TRUE~0
)) %>%
mutate(CROPDMGEXPn=case_when(
CROPDMGEXP=="B"~1000000000,
CROPDMGEXP %in% c("K","k")~1000,
CROPDMGEXP %in% c("M","m")~1000000,
TRUE~0
)) %>%
mutate(ecosum=PROPDMG*PROPDMGEXPn+CROPDMG*CROPDMGEXPn)
economic_sum=weather_2 %>%
group_by(EVTYPE) %>%
summarize(ecocons=sum(ecosum,na.rm = T)) %>%
arrange(-ecocons)
head(economic_sum,5)
## # A tibble: 5 × 2
## EVTYPE ecocons
## <chr> <dbl>
## 1 FLOOD 150319678250
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57352113590
## 4 STORM SURGE 43323541000
## 5 HAIL 18758221170
As we can see from the above table, the top 5 events that caused the largest economic consequences are flood, hurricane/typhoon, tornado, storm surge, and hail. The trends of damages caused by these 5 events across time were shown in below figure.
weather_2 %>%
filter(EVTYPE %in% c("FLOOD","HURRICANE/TYPHOON","TORNADO","STORM SURGE","HAIL")) %>%
group_by(year,EVTYPE) %>%
summarise(ecocons=sum(ecosum,na.rm = T)) %>%
ggplot(aes(x=year,y=ecocons,color=EVTYPE))+
geom_line()+
scale_y_log10()+
labs(x="Year",y="Damages (property damage+crop damage)",title="The top 5 events that caused the largest economic consequences across 1950-2021")