SUMMARY
This report is a project assignement of Reproducible Research training course at Coursera. In this report we want to explore the severe weather events that create most health and econmic consequences in the United State from 1950 to 2011. The data is retrived from the National Oceanic and Atmospheric Administration (NOAA) Storm Database, the data was analyzed using R-studio. The results clearly show the tornado as the most harmful event for population health and the flood as the one with the greatest economic consequences.
INTRODUCTION
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to estimates of any fatalities, injuries, and property damage.
DATA PROCESSING
The data was downloaded from the link provided by the course,
if (!file.exists("storm.data.bz2"))
{
dlurl <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2Fdf.csv.bz2'
download.file(dlurl,destfile='storm.data.bz2',mode='wb')
}
stormdata <- read.csv(bzfile("storm.data.bz2", "r"))
df1 <- as_tibble(stormdata) # create useful table for data
names(df1) <- make.names(names(df1), allow_ = FALSE) # compatibility names
# select data for analysis and store
use.storm <- df1 %>%
select(EVTYPE, FATALITIES, INJURIES,
contains("DMG"))
# eastimate the health consequence by event type by calculating
# the sum of fatalities injuries, and total.
# also rank the number of fatality, injury and both and assigned into new variables.
# arrange the data by total, fatalities and injury (decending)
health.storm <- use.storm %>%
select(EVTYPE, FATALITIES, INJURIES) %>%
group_by(EVTYPE) %>%
summarise(across(c("FATALITIES", "INJURIES"), ~sum(.,na.rm = TRUE))) %>%
mutate(TOT.HCON=FATALITIES + INJURIES,
RK.FAT=dense_rank(desc(FATALITIES)),
RK.INJ=dense_rank(desc(INJURIES)),
RK.TOT=dense_rank(desc(TOT.HCON))) %>%
arrange(desc(TOT.HCON),
desc(FATALITIES),
desc(INJURIES))
# generate a dataset for property damage.
# calculate the lost
PROP.storm <- use.storm %>%
select(EVTYPE, starts_with("PROP")) %>%
group_by(EVTYPE, PROPDMGEXP) %>%
summarize(DAMAGE.SET=sum(PROPDMG)) %>%
mutate(
PROPDAMAGE=ifelse(PROPDMGEXP=="K",
DAMAGE.SET*(10^3),
ifelse(PROPDMGEXP=="M",
DAMAGE.SET*(10^6),
ifelse(PROPDMGEXP=="B",
DAMAGE.SET*(10^9),
DAMAGE.SET)))) %>%
summarise(TOTPROPDMG=sum(PROPDAMAGE))
## `summarise()` has grouped output by 'EVTYPE'. You can override using the `.groups` argument.
# generate a dataset for crop damage.
# calculate the lost
CROP.storm <- use.storm %>%
select(EVTYPE, starts_with("CROP")) %>%
group_by(EVTYPE, CROPDMGEXP) %>%
summarize(DAMAGE.SET=sum(CROPDMG) ) %>%
mutate(CROPDAMAGE=ifelse(CROPDMGEXP=="K",
DAMAGE.SET*(10^3),
ifelse(CROPDMGEXP=="M",
DAMAGE.SET*(10^6),
ifelse(CROPDMGEXP=="B",
DAMAGE.SET*(10^9),
DAMAGE.SET)))) %>%
summarise(TOTCROPDMG=sum(CROPDAMAGE))
## `summarise()` has grouped output by 'EVTYPE'. You can override using the `.groups` argument.
# join two dataset and compute total damage
DMG.storm <- full_join(PROP.storm,CROP.storm) %>%
mutate(TOTDMG=TOTPROPDMG + TOTCROPDMG,
RK.PROP=dense_rank(desc(TOTPROPDMG)),
RK.CROP=dense_rank(desc(TOTCROPDMG)),
RK.DMG=dense_rank(desc(TOTDMG))) %>%
arrange(desc(TOTDMG))
## Joining, by = "EVTYPE"
RESULTS
# calculate number of weather events by year.
# calculate some descriptive statistics
bgy <- format(as.Date(df1$BGN.DATE, format="%d/%m/%Y"),"%Y")
t<-table(bgy)
t <-as_tibble(t)
minev=min(t$n)
maxev=max(t$n)
avgev=as.integer(mean(t$n))
The data tracks all adverse weather events from 1950 to 2011, the average number of event per year is 5915, the min is 71 and max is 25716.
The firgure 1 below showed the number of weather events recorded by year.
#plot number of event per year
plot(t, main="Figure 1: Number of event by year",col.main='blue',xlab="Year",ylab="Number of events")
It is clearly showed in the figure 1, the mumber of weather event increasing by year. There were 985 types of weather events recorded in from 1950 to 2011 in US.
They caused a total of 15145 fatalities and 140528 injuries in the US.
The figure 2 present the top 10 weather events that are most harmful to population health. It is creatly showed that Tonado is the leading cause of health consequences.
# create barplot showing the consequences of top ten events ranked by health consequences.
barplot(t(as.matrix(health.storm[1:10,4:2])),
main = "Figure 2: Top 10 events are most harmful to population health ",
names.arg = health.storm$EVTYPE[1:10],
las=3,
cex.names = 0.45,
ylab = "Death/injuried",
beside = TRUE,
col = rainbow(3))
legend(20,50000,c("Total", "Injuried", "Fatalities"),
fill = rainbow(3))
The weather events also made a total of US $427,279,750,338 lost during the period of 1950 to 2011.
# create barplot of top 10 cause of economic damage caused by weather events.
barplot(t(as.matrix(DMG.storm[1:10,c(4,2,3)])),
main = "Figure 3: Top 10 events have the greatest economic consequences",
names.arg = DMG.storm$EVTYPE[1:10],
las=3,
cex.names = 0.45,
ylab = "Damage in US $",
beside = TRUE,
col = rainbow(3))
legend(20,80000000000,c("Total", "Property", "Crop"),
fill = rainbow(3))
Flood is the leading cause of total and property lost, but drought is the main cause of crop failure.
CONCLUSION
The weather-related events have huge impact on population health and the economy. It is important to be alert and prepared to minimize the consequences.