In this report, we aim to analyze the impact of different weather events on public health and economy based on the storm database collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) from 1950 - 2011. It will try to answer two unknowns, the first will cover the problems from the economic aspect and the other based on the impact of these problems on the population of the United States
In this report, we aim to analyze the impact of different weather events on public health and economy based on the storm database collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) from 1950 - 2011. We will use the estimates of fatalities, injuries, property and crop damage to decide which types of event are most harmful to the population health and economy. From these data, we found that excessive heat and tornado are most harmful with respect to population health, while flood, drought, and hurricane/typhoon have the greatest economic consequences.
storm<- read.csv("stormData.csv.bz2")
head(storm)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
dim(storm)
## [1] 902297 37
peligro<-aggregate(storm$FATALITIES~storm$EVTYPE,FUN = sum)
colnames(peligro)<- c("Event type","Fatalities")
head(peligro)
## Event type Fatalities
## 1 HIGH SURF ADVISORY 0
## 2 COASTAL FLOOD 0
## 3 FLASH FLOOD 0
## 4 LIGHTNING 0
## 5 TSTM WIND 0
## 6 TSTM WIND (G45) 0
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
peligro2<- subset(peligro,peligro$Fatalities > 0)
hist(log(peligro2$Fatalities),main="Fatilities Hist",
xlab=paste(expression("Fatalities (logarithm) terms")))
the most dangerous event in usa is:
moda <- peligro2[which.max(peligro2$Fatalities),c(1,2)]
moda
## Event type Fatalities
## 834 TORNADO 5633
the most common event in USA is
recurrencia<- as.data.frame(table(storm$EVTYPE))
recurrencia2<- recurrencia[which.max(recurrencia$Freq),c(1,2)]
recurrencia2
## Var1 Freq
## 244 HAIL 288661
In the following regression, it is easy to illustrate the relationship between the frequency of the event and deaths in the United States. It can be seen that the more frequent the event, the greater the deaths will be.
library(ggplot2)
colnames(recurrencia)<- c("Event type","frequency")
relacion<- merge(peligro,recurrencia)
ggplot(data = relacion, aes(x = log(frequency), y = log(Fatalities))) +
geom_point(color='blue') +
geom_smooth(method = "lm", se = FALSE)+
labs(title =" Relation frequency vs Fatalities(Log scale)")+
ylab("Fatalities")+xlab("Frequency")
## `geom_smooth()` using formula 'y ~ x'
the most big proporcion of damage in USA
Totales<-aggregate(storm$PROPDMG~storm$EVTYPE,FUN = sum)
colnames(Totales)<- c("Event type","Proporcion")
moda2 <- Totales[which.max(Totales$Proporcion),c(1,2)]
moda2
## Event type Proporcion
## 834 TORNADO 3212258
It is clearly seen that the greater the frequency of incidents in the United States, the greater the proportion of economic damages
relacion3<- merge(Totales,recurrencia)
ggplot(data = relacion3, aes(x = log(Proporcion), y = log(frequency))) +
geom_point(color='red') +
geom_smooth(method = "lm", se = FALSE)+
labs(title =" Relation frequency vs Proportion(log Scale)")+
ylab("Proportion Damage")+xlab("Frequency")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 579 rows containing non-finite values (stat_smooth).
It is concluded that one of the main problems in the United States both for health and for the economic part are tornadoes always occupying the head, for this reason it is necessary to find a way to better prevent these events.