Synopsys

  1. In this way, it was found that in the first place and by abysmal difference with respect to the other events on the podium, there are the tornadoes, followed by Excessive heat, TSTM Wind, FLood and Lightning.

  2. As expected, the tornado is the event that brings the most economic consequences, followed by Flash flood, TSTM Wind, Hail, Flood.

Data Processing

The following packages were used:

library(ggplot2)
library(dplyr)
library(plyr)

Load Data:

fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, "data.csv.bz2")
data <- read.csv("data.csv.bz2", sep = ",",header = TRUE )

Across the United States, which types of events EVTYPE variable are most harmful with respect to population health?

The variables FATALITIES and INJURIES were grouped according to the type of event. In this way, it is possible to identify which event accumulates the greatest number of deaths and injuries and classify them according to their harmfulness.

data_2 <- aggregate(FATALITIES + INJURIES ~ EVTYPE, 
                    FUN = "sum",
                    data=data)

colnames(data_2)[2]<-"v1"

When analyzing the structure of the data, it was found that the vast majority of events do not report deaths or injuries.

prob <- c(1:10)/10
quantile(data_2$v1, prob=prob)
##     10%     20%     30%     40%     50%     60%     70%     80%     90%    100% 
##     0.0     0.0     0.0     0.0     0.0     0.0     0.0     1.0     7.6 96979.0

Additionally, it is observed that there is a significant difference between the number of deaths and injuries that accumulates the most harmful event with respect to the second., It is also observed that there is a jump between the fifth most harmful event and the sixth event, for which reason they are chosen as the most harmful events to the top 5.

data_2 <- data_2 %>% arrange(desc(v1))
top <- data_2[1:5,]
head(data_2, n=10)
##               EVTYPE    v1
## 1            TORNADO 96979
## 2     EXCESSIVE HEAT  8428
## 3          TSTM WIND  7461
## 4              FLOOD  7259
## 5          LIGHTNING  6046
## 6               HEAT  3037
## 7        FLASH FLOOD  2755
## 8          ICE STORM  2064
## 9  THUNDERSTORM WIND  1621
## 10      WINTER STORM  1527

In this way, it was found that in the first place and by abysmal difference with respect to the other events on the podium, there are the tornadoes, followed by Excessive heat, TSTM Wind, FLood and Lightning.

ggplot(data=top,aes(x=reorder(EVTYPE, -v1), y=v1,fill=v1))+
  geom_bar(stat= "identity", position="dodge")+
  coord_cartesian(ylim= c(4000,15000))+
  annotate("text", x= 1, y = 14000, label = "+15000", color = "white")+
  labs(x=" Event Type",
       y= " Fatalities + Injuries",
       title = "Harmful Events")+
  scale_fill_gradient("Harmfulness", low = "grey", high = "black")+
  theme(axis.text.x = element_text(size=6.35))

Across the United States, which types of events have the greatest economic consequences?

Two temporary variables were created that indicate the expansion factor that is included in a separate variable and by which it must be multiplied to indicate the value in dollars. Subsequently, the costs for the property and the crops are added and entered in a new variable.Only the variables to be used were selected and separated into a separate data set.

tempPROPDMG <- mapvalues(data$PROPDMGEXP,
                         c("K","M","", "B","m","+","0","5","6",
                           "?","4","2","3","h","7","H","-","1","8"), 
                         c(1e3,1e6, 1, 1e9,1e6,  1,  1,1e5,1e6,  1,
                           1e4,1e2,1e3,  1,1e7,1e2,  1, 10,1e8))

tempCROPDMG <- mapvalues(data$CROPDMGEXP,
                         c("","M","K","m","B","?","0","k","2"),
                         c( 1,1e6,1e3,1e6,1e9,1,1,1e3,1e2))

data$PROPTOTALDMG <- as.numeric(tempPROPDMG) * data$PROPDMG
data$CROPTOTALDMG <- as.numeric(tempCROPDMG) * data$CROPDMG
data$TOTALDMG <- data$PROPTOTALDMG + data$CROPTOTALDMG

data_3 <- data %>% select(EVTYPE, TOTALDMG)
data_4 <- aggregate(TOTALDMG ~ EVTYPE, FUN = "sum", data = data_3)
data_4 <- data_4 %>% arrange(desc(TOTALDMG))

As expected, the tornado is the event that brings the most economic consequences, followed by Flash flood, TSTM Wind, Hail, Flood.

ggplot(data=head(data_4,5),aes(x=reorder(EVTYPE, -TOTALDMG), 
                               y=TOTALDMG,fill=TOTALDMG))+
  geom_bar(stat= "identity", position="dodge")+
  labs(x=" Event Type",
       y= "Property Damage (USD)",
       title = "Economic consequences")+
  scale_fill_gradient("Economic Damage", low = "grey", high = "black")+
  theme(axis.text.x = element_text(size=6.35))

Results

  1. In this way, it was found that in the first place and by abysmal difference with respect to the other events on the podium, there are the tornadoes, followed by Excessive heat, TSTM Wind, FLood and Lightning.

  2. As expected, the tornado is the event that brings the most economic consequences, followed by Flash flood, TSTM Wind, Hail, Flood.