Synopsis

We will try to ask two major questions with the data provided by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, the questions are:

-Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

-Across the United States, which types of events have the greatest economic consequences?

With the data provided by the U.S. National Oceanic and Atmospheric Administration’s storm database we will perform an analysis and try to get conclusions for the questions above this paragraph.

This analysis shows that 96979 persons have been injured (91346) or killed (5633) by the most harmful events which are the TORNADOES. The second one is the EXCESSIVE HEAT with a total of 8428. You can see it in the graphs.

This analysis also shows the events with the most economic consequences over United States are the FLOODS with 150 billions of Dollars in damages followed by the HURRICANES and TYPHOONS with near 72 billions of Dollars.

Data Processing

The following code was used to read the data from the file into R

library(plyr)
## Warning: package 'plyr' was built under R version 4.0.2
library(knitr)
## Warning: package 'knitr' was built under R version 4.0.2
library(ggplot2)
library(grid)
data<-read.csv("repdata_data_StormData.csv.bz2")
head(data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6

The following code was used to transform the data to answer the questions.This is for answering the second question.Here we are changing the values of K,M and B in expenses.

data$PROPDMGEXP<- revalue(data$PROPDMGEXP,c("K"="1000", "M"="1000000", "B"="1000000000"))
 data$PROPDMGEXP<- revalue(data$PROPDMGEXP,c("K"="1000", "M"="1000000", "B"="1000000000"))
## The following `from` values were not present in `x`: K, M, B

For total damage expenses and total injuries and fatalities

data$PROPDMG <- as.numeric(data$PROPDMG)
data$PROPDMGEXP <- as.numeric(data$PROPDMGEXP)
## Warning: NAs introduced by coercion
data$CROPDMG <- as.numeric(data$CROPDMG)
data$CROPDMGEXP <- as.numeric(data$CROPDMGEXP)
## Warning: NAs introduced by coercion
data$FATALITIES<-as.numeric(data$FATALITIES)
data$INJURIES<-as.numeric(data$INJURIES)
data$total_propdmgexp <-data$PROPDMG * data$PROPDMGEXP
data$total_cropdmgexp <-data$CROPDMG * data$CROPDMGEXP
data$total_exp <- rowSums(data[,c("total_propdmgexp", "total_cropdmgexp")], na.rm=TRUE)
summary <- ddply(data,.(EVTYPE), summarize, propdamage = sum(total_exp), injuries= sum(INJURIES), fatalities = sum(FATALITIES), persdamage = sum(INJURIES)+sum(FATALITIES))

After summarizing all the data. Now just aligning all the data in decreasing order according to fatalities and injuries.So that we can know which EVTYPE are most harmful with respect to population health.

summary <- summary[order(summary$persdamage, decreasing = TRUE),]
head(summary)
##             EVTYPE   propdamage injuries fatalities persdamage
## 834        TORNADO  56925660991    91346       5633      96979
## 130 EXCESSIVE HEAT      7753700     6525       1903       8428
## 856      TSTM WIND   4484928440     6957        504       7461
## 170          FLOOD 144657709800     6789        470       7259
## 464      LIGHTNING    928659366     5230        816       6046
## 275           HEAT      1797000     2100        937       3037

Now plotting the data

plot1 <- ggplot(data=head(summary), aes(x=EVTYPE, y= persdamage)) + geom_bar(stat='identity') + labs(x = "event type", y = "injuries and fatalities") 

print(plot1)

So we can see that Tornado are most harmful with respect to population health. Now for answering the second question.Aligning all the data in decreasing order according to property damages.So that we can know which EVTYPE have the greatest economic consequences

summary <- summary[order(summary$persdamage, decreasing = TRUE),]
head(summary)
##             EVTYPE   propdamage injuries fatalities persdamage
## 834        TORNADO  56925660991    91346       5633      96979
## 130 EXCESSIVE HEAT      7753700     6525       1903       8428
## 856      TSTM WIND   4484928440     6957        504       7461
## 170          FLOOD 144657709800     6789        470       7259
## 464      LIGHTNING    928659366     5230        816       6046
## 275           HEAT      1797000     2100        937       3037

Now plotting the data

plot2 <- ggplot(data=head(summary), aes(x=EVTYPE, y= propdamage)) + geom_bar(stat='identity') + labs(x = "event type", y = "property damage (in $USD)") 
print(plot2)

We can see that the FLOODS are the events with the greatest economic consequences, followed by HURRICANES/TYPHOONES.

Results

The following questions were answered by the data analysis:

1.Across the United States, which types of events are most harmful with respect to population health?

This question is easy to answer if you see the graph. The TORNADOES are the most harmful according to population health with more than 90.000 people hurt or dead.

*2.Across the United States, which types of events have the greatest economic consequences?

The FLOODS are the events with the greatest economic consequences, followed by HURRICANES/TYPHOONES.