We will try to ask two major questions with the data provided by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, the questions are:
-Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
-Across the United States, which types of events have the greatest economic consequences?
With the data provided by the U.S. National Oceanic and Atmospheric Administration’s storm database we will perform an analysis and try to get conclusions for the questions above this paragraph.
This analysis shows that 96979 persons have been injured (91346) or killed (5633) by the most harmful events which are the TORNADOES. The second one is the EXCESSIVE HEAT with a total of 8428. You can see it in the graphs.
This analysis also shows the events with the most economic consequences over United States are the FLOODS with 150 billions of Dollars in damages followed by the HURRICANES and TYPHOONS with near 72 billions of Dollars.
The following code was used to read the data from the file into R
library(plyr)
## Warning: package 'plyr' was built under R version 4.0.2
library(knitr)
## Warning: package 'knitr' was built under R version 4.0.2
library(ggplot2)
library(grid)
data<-read.csv("repdata_data_StormData.csv.bz2")
head(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
The following code was used to transform the data to answer the questions.This is for answering the second question.Here we are changing the values of K,M and B in expenses.
data$PROPDMGEXP<- revalue(data$PROPDMGEXP,c("K"="1000", "M"="1000000", "B"="1000000000"))
data$PROPDMGEXP<- revalue(data$PROPDMGEXP,c("K"="1000", "M"="1000000", "B"="1000000000"))
## The following `from` values were not present in `x`: K, M, B
For total damage expenses and total injuries and fatalities
data$PROPDMG <- as.numeric(data$PROPDMG)
data$PROPDMGEXP <- as.numeric(data$PROPDMGEXP)
## Warning: NAs introduced by coercion
data$CROPDMG <- as.numeric(data$CROPDMG)
data$CROPDMGEXP <- as.numeric(data$CROPDMGEXP)
## Warning: NAs introduced by coercion
data$FATALITIES<-as.numeric(data$FATALITIES)
data$INJURIES<-as.numeric(data$INJURIES)
data$total_propdmgexp <-data$PROPDMG * data$PROPDMGEXP
data$total_cropdmgexp <-data$CROPDMG * data$CROPDMGEXP
data$total_exp <- rowSums(data[,c("total_propdmgexp", "total_cropdmgexp")], na.rm=TRUE)
summary <- ddply(data,.(EVTYPE), summarize, propdamage = sum(total_exp), injuries= sum(INJURIES), fatalities = sum(FATALITIES), persdamage = sum(INJURIES)+sum(FATALITIES))
After summarizing all the data. Now just aligning all the data in decreasing order according to fatalities and injuries.So that we can know which EVTYPE are most harmful with respect to population health.
summary <- summary[order(summary$persdamage, decreasing = TRUE),]
head(summary)
## EVTYPE propdamage injuries fatalities persdamage
## 834 TORNADO 56925660991 91346 5633 96979
## 130 EXCESSIVE HEAT 7753700 6525 1903 8428
## 856 TSTM WIND 4484928440 6957 504 7461
## 170 FLOOD 144657709800 6789 470 7259
## 464 LIGHTNING 928659366 5230 816 6046
## 275 HEAT 1797000 2100 937 3037
Now plotting the data
plot1 <- ggplot(data=head(summary), aes(x=EVTYPE, y= persdamage)) + geom_bar(stat='identity') + labs(x = "event type", y = "injuries and fatalities")
print(plot1)
So we can see that Tornado are most harmful with respect to population health. Now for answering the second question.Aligning all the data in decreasing order according to property damages.So that we can know which EVTYPE have the greatest economic consequences
summary <- summary[order(summary$persdamage, decreasing = TRUE),]
head(summary)
## EVTYPE propdamage injuries fatalities persdamage
## 834 TORNADO 56925660991 91346 5633 96979
## 130 EXCESSIVE HEAT 7753700 6525 1903 8428
## 856 TSTM WIND 4484928440 6957 504 7461
## 170 FLOOD 144657709800 6789 470 7259
## 464 LIGHTNING 928659366 5230 816 6046
## 275 HEAT 1797000 2100 937 3037
Now plotting the data
plot2 <- ggplot(data=head(summary), aes(x=EVTYPE, y= propdamage)) + geom_bar(stat='identity') + labs(x = "event type", y = "property damage (in $USD)")
print(plot2)
We can see that the FLOODS are the events with the greatest economic consequences, followed by HURRICANES/TYPHOONES.
The following questions were answered by the data analysis:
1.Across the United States, which types of events are most harmful with respect to population health?
This question is easy to answer if you see the graph. The TORNADOES are the most harmful according to population health with more than 90.000 people hurt or dead.
*2.Across the United States, which types of events have the greatest economic consequences?
The FLOODS are the events with the greatest economic consequences, followed by HURRICANES/TYPHOONES.