This analysis goes through the data from National Weather Service in the United States to show what weather conditions have the most effect on The populatoin and the Economy. To do so, and examine the effect on the population, the dataset is subsetted for the number of Injuries and Fatalities. The data shows that Tornados have the most effect. As for the pupulation, the dataset is used to extract the economic costs of diffent weather conditions in US dollars. It appears that Floods have the most economic cost.
To answer that question, first, to load the needed libraries and the data from the csv file:
data <- read.csv("repdata-data-StormData.csv")
Now to create a subset with the injuries count and then take the top 10 at this list.
injury_data <- aggregate(data$INJURIES, list(type = data$EVTYPE), sum)
injury_data <- injury_data[order(-injury_data$x),]
injury_data <- injury_data[1:10,]
Now to create a subset with the Fatalities count and then take the top 10 at this list.
fatality_data <- aggregate(data$FATALITIES, list(type = data$EVTYPE), sum)
fatality_data <- fatality_data[order(-fatality_data$x),]
fatality_data <- fatality_data[1:10,]
to answer that question we extract the total cost in US dollars from the datasets. First the value is transformed from characters and letters to numbers in billion of dollars in PROPDMGTOTAL and CROPDMGTOTAL.
library(plyr)
unique(data$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
data$PROPDMGEXP <- mapvalues(data$PROPDMGEXP, from =
c("K", "M","", "B", "m", "+", "0", "5", "6", "?", "4", "2", "3", "h", "7", "H", "-", "1", "8"), to =
c(10^3, 10^6, 1, 10^9, 10^6, 0,1,10^5, 10^6, 0, 10^4, 10^2, 10^3, 10^2, 10^7, 10^2, 0, 10, 10^8))
data$PROPDMGEXP <- as.numeric(as.character(data$PROPDMGEXP))
PROPDMGTOTAL <- (data$PROPDMG * data$PROPDMGEXP)/1000000000
unique(data$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
data$CROPDMGEXP <- mapvalues(data$CROPDMGEXP,
from = c("","M", "K", "m", "B", "?", "0", "k","2"),
to = c(1,10^6, 10^3, 10^6, 10^9, 0, 1, 10^3, 10^2))
data$CROPDMGEXP <- as.numeric(as.character(data$CROPDMGEXP))
CROPDMGTOTAL <- (data$CROPDMG * data$CROPDMGEXP) / 1000000000
The total economic cost is then added up and the top 10 weather conditions are ordered.
DMG <- data.frame(EVTYPE = data$EVTYPE, PROP = PROPDMGTOTAL, CROP = CROPDMGTOTAL)
DMG$TOTALDMGBILLION <- DMG$PROP + DMG$CROP
DMGTYPE <- aggregate(DMG$TOTALDMGBILLION, list(TYPE = DMG$EVTYPE), sum)
DMGTYPE <- DMGTYPE[order(-DMGTYPE$x),]
DMGTYPE <- DMGTYPE[1:10,]
Plot of the Total Count of Injuries:
library(ggplot2)
ggplot(injury_data, aes(type, x, group = 1))+geom_point() + geom_line() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(x = "Event Type", y = "Count", title = "Top 6 Events Causing injuries")
It clearly shows that Tornados have the most effect with regard to the total number of injureis.
Plot of the total count of Fatalities:
ggplot(fatality_data, aes(type, x, group = 1))+geom_point() + geom_line() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(x = "Event Type", y = "Count", title = "Top 6 Events Causing Fatalities")
It clearly shows that Tornados have the most effect with regard to the total number of fatalities.
ggplot(DMGTYPE, aes(TYPE, x, group = 1)) + geom_point() + geom_line() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(x = "Event Type", y = "Total Damage in Billioin of Dollars", title = "Top 10 Events Causing The Most Economic Damage in Billions of Dollars")