This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
My data analysis here will address the following two primary questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
# getting data into R
setwd("C:\\") # setting up the working directory
if(!file.exists("./stormdata")){dir.create("./stormdata")} # creating a folder
setwd("C:\\stormdata") # setting up the working directory
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2" # setup the url
download.file(url, destfile = "./repdata_data_StormData.csv.bz2") # download the file in your local folder
storm = read.csv("repdata_data_StormData.csv.bz2") # read csv file
storm = tbl_df(storm) # creating compact data frame for viewing
popdamage = select(storm,c(8,23,24)) # selecting the data from RAW DATA
popdamage = popdamage %>% group_by(EVTYPE) %>% summarise(fatality_total = sum(FATALITIES), injury_total = sum(INJURIES))
dim(popdamage) # 985 unique events in storm data
## [1] 985 3
byfatality = popdamage %>% arrange(desc(fatality_total),desc(injury_total))
top5fatality = byfatality[1:5,]
## Source: local data frame [5 x 3]
##
## EVTYPE fatality_total injury_total
## (fctr) (dbl) (dbl)
## 1 TORNADO 5633 91346
## 2 EXCESSIVE HEAT 1903 6525
## 3 FLASH FLOOD 978 1777
## 4 HEAT 937 2100
## 5 LIGHTNING 816 5230
top5fatality$EVTYPE <- factor(top5fatality$EVTYPE, levels = top5fatality$EVTYPE[order(top5fatality$fatality_total, decreasing = TRUE)])
plot1 = ggplot(top5fatality, aes(x=EVTYPE, y=fatality_total), fill=fatality_total)+
geom_bar(stat = "identity", fill = 1:5)+
labs(x="Event Type",y="Total Fatalities",title="Top 5 event types by total Fatalities")+
theme(axis.text.x = element_text(angle = 45))
plot1
# % of total Fatalies accounted by top 5 events:
top5fatality_percentage = round((sum(top5fatality$fatality_total)/sum(popdamage$fatality_total))*100,1)
therestfatality_percentage = 100 - top5fatality_percentage
x = c(top5fatality_percentage, therestfatality_percentage)
names(x) = c("Fatalities_%_by_Top5","Fatalities_%_by_theRest")
x
## Fatalities_%_by_Top5 Fatalities_%_by_theRest
## 67.8 32.2
byinjury = popdamage %>% arrange(desc(injury_total),desc(fatality_total))
top5injury = byinjury[1:5,]
## Source: local data frame [5 x 3]
##
## EVTYPE fatality_total injury_total
## (fctr) (dbl) (dbl)
## 1 TORNADO 5633 91346
## 2 TSTM WIND 504 6957
## 3 FLOOD 470 6789
## 4 EXCESSIVE HEAT 1903 6525
## 5 LIGHTNING 816 5230
top5injury$EVTYPE <- factor(top5injury$EVTYPE, levels = top5injury$EVTYPE[order(top5injury$injury_total, decreasing = TRUE)])
plot2 = ggplot(top5injury, aes(x=EVTYPE, y=injury_total), fill=injury_total)+
geom_bar(stat = "identity", fill = 1:5)+
labs(x="Event Type",y="Total Injuries",title="Top 5 event types by total Injuries")+
theme(axis.text.x = element_text(angle = 45))
plot2
# % of total Injuries accounted by top 5 events:
top5injury_percentage = round((sum(top5injury$injury_total)/sum(popdamage$injury_total))*100,1)
therestinjury_percentage = 100 - top5injury_percentage
y = c(top5injury_percentage, therestinjury_percentage)
names(y) = c("Injuries_%_by_Top5","Injuries_%_by_theRest")
y
## Injuries_%_by_Top5 Injuries_%_by_theRest
## 83.1 16.9
# common events in top 5 Fatalities and Injuries:
common = which(top5fatality$EVTYPE %in% top5injury$EVTYPE)
common
## [1] 1 2 5
top3 = as.vector(top5fatality$EVTYPE[common])
top3
## [1] "TORNADO" "EXCESSIVE HEAT" "LIGHTNING"
ecodamage = select(storm, c(8,25,26,27,28))
ecodamage$PROPDMGEXP = as.character(ecodamage$PROPDMGEXP)
ecodamage$CROPDMGEXP = as.character(ecodamage$CROPDMGEXP)
# Ignoring data with unknown damage expressions.
ecodamage = ecodamage[ecodamage$PROPDMGEXP %in% c("h","H","k","K","m","M","b","B"),]
ecodamage = ecodamage[ecodamage$CROPDMGEXP %in% c("h","H","k","K","m","M","b","B"),]
# conversions for damage expressions:
ecodamage$PROPDMGEXP = as.numeric(with(ecodamage, ifelse(PROPDMGEXP=="k"|PROPDMGEXP=="K",10^3,(ifelse(PROPDMGEXP=="m"|PROPDMGEXP=="M",10^6,(ifelse(PROPDMGEXP=="b"|PROPDMGEXP=="B",10^9,"NA")))))))
ecodamage$CROPDMGEXP = as.numeric(with(ecodamage, ifelse(CROPDMGEXP=="k"|CROPDMGEXP=="K",10^3,(ifelse(CROPDMGEXP=="m"|CROPDMGEXP=="M",10^6,(ifelse(CROPDMGEXP=="b"|CROPDMGEXP=="B",10^9,"NA")))))))
# adding new column for total damages:
ecodamage$TotalDamages_inMillions = ((ecodamage$PROPDMG*ecodamage$PROPDMGEXP) + (ecodamage$CROPDMG*ecodamage$CROPDMGEXP))/(10^6)
ecodamage = ecodamage %>% group_by(EVTYPE) %>% summarise(TotalDamages_inMillions = round(sum(TotalDamages_inMillions),3)) %>% arrange(desc(TotalDamages_inMillions))
top5ecodamage = ecodamage[1:5,]
## Source: local data frame [5 x 2]
##
## EVTYPE TotalDamages_inMillions
## (fctr) (dbl)
## 1 FLOOD 138007.45
## 2 HURRICANE/TYPHOON 29348.17
## 3 TORNADO 16520.15
## 4 HURRICANE 12405.27
## 5 RIVER FLOOD 10108.37
top5ecodamage$EVTYPE <- factor(top5ecodamage$EVTYPE, levels = top5ecodamage$EVTYPE[order(top5ecodamage$TotalDamages_inMillions, decreasing = TRUE)])
plot3 = ggplot(top5ecodamage, aes(x=EVTYPE, y=TotalDamages_inMillions), fill=TotalDamages_inMillions)+
geom_bar(stat = "identity", fill = 1:5)+
labs(x="Event Type",y="Total Damages (in Millions of $)",title="Top 5 event types by total Economical Damages")+
theme(axis.text.x = element_text(angle = 45))
plot3
# % of total Injuries accounted by top 5 events:
top5eco_percentage = round((sum(top5ecodamage$TotalDamages_inMillions)/sum(ecodamage$TotalDamages_inMillions))*100,1)
theresteco_percentage = 100 - top5eco_percentage
z = c(top5eco_percentage, theresteco_percentage)
names(z) = c("Economical_Damage_%_by_Top5","Economical_Damage_%_by_theRest")
z
## Economical_Damage_%_by_Top5 Economical_Damage_%_by_theRest
## 78.9 21.1
There 985 unique events in the storm data. Our two goal are to see the most harmful of these for population damages in terms of Fatalitie/Injuries and Economical consequences.
Top-5 events causing highest no. of fatalities are:
## Source: local data frame [5 x 3]
##
## EVTYPE fatality_total injury_total
## (fctr) (dbl) (dbl)
## 1 TORNADO 5633 91346
## 2 EXCESSIVE HEAT 1903 6525
## 3 FLASH FLOOD 978 1777
## 4 HEAT 937 2100
## 5 LIGHTNING 816 5230
Top-5 events account for 67.8% of all fatalities.
Top-5 events causing highest no. of Injuries are:
## Source: local data frame [5 x 3]
##
## EVTYPE fatality_total injury_total
## (fctr) (dbl) (dbl)
## 1 TORNADO 5633 91346
## 2 TSTM WIND 504 6957
## 3 FLOOD 470 6789
## 4 EXCESSIVE HEAT 1903 6525
## 5 LIGHTNING 816 5230
Top-5 events account for 83.1% of all Injuries.
Three most harmful events causing population health damages in form of both fatalities and injuries in US based on storm data from year 1950 to end in November 2011 are:
1. TORNADO
2. EXCESSIVE HEAT
3. LIGHTNING“TORNADO” being no.1 in both “FATALITY” and “INJURY” list.
Top-5 events causing highest economical damages in form of property damages and crop damages are:
## Source: local data frame [5 x 2]
##
## EVTYPE TotalDamages_inMillions
## (fctr) (dbl)
## 1 FLOOD 138007.45
## 2 HURRICANE/TYPHOON 29348.17
## 3 TORNADO 16520.15
## 4 HURRICANE 12405.27
## 5 RIVER FLOOD 10108.37
“FLOOD” is the No.1 in the list with total damages of $138007 millions.
Top-5 events account for 78.9% of all economical damages.