Storm Data is an official publication of the National Oceanic and Atmospheric Administration (NOAA), which documents the occurrence of storms and other significant weather phenomena having sufficient intensity to cause loss of life, injuries, significant property damage, and/or disruption to commerce includes weather events information, like when and where they occur, estimates of any fatalities, injuries and property damage. This analysis is focused on visualizing the qualitative impact, various type of sever weather conditions had on two major aspects of people’s life in affected areas. The analysis is split into two parts. One measures an impact of various types of weather related events on amount of fatalities and injuries where the second one gives a view on economic impact (expressed in USD) such events had in the period covered by a data set.
Questions defined for this analysis to answer are:
The Storm Data file which is subject of the analysis can be found here
The Storm Data documentation is available here
Downloading the Storm Data base and reading it into R
if(!file.exists("./2FStormData.csv.bz2")){
url<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile = "./2FStormData.csv.bz2", mode = wb, method = "curl")
}
df<- read.csv("./2FStormData.csv.bz2",stringsAsFactors = FALSE)
Loading required packages
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(knitr)
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.2.4
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
Data set dimensions
dim(df)
## [1] 902297 37
Summary of data set
summary(df)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
##
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0.0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31.0 Class :character Class :character Class :character
## Median : 75.0 Mode :character Mode :character Mode :character
## Mean :100.6
## 3rd Qu.:131.0
## Max. :873.0
##
## BGN_RANGE BGN_AZI BGN_LOCATI
## Min. : 0.000 Length:902297 Length:902297
## 1st Qu.: 0.000 Class :character Class :character
## Median : 0.000 Mode :character Mode :character
## Mean : 1.484
## 3rd Qu.: 1.000
## Max. :3749.000
##
## END_DATE END_TIME COUNTY_END COUNTYENDN
## Length:902297 Length:902297 Min. :0 Mode:logical
## Class :character Class :character 1st Qu.:0 NA's:902297
## Mode :character Mode :character Median :0
## Mean :0
## 3rd Qu.:0
## Max. :0
##
## END_RANGE END_AZI END_LOCATI
## Min. : 0.0000 Length:902297 Length:902297
## 1st Qu.: 0.0000 Class :character Class :character
## Median : 0.0000 Mode :character Mode :character
## Mean : 0.9862
## 3rd Qu.: 0.0000
## Max. :925.0000
##
## LENGTH WIDTH F MAG
## Min. : 0.0000 Min. : 0.000 Min. :0.0 Min. : 0.0
## 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.:0.0 1st Qu.: 0.0
## Median : 0.0000 Median : 0.000 Median :1.0 Median : 50.0
## Mean : 0.2301 Mean : 7.503 Mean :0.9 Mean : 46.9
## 3rd Qu.: 0.0000 3rd Qu.: 0.000 3rd Qu.:1.0 3rd Qu.: 75.0
## Max. :2315.0000 Max. :4400.000 Max. :5.0 Max. :22000.0
## NA's :843563
## FATALITIES INJURIES PROPDMG
## Min. : 0.0000 Min. : 0.0000 Min. : 0.00
## 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.00
## Median : 0.0000 Median : 0.0000 Median : 0.00
## Mean : 0.0168 Mean : 0.1557 Mean : 12.06
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 0.50
## Max. :583.0000 Max. :1700.0000 Max. :5000.00
##
## PROPDMGEXP CROPDMG CROPDMGEXP
## Length:902297 Min. : 0.000 Length:902297
## Class :character 1st Qu.: 0.000 Class :character
## Mode :character Median : 0.000 Mode :character
## Mean : 1.527
## 3rd Qu.: 0.000
## Max. :990.000
##
## WFO STATEOFFIC ZONENAMES LATITUDE
## Length:902297 Length:902297 Length:902297 Min. : 0
## Class :character Class :character Class :character 1st Qu.:2802
## Mode :character Mode :character Mode :character Median :3540
## Mean :2875
## 3rd Qu.:4019
## Max. :9706
## NA's :47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## Min. :-14451 Min. : 0 Min. :-14455 Length:902297
## 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 8707 Median : 0 Median : 0 Mode :character
## Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. : 17124 Max. :9706 Max. :106220
## NA's :40
## REFNUM
## Min. : 1
## 1st Qu.:225575
## Median :451149
## Mean :451149
## 3rd Qu.:676723
## Max. :902297
##
List of variables
names(df)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
After an analysis of the entire spectrum of variables available in the data set, two of them : FATALITIES and INJURIES have been classified as having the most harmful impact on population health.
From the original data set, we select three variables (EVTYPE, FATALITIES, INJURIES). Then we add up the FATALITIES and INJURIES numbers and put them in the new Fatalities_and_Injuries column which is next arranged in descending order. We store the results in df_x data frame.
df_x <- df %>%
select(EVTYPE, FATALITIES, INJURIES) %>%
mutate(Fatalities_and_Injuries = FATALITIES + INJURIES) %>%
arrange(desc(Fatalities_and_Injuries))
In the next step, we aggregate Fatalities_and_Injuries variable from df_x by list of EVTYPE using sum function. We arrange the results in descending order and take the top 15 obervations and store the results in df_y data frame.
df_y <- aggregate(x=df_x$Fatalities_and_Injuries, by = list(df_x$EVTYPE ), FUN = sum) %>%
select(EVTYPE = Group.1, Fatalities_and_Injuries = x) %>%
arrange( desc(Fatalities_and_Injuries)) %>%
head(15)
Data frame df_y represents the final data processing step for the Part I, responsible for identifying which types of events are most harmful with respect to population health.
After an analysis of the entire spectrum of variable available in the data set, two of them : PROPDMG and CROPDMG have been classified as having the greatest economic consequences.
g <- ggplot(df_y, aes(EVTYPE, Fatalities_and_Injuries, color = EVTYPE, fill = EVTYPE))
g + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 45, hjust = 1))
According the to the analysis results visualised above the top three events that are most harmful with respect to population health are: 1.TORNADO 2.EXCESSIVE HEAT 3.TSTM WIND
combined_dmg_plot <- ggplot(combined_dmg, aes(EVTYPE, DAMAGEUSD, color = EVTYPE, fill = EVTYPE)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ggtitle("Combined property and crop related damages in USD")
combined_dmg_plot
According the to the analysis results visualised above the top three events that have the greatest economic consequences are: 1.Flood 2.HURRICANE/TYPHOON 3.HAIL