Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
In this report, the US National Oceanic and Atmospheric Administration (NOAA) storm database is analyzed to find out what kinds of natural phenomena have been the most damaging events in the US between 1950 and end of November 2011, in terms of damages related to people, and public and private property.
That database tracks the characteristics of major storms and weather events in the US, including when and where they occur, as well as estimates of deaths, injuries, and property damage.
The steps taken in order to generate the results are as follows:
LOADING AND READING THE DATA USING download.file and read.csv
#loading the data directly from the link
if(!file.exists("./data")) {dir.create("./data")}
fileURL<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL,destfile="D:\\EDUCATION\\DATA SCIENCE JH\\Course5.csv.bz2",method="curl")
#reading the data
restData<-read.csv("D:\\EDUCATION\\DATA SCIENCE JH\\Course5.csv.bz2")
The data is now loaded. The next step is to take a rough look at the summary of the data using str
#converting the type of event to a Factor variable
restData<- transform(restData,EVTYPE=as.factor(EVTYPE))
str(restData)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
We can now see that there are 37 variables in the raw data. For the present analysis we only need a subset of the data, essentially containing the columns that have details of:
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.2
tidydata <- select(restData,EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)
The next step is to convert the property damage and crop damage to amount in dollars I have chosen to store the data related to ecomonical damage in a seperate data frame called ecnomicdmg
On the following web page “How To Handle Exponent Value of PROPDMGEXP and CROPDMGEXP” there is an explanation on how to understand and use PROPDMGEXP and CROPDMGEXP to calculate the property and crop damage assessment for each row of dfm.
There it was said that these are the possible values of CROPDMGEXP and PROPDMGEXP: H, h, K, k, M, m, B, b, +, -, ?, 0, 1, 2, 3, 4, 5, 6, 7, 8, and blank-character. And these are the equivalences:
ecnomicdmg<- select(tidydata,EVTYPE,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)
ecnomicdmg$PROPDMGDOL = 0
ecnomicdmg[ecnomicdmg$PROPDMGEXP == "H", ]$PROPDMGDOL = ecnomicdmg[ecnomicdmg$PROPDMGEXP == "H", ]$PROPDMG * 10^2
ecnomicdmg[ecnomicdmg$PROPDMGEXP == "K", ]$PROPDMGDOL = ecnomicdmg[ecnomicdmg$PROPDMGEXP == "K", ]$PROPDMG * 10^3
ecnomicdmg[ecnomicdmg$PROPDMGEXP == "M", ]$PROPDMGDOL = ecnomicdmg[ecnomicdmg$PROPDMGEXP == "M", ]$PROPDMG * 10^6
ecnomicdmg[ecnomicdmg$PROPDMGEXP == "B", ]$PROPDMGDOL = ecnomicdmg[ecnomicdmg$PROPDMGEXP == "B", ]$PROPDMG * 10^9
ecnomicdmg$CROPDMGDOL = 0
ecnomicdmg[ecnomicdmg$CROPDMGEXP == "H", ]$CROPDMGDOL = ecnomicdmg[ecnomicdmg$CROPDMGEXP == "H", ]$CROPDMG * 10^2
ecnomicdmg[ecnomicdmg$CROPDMGEXP == "K", ]$CROPDMGDOL = ecnomicdmg[ecnomicdmg$CROPDMGEXP == "K", ]$CROPDMG * 10^3
ecnomicdmg[ecnomicdmg$CROPDMGEXP == "M", ]$CROPDMGDOL = ecnomicdmg[ecnomicdmg$CROPDMGEXP == "M", ]$CROPDMG * 10^6
ecnomicdmg[ecnomicdmg$CROPDMGEXP == "B", ]$CROPDMGDOL = ecnomicdmg[ecnomicdmg$CROPDMGEXP == "B", ]$CROPDMG * 10^9
Let’s now take a look at the data
head(ecnomicdmg)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP PROPDMGDOL CROPDMGDOL
## 1 TORNADO 25.0 K 0 25000 0
## 2 TORNADO 2.5 K 0 2500 0
## 3 TORNADO 25.0 K 0 25000 0
## 4 TORNADO 2.5 K 0 2500 0
## 5 TORNADO 2.5 K 0 2500 0
## 6 TORNADO 2.5 K 0 2500 0
head(tidydata)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
Our data is now filtered and processed The next step done is to find out solutions to the concerns of this analysis
The steps taken to answer this question: - Finding total number of fatalities and injuries respectively for every type of event that has occured - Plotting the barplots for fatalities and injuries vs type of event respectivelu for the Top 10 values
sumFatalities <- aggregate(FATALITIES~EVTYPE,tidydata,sum)
sumFatalities<- sumFatalities[sumFatalities[,"FATALITIES"]!=0,]
sumFatalities<- arrange(sumFatalities,desc(FATALITIES))
sumFatalities<- sumFatalities[1:14,]
sumFatalities$EVTYPE<- factor(sumFatalities$EVTYPE, levels= sumFatalities$EVTYPE)
sumInj <- aggregate(INJURIES~EVTYPE,tidydata,sum)
sumInj<- sumInj[sumInj[,"INJURIES"]!=0,]
sumInj<- arrange(sumInj,desc(INJURIES))
sumInj<- arrange(sumInj,desc(INJURIES))
sumInj<- sumInj[1:14,]
sumInj$EVTYPE<- factor(sumInj$EVTYPE, levels= sumInj$EVTYPE)
ggp<-ggplot(sumFatalities,aes(x=EVTYPE,y= FATALITIES))+
geom_bar(stat = "identity", fill = "red", las = 2) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("EVENT TYPE") + ylab("FATALITIES") + ggtitle("Number of fatalities by top 14 Weather Events")
## Warning: Ignoring unknown parameters: las
ggp
ggp<-ggplot(sumInj,aes(x=EVTYPE,y= INJURIES))+
geom_bar(stat = "identity", fill = "orange", las = 2) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("EVENT TYPE") + ylab("INJURIES") + ggtitle("Number of INJURIES by top 14 Weather Events")
## Warning: Ignoring unknown parameters: las
ggp
From this bar-graph it is very clear that Tordanoes are without doubt the most damaging in terms of fatality as well as in term of injuries.
The steps taken to answer this question: - Finding total amount of damage done to prpoperty and crop in terms of money in dollars for every type of event that has occured - Plotting the barplot for damage in dollars vs type of event for the Top 15 values of damage
#Adding the amount in dollars for propery damage and crop damage
ecnomicdmg$SUMOFPC<-ecnomicdmg$PROPDMGDOL+ecnomicdmg$CROPDMGDOL
#Total damage for each type of weather event
sumpc<- aggregate(SUMOFPC~EVTYPE,ecnomicdmg,sum)
#arranding the damage in descending order
sumpc<- arrange(sumpc,desc(SUMOFPC))
sumpc<- sumpc[1:14,]
sumpc$EVTYPE<- factor(sumpc$EVTYPE, levels= sumpc$EVTYPE)
ggp<-ggplot(sumpc,aes(x=EVTYPE,y= SUMOFPC))+
geom_bar(stat = "identity", fill = "green", las = 2) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("EVENT TYPE") + ylab("PROPERTY AND CROP DAMAGE") + ggtitle("Property and crop damage for top 14 weather events")
## Warning: Ignoring unknown parameters: las
ggp
From this bar-graph it is very clear that Floods are without doubt the most damaging in terms of the economic consequences for the country of US.