Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
We used this data to analyze what the cost of these weather events are, both in terms of the Human cost which includes fatalities and injuries and also in terms of the economic cost which includes property and crop damage.
Basis our analysis, we can conclude that Tornadoes cost the maximum human damage in terms of fatalities and injuries and that floods cause the maximum economic damage
The initial step would be to load the raw data file. We would be doing this using the read.csv function
basedata <- read.csv("Stormdata.csv") # Reading the data
Once the data is read in we would be checking the data to see what the construct is
str(basedata)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436774 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
checking the names of the columns in the dataframe
names(basedata)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
I want to use the following columns
The column numbers are 8 for the event type and 23 - 28 for the others. I would be creating a fresh data frame with only these columns and all the observations. Would also be checking the dimension of the data frame once it is created
storm <- basedata[,c(8,23:28)] # Creating a new data frame with required variables
dim(storm)
## [1] 902297 7
So we are left with 7 variables on which we will base our analysis and 902K observations
Creating a table which would be aggregating the Fatalities and Injuries by Events
suppressMessages(library(dplyr))
## Warning: package 'dplyr' was built under R version 3.4.2
fatality <- aggregate(cbind(FATALITIES,INJURIES) ~ EVTYPE, data = storm,
sum, na.rm = TRUE)
fatality <- arrange(fatality, desc(FATALITIES + INJURIES))
## Warning: package 'bindrcpp' was built under R version 3.4.2
head(fatality)
## EVTYPE FATALITIES INJURIES
## 1 TORNADO 5633 91346
## 2 EXCESSIVE HEAT 1903 6525
## 3 TSTM WIND 504 6957
## 4 FLOOD 470 6789
## 5 LIGHTNING 816 5230
## 6 HEAT 937 2100
I want to further process the data to find the total value for crop damages and property damages
I would be classifying the data using the notations given in the columns PROPDMGEXP and CROPDMGEXP
# checking the column PROPDMGEXP
table(storm$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 424665 7 11330
# checking the columns CROPDMGEXP
table(storm$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
Basis this data I would be updating the values of Property damage and Crop Damage. These are the 2 variables which I would be using to estimate economic cost of the climatic events
Updating property damage data
storm$PROPDMG[storm$PROPDMGEXP == "K"] <- storm$PROPDMG[storm$PROPDMGEXP == "K"] * 1000
storm$PROPDMG[storm$PROPDMGEXP == "M"] <- storm$PROPDMG[storm$PROPDMGEXP == "M"] * (10^6)
storm$PROPDMG[storm$PROPDMGEXP == "H"] <- storm$PROPDMG[storm$PROPDMGEXP == "H"] * 100
storm$PROPDMG[storm$PROPDMGEXP == "h"] <- storm$PROPDMG[storm$PROPDMGEXP == "h"] * 100
storm$PROPDMG[storm$PROPDMGEXP == ""] <- storm$PROPDMG[storm$PROPDMGEXP == ""] * 1
storm$PROPDMG[storm$PROPDMGEXP == "B"] <- storm$PROPDMG[storm$PROPDMGEXP == "B"] * (10^9)
storm$PROPDMG[storm$PROPDMGEXP == "m"] <- storm$PROPDMG[storm$PROPDMGEXP == "m"] * (10^6)
storm$PROPDMG[storm$PROPDMGEXP == "0"] <- storm$PROPDMG[storm$PROPDMGEXP == "0"] * 1
storm$PROPDMG[storm$PROPDMGEXP == "1"] <- storm$PROPDMG[storm$PROPDMGEXP == "1"] * 10
storm$PROPDMG[storm$PROPDMGEXP == "2"] <- storm$PROPDMG[storm$PROPDMGEXP == "2"] * 100
storm$PROPDMG[storm$PROPDMGEXP == "3"] <- storm$PROPDMG[storm$PROPDMGEXP == "3"] * 1000
storm$PROPDMG[storm$PROPDMGEXP == "4"] <- storm$PROPDMG[storm$PROPDMGEXP == "4"] * (10^4)
storm$PROPDMG[storm$PROPDMGEXP == "5"] <- storm$PROPDMG[storm$PROPDMGEXP == "5"] * (10^5)
storm$PROPDMG[storm$PROPDMGEXP == "6"] <- storm$PROPDMG[storm$PROPDMGEXP == "6"] * (10^6)
storm$PROPDMG[storm$PROPDMGEXP == "7"] <- storm$PROPDMG[storm$PROPDMGEXP == "7"] * (10^7)
storm$PROPDMG[storm$PROPDMGEXP == "8"] <- storm$PROPDMG[storm$PROPDMGEXP == "8"] * (10^8)
storm$PROPDMG[storm$PROPDMGEXP == "+"] <- 0
storm$PROPDMG[storm$PROPDMGEXP == "-"] <- 0
storm$PROPDMG[storm$PROPDMGEXP == "?"] <- 0
Updating Crop Damage Data
storm$CROPDMG[storm$CROPDMGEXP == "M"] <- storm$CROPDMG[storm$CROPDMGEXP == "M"] * (10^6)
storm$CROPDMG[storm$CROPDMGEXP == "K"] <- storm$CROPDMG[storm$CROPDMGEXP == "K"] * 1000
storm$CROPDMG[storm$CROPDMGEXP == "m"] <- storm$CROPDMG[storm$CROPDMGEXP == "m"] * (10^6)
storm$CROPDMG[storm$CROPDMGEXP == "B"] <- storm$CROPDMG[storm$CROPDMGEXP == "B"] * (10^9)
storm$CROPDMG[storm$CROPDMGEXP == "k"] <- storm$CROPDMG[storm$CROPDMGEXP == "k"] * 1000
storm$CROPDMG[storm$CROPDMGEXP == "0"] <- storm$CROPDMG[storm$CROPDMGEXP == "0"] * 1
storm$CROPDMG[storm$CROPDMGEXP == "2"] <- storm$CROPDMG[storm$CROPDMGEXP == "2"] * 100
storm$CROPDMG[storm$CROPDMGEXP == ""] <- storm$CROPDMG[storm$CROPDMGEXP == ""] * 1
storm$CROPDMG[storm$CROPDMGEXP == "?"] <- 0
Checking the data
head(storm)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25000 K 0
## 2 TORNADO 0 0 2500 K 0
## 3 TORNADO 0 2 25000 K 0
## 4 TORNADO 0 2 2500 K 0
## 5 TORNADO 0 2 2500 K 0
## 6 TORNADO 0 6 2500 K 0
Creating a new variable which is an addition of crop damage and property damage
storm$ECON <- storm$PROPDMG + storm$CROPDMG
Aggregating the results of Economic Damage by Events
economic <- aggregate(storm$ECON, by = list(storm$EVTYPE), sum, na.rm = TRUE)
names(economic) <- c("Event", "Damage1")
economic <- arrange(economic, desc(Damage1))
economic$Damage <- as.integer(economic$Damage1/(10^9))
economic$Damage1 <- NULL
head(economic)
## Event Damage
## 1 FLOOD 150
## 2 HURRICANE/TYPHOON 71
## 3 TORNADO 57
## 4 STORM SURGE 43
## 5 HAIL 18
## 6 FLASH FLOOD 18
For results we would be using the dataframes we had created earlier
We would first display the results of the Human cost, the fatalities and injuries associated with the events
fatal <- fatality[1:20,] # creating a new data frame with the top reasons
head(fatal)
## EVTYPE FATALITIES INJURIES
## 1 TORNADO 5633 91346
## 2 EXCESSIVE HEAT 1903 6525
## 3 TSTM WIND 504 6957
## 4 FLOOD 470 6789
## 5 LIGHTNING 816 5230
## 6 HEAT 937 2100
Now we would be plotting a chart with this
barplot(t(fatal[,-1]), names.arg = fatal$EVTYPE,
cex.names = 0.5,las =2, cex.axis =0.7,
beside = 0, col = c("light blue","pink"),
ylab = "Count", main = "Fatalities and Injuries by Event")
This shows that Tornadoes are responsible for the highest fatalities and injuries. This is followed by Excessive Heat, Wind and Flood.
Now we would be checking the economic cost associated with the events
economy <- economic[1:20,] # Creating a new data frame with the top reasons
head(economy)
## Event Damage
## 1 FLOOD 150
## 2 HURRICANE/TYPHOON 71
## 3 TORNADO 57
## 4 STORM SURGE 43
## 5 HAIL 18
## 6 FLASH FLOOD 18
Now we would be using this to create a chart outlining the economic cost
barplot(t(economy[,-1]), names.arg = economy$Event,
cex.names = 0.5, las = 2,
col = "green", ylab = "Damage in Billions",
main = "Economic Damage in Billions by Event ")