Background

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Synposis

We used this data to analyze what the cost of these weather events are, both in terms of the Human cost which includes fatalities and injuries and also in terms of the economic cost which includes property and crop damage.

Basis our analysis, we can conclude that Tornadoes cost the maximum human damage in terms of fatalities and injuries and that floods cause the maximum economic damage

Data processing

The initial step would be to load the raw data file. We would be doing this using the read.csv function

basedata <- read.csv("Stormdata.csv") # Reading the data 

Once the data is read in we would be checking the data to see what the construct is

str(basedata)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436774 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

checking the names of the columns in the dataframe

names(basedata)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

I want to use the following columns

The column numbers are 8 for the event type and 23 - 28 for the others. I would be creating a fresh data frame with only these columns and all the observations. Would also be checking the dimension of the data frame once it is created

storm <- basedata[,c(8,23:28)] # Creating a new data frame with required variables
dim(storm)
## [1] 902297      7

So we are left with 7 variables on which we will base our analysis and 902K observations

Creating a table which would be aggregating the Fatalities and Injuries by Events

suppressMessages(library(dplyr))
## Warning: package 'dplyr' was built under R version 3.4.2
fatality <- aggregate(cbind(FATALITIES,INJURIES) ~ EVTYPE, data = storm,
                      sum, na.rm = TRUE)
fatality <- arrange(fatality, desc(FATALITIES + INJURIES))
## Warning: package 'bindrcpp' was built under R version 3.4.2
head(fatality)
##           EVTYPE FATALITIES INJURIES
## 1        TORNADO       5633    91346
## 2 EXCESSIVE HEAT       1903     6525
## 3      TSTM WIND        504     6957
## 4          FLOOD        470     6789
## 5      LIGHTNING        816     5230
## 6           HEAT        937     2100

I want to further process the data to find the total value for crop damages and property damages

I would be classifying the data using the notations given in the columns PROPDMGEXP and CROPDMGEXP

# checking the column PROPDMGEXP
table(storm$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330
# checking the columns CROPDMGEXP
table(storm$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994

Basis this data I would be updating the values of Property damage and Crop Damage. These are the 2 variables which I would be using to estimate economic cost of the climatic events

Updating property damage data

storm$PROPDMG[storm$PROPDMGEXP == "K"] <- storm$PROPDMG[storm$PROPDMGEXP == "K"] * 1000
storm$PROPDMG[storm$PROPDMGEXP == "M"] <- storm$PROPDMG[storm$PROPDMGEXP == "M"] * (10^6)
storm$PROPDMG[storm$PROPDMGEXP == "H"] <- storm$PROPDMG[storm$PROPDMGEXP == "H"] * 100
storm$PROPDMG[storm$PROPDMGEXP == "h"] <- storm$PROPDMG[storm$PROPDMGEXP == "h"] * 100
storm$PROPDMG[storm$PROPDMGEXP == ""] <- storm$PROPDMG[storm$PROPDMGEXP == ""] * 1
storm$PROPDMG[storm$PROPDMGEXP == "B"] <- storm$PROPDMG[storm$PROPDMGEXP == "B"] * (10^9)
storm$PROPDMG[storm$PROPDMGEXP == "m"] <- storm$PROPDMG[storm$PROPDMGEXP == "m"] * (10^6)
storm$PROPDMG[storm$PROPDMGEXP == "0"] <- storm$PROPDMG[storm$PROPDMGEXP == "0"] * 1
storm$PROPDMG[storm$PROPDMGEXP == "1"] <- storm$PROPDMG[storm$PROPDMGEXP == "1"] * 10
storm$PROPDMG[storm$PROPDMGEXP == "2"] <- storm$PROPDMG[storm$PROPDMGEXP == "2"] * 100
storm$PROPDMG[storm$PROPDMGEXP == "3"] <- storm$PROPDMG[storm$PROPDMGEXP == "3"] * 1000
storm$PROPDMG[storm$PROPDMGEXP == "4"] <- storm$PROPDMG[storm$PROPDMGEXP == "4"] * (10^4)
storm$PROPDMG[storm$PROPDMGEXP == "5"] <- storm$PROPDMG[storm$PROPDMGEXP == "5"] * (10^5)
storm$PROPDMG[storm$PROPDMGEXP == "6"] <- storm$PROPDMG[storm$PROPDMGEXP == "6"] * (10^6)
storm$PROPDMG[storm$PROPDMGEXP == "7"] <- storm$PROPDMG[storm$PROPDMGEXP == "7"] * (10^7)
storm$PROPDMG[storm$PROPDMGEXP == "8"] <- storm$PROPDMG[storm$PROPDMGEXP == "8"] * (10^8)
storm$PROPDMG[storm$PROPDMGEXP == "+"] <- 0
storm$PROPDMG[storm$PROPDMGEXP == "-"] <- 0
storm$PROPDMG[storm$PROPDMGEXP == "?"] <- 0

Updating Crop Damage Data

storm$CROPDMG[storm$CROPDMGEXP == "M"] <- storm$CROPDMG[storm$CROPDMGEXP == "M"] * (10^6)
storm$CROPDMG[storm$CROPDMGEXP == "K"] <- storm$CROPDMG[storm$CROPDMGEXP == "K"] * 1000
storm$CROPDMG[storm$CROPDMGEXP == "m"] <- storm$CROPDMG[storm$CROPDMGEXP == "m"] * (10^6)
storm$CROPDMG[storm$CROPDMGEXP == "B"] <- storm$CROPDMG[storm$CROPDMGEXP == "B"] * (10^9)
storm$CROPDMG[storm$CROPDMGEXP == "k"] <- storm$CROPDMG[storm$CROPDMGEXP == "k"] * 1000
storm$CROPDMG[storm$CROPDMGEXP == "0"] <- storm$CROPDMG[storm$CROPDMGEXP == "0"] * 1
storm$CROPDMG[storm$CROPDMGEXP == "2"] <- storm$CROPDMG[storm$CROPDMGEXP == "2"] * 100
storm$CROPDMG[storm$CROPDMGEXP == ""] <- storm$CROPDMG[storm$CROPDMGEXP == ""] * 1
storm$CROPDMG[storm$CROPDMGEXP == "?"] <- 0

Checking the data

head(storm)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15   25000          K       0           
## 2 TORNADO          0        0    2500          K       0           
## 3 TORNADO          0        2   25000          K       0           
## 4 TORNADO          0        2    2500          K       0           
## 5 TORNADO          0        2    2500          K       0           
## 6 TORNADO          0        6    2500          K       0

Creating a new variable which is an addition of crop damage and property damage

storm$ECON <- storm$PROPDMG + storm$CROPDMG

Aggregating the results of Economic Damage by Events

economic <- aggregate(storm$ECON, by = list(storm$EVTYPE), sum, na.rm = TRUE)
names(economic) <- c("Event", "Damage1")
economic <- arrange(economic, desc(Damage1))
economic$Damage <- as.integer(economic$Damage1/(10^9))
economic$Damage1 <- NULL
head(economic)
##               Event Damage
## 1             FLOOD    150
## 2 HURRICANE/TYPHOON     71
## 3           TORNADO     57
## 4       STORM SURGE     43
## 5              HAIL     18
## 6       FLASH FLOOD     18

Results

For results we would be using the dataframes we had created earlier

Human Cost

We would first display the results of the Human cost, the fatalities and injuries associated with the events

fatal <- fatality[1:20,] # creating a new data frame with the top reasons
head(fatal)
##           EVTYPE FATALITIES INJURIES
## 1        TORNADO       5633    91346
## 2 EXCESSIVE HEAT       1903     6525
## 3      TSTM WIND        504     6957
## 4          FLOOD        470     6789
## 5      LIGHTNING        816     5230
## 6           HEAT        937     2100

Now we would be plotting a chart with this

barplot(t(fatal[,-1]), names.arg = fatal$EVTYPE,
        cex.names = 0.5,las =2, cex.axis  =0.7,
        beside = 0, col = c("light blue","pink"),
        ylab = "Count", main = "Fatalities and Injuries by Event")

This shows that Tornadoes are responsible for the highest fatalities and injuries. This is followed by Excessive Heat, Wind and Flood.

Economic Cost

Now we would be checking the economic cost associated with the events

economy <- economic[1:20,]  # Creating a new data frame with the top reasons
head(economy)
##               Event Damage
## 1             FLOOD    150
## 2 HURRICANE/TYPHOON     71
## 3           TORNADO     57
## 4       STORM SURGE     43
## 5              HAIL     18
## 6       FLASH FLOOD     18

Now we would be using this to create a chart outlining the economic cost

barplot(t(economy[,-1]), names.arg = economy$Event,
        cex.names = 0.5, las = 2,
        col = "green", ylab = "Damage in Billions", 
        main = "Economic Damage in Billions by Event ")