Damage Analysis for Storm and Weather Events

Synopsis

The objective of this project is to analyse the Storm and other weather events and to find out which is most harmful in terms damages caused to people and properties.

The dataset used for this project is U.S. National Oceanic and Atmospheric Administration’s (NOAA) Storm data.

The data consist of about 902297 rows and 37 columns. So we need to process the data to have only the required columns.

After data processing, the data has been aggregated to the event level and damage estimates were calculated.

The result shows that most harmful event for the people is Tornado and the most harmful event for the property is Flood.

Data Processing

Data

The dataset used for this project is U.S. National Oceanic and Atmospheric Administration’s (NOAA) Storm data. The data tracks characterstics of major storms and weather events in United States, including when and where it occured, as well as any fatalities, injuries and < property Damage

Loading Data

Data is already downloaded, unzipped and it is in the working directory. Name of the file is “repdata_data_StormData.csv”.

Lets load the data in R

fileName <- "repdata-data-StormData.csv"
if(file.exists(fileName)){
    data <- read.csv(fileName)
}else{
    stop("Data is not there in the working direcorty. Please download the data")
}
dim(data)
## [1] 902297     37
names(data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Processing Data

To answer the questions asked we just need Event type and damages corresponding to those events. So lets take only relevant details and discard others.

data_small <- data[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
summary(data_small)
##                EVTYPE         FATALITIES          INJURIES        
##  HAIL             :288661   Min.   :  0.0000   Min.   :   0.0000  
##  TSTM WIND        :219940   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  THUNDERSTORM WIND: 82563   Median :  0.0000   Median :   0.0000  
##  TORNADO          : 60652   Mean   :  0.0168   Mean   :   0.1557  
##  FLASH FLOOD      : 54277   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  FLOOD            : 25326   Max.   :583.0000   Max.   :1700.0000  
##  (Other)          :170878                                         
##     PROPDMG          PROPDMGEXP        CROPDMG          CROPDMGEXP    
##  Min.   :   0.00          :465934   Min.   :  0.000          :618413  
##  1st Qu.:   0.00   K      :424665   1st Qu.:  0.000   K      :281832  
##  Median :   0.00   M      : 11330   Median :  0.000   M      :  1994  
##  Mean   :  12.06   0      :   216   Mean   :  1.527   k      :    21  
##  3rd Qu.:   0.50   B      :    40   3rd Qu.:  0.000   0      :    19  
##  Max.   :5000.00   5      :    28   Max.   :990.000   B      :     9  
##                    (Other):    84                     (Other):     9

The fields, PROPDMGEXP and CROPDMGEXP are expressed in terms of H,K,M,B. So lets change those fields as numeric and calculate the actual property and corp damage. Numeric value for the respective codes are as follows, * H - 100 or 10^2 * K - 1000 or 10^3 * M - 1000000 or 10^6 * B - 1000000000 or 10^9

#Porperty Damage Estimation
data_small[data_small$PROPDMGEXP %in% c("H","h"), ]$PROPDMG <- data_small[data_small$PROPDMGEXP %in% c("H","h"),]$PROPDMG * (10^2)
data_small[data_small$PROPDMGEXP %in% c("K","k"), ]$PROPDMG <- data_small[data_small$PROPDMGEXP %in% c("K","k"),]$PROPDMG * (10^3)
data_small[data_small$PROPDMGEXP %in% c("M","m"), ]$PROPDMG <- data_small[data_small$PROPDMGEXP %in% c("M","m"),]$PROPDMG * (10^6)
data_small[data_small$PROPDMGEXP %in% c("B","b"), ]$PROPDMG <- data_small[data_small$PROPDMGEXP %in% c("B","b"),]$PROPDMG * (10^9)

#Crop Damage Estimation
data_small[data_small$CROPDMGEXP %in% c("K","k"), ]$CROPDMG <- data_small[data_small$CROPDMGEXP %in% c("K","k"),]$CROPDMG * (10^3)
data_small[data_small$CROPDMGEXP %in% c("M","m"), ]$CROPDMG <- data_small[data_small$CROPDMGEXP %in% c("M","m"),]$CROPDMG * (10^6)
data_small[data_small$CROPDMGEXP %in% c("B","b"), ]$CROPDMG <- data_small[data_small$CROPDMGEXP %in% c("B","b"),]$CROPDMG * (10^9)

Results

Harmful Events to Population Health

Lets aggregate the data to estimate the most harmful events in terms of people health.

library(data.table)
data_small.dt <- data.table(data_small)
data_people <- data_small.dt[,list(FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES), TOTAL = sum(FATALITIES+INJURIES)), by = "EVTYPE"]

Now lets find out the top 10 events which created more damage to the people’s health

data_people <- data_people[order(-TOTAL)]
data_people10 <- data_people[1:10,]
data_people10$EVTYPE <- factor(data_people10$EVTYPE, levels = data_people10$EVTYPE)

# Top 10 events harmful for people
data_people10
##                EVTYPE FATALITIES INJURIES TOTAL
##  1:           TORNADO       5633    91346 96979
##  2:    EXCESSIVE HEAT       1903     6525  8428
##  3:         TSTM WIND        504     6957  7461
##  4:             FLOOD        470     6789  7259
##  5:         LIGHTNING        816     5230  6046
##  6:              HEAT        937     2100  3037
##  7:       FLASH FLOOD        978     1777  2755
##  8:         ICE STORM         89     1975  2064
##  9: THUNDERSTORM WIND        133     1488  1621
## 10:      WINTER STORM        206     1321  1527

The following plot shows the top 10 Harmful Weather Events for People

library(ggplot2)
ggplot(data_people10, aes(x = EVTYPE, y = TOTAL)) + 
    geom_bar(stat = "identity", fill = "red") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Fatalities and Injuries") + ggtitle("Top 10 Harmful Events for People")

Harmful Events to Property

Lets aggregate the data to estimmate the most harmful events in terms of Property Damage.

data_prop <- data_small.dt[, list(PROPERTY = sum(PROPDMG), CROPS = sum(CROPDMG), TOTAL = sum(PROPDMG+CROPDMG)), by = "EVTYPE"]

Now lets find out the top 10 events which ckreated more damage for crops

data_prop <- data_prop[order(-TOTAL)]
data_prop10 <- data_prop[1:10,]
data_prop10 <- data_prop10[, list(EVTYPE, PROPERTY = PROPERTY/(10^9), CROPS = CROPS/(10^9), TOTAL = TOTAL/10^9)]
data_prop10$EVTYPE <- factor(data_prop10$EVTYPE, levels = data_prop10$EVTYPE)

#Top 10 events harmful for Property
data_prop10
##                EVTYPE   PROPERTY      CROPS      TOTAL
##  1:             FLOOD 144.657710  5.6619684 150.319678
##  2: HURRICANE/TYPHOON  69.305840  2.6078728  71.913713
##  3:           TORNADO  56.937161  0.4149533  57.352114
##  4:       STORM SURGE  43.323536  0.0000050  43.323541
##  5:              HAIL  15.732268  3.0259545  18.758222
##  6:       FLASH FLOOD  16.140812  1.4213171  17.562129
##  7:           DROUGHT   1.046106 13.9725660  15.018672
##  8:         HURRICANE  11.868319  2.7419100  14.610229
##  9:       RIVER FLOOD   5.118945  5.0294590  10.148404
## 10:         ICE STORM   3.944928  5.0221135   8.967041

The plot below shows the top 10 harmful events for Property and Crops,

library(ggplot2)
ggplot(data_prop10, aes(x = EVTYPE, y = TOTAL)) + 
    geom_bar(stat = "identity", fill = "red") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Property and Crop Damage (in Billions)") + ggtitle("Top 10 Harmful Events for Property")