STORM DATA DATABASE ANALYSIS

1. SYNOPSIS

2. Data Processing

2.1 Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

<Storm Data [47Mb]>

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

We load the libraries that we are going to use

packages <- c('markdown', #Is a plain-text formatting syntax that can be converted to 'XHTML' or other formats.
              'knitr', #Tool for generating dynamic reports in R.
              'dplyr', #For data manipulation.
              'lubridate', #To work with date-times and time-spans.
              'ggplot2', #For graphics
              'sqldf', #configure and transparently import a database
              'lattice', #Data visualization
              'Hmisc' #Useful functions for data analysis, 
              #high-level graphing, impute missing values and import and annotate data sets
)

installed <- packages %in% installed.packages()

if(sum(installed == F) > 0) {
  install.packages(packages[!installed])
}
lapply(packages,require,character.only = T)

Loading and preprocessing the data.

setwd('F:/1. PROYECTOS DE TRABAJO/RStudio/5. Reproducible Research/Activity 2/')
getwd()
## [1] "F:/1. PROYECTOS DE TRABAJO/RStudio/5. Reproducible Research/Activity 2"
df <- read.csv('repdata_data_StormData.csv')
Sys.setlocale('LC_TIME', 'English')
## [1] "English_United States.1252"

General exploratory analysis and data type.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
dim(df)
## [1] 902297     37
head(df)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6
glimpse(df)
## Observations: 902,297
## Variables: 37
## $ STATE__    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ BGN_DATE   <fct> 4/18/1950 0:00:00, 4/18/1950 0:00:00, 2/20/1951 0:00:00,...
## $ BGN_TIME   <fct> 0130, 0145, 1600, 0900, 1500, 2000, 0100, 0900, 2000, 20...
## $ TIME_ZONE  <fct> CST, CST, CST, CST, CST, CST, CST, CST, CST, CST, CST, C...
## $ COUNTY     <dbl> 97, 3, 57, 89, 43, 77, 9, 123, 125, 57, 43, 9, 73, 49, 1...
## $ COUNTYNAME <fct> MOBILE, BALDWIN, FAYETTE, MADISON, CULLMAN, LAUDERDALE, ...
## $ STATE      <fct> AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, ...
## $ EVTYPE     <fct> TORNADO, TORNADO, TORNADO, TORNADO, TORNADO, TORNADO, TO...
## $ BGN_RANGE  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ BGN_AZI    <fct> , , , , , , , , , , , , , , , , , , , , , , , , , 
## $ BGN_LOCATI <fct> , , , , , , , , , , , , , , , , , , , , , , , , , 
## $ END_DATE   <fct> , , , , , , , , , , , , , , , , , , , , , , , , , 
## $ END_TIME   <fct> , , , , , , , , , , , , , , , , , , , , , , , , , 
## $ COUNTY_END <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ COUNTYENDN <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ END_RANGE  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ END_AZI    <fct> , , , , , , , , , , , , , , , , , , , , , , , , , 
## $ END_LOCATI <fct> , , , , , , , , , , , , , , , , , , , , , , , , , 
## $ LENGTH     <dbl> 14.0, 2.0, 0.1, 0.0, 0.0, 1.5, 1.5, 0.0, 3.3, 2.3, 1.3, ...
## $ WIDTH      <dbl> 100, 150, 123, 100, 150, 177, 33, 33, 100, 100, 400, 400...
## $ F          <int> 3, 2, 2, 2, 2, 2, 2, 1, 3, 3, 1, 1, 3, 3, 3, 4, 1, 1, 1,...
## $ MAG        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ FATALITIES <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 4, 0, 0, 0,...
## $ INJURIES   <dbl> 15, 0, 2, 2, 2, 6, 1, 0, 14, 0, 3, 3, 26, 12, 6, 50, 2, ...
## $ PROPDMG    <dbl> 25.0, 2.5, 25.0, 2.5, 2.5, 2.5, 2.5, 2.5, 25.0, 25.0, 2....
## $ PROPDMGEXP <fct> K, K, K, K, K, K, K, K, K, K, M, M, K, K, K, K, K, K, K,...
## $ CROPDMG    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CROPDMGEXP <fct> , , , , , , , , , , , , , , , , , , , , , , , , , 
## $ WFO        <fct> , , , , , , , , , , , , , , , , , , , , , , , , , 
## $ STATEOFFIC <fct> , , , , , , , , , , , , , , , , , , , , , , , , , 
## $ ZONENAMES  <fct> , , , , , , , , , , , , , , , , , , , , , , , , , 
## $ LATITUDE   <dbl> 3040, 3042, 3340, 3458, 3412, 3450, 3405, 3255, 3334, 33...
## $ LONGITUDE  <dbl> 8812, 8755, 8742, 8626, 8642, 8748, 8631, 8558, 8740, 87...
## $ LATITUDE_E <dbl> 3051, 0, 0, 0, 0, 0, 0, 0, 3336, 3337, 3402, 3404, 0, 34...
## $ LONGITUDE_ <dbl> 8806, 0, 0, 0, 0, 0, 0, 0, 8738, 8737, 8644, 8640, 0, 85...
## $ REMARKS    <fct> , , , , , , , , , , , , , , , , , , , , , , , , , 
## $ REFNUM     <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1...
sort(names(df))
##  [1] "BGN_AZI"    "BGN_DATE"   "BGN_LOCATI" "BGN_RANGE"  "BGN_TIME"  
##  [6] "COUNTY"     "COUNTY_END" "COUNTYENDN" "COUNTYNAME" "CROPDMG"   
## [11] "CROPDMGEXP" "END_AZI"    "END_DATE"   "END_LOCATI" "END_RANGE" 
## [16] "END_TIME"   "EVTYPE"     "F"          "FATALITIES" "INJURIES"  
## [21] "LATITUDE"   "LATITUDE_E" "LENGTH"     "LONGITUDE"  "LONGITUDE_"
## [26] "MAG"        "PROPDMG"    "PROPDMGEXP" "REFNUM"     "REMARKS"   
## [31] "STATE"      "STATE__"    "STATEOFFIC" "TIME_ZONE"  "WFO"       
## [36] "WIDTH"      "ZONENAMES"

Basic statistics

summary(df)
##     STATE__                  BGN_DATE             BGN_TIME     
##  Min.   : 1.0   5/25/2011 0:00:00:  1202   12:00:00 AM: 10163  
##  1st Qu.:19.0   4/27/2011 0:00:00:  1193   06:00:00 PM:  7350  
##  Median :30.0   6/9/2011 0:00:00 :  1030   04:00:00 PM:  7261  
##  Mean   :31.2   5/30/2004 0:00:00:  1016   05:00:00 PM:  6891  
##  3rd Qu.:45.0   4/4/2011 0:00:00 :  1009   12:00:00 PM:  6703  
##  Max.   :95.0   4/2/2006 0:00:00 :   981   03:00:00 PM:  6700  
##                 (Other)          :895866   (Other)    :857229  
##    TIME_ZONE          COUNTY           COUNTYNAME         STATE       
##  CST    :547493   Min.   :  0.0   JEFFERSON :  7840   TX     : 83728  
##  EST    :245558   1st Qu.: 31.0   WASHINGTON:  7603   KS     : 53440  
##  MST    : 68390   Median : 75.0   JACKSON   :  6660   OK     : 46802  
##  PST    : 28302   Mean   :100.6   FRANKLIN  :  6256   MO     : 35648  
##  AST    :  6360   3rd Qu.:131.0   LINCOLN   :  5937   IA     : 31069  
##  HST    :  2563   Max.   :873.0   MADISON   :  5632   NE     : 30271  
##  (Other):  3631                   (Other)   :862369   (Other):621339  
##                EVTYPE         BGN_RANGE           BGN_AZI      
##  HAIL             :288661   Min.   :   0.000          :547332  
##  TSTM WIND        :219940   1st Qu.:   0.000   N      : 86752  
##  THUNDERSTORM WIND: 82563   Median :   0.000   W      : 38446  
##  TORNADO          : 60652   Mean   :   1.484   S      : 37558  
##  FLASH FLOOD      : 54277   3rd Qu.:   1.000   E      : 33178  
##  FLOOD            : 25326   Max.   :3749.000   NW     : 24041  
##  (Other)          :170878                      (Other):134990  
##          BGN_LOCATI                  END_DATE             END_TIME     
##               :287743                    :243411              :238978  
##  COUNTYWIDE   : 19680   4/27/2011 0:00:00:  1214   06:00:00 PM:  9802  
##  Countywide   :   993   5/25/2011 0:00:00:  1196   05:00:00 PM:  8314  
##  SPRINGFIELD  :   843   6/9/2011 0:00:00 :  1021   04:00:00 PM:  8104  
##  SOUTH PORTION:   810   4/4/2011 0:00:00 :  1007   12:00:00 PM:  7483  
##  NORTH PORTION:   784   5/30/2004 0:00:00:   998   11:59:00 PM:  7184  
##  (Other)      :591444   (Other)          :653450   (Other)    :622432  
##    COUNTY_END COUNTYENDN       END_RANGE           END_AZI      
##  Min.   :0    Mode:logical   Min.   :  0.0000          :724837  
##  1st Qu.:0    NA's:902297    1st Qu.:  0.0000   N      : 28082  
##  Median :0                   Median :  0.0000   S      : 22510  
##  Mean   :0                   Mean   :  0.9862   W      : 20119  
##  3rd Qu.:0                   3rd Qu.:  0.0000   E      : 20047  
##  Max.   :0                   Max.   :925.0000   NE     : 14606  
##                                                 (Other): 72096  
##            END_LOCATI         LENGTH              WIDTH         
##                 :499225   Min.   :   0.0000   Min.   :   0.000  
##  COUNTYWIDE     : 19731   1st Qu.:   0.0000   1st Qu.:   0.000  
##  SOUTH PORTION  :   833   Median :   0.0000   Median :   0.000  
##  NORTH PORTION  :   780   Mean   :   0.2301   Mean   :   7.503  
##  CENTRAL PORTION:   617   3rd Qu.:   0.0000   3rd Qu.:   0.000  
##  SPRINGFIELD    :   575   Max.   :2315.0000   Max.   :4400.000  
##  (Other)        :380536                                         
##        F               MAG            FATALITIES          INJURIES        
##  Min.   :0.0      Min.   :    0.0   Min.   :  0.0000   Min.   :   0.0000  
##  1st Qu.:0.0      1st Qu.:    0.0   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  Median :1.0      Median :   50.0   Median :  0.0000   Median :   0.0000  
##  Mean   :0.9      Mean   :   46.9   Mean   :  0.0168   Mean   :   0.1557  
##  3rd Qu.:1.0      3rd Qu.:   75.0   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  Max.   :5.0      Max.   :22000.0   Max.   :583.0000   Max.   :1700.0000  
##  NA's   :843563                                                           
##     PROPDMG          PROPDMGEXP        CROPDMG          CROPDMGEXP    
##  Min.   :   0.00          :465934   Min.   :  0.000          :618413  
##  1st Qu.:   0.00   K      :424665   1st Qu.:  0.000   K      :281832  
##  Median :   0.00   M      : 11330   Median :  0.000   M      :  1994  
##  Mean   :  12.06   0      :   216   Mean   :  1.527   k      :    21  
##  3rd Qu.:   0.50   B      :    40   3rd Qu.:  0.000   0      :    19  
##  Max.   :5000.00   5      :    28   Max.   :990.000   B      :     9  
##                    (Other):    84                     (Other):     9  
##       WFO                                       STATEOFFIC    
##         :142069                                      :248769  
##  OUN    : 17393   TEXAS, North                       : 12193  
##  JAN    : 13889   ARKANSAS, Central and North Central: 11738  
##  LWX    : 13174   IOWA, Central                      : 11345  
##  PHI    : 12551   KANSAS, Southwest                  : 11212  
##  TSA    : 12483   GEORGIA, North and Central         : 11120  
##  (Other):690738   (Other)                            :595920  
##                                                                                                                                                                                                     ZONENAMES     
##                                                                                                                                                                                                          :594029  
##                                                                                                                                                                                                          :205988  
##  GREATER RENO / CARSON CITY / M - GREATER RENO / CARSON CITY / M                                                                                                                                         :   639  
##  GREATER LAKE TAHOE AREA - GREATER LAKE TAHOE AREA                                                                                                                                                       :   592  
##  JEFFERSON - JEFFERSON                                                                                                                                                                                   :   303  
##  MADISON - MADISON                                                                                                                                                                                       :   302  
##  (Other)                                                                                                                                                                                                 :100444  
##     LATITUDE      LONGITUDE        LATITUDE_E     LONGITUDE_    
##  Min.   :   0   Min.   :-14451   Min.   :   0   Min.   :-14455  
##  1st Qu.:2802   1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0  
##  Median :3540   Median :  8707   Median :   0   Median :     0  
##  Mean   :2875   Mean   :  6940   Mean   :1452   Mean   :  3509  
##  3rd Qu.:4019   3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735  
##  Max.   :9706   Max.   : 17124   Max.   :9706   Max.   :106220  
##  NA's   :47                      NA's   :40                     
##                                            REMARKS           REFNUM      
##                                                :287433   Min.   :     1  
##                                                : 24013   1st Qu.:225575  
##  Trees down.\n                                 :  1110   Median :451149  
##  Several trees were blown down.\n              :   569   Mean   :451149  
##  Trees were downed.\n                          :   446   3rd Qu.:676723  
##  Large trees and power lines were blown down.\n:   432   Max.   :902297  
##  (Other)                                       :588294

##Results

##Questions

Your data analysis must address the following questions:

1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Variable selection (reducing the data set to only needed columns and variables).

df2 <- df[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP",
              "CROPDMG", "CROPDMGEXP")]

General exploratory analysis and data type.

library(dplyr)
dim(df2)
## [1] 902297      7
head(df2)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0
glimpse(df2)
## Observations: 902,297
## Variables: 7
## $ EVTYPE     <fct> TORNADO, TORNADO, TORNADO, TORNADO, TORNADO, TORNADO, TO...
## $ FATALITIES <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 4, 0, 0, 0,...
## $ INJURIES   <dbl> 15, 0, 2, 2, 2, 6, 1, 0, 14, 0, 3, 3, 26, 12, 6, 50, 2, ...
## $ PROPDMG    <dbl> 25.0, 2.5, 25.0, 2.5, 2.5, 2.5, 2.5, 2.5, 25.0, 25.0, 2....
## $ PROPDMGEXP <fct> K, K, K, K, K, K, K, K, K, K, M, M, K, K, K, K, K, K, K,...
## $ CROPDMG    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CROPDMGEXP <fct> , , , , , , , , , , , , , , , , , , , , , , , , ,
sort(names(df2))
## [1] "CROPDMG"    "CROPDMGEXP" "EVTYPE"     "FATALITIES" "INJURIES"  
## [6] "PROPDMG"    "PROPDMGEXP"

Basic statistics

summary(df2)
##                EVTYPE         FATALITIES          INJURIES        
##  HAIL             :288661   Min.   :  0.0000   Min.   :   0.0000  
##  TSTM WIND        :219940   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  THUNDERSTORM WIND: 82563   Median :  0.0000   Median :   0.0000  
##  TORNADO          : 60652   Mean   :  0.0168   Mean   :   0.1557  
##  FLASH FLOOD      : 54277   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  FLOOD            : 25326   Max.   :583.0000   Max.   :1700.0000  
##  (Other)          :170878                                         
##     PROPDMG          PROPDMGEXP        CROPDMG          CROPDMGEXP    
##  Min.   :   0.00          :465934   Min.   :  0.000          :618413  
##  1st Qu.:   0.00   K      :424665   1st Qu.:  0.000   K      :281832  
##  Median :   0.00   M      : 11330   Median :  0.000   M      :  1994  
##  Mean   :  12.06   0      :   216   Mean   :  1.527   k      :    21  
##  3rd Qu.:   0.50   B      :    40   3rd Qu.:  0.000   0      :    19  
##  Max.   :5000.00   5      :    28   Max.   :990.000   B      :     9  
##                    (Other):    84                     (Other):     9

Reviewing events that cause the most fatalities (The Top-10 Fatalities by Weather Event).

Procedure = aggregate the top 10 fatalities by the event type and sort the output in descending order.

Fatalities <- aggregate(FATALITIES ~ EVTYPE, data = df2, FUN = sum)
Top10_Fatalities <- Fatalities[order(-Fatalities$FATALITIES), ][1:10, ] 
Top10_Fatalities
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224

Reviewing events that cause the most injuries (The Top-10 Injuries by Weather Event).

Procedure = aggregate the top 10 injuries by the event type and sort the output in descending order.

Injuries <- aggregate(INJURIES ~ EVTYPE, data = df2, FUN = sum)
Top10_Injuries <- Injuries[order(-Injuries$INJURIES), ][1:10, ] 
Top10_Injuries 
##                EVTYPE INJURIES
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361

Plot of Top 10 Fatalities & Injuries for Weather Event Types ( Population Health Impact ).

Proecedure = plot graphs showing the top 10 fatalities and injuries.

par(mfrow=c(1,2),mar=c(11,5,5,2))
barplot(Top10_Fatalities$FATALITIES,names.arg=Top10_Fatalities$EVTYPE,las=2,col='violet',ylab='fatalities',main='Top 10 fatalities')
barplot(Top10_Injuries$INJURIES,names.arg=Top10_Injuries$EVTYPE,las=2,col='violet',ylab='injuries',main='Top 10 Injuries')

Figure 1: The weather event responsbile for the highest fatalities and injuries is the ‘Tornado’.

2. Across the United States, which types of events have the greatest economic consequences?

An analysis of the weather events responsible for the greatest economic consequences.

Hypothesis: Economic consequences means damages. The two significant types of damage typically caused by weather events include ‘properties and crops’.

Data Exploration & Findings …

As a result, let’s convert the exponent columns into numeric data for the calculation of total property and crop damages encountered.

Defining & Calcuating [ Property Damage ].

Property damage exponents for each level listed out & assigned those values for the property exponent data.

Invalid data was excluded by assigning the value as ‘0’.

Then, the property damage value was calculated by multiplying the property damage and property exponent value.

unique(df2$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
df2$PROPEXP[df2$PROPDMGEXP == "K"] <- 1000
df2$PROPEXP[df2$PROPDMGEXP == "M"] <- 1e+06
df2$PROPEXP[df2$PROPDMGEXP == ""] <- 1
df2$PROPEXP[df2$PROPDMGEXP == "B"] <- 1e+09
df2$PROPEXP[df2$PROPDMGEXP == "m"] <- 1e+06
df2$PROPEXP[df2$PROPDMGEXP == "0"] <- 1
df2$PROPEXP[df2$PROPDMGEXP == "5"] <- 1e+05
df2$PROPEXPP[df2$PROPDMGEXP == "6"] <- 1e+06
df2$PROPEXP[df2$PROPDMGEXP == "4"] <- 10000
df2$PROPEXP[df2$PROPDMGEXP == "2"] <- 100
df2$PROPEXP[df2$PROPDMGEXP == "3"] <- 1000
df2$PROPEXP[df2$PROPDMGEXP == "h"] <- 100
df2$PROPEXP[df2$PROPDMGEXP == "7"] <- 1e+07
df2$PROPEXP[df2$PROPDMGEXP == "H"] <- 100
df2$PROPEXP[df2$PROPDMGEXP == "1"] <- 10
df2$PROPEXP[df2$PROPDMGEXP == "8"] <- 1e+08

# Assigning '0' to invalid exponent strmdata
df2$PROPEXP[df2$PROPDMGEXP == "+"] <- 0
df2$PROPEXP[df2$PROPDMGEXP == "-"] <- 0
df2$PROPEXP[df2$PROPDMGEXP == "?"] <- 0

# Calculating the property damage value
df2$PROPDMGVAL <- df2$PROPDMG * df2$PROPEXP

Defining & Calcuating [ Crop Damage ].

Crop damage exponents for each level listed out & assigned those values for the crop exponent data.

Invalid data was excluded by assigning the value as ‘0’.

Then, the crop damage value was calculated by multiplying the crop damage and crop exponent value.

unique(df2$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M
# Assigning values for the crop exponent strmdata 
df2$CROPEXP[df2$CROPDMGEXP == "M"] <- 1e+06
df2$CROPEXP[df2$CROPDMGEXP == "K"] <- 1000
df2$CROPEXP[df2$CROPDMGEXP == "m"] <- 1e+06
df2$CROPEXP[df2$CROPDMGEXP == "B"] <- 1e+09
df2$CROPEXP[df2$CROPDMGEXP == "0"] <- 1
df2$CROPEXP[df2$CROPDMGEXP == "k"] <- 1000
df2$CROPEXP[df2$CROPDMGEXP == "2"] <- 100
df2$CROPEXP[df2$CROPDMGEXP == ""] <- 1

# Assigning '0' to invalid exponent strmdata
df2$CROPEXP[df2$CROPDMGEXP == "?"] <- 0

# calculating the crop damage 
df2$CROPDMGVAL <- df2$CROPDMG * df2$CROPEXP

Property Damage Summary.

Procedure = aggregate the property damage by the event type and sort the output it in descending order.

prop <- aggregate(PROPDMGVAL~EVTYPE,data=df2,FUN=sum,na.rm=TRUE)
prop <- prop[with(prop,order(-PROPDMGVAL)),]
prop <- head(prop, 10)
prop
##                EVTYPE   PROPDMGVAL
## 170             FLOOD 144657709807
## 411 HURRICANE/TYPHOON  69305840000
## 834           TORNADO  56947380617
## 670       STORM SURGE  43323536000
## 153       FLASH FLOOD  16822673979
## 244              HAIL  15735267513
## 402         HURRICANE  11868319010
## 848    TROPICAL STORM   7703890550
## 972      WINTER STORM   6688497251
## 359         HIGH WIND   5270046260

Crop Damage Summary

Procedure = aggregate the crop damage by the event type and sort the output it in descending order.

crop <- aggregate(CROPDMGVAL~EVTYPE,data=df2,FUN=sum,na.rm=TRUE)
crop <- crop[with(crop,order(-CROPDMGVAL)),]
crop <- head(crop, 10)
crop
##                EVTYPE  CROPDMGVAL
## 95            DROUGHT 13972566000
## 170             FLOOD  5661968450
## 590       RIVER FLOOD  5029459000
## 427         ICE STORM  5022113500
## 244              HAIL  3025954473
## 402         HURRICANE  2741910000
## 411 HURRICANE/TYPHOON  2607872800
## 153       FLASH FLOOD  1421317100
## 140      EXTREME COLD  1292973000
## 212      FROST/FREEZE  1094086000

Plot of Top 10 Property & Crop damages by Weather Event Types (Economic Consequences).

plot the graph showing the top 10 property and crop damages.

par(mfrow=c(1,2),mar=c(11,4,4,2))
barplot(prop$PROPDMGVAL/(10^9),names.arg=prop$EVTYPE,las=2,col="blue",ylab="Prop.damage(billions)",main="Top10 Prop.Damages")
barplot(crop$CROPDMGVAL/(10^9),names.arg=crop$EVTYPE,las=2,col="blue",ylab="Crop damage(billions)",main="Top10 Crop.Damages")

Results and Conclusions

Tornados caused the maximum number of fatalities and injuries. It was followed by Excessive Heat for fatalities and Thunderstorm wind for injuries.

Floods caused the maximum property damage where as Drought caused the maximum crop damage. Second major events that caused the maximum damage was Hurricanes/Typhoos for property damage and Floods for crop damage.