Synopsis

The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The data is used in this project for descriptive statistics: determining what particular major storms and weather events induce * the highest number of harmful events in the population (e.g. injuries and fatalities); * the highest economic consequences (e.g. crop and property destruction). This will be computed using the NOAA database using R.

Data Processing

The NOAA database is available online (https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2) in a comma-separated-value file type compressed via the bzip2, which, in R, can be read in a dataframe using read.csv after downloading it from the available link via download.file. The next cell is reading the data and shows its dimension, its corresponding variables and the data frame summary.

if(!dir.exists('./data')) file.create('./data')

download.file('https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2','./data/Project2.csv')

df <- read.csv('./data/Project2.csv')
dim(df)
## [1] 902297     37
names(df)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
#str(df)
#head(df)
summary(df$EVTYPE)
##    Length     Class      Mode 
##    902297 character character

The aggregated dataframes correspnding to the sum of INJURIES and FATALITIES respectively are computed using as grouping elements the ‘EVTYPE’, i.e the type events.

## Aggregate the data frame on the EVTYPE variable
dfInj <- aggregate(INJURIES ~ EVTYPE, data = df, sum, na.rm = T)
dfFat <- aggregate(FATALITIES ~ EVTYPE, data = df, sum, na.rm = T)

The dfHarm dataframe is formed, with the first column given by the vector of the events and the second column computed as the sum of the total injuries and fatalities. THe dataframe is then ordered in descending order of the number of population harmful events

dfHarm <- dfInj[,1, drop=F]
dfHarm$Harms <- dfInj$INJURIES + dfFat$FATALITIES
dfHarm <- dfHarm[order(dfHarm$Harms,decreasing=TRUE), ]
head(dfHarm,4)
##             EVTYPE Harms
## 834        TORNADO 96979
## 130 EXCESSIVE HEAT  8428
## 856      TSTM WIND  7461
## 170          FLOOD  7259

An easy visual representation of population harmful events number per weather events is via a barplot. Only the first 16 weather events in terms of their population harmful events number are ploted.

par(mfrow=c(1,1), mar = c(9.5,4,1,0.5))
barplot(dfHarm$Harms[1:16], names.arg = dfHarm$EVTYPE[1:16],
        cex.axis=0.5, cex.names=0.5,
        col = 'wheat',
        xlab = "Event", ylab = 'Number of Population Harmful Events', 
        main = "Population Harmful Events [Injuries + Fatalities]", las=2)

Tornado is inducing the highest population harmful events number (fatalities & injuries): 96979

The data for the economic consequences is split into two parts: the crop and the property damage. Both are defined via two variables: * Crop * CROPDMG: Numerical value corresponding to the crop damage * CROPDMGEXP: Categorical value, coded as character, corresponding to the magnitude of the crop damage * Property
* PROPDMG: Numerical value corresponding to the property damage * PROPDMGEXP: Categorical value, coded as character, corresponding to the magnitude of the property damage

The magnitude variables have the following signification:

A function that transforms the categorical variables, coded as character into numerical variables according to the significations above is first implemented.

numericValues <- function(a){
    if(a == '?' || a == '-' || a == ''){
        return(0)
    }
    if(a == '+'){
        return(1)
    }        
    if(a %in% 0:9){
        return(10)
    }
    if(a == 'H' || a == 'h'){
        return(10**2)
    }
    if(a == 'K' || a == 'k'){
        return(10**3)
    }
    if(a == 'M' || a == 'm'){
        return(10**6)
    }
    if(a == 'B' || a == 'b'){
        return(10**9)
    }    
}

The function is then applied to the two magnitude variables (i.e. PROPDMGEXP and CROPDMGEXP) and the numerical value for the economic impact is computed for both the property and the crop. Together with the type of event (i.e. EVTYPE) their sum, named PropCrop form a two column data frame, dfEconomic, with the type of meteorologic event and it’s corresponding economic impact for each row.

dfEconomic <- df[,'EVTYPE', drop=F]

df$PROPDMGEXP <- vapply(df$PROPDMGEXP, numericValues, numeric(1))
dfEconomic$Property <- df$PROPDMG * df$PROPDMGEXP

df$CROPDMGEXP <- vapply(df$CROPDMGEXP, numericValues, numeric(1))
dfEconomic$Crop <- df$CROPDMG * df$CROPDMGEXP

dfEconomic$PropCrop <- dfEconomic$Property + dfEconomic$Crop

head(dfEconomic)
##    EVTYPE Property Crop PropCrop
## 1 TORNADO    25000    0    25000
## 2 TORNADO     2500    0     2500
## 3 TORNADO    25000    0    25000
## 4 TORNADO     2500    0     2500
## 5 TORNADO     2500    0     2500
## 6 TORNADO     2500    0     2500

The dfEconomicAgg data frame is obtained from the data frame dfEconomic correspnding to the sum of PropCropis computed using as grouping elements the ‘EVTYPE’ variable, i.e the type events, which is then ordered in decreasing order.

dfEconomicAgg <- aggregate(PropCrop ~ EVTYPE, data = dfEconomic, sum)
dfEconomicAgg <- dfEconomicAgg[order(dfEconomicAgg$PropCrop,decreasing=TRUE), ]

An easy visual representation of economic impact per weather events is via a barplot. Only the first 16 weather events in terms of their economic impact are ploted.

par(mfrow=c(1,1), mar = c(10.5,4,1,0.5))
barplot(dfEconomicAgg$PropCrop[1:16], names.arg = dfEconomicAgg$EVTYPE[1:16],
        cex.axis=0.5, cex.names=0.5,
        col = 'wheat',
        xlab = "Event", ylab = 'Damage Cost $', 
        main = "Economic Impact [Crop + Property]", las=2)

Flood is inducing the highest economic impact (crops & properties): ~ $ 150 B .

Results

For the the population harmful events (obtained as the sum of injuries and fatalities) the most damage is induce by tornados, which account for approximately \(0.62\) percent of the total population harmful events.

print(paste('Percent of tornado population harmful events effects: ', dfHarm$Harms[1]/sum(dfHarm$Harms), '%.'))
## [1] "Percent of tornado population harmful events effects:  0.622966089174102 %."

For and the economic impact (obtained as the sum of the crop and property damaged, accounting for the corresponding magnitudes) the most damage is induced by floods, which account for approximately \(0.32\) percent of the total population harmful events.

In both cases, the proportion of tornado negative effects is much higher compared to the other weather events in the data base:

print(paste('Percent of flood economic damage effects: ', dfEconomicAgg$PropCrop[1]/sum(dfEconomicAgg$PropCrop), '%.'))
## [1] "Percent of flood economic damage effects:  0.315517313838657 %."

Between the 16 weather events that account for the most economic damage impact and the 16 weather events that account for the most population harmful impact, ten of the weather events are common: TORNADO, TSTM WIND, FLOOD, FLASH FLOOD, ICE STORM, WINTER STORM, HIGH WIND, HAIL, HURRICANE/TYPHOON, WILDFIRE, which account for \(0.79%\) of the population harmful events and \(0.73%\) of the economic damage.

commonEvents <- intersect(dfHarm$EVTYPE[1:16], dfEconomicAgg$EVTYPE[1:16])
print(commonEvents)
##  [1] "TORNADO"           "TSTM WIND"         "FLOOD"            
##  [4] "FLASH FLOOD"       "ICE STORM"         "WINTER STORM"     
##  [7] "HIGH WIND"         "HAIL"              "HURRICANE/TYPHOON"
## [10] "WILDFIRE"
print(paste('Percent of first 10 common weather events in terms of population harmful events effects: ', sum(dfHarm$Harms[match(commonEvents, dfHarm$EVTYPE)])/sum(dfHarm$Harms), '%.'))
## [1] "Percent of first 10 common weather events in terms of population harmful events effects:  0.790959254334406 %."
print(paste('Percent of first 10 common weather events in terms of economic damage effects: ', sum(dfEconomicAgg$PropCrop[match(commonEvents, dfEconomicAgg$EVTYPE)])/sum(dfEconomicAgg$PropCrop), '%.'))
## [1] "Percent of first 10 common weather events in terms of economic damage effects:  0.729596496031931 %."