#Coursera: Reproducible Research Assignment 2 - NOAA Storm Database analysis for severe weather events Synopsis This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. We shall cleanse the data; and produce the top 5 severe harmful events where focus needs to be driven. Observation (s) There are 985 uniqEvents There are 37 variables and 902297 observations CROPDMGEXP, and PROPDMGEXP requires cleansing for uniformity on units The CROPDMG, PROPDMG values would need to scaled appropriately

Work Details Loading and preprocessing the data Assumptions

Working directory is set to current local clone of the github repository for this assignment.
The dataset,repdata-data-StormData.csv.bz2, required for the reproducible research is already downloaded to the repository.
setwd("C:/Users/Helga/Documents/gabriel/cursos/")
data.local = "C:/Users/Helga/Documents/gabriel/cursos/repdata_data_StormData.csv"

# load the data
stormData <- read.csv(data.local, na.strings = "NA", stringsAsFactors = FALSE)

# take a quick look at the data culture
summary(stormData)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY     COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31   Class :character   Class :character   Class :character  
##  Median : 75   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :101                                                           
##  3rd Qu.:131                                                           
##  Max.   :873                                                           
##                                                                        
##    BGN_RANGE      BGN_AZI           BGN_LOCATI          END_DATE        
##  Min.   :   0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:   0   Class :character   Class :character   Class :character  
##  Median :   0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :   1                                                           
##  3rd Qu.:   1                                                           
##  Max.   :3749                                                           
##                                                                         
##    END_TIME           COUNTY_END COUNTYENDN       END_RANGE  
##  Length:902297      Min.   :0    Mode:logical   Min.   :  0  
##  Class :character   1st Qu.:0    NA's:902297    1st Qu.:  0  
##  Mode  :character   Median :0                   Median :  0  
##                     Mean   :0                   Mean   :  1  
##                     3rd Qu.:0                   3rd Qu.:  0  
##                     Max.   :0                   Max.   :925  
##                                                              
##    END_AZI           END_LOCATI            LENGTH           WIDTH     
##  Length:902297      Length:902297      Min.   :   0.0   Min.   :   0  
##  Class :character   Class :character   1st Qu.:   0.0   1st Qu.:   0  
##  Mode  :character   Mode  :character   Median :   0.0   Median :   0  
##                                        Mean   :   0.2   Mean   :   8  
##                                        3rd Qu.:   0.0   3rd Qu.:   0  
##                                        Max.   :2315.0   Max.   :4400  
##                                                                       
##        F               MAG          FATALITIES     INJURIES     
##  Min.   :0        Min.   :    0   Min.   :  0   Min.   :   0.0  
##  1st Qu.:0        1st Qu.:    0   1st Qu.:  0   1st Qu.:   0.0  
##  Median :1        Median :   50   Median :  0   Median :   0.0  
##  Mean   :1        Mean   :   47   Mean   :  0   Mean   :   0.2  
##  3rd Qu.:1        3rd Qu.:   75   3rd Qu.:  0   3rd Qu.:   0.0  
##  Max.   :5        Max.   :22000   Max.   :583   Max.   :1700.0  
##  NA's   :843563                                                 
##     PROPDMG      PROPDMGEXP           CROPDMG       CROPDMGEXP       
##  Min.   :   0   Length:902297      Min.   :  0.0   Length:902297     
##  1st Qu.:   0   Class :character   1st Qu.:  0.0   Class :character  
##  Median :   0   Mode  :character   Median :  0.0   Mode  :character  
##  Mean   :  12                      Mean   :  1.5                     
##  3rd Qu.:   0                      3rd Qu.:  0.0                     
##  Max.   :5000                      Max.   :990.0                     
##                                                                      
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 

str(stormData)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

uniqEvents <- unique(stormData$EVTYPE)
numEvents <- length(uniqEvents)

Data Cleansing/subsetting for expenses:

Narrow down the window to where the problems are

Reduce the subset to focus - No damage should be discarded

sData <- stormData[stormData$FATALITIES > 0 | stormData$INJURIES > 0 | stormData$PROPDMG > 
    0 | stormData$CROPDMG > 0, ]

Look at the scale of damage

propDmgEXP <- unique (sData$PROPDMGEXP)
cropDmgEXP <- unique (sData$CROPDMGEXP)
table(sData$CROPDMGEXP)
## 
##             ?      0      B      k      K      m      M 
## 152664      6     17      7     21  99932      1   1985
table(sData$PROPDMGEXP)
## 
##             -      +      0      2      3      4      5      6      7 
##  11585      1      5    210      1      1      4     18      3      3 
##      B      h      H      K      m      M 
##     40      1      6 231428      7  11320



#Approach:
# Make the scale Uniform: need to scale the data value accordingly
# B/b --> billion : 10e(9)
# M/m --> million : 10e(6)
# K/k --> thousand: 10e(3)
# H/h --> hundred : 10e(2)
# "-" --> 10e0
# "?" --> 10e0
# "number" -> 10e(number)


# function to convert DMGEXP character to multiplication number
#       but suppress NA warnings caused by unnecessary 10^character attempts
exp_factor <- function(x){suppressWarnings(
                       ifelse(x %in% as.character(0:8), 10^as.numeric(x),
                       ifelse(x %in% c("b","B"), 10^9,    # billion
                       ifelse(x %in% c("m","M"), 10^6,    # million/mega
                       ifelse(x %in% c("k","K"), 10^3,    # kilo   
                       ifelse(x %in% c("h","H"), 10^2,    # hecto
                       1))))))
                      }
sData$PROPDMG <- sData$PROPDMG*exp_factor(sData$PROPDMGEXP)
sData$CROPDMG <- sData$CROPDMG*exp_factor(sData$CROPDMGEXP)

vPropDmg <-tapply(sData$PROPDMG, sData$EVTYPE, sum);
vPropDmg <-vPropDmg[order(vPropDmg, decreasing=TRUE)]
# Top 5 Property Damage Cost in Millions USD
vPropDmgTop5 <- head (vPropDmg/10^6,5)
vPropDmgTop5
##             FLOOD HURRICANE/TYPHOON           TORNADO       STORM SURGE 
##            144658             69306             56947             43324 
##       FLASH FLOOD 
##             16823

vCropDmg <-tapply(sData$CROPDMG, sData$EVTYPE, sum);
vCropDmg <-vPropDmg[order(vCropDmg, decreasing=TRUE)]

# Top 5 Crops Damage Cost in Millions USD
vCropDmgTop5 <- head (vCropDmg/10^6, 5)
vCropDmgTop5
##  WINTER STORM HIGH WINDS         Coastal Flooding       WATERSPOUT-TORNADO 
##                   60.000                    6.325                    0.015 
## THUNDERSTORM WIND 60 MPH                SNOW/COLD 
##                    0.070                    1.000

Plot depicting Top 5 events causing most of expense

par(mfrow = c(2, 1))
barplot(vCropDmgTop5, xlab = "Event Type", ylab = "Crops Damage in Millions USD")
barplot(vPropDmgTop5, xlab = "Event Type", ylab = "Properties Damage in Millions USD")

plot of chunk unnamed-chunk-4

plot of chunk unnamed-chunk-3 Top 5 Events causing the most Fatalities/Injuries

vFatal <- tapply(sData$FATALITIES, sData$EVTYPE, sum)
vFatal <- vFatal[order(vFatal, decreasing = TRUE)]

vFatalTop5 <- head(vFatal, 5)
vFatalTop5
##        TORNADO EXCESSIVE HEAT    FLASH FLOOD           HEAT      LIGHTNING 
##           5633           1903            978            937            816

vInjury <- tapply(sData$INJURIES, sData$EVTYPE, sum)
vInjury <- vInjury[order(vInjury, decreasing = TRUE)]
vInjuryTop5 <- head(vInjury, 5)
vInjuryTop5
##        TORNADO      TSTM WIND          FLOOD EXCESSIVE HEAT      LIGHTNING 
##          91346           6957           6789           6525           5230

Plot depicting Top 5 events causing Fatalities/Injuries

par(mfrow = c(2, 1))
barplot(vFatalTop5, xlab = "Event Type", ylab = "Number of Fatalities")
barplot(vInjuryTop5, xlab = "Event Type", ylab = "Number of Inuries")

plot of chunk unnamed-chunk-6

plot of chunk unnamed-chunk-5 Results

Tornado is the single event that's most damaging. It caused 5633 incidence deaths, and 91346 incidences of injuries.
Flood is causing the most damage to the Properites. This is tune of 144.6 billion USD.
WINTER STORM HIGH WINDS is causing the most damage to the Crops. This is tune of 60 million USD.