Summary

The present analisys aims to identify most harmful meteorological phenomena. To do so we will analize National Weather Service data from 1950 to 2011. National Weather Service Instructions

To perform the analisys first we will use data from 2007 to 2011, in prior years there were less events observed and less standarized event types.

We will calculate the dollar amounts for the damages which are expressed by a number and a alphanumerical exponent.

Once done so we will present the top more harmful and negative for the economy events. To determine how harmful to population was an event, we will summarize number of injured and fatalities. To determine level of impact on economy we will summarize the damage to properties and crop.

Data Processing

Data processing will consist in two different steps. Downloading and reading and summarizing data calculating dollar amounts using the given exponents in the dataset and total victims per year and event.

Downloading and reading.

Download storm data and read csv file into data frame “data”.

#download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile = "StormData.bz2")
data <- read.csv("StormData.bz2")

There are 902297 Obs. of 37 varibles.

str(data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Summarize, calculate dollar amounts and total victims

library(lubridate)

data$BGN_DATE <- as.Date(data$BGN_DATE, "%m/%d/%Y" )

data <- data %>% mutate(YEAR = year(BGN_DATE))

Create new columns with the actual amount of property and crop damage by multiplyng by the manigitude (PROPDMEXP,CROPDMEXP) being magnitud K thousands, M millions and B billions as explained in National Weather Service Instructions

We will express monetary quantities in millions of dollars.

function.exptonum = function(x){  case_when(x=="K"~1000, x=="M"~1000000, x=="B"~1000000000) }


PropertyDmg <- data %>% group_by(EVTYPE,YEAR) %>%
                      filter(!is.na(PROPDMG)) %>%    
                      mutate(num_propdmgexp = function.exptonum(PROPDMGEXP),
                             prop_dmg = (PROPDMG * num_propdmgexp) / 1e+06) %>%
                      summarize(prop_dmg= sum(prop_dmg, na.rm = TRUE))     



CropDmg <- data %>% group_by(EVTYPE,YEAR) %>%
                      filter(!is.na(CROPDMG)) %>%    
                      mutate(num_cropdmgexp = function.exptonum(CROPDMGEXP),
                             crop_dmg = (CROPDMG * num_cropdmgexp) / 1e+06)%>%
                      summarize(crop_dmg= sum(crop_dmg, na.rm = TRUE)) 
                             



EconomicImpact <- merge(PropertyDmg,CropDmg) %>%
                  mutate (total_dmg = prop_dmg + crop_dmg)
PopulationImpact <- data %>% group_by(EVTYPE,YEAR) %>%
                            summarize(Fatalities = sum(FATALITIES), 
                                      Injured = sum(INJURIES),
                                      Total_victims = sum(FATALITIES) + sum(INJURIES))

Analysis

Choose period

First we wil check if observations of events has been regular along time.

table(EconomicImpact$YEAR)
## 
## 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 
##    1    1    1    1    1    3    3    3    3    3    3    3    3    3    3    3 
## 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 
##    3    3    3    3    3    3    3    3    3    3    3    3    3    3    3    3 
## 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 
##    3    3    3    3    3    3    3    3    3    3    3  160  267  387  228  170 
## 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 
##  126  121  112  122   99   51   38   46   50   46   46   46   46   46
unique(EconomicImpact[EconomicImpact$YEAR<1993,]$EVTYPE)
## [1] "HAIL"      "TORNADO"   "TSTM WIND"

We see that before 1993 there are observations only for hail, thunderstorm wind and tornadoes, and that the max number of events observed by year is 44 onli since 2007.

If we do the analysis of the whole time span the those events observed the longest would had more accumlated impact. So will perform the subsequent analysis only for years 2007

Economic2007 <- EconomicImpact[EconomicImpact$YEAR >=2007,]
unique(Economic2007$EVTYPE)
##  [1] "ASTRONOMICAL LOW TIDE"    "AVALANCHE"               
##  [3] "BLIZZARD"                 "COASTAL FLOOD"           
##  [5] "COLD/WIND CHILL"          "DENSE FOG"               
##  [7] "DENSE SMOKE"              "DROUGHT"                 
##  [9] "DUST DEVIL"               "DUST STORM"              
## [11] "EXCESSIVE HEAT"           "EXTREME COLD/WIND CHILL" 
## [13] "FLASH FLOOD"              "FLOOD"                   
## [15] "FREEZING FOG"             "FROST/FREEZE"            
## [17] "FUNNEL CLOUD"             "HAIL"                    
## [19] "HEAT"                     "HEAVY RAIN"              
## [21] "HEAVY SNOW"               "HIGH SURF"               
## [23] "HIGH WIND"                "HURRICANE"               
## [25] "ICE STORM"                "LAKE-EFFECT SNOW"        
## [27] "LAKESHORE FLOOD"          "LANDSLIDE"               
## [29] "LIGHTNING"                "MARINE HAIL"             
## [31] "MARINE HIGH WIND"         "MARINE STRONG WIND"      
## [33] "MARINE THUNDERSTORM WIND" "RIP CURRENT"             
## [35] "SEICHE"                   "SLEET"                   
## [37] "STORM SURGE/TIDE"         "STRONG WIND"             
## [39] "THUNDERSTORM WIND"        "TORNADO"                 
## [41] "TROPICAL DEPRESSION"      "TROPICAL STORM"          
## [43] "TSUNAMI"                  "VOLCANIC ASHFALL"        
## [45] "WATERSPOUT"               "WILDFIRE"                
## [47] "WINTER STORM"             "WINTER WEATHER"

We will do the same to the population impact data.

Population2007 <- PopulationImpact[PopulationImpact$YEAR >=2007,]

Economic Impact

First we will list the event types in decreasing order by total damage they caused along all the years of observations.

library(ggplot2)
library(knitr )

kable(Economic2007 %>% group_by(EVTYPE) %>% 
                 summarize("Property damage" = sum(prop_dmg),
                           "Crop damage" = sum(crop_dmg),
                            Total_damage =sum(total_dmg)) %>%
                  arrange(desc(Total_damage)) %>%
                  head(10), caption= "Top 10 most destructive events")
Top 10 most destructive events
EVTYPE Property damage Crop damage Total_damage
FLOOD 13969.306 2886.110 16855.416
TORNADO 14629.324 102.960 14732.284
HAIL 6098.998 868.793 6967.791
FLASH FLOOD 5040.672 711.942 5752.614
STORM SURGE/TIDE 4640.643 0.850 4641.493
THUNDERSTORM WIND 3373.459 398.102 3771.561
HURRICANE 2467.600 180.510 2648.110
WILDFIRE 2190.413 31.094 2221.507
HIGH WIND 1201.058 91.571 1292.629
FROST/FREEZE 9.480 931.801 941.281
top4 <-  head(Economic2007 %>% group_by(EVTYPE) %>% 
                               summarize(Property_damage = sum(prop_dmg),
                                         Crop_damage = sum(crop_dmg),
                                         Total_damage =sum(total_dmg)) %>%
                               arrange(desc(Total_damage)),4) %>% 
                               select(EVTYPE)


ggplot(Economic2007[Economic2007$EVTYPE %in% top4$EVTYPE,], aes(YEAR,total_dmg)) +
      geom_line()+
      ylab("Damage /Million $") +
      facet_grid(EVTYPE~.)

Population impact

kable(Population2007 %>% group_by(EVTYPE) %>% 
                 summarize(Fatalities = sum(Fatalities),
                           Injured = sum(Injured),
                           Total_victims =sum(Total_victims)) %>%
                  arrange(desc(Total_victims)) %>%
                  head(10), caption= "Top 10 most harmful to population events")
Top 10 most harmful to population events
EVTYPE Fatalities Injured Total_victims
TORNADO 863 9608 10471
THUNDERSTORM WIND 130 1391 1521
LIGHTNING 159 923 1082
EXCESSIVE HEAT 119 880 999
HEAT 182 702 884
FLASH FLOOD 293 316 609
WILDFIRE 30 425 455
WINTER WEATHER 30 324 354
RIP CURRENT 207 127 334
FLOOD 161 171 332

Top harmful events were thunderstorm winds and tornadoes.

top4 <-  head(Population2007 %>% group_by(EVTYPE) %>% 
                                 summarize(Fatalities = sum(Fatalities),
                                           Injured = sum(Injured),
                                           Total_victims =sum(Total_victims)) %>%
                                 arrange(desc(Total_victims)),4) %>% 
                                 select(EVTYPE)


ggplot(Population2007[Population2007$EVTYPE %in% top4$EVTYPE,], aes(YEAR,Total_victims)) +
      geom_line()+
      ylab("Victims") +
      facet_grid(EVTYPE~.)

Results

Since 2007 the events that caused most property and crop damagre where tornadoes and floods, followes by hail and flash floods.

From 2010 to 2011 , both toradoes and flood seem to be more devastating.

Regarding harm to population, summarizing injured and deceased, we find that tornadoes were the most harmful by difference, and also expirienced a great increase from 2010 to 2011.