The present analisys aims to identify most harmful meteorological phenomena. To do so we will analize National Weather Service data from 1950 to 2011. National Weather Service Instructions
To perform the analisys first we will use data from 2007 to 2011, in prior years there were less events observed and less standarized event types.
We will calculate the dollar amounts for the damages which are expressed by a number and a alphanumerical exponent.
Once done so we will present the top more harmful and negative for the economy events. To determine how harmful to population was an event, we will summarize number of injured and fatalities. To determine level of impact on economy we will summarize the damage to properties and crop.
Data processing will consist in two different steps. Downloading and reading and summarizing data calculating dollar amounts using the given exponents in the dataset and total victims per year and event.
Download storm data and read csv file into data frame “data”.
#download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile = "StormData.bz2")
data <- read.csv("StormData.bz2")
There are 902297 Obs. of 37 varibles.
str(data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
library(lubridate)
data$BGN_DATE <- as.Date(data$BGN_DATE, "%m/%d/%Y" )
data <- data %>% mutate(YEAR = year(BGN_DATE))
Create new columns with the actual amount of property and crop damage by multiplyng by the manigitude (PROPDMEXP,CROPDMEXP) being magnitud K thousands, M millions and B billions as explained in National Weather Service Instructions
We will express monetary quantities in millions of dollars.
function.exptonum = function(x){ case_when(x=="K"~1000, x=="M"~1000000, x=="B"~1000000000) }
PropertyDmg <- data %>% group_by(EVTYPE,YEAR) %>%
filter(!is.na(PROPDMG)) %>%
mutate(num_propdmgexp = function.exptonum(PROPDMGEXP),
prop_dmg = (PROPDMG * num_propdmgexp) / 1e+06) %>%
summarize(prop_dmg= sum(prop_dmg, na.rm = TRUE))
CropDmg <- data %>% group_by(EVTYPE,YEAR) %>%
filter(!is.na(CROPDMG)) %>%
mutate(num_cropdmgexp = function.exptonum(CROPDMGEXP),
crop_dmg = (CROPDMG * num_cropdmgexp) / 1e+06)%>%
summarize(crop_dmg= sum(crop_dmg, na.rm = TRUE))
EconomicImpact <- merge(PropertyDmg,CropDmg) %>%
mutate (total_dmg = prop_dmg + crop_dmg)
PopulationImpact <- data %>% group_by(EVTYPE,YEAR) %>%
summarize(Fatalities = sum(FATALITIES),
Injured = sum(INJURIES),
Total_victims = sum(FATALITIES) + sum(INJURIES))
First we wil check if observations of events has been regular along time.
table(EconomicImpact$YEAR)
##
## 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965
## 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3
## 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981
## 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997
## 3 3 3 3 3 3 3 3 3 3 3 160 267 387 228 170
## 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
## 126 121 112 122 99 51 38 46 50 46 46 46 46 46
unique(EconomicImpact[EconomicImpact$YEAR<1993,]$EVTYPE)
## [1] "HAIL" "TORNADO" "TSTM WIND"
We see that before 1993 there are observations only for hail, thunderstorm wind and tornadoes, and that the max number of events observed by year is 44 onli since 2007.
If we do the analysis of the whole time span the those events observed the longest would had more accumlated impact. So will perform the subsequent analysis only for years 2007
Economic2007 <- EconomicImpact[EconomicImpact$YEAR >=2007,]
unique(Economic2007$EVTYPE)
## [1] "ASTRONOMICAL LOW TIDE" "AVALANCHE"
## [3] "BLIZZARD" "COASTAL FLOOD"
## [5] "COLD/WIND CHILL" "DENSE FOG"
## [7] "DENSE SMOKE" "DROUGHT"
## [9] "DUST DEVIL" "DUST STORM"
## [11] "EXCESSIVE HEAT" "EXTREME COLD/WIND CHILL"
## [13] "FLASH FLOOD" "FLOOD"
## [15] "FREEZING FOG" "FROST/FREEZE"
## [17] "FUNNEL CLOUD" "HAIL"
## [19] "HEAT" "HEAVY RAIN"
## [21] "HEAVY SNOW" "HIGH SURF"
## [23] "HIGH WIND" "HURRICANE"
## [25] "ICE STORM" "LAKE-EFFECT SNOW"
## [27] "LAKESHORE FLOOD" "LANDSLIDE"
## [29] "LIGHTNING" "MARINE HAIL"
## [31] "MARINE HIGH WIND" "MARINE STRONG WIND"
## [33] "MARINE THUNDERSTORM WIND" "RIP CURRENT"
## [35] "SEICHE" "SLEET"
## [37] "STORM SURGE/TIDE" "STRONG WIND"
## [39] "THUNDERSTORM WIND" "TORNADO"
## [41] "TROPICAL DEPRESSION" "TROPICAL STORM"
## [43] "TSUNAMI" "VOLCANIC ASHFALL"
## [45] "WATERSPOUT" "WILDFIRE"
## [47] "WINTER STORM" "WINTER WEATHER"
We will do the same to the population impact data.
Population2007 <- PopulationImpact[PopulationImpact$YEAR >=2007,]
First we will list the event types in decreasing order by total damage they caused along all the years of observations.
library(ggplot2)
library(knitr )
kable(Economic2007 %>% group_by(EVTYPE) %>%
summarize("Property damage" = sum(prop_dmg),
"Crop damage" = sum(crop_dmg),
Total_damage =sum(total_dmg)) %>%
arrange(desc(Total_damage)) %>%
head(10), caption= "Top 10 most destructive events")
| EVTYPE | Property damage | Crop damage | Total_damage |
|---|---|---|---|
| FLOOD | 13969.306 | 2886.110 | 16855.416 |
| TORNADO | 14629.324 | 102.960 | 14732.284 |
| HAIL | 6098.998 | 868.793 | 6967.791 |
| FLASH FLOOD | 5040.672 | 711.942 | 5752.614 |
| STORM SURGE/TIDE | 4640.643 | 0.850 | 4641.493 |
| THUNDERSTORM WIND | 3373.459 | 398.102 | 3771.561 |
| HURRICANE | 2467.600 | 180.510 | 2648.110 |
| WILDFIRE | 2190.413 | 31.094 | 2221.507 |
| HIGH WIND | 1201.058 | 91.571 | 1292.629 |
| FROST/FREEZE | 9.480 | 931.801 | 941.281 |
top4 <- head(Economic2007 %>% group_by(EVTYPE) %>%
summarize(Property_damage = sum(prop_dmg),
Crop_damage = sum(crop_dmg),
Total_damage =sum(total_dmg)) %>%
arrange(desc(Total_damage)),4) %>%
select(EVTYPE)
ggplot(Economic2007[Economic2007$EVTYPE %in% top4$EVTYPE,], aes(YEAR,total_dmg)) +
geom_line()+
ylab("Damage /Million $") +
facet_grid(EVTYPE~.)
kable(Population2007 %>% group_by(EVTYPE) %>%
summarize(Fatalities = sum(Fatalities),
Injured = sum(Injured),
Total_victims =sum(Total_victims)) %>%
arrange(desc(Total_victims)) %>%
head(10), caption= "Top 10 most harmful to population events")
| EVTYPE | Fatalities | Injured | Total_victims |
|---|---|---|---|
| TORNADO | 863 | 9608 | 10471 |
| THUNDERSTORM WIND | 130 | 1391 | 1521 |
| LIGHTNING | 159 | 923 | 1082 |
| EXCESSIVE HEAT | 119 | 880 | 999 |
| HEAT | 182 | 702 | 884 |
| FLASH FLOOD | 293 | 316 | 609 |
| WILDFIRE | 30 | 425 | 455 |
| WINTER WEATHER | 30 | 324 | 354 |
| RIP CURRENT | 207 | 127 | 334 |
| FLOOD | 161 | 171 | 332 |
Top harmful events were thunderstorm winds and tornadoes.
top4 <- head(Population2007 %>% group_by(EVTYPE) %>%
summarize(Fatalities = sum(Fatalities),
Injured = sum(Injured),
Total_victims =sum(Total_victims)) %>%
arrange(desc(Total_victims)),4) %>%
select(EVTYPE)
ggplot(Population2007[Population2007$EVTYPE %in% top4$EVTYPE,], aes(YEAR,Total_victims)) +
geom_line()+
ylab("Victims") +
facet_grid(EVTYPE~.)
Since 2007 the events that caused most property and crop damagre where tornadoes and floods, followes by hail and flash floods.
From 2010 to 2011 , both toradoes and flood seem to be more devastating.
Regarding harm to population, summarizing injured and deceased, we find that tornadoes were the most harmful by difference, and also expirienced a great increase from 2010 to 2011.