In this analysis, it analyzed the database of NOAA (U.S. National Oceanic and Atmospheric Administration’s) for years from 1950 to november 2011, to check what are the events with greater impact on the economy and public health. NOAA database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The analysis of the data showed that with regard to public health, the higher damage is done by the tornado events while regarding the damage to property events with greater impact are those due to the flooding.
The data used in this analysis was downloaded in October 22, 2016 from this site:
if (!file.exists("./data/StormData.csv.bz2")) {
if (!dir.exists("./data")) {
dir.create("./data")
}
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
"./data/StormData.csv.bz2")
}
and some documentation can be find here:
Data was peocessed with:
The data downloaded is load without decompression on a first data frame storm which dimensions must be \(902297 \times 37\)
# read.csv can read directly .bz2 file without decompressing
storm <- read.csv("./data/StormData.csv.bz2", stringsAsFactors = F)
dim(storm)
## [1] 902297 37
For this analysis only few columns are needed so was created a second data frame mystorm with columns:
# Keep only columns: EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP;CROPDMG,CROPDMGEXP
mystorm <- storm[, c(8,23:28)]
dim(mystorm)
## [1] 902297 7
From a first glance at the data it shows that there are several variables to be cleaned for a consistent dataset
# data before cleaning
head(unique(mystorm$EVTYPE), 10)
## [1] "TORNADO" "TSTM WIND"
## [3] "HAIL" "FREEZING RAIN"
## [5] "SNOW" "ICE STORM/FLASH FLOOD"
## [7] "SNOW/ICE" "WINTER STORM"
## [9] "HURRICANE OPAL/HIGH WINDS" "THUNDERSTORM WINDS"
unique(mystorm$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
unique(mystorm$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
First of all the values of EVTYPE, PROPODMGEXP, CROPDMGEXP are transformed to lower case and the values not compliant with Table 1 of section 2.1.1 of National Weather Service Storm Data Documentation was changed.
The analysis to determine which EVTYPE values had to be changed was done manually.
# all character to lower case, EVTYPE trim trailing white spaces
mystorm$EVTYPE <- tolower(trimws(mystorm$EVTYPE))
mystorm$PROPDMGEXP <- tolower(mystorm$PROPDMGEXP)
mystorm$CROPDMGEXP <- tolower(mystorm$CROPDMGEXP)
# data cleaning on event type to fix major mistype and abbreviation/interpretation
#
mystorm[mystorm$EVTYPE == 'avalance',]$EVTYPE <- c('avalanche')
mystorm[mystorm$EVTYPE == 'blizzard/winter storm',]$EVTYPE <- c('blizzard')
mystorm[mystorm$EVTYPE == 'coastal flooding/erosion',]$EVTYPE <- c('coastal flood')
mystorm[mystorm$EVTYPE == 'coastal erosion',]$EVTYPE <- c('coastal flood')
mystorm[mystorm$EVTYPE == 'coastal flooding',]$EVTYPE <- c('coastal flood')
mystorm[mystorm$EVTYPE == 'coastal flooding/erosion',]$EVTYPE <- c('coastal flood')
mystorm[mystorm$EVTYPE == 'coastalstorm',]$EVTYPE <- c('coastal storm')
mystorm[mystorm$EVTYPE == 'cold/winds',]$EVTYPE <- c('cold/wind chill')
mystorm[mystorm$EVTYPE == 'drought/excessive heat',]$EVTYPE <- c('drought')
mystorm[mystorm$EVTYPE == 'dust devil waterspout',]$EVTYPE <- c('dust devil')
mystorm[mystorm$EVTYPE == 'dust storm/high winds',]$EVTYPE <- c('dust storm')
mystorm[mystorm$EVTYPE == 'erosion/cstl flood',]$EVTYPE <- c('coastal flood')
mystorm[mystorm$EVTYPE == 'excessive heat',]$EVTYPE <- c('heat')
mystorm[mystorm$EVTYPE == 'extreme cold/wind chill',]$EVTYPE <- c('Cold/Wind Chill')
mystorm[mystorm$EVTYPE == 'extreme heat',]$EVTYPE <- c('heat')
mystorm[mystorm$EVTYPE == 'extreme wind chill',]$EVTYPE <- c('Cold/Wind Chill')
mystorm[mystorm$EVTYPE == 'extreme windchill',]$EVTYPE <- c('Cold/Wind Chill')
mystorm[mystorm$EVTYPE == 'flash flood - heavy rain',]$EVTYPE <- c('flash flood')
mystorm[mystorm$EVTYPE == 'flash flood from ice jams',]$EVTYPE <- c('flash flood')
mystorm[mystorm$EVTYPE == 'flash flood landslides',]$EVTYPE <- c('flash flood')
mystorm[mystorm$EVTYPE == 'flash flood winds',]$EVTYPE <- c('flash flood')
mystorm[mystorm$EVTYPE == 'flash flood/',]$EVTYPE <- c('flash flood')
mystorm[mystorm$EVTYPE == 'flash flood/ street',]$EVTYPE <- c('flash flood')
mystorm[mystorm$EVTYPE == 'flash flood/flood',]$EVTYPE <- c('flash flood')
mystorm[mystorm$EVTYPE == 'flash flood/landslide',]$EVTYPE <- c('flash flood')
mystorm[mystorm$EVTYPE == 'flash flooding',]$EVTYPE <- c('flash flood')
mystorm[mystorm$EVTYPE == 'flash flooding/flood',]$EVTYPE <- c('flash flood')
mystorm[mystorm$EVTYPE == 'flash flooding/thunderstorm wi',]$EVTYPE <- c('flash flood')
mystorm[mystorm$EVTYPE == 'flash floods',]$EVTYPE <- c('flash flood')
mystorm[mystorm$EVTYPE == 'flood & heavy rain',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'flood flash',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'flood/flash',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'flood/flash flood',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'flood/flash/flood',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'flood/flashflood',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'flood/rain/winds',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'flood/river flood',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'flooding',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'flooding/heavy rain',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'floods',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'fog and cold temperatures',]$EVTYPE <- c('fog')
mystorm[mystorm$EVTYPE == 'freeze',]$EVTYPE <- c('frost/freeze')
mystorm[mystorm$EVTYPE == 'freezing fog',]$EVTYPE <- c('freezing fog')
mystorm[mystorm$EVTYPE == 'frost',]$EVTYPE <- c('frost/freeze')
mystorm[mystorm$EVTYPE == 'frost/freeze',]$EVTYPE <- c('frost/freeze')
mystorm[mystorm$EVTYPE == 'frost\\freeze',]$EVTYPE <- c('frost/freeze')
mystorm[mystorm$EVTYPE == 'hail 0.75',]$EVTYPE <- c('hail')
mystorm[mystorm$EVTYPE == 'hail 075',]$EVTYPE <- c('hail')
mystorm[mystorm$EVTYPE == 'hail 100',]$EVTYPE <- c('hail')
mystorm[mystorm$EVTYPE == 'hail 125',]$EVTYPE <- c('hail')
mystorm[mystorm$EVTYPE == 'hail 150',]$EVTYPE <- c('hail')
mystorm[mystorm$EVTYPE == 'hail 175',]$EVTYPE <- c('hail')
mystorm[mystorm$EVTYPE == 'hail 200',]$EVTYPE <- c('hail')
mystorm[mystorm$EVTYPE == 'hail 275',]$EVTYPE <- c('hail')
mystorm[mystorm$EVTYPE == 'hail 450',]$EVTYPE <- c('hail')
mystorm[mystorm$EVTYPE == 'hail 75',]$EVTYPE <- c('hail')
mystorm[mystorm$EVTYPE == 'hail damage',]$EVTYPE <- c('hail')
mystorm[mystorm$EVTYPE == 'hail/wind',]$EVTYPE <- c('hail')
mystorm[mystorm$EVTYPE == 'hail/winds',]$EVTYPE <- c('hail')
mystorm[mystorm$EVTYPE == 'hailstorm',]$EVTYPE <- c('hail')
mystorm[mystorm$EVTYPE == 'hard freeze',]$EVTYPE <- c('Frost/Freeze')
mystorm[mystorm$EVTYPE == 'heat wave',]$EVTYPE <- c('heat')
mystorm[mystorm$EVTYPE == 'heat wave drought',]$EVTYPE <- c('heat')
mystorm[mystorm$EVTYPE == 'heat waves',]$EVTYPE <- c('heat')
mystorm[mystorm$EVTYPE == 'heavy lake snow',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'heavy precipitation',]$EVTYPE <- c('heavy rain')
mystorm[mystorm$EVTYPE == 'heavy rain and flood',]$EVTYPE <- c('heavy rain')
mystorm[mystorm$EVTYPE == 'heavy rain/high surf',]$EVTYPE <- c('heavy rain')
mystorm[mystorm$EVTYPE == 'heavy rain/lightning',]$EVTYPE <- c('heavy rain')
mystorm[mystorm$EVTYPE == 'heavy rain/severe weather',]$EVTYPE <- c('heavy rain')
mystorm[mystorm$EVTYPE == 'heavy rain/small stream urban',]$EVTYPE <- c('heavy rain')
mystorm[mystorm$EVTYPE == 'heavy rain/snow',]$EVTYPE <- c('heavy rain')
mystorm[mystorm$EVTYPE == 'heavy rains',]$EVTYPE <- c('heavy rain')
mystorm[mystorm$EVTYPE == 'heavy rains/flooding',]$EVTYPE <- c('heavy rain')
mystorm[mystorm$EVTYPE == 'heavy snow and high winds',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'heavy snow and strong winds',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'heavy snow shower',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'heavy snow squalls',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'heavy snow/blizzard',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'heavy snow/blizzard/avalanche',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'heavy snow/freezing rain',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'heavy snow/high winds & flood',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'heavy snow/ice',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'heavy snow/squalls',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'heavy snow/wind',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'heavy snow/winter storm',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'heavy snowpack',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'heavy snow-squalls',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'heavy surf',]$EVTYPE <- c('high surf')
mystorm[mystorm$EVTYPE == 'heavy surf and wind',]$EVTYPE <- c('high surf')
mystorm[mystorm$EVTYPE == 'heavy surf coastal flooding',]$EVTYPE <- c('high surf')
mystorm[mystorm$EVTYPE == 'heavy surf/high surf',]$EVTYPE <- c('high surf')
mystorm[mystorm$EVTYPE == 'high wind (g40)',]$EVTYPE <- c('high wind')
mystorm[mystorm$EVTYPE == 'high wind 48',]$EVTYPE <- c('high wind')
mystorm[mystorm$EVTYPE == 'high wind and seas',]$EVTYPE <- c('high wind')
mystorm[mystorm$EVTYPE == 'high wind damage',]$EVTYPE <- c('high wind')
mystorm[mystorm$EVTYPE == 'high wind/blizzard',]$EVTYPE <- c('high wind')
mystorm[mystorm$EVTYPE == 'high wind/heavy snow',]$EVTYPE <- c('high wind')
mystorm[mystorm$EVTYPE == 'high wind/seas',]$EVTYPE <- c('high wind')
mystorm[mystorm$EVTYPE == 'high winds',]$EVTYPE <- c('high wind')
mystorm[mystorm$EVTYPE == 'high winds heavy rains',]$EVTYPE <- c('high wind')
mystorm[mystorm$EVTYPE == 'high winds/',]$EVTYPE <- c('high wind')
mystorm[mystorm$EVTYPE == 'high winds/coastal flood',]$EVTYPE <- c('high wind')
mystorm[mystorm$EVTYPE == 'high winds/cold',]$EVTYPE <- c('high wind')
mystorm[mystorm$EVTYPE == 'high winds/heavy rain',]$EVTYPE <- c('high wind')
mystorm[mystorm$EVTYPE == 'high winds/snow',]$EVTYPE <- c('high wind')
mystorm[mystorm$EVTYPE == 'hurricane',]$EVTYPE <- c('hurricane (typhoon)')
mystorm[mystorm$EVTYPE == 'hurricane edouard',]$EVTYPE <- c('hurricane (typhoon)')
mystorm[mystorm$EVTYPE == 'hurricane emily',]$EVTYPE <- c('hurricane (typhoon)')
mystorm[mystorm$EVTYPE == 'hurricane erin',]$EVTYPE <- c('hurricane (typhoon)')
mystorm[mystorm$EVTYPE == 'hurricane felix',]$EVTYPE <- c('hurricane (typhoon)')
mystorm[mystorm$EVTYPE == 'hurricane gordon',]$EVTYPE <- c('hurricane (typhoon)')
mystorm[mystorm$EVTYPE == 'hurricane opal',]$EVTYPE <- c('hurricane (typhoon)')
mystorm[mystorm$EVTYPE == 'hurricane opal/high winds',]$EVTYPE <- c('hurricane (typhoon)')
mystorm[mystorm$EVTYPE == 'hurricane/typhoon',]$EVTYPE <- c('hurricane (typhoon)')
mystorm[mystorm$EVTYPE == 'hurricane-generated swells',]$EVTYPE <- c('hurricane (typhoon)')
mystorm[mystorm$EVTYPE == 'hvy rain',]$EVTYPE <- c('heavy rain')
mystorm[mystorm$EVTYPE == 'ice storm/flash flood',]$EVTYPE <- c('ice storm')
mystorm[mystorm$EVTYPE == 'ice/strong winds',]$EVTYPE <- c('ice storm')
mystorm[mystorm$EVTYPE == 'lake effect snow',]$EVTYPE <- c('lake-effect snow')
mystorm[mystorm$EVTYPE == 'landslides',]$EVTYPE <- c('landslide')
mystorm[mystorm$EVTYPE == 'lighting',]$EVTYPE <- c('lightning')
mystorm[mystorm$EVTYPE == 'lightning wauseon',]$EVTYPE <- c('lightning')
mystorm[mystorm$EVTYPE == 'lightning and heavy rain',]$EVTYPE <- c('lightning')
mystorm[mystorm$EVTYPE == 'lightning and thunderstorm win',]$EVTYPE <- c('lightning')
mystorm[mystorm$EVTYPE == 'lightning fire',]$EVTYPE <- c('lightning')
mystorm[mystorm$EVTYPE == 'lightning injury',]$EVTYPE <- c('lightning')
mystorm[mystorm$EVTYPE == 'lightning thunderstorm winds',]$EVTYPE <- c('lightning')
mystorm[mystorm$EVTYPE == 'lightning.',]$EVTYPE <- c('lightning')
mystorm[mystorm$EVTYPE == 'lightning/heavy rain',]$EVTYPE <- c('lightning')
mystorm[mystorm$EVTYPE == 'ligntning',]$EVTYPE <- c('lightning')
mystorm[mystorm$EVTYPE == 'marine tstm wind',]$EVTYPE <- c('marine thunderstorm wind')
mystorm[mystorm$EVTYPE == 'mud slides',]$EVTYPE <- c('mud slide')
mystorm[mystorm$EVTYPE == 'mud slides urban flooding',]$EVTYPE <- c('mud slide')
mystorm[mystorm$EVTYPE == 'mudslide',]$EVTYPE <- c('mud slide')
mystorm[mystorm$EVTYPE == 'mudslides',]$EVTYPE <- c('mud slide')
mystorm[mystorm$EVTYPE == 'record/excessive heat',]$EVTYPE <- c('heat')
mystorm[mystorm$EVTYPE == 'rip currents',]$EVTYPE <- c('rip current')
mystorm[mystorm$EVTYPE == 'rip currents/heavy surf',]$EVTYPE <- c('rip current')
mystorm[mystorm$EVTYPE == 'river flood',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'river flooding',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'rural flood',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'severe thunderstorm',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'severe thunderstorm winds',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'severe thunderstorms',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'sleet/ice storm',]$EVTYPE <- c('sleet')
mystorm[mystorm$EVTYPE == 'small stream flood',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'snow accumulation',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'snow and heavy snow',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'snow and ice',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'snow and ice storm',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'snow freezing rain',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'snow squall',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'snow squalls',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'snow/ bitter cold',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'snow/ ice',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'snow/blowing snow',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'snow/cold',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'snow/freezing rain',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'snow/heavy snow',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'snow/high winds',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'snow/ice',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'snow/ice storm',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'snow/sleet',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'snow/sleet/freezing rain',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'snowmelt flooding',]$EVTYPE <- c('heavy snow')
mystorm[mystorm$EVTYPE == 'storm surge',]$EVTYPE <- c('storm surge/tide')
mystorm[mystorm$EVTYPE == 'strong winds',]$EVTYPE <- c('strong wind')
mystorm[mystorm$EVTYPE == 'thuderstorm winds',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thundeerstorm winds',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderestorm winds',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thundersnow',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm winds',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm damage to',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm hail',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm wind',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm wind (g40)',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm wind 60 mph',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm wind 65 mph',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm wind 65mph',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm wind 98 mph',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm wind g50',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm wind g52',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm wind g55',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm wind g60',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm wind trees',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm wind.',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm wind/ tree',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm wind/ trees',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm wind/awning',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm wind/hail',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm wind/lightning',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm winds',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm winds 13',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm winds 63 mph',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm winds and',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm winds g60',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm winds hail',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm winds lightning',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm winds.',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm winds/ flood',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm winds/flooding',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm winds/funnel clou',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm winds/hail',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm winds53',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm windshail',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm windss',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorm wins',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorms',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorms wind',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstorms winds',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstormw',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstormwinds',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunderstrom wind',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thundertorm winds',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'thunerstorm winds',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'tornado f0',]$EVTYPE <- c('tornado')
mystorm[mystorm$EVTYPE == 'tornado f1',]$EVTYPE <- c('tornado')
mystorm[mystorm$EVTYPE == 'tornado f2',]$EVTYPE <- c('tornado')
mystorm[mystorm$EVTYPE == 'tornado f3',]$EVTYPE <- c('tornado')
mystorm[mystorm$EVTYPE == 'tornadoes',]$EVTYPE <- c('tornado')
mystorm[mystorm$EVTYPE == 'tornadoes, tstm wind, hail',]$EVTYPE <- c('tornado')
mystorm[mystorm$EVTYPE == 'torndao',]$EVTYPE <- c('tornado')
mystorm[mystorm$EVTYPE == 'tropical depression',]$EVTYPE <- c('tropical storm')
mystorm[mystorm$EVTYPE == 'tropical storm alberto',]$EVTYPE <- c('tropical storm')
mystorm[mystorm$EVTYPE == 'tropical storm dean',]$EVTYPE <- c('tropical storm')
mystorm[mystorm$EVTYPE == 'tropical storm gordon',]$EVTYPE <- c('tropical storm')
mystorm[mystorm$EVTYPE == 'tropical storm jerry',]$EVTYPE <- c('tropical storm')
mystorm[mystorm$EVTYPE == 'tstm wind',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'tstm wind (g45)',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'tstm wind (41)',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'tstm wind (g35)',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'tstm wind (g40)',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'tstm wind (g45)',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'tstm wind 40',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'tstm wind 45',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'tstm wind 55',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'tstm wind 65)',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'tstm wind and lightning',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'tstm wind damage',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'tstm wind g45',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'tstm wind g58',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'tstm wind/hail',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'tstm winds',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'tstmw',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'tunderstorm wind',]$EVTYPE <- c('thunderstorm wind')
mystorm[mystorm$EVTYPE == 'typhoon',]$EVTYPE <- c('hurricane (typhoon)')
mystorm[mystorm$EVTYPE == 'urban flood',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'urban flooding',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'urban floods',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'urban small',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'urban/small stream',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'urban/small stream flood',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'urban/sml stream fld',]$EVTYPE <- c('flood')
mystorm[mystorm$EVTYPE == 'waterspout-',]$EVTYPE <- c('waterspout')
mystorm[mystorm$EVTYPE == 'waterspout tornado',]$EVTYPE <- c('waterspout')
mystorm[mystorm$EVTYPE == 'waterspout/ tornado',]$EVTYPE <- c('waterspout')
mystorm[mystorm$EVTYPE == 'waterspout/tornado',]$EVTYPE <- c('waterspout')
mystorm[mystorm$EVTYPE == 'waterspout-tornado',]$EVTYPE <- c('waterspout')
mystorm[mystorm$EVTYPE == 'wild fires',]$EVTYPE <- c('wildfire')
mystorm[mystorm$EVTYPE == 'wild/forest fire',]$EVTYPE <- c('wildfire')
mystorm[mystorm$EVTYPE == 'wild/forest fires',]$EVTYPE <- c('wildfire')
mystorm[mystorm$EVTYPE == 'wildfires',]$EVTYPE <- c('wildfire')
mystorm[mystorm$EVTYPE == 'wind and wave',]$EVTYPE <- c('wind')
mystorm[mystorm$EVTYPE == 'wind damage',]$EVTYPE <- c('wind')
mystorm[mystorm$EVTYPE == 'wind storm',]$EVTYPE <- c('wind')
mystorm[mystorm$EVTYPE == 'wind/hail',]$EVTYPE <- c('wind')
mystorm[mystorm$EVTYPE == 'winds',]$EVTYPE <- c('wind')
mystorm[mystorm$EVTYPE == 'winter storm high winds',]$EVTYPE <- c('winter storm')
mystorm[mystorm$EVTYPE == 'winter storms',]$EVTYPE <- c('winter storm')
mystorm[mystorm$EVTYPE == 'winter weather mix',]$EVTYPE <- c('winter weather')
mystorm[mystorm$EVTYPE == 'winter weather/mix',]$EVTYPE <- c('winter weather')
# fix exponent for properties damage
mystorm[mystorm$PROPDMGEXP == "" | mystorm$PROPDMGEXP == "?" | mystorm$PROPDMGEXP == "+" | mystorm$PROPDMGEXP == "-",]$PROPDMGEXP <- "1"
mystorm[mystorm$PROPDMGEXP == "h",]$PROPDMGEXP <- "2"
mystorm[mystorm$PROPDMGEXP == "k",]$PROPDMGEXP <- "3"
mystorm[mystorm$PROPDMGEXP == "m",]$PROPDMGEXP <- "6"
mystorm[mystorm$PROPDMGEXP == "b",]$PROPDMGEXP <- "9"
mystorm$PROPDMGEXP <- as.integer(mystorm$PROPDMGEXP)
unique(mystorm$PROPDMGEXP)
## [1] 3 6 1 9 0 5 4 2 7 8
# Fix exponent for crops damage
mystorm[mystorm$CROPDMGEXP == "" | mystorm$CROPDMGEXP == "?" ,]$CROPDMGEXP <- "1"
mystorm[mystorm$CROPDMGEXP == "k",]$CROPDMGEXP <- "3"
mystorm[mystorm$CROPDMGEXP == "m",]$CROPDMGEXP <- "6"
mystorm[mystorm$CROPDMGEXP == "b",]$CROPDMGEXP <- "9"
mystorm$CROPDMGEXP <- as.integer(mystorm$CROPDMGEXP)
unique(mystorm$CROPDMGEXP)
## [1] 1 6 3 9 0 2
Now that mystorm was cleaned, it’ll be calculate some aggregate dataset to use in final results:
# calculate total injuries by evtype
injuries <- aggregate(mystorm$INJURIES, list(mystorm$EVTYPE), sum)
# calculate total fatalities by evtype
fatalities <- aggregate(mystorm$FATALITIES, list(mystorm$EVTYPE), sum)
names(injuries) <- c("EVTYPE", "VALUES")
names(fatalities) <- c("EVTYPE", "VALUES")
# order by values desc
orderedinjuries <- injuries[order(-injuries$VALUES),]
orderedfatalities <- fatalities[order(-fatalities$VALUES),]
# prepare for total damage
mystorm$property_damage <- mystorm$PROPDMG * 10 ^ mystorm$PROPDMGEXP
mystorm$crop_damage <- mystorm$CROPDMG * 10 ^ mystorm$CROPDMGEXP
mystorm$event_total_damage <- mystorm$property_damage + mystorm$crop_damage
# calculate partial and total damage
dmg <- aggregate(mystorm$event_total_damage, list(mystorm$EVTYPE), sum)
dmgprop <- aggregate(mystorm$property_damage, list(mystorm$EVTYPE), sum)
dmgcrop <- aggregate(mystorm$crop_damage, list(mystorm$EVTYPE), sum)
names(dmg) <- c("EVTYPE","event_total_damage")
names(dmgprop) <- c("EVTYPE","property_total_damage")
names(dmgcrop) <- c("EVTYPE","crop_total_damage")
# order by event_total_damage desc
orddmg <- dmg[order(-dmg$event_total_damage),]
orddmgprop <- dmgprop[order(-dmgprop$property_total_damage),]
orddmgcrop <- dmgcrop[order(-dmgcrop$crop_total_damage),]
The final result of the analysis as regards the public health is summarized by the following tables showing the top five causes of injuries
# show results
knitr::kable(orderedinjuries[1:5,])
| EVTYPE | VALUES | |
|---|---|---|
| 539 | tornado | 91364 |
| 506 | thunderstorm wind | 9509 |
| 178 | heat | 9174 |
| 118 | flood | 6888 |
| 293 | lightning | 5232 |
and fatalities
# show results
knitr::kable(orderedfatalities[1:5,])
| EVTYPE | VALUES | |
|---|---|---|
| 539 | tornado | 5658 |
| 178 | heat | 3134 |
| 114 | flash flood | 1018 |
| 293 | lightning | 817 |
| 506 | thunderstorm wind | 712 |
These results are summarize on these plots
par(mar = c(4,9,1,1), oma = c(2,2,2,2) )
barplot(orderedinjuries$VALUES[1:10], names.arg = orderedinjuries$EVTYPE[1:10], horiz = T, las = 1, xlab = "Number of injuries", main = "Top 10 wheather events per population injuries" )
mtext( "Event's sources of injuries", outer = T, side = 2)
barplot(orderedfatalities$VALUES[1:10], names.arg = orderedfatalities$EVTYPE[1:10], horiz = T, las = 1, xlab = "Number of fatalities", main = "Top 10 wheather events per population fatalities")
mtext( "Event's sources of fatalities", outer = T, side = 2)
As you can see tornadoes are the first cause either of fatalities and injuries
The result of the analysis as regards the economic damage is summarized by the following tables showing the top five causes of properties damage
knitr::kable(orddmgprop[1:5,])
| EVTYPE | property_total_damage | |
|---|---|---|
| 118 | flood | 150275638447 |
| 253 | hurricane (typhoon) | 85356410010 |
| 539 | tornado | 58552152944 |
| 430 | storm surge/tide | 47964724000 |
| 114 | flash flood | 17414733046 |
and crop damage
knitr::kable(orddmgcrop[1:5,])
| EVTYPE | crop_total_damage | |
|---|---|---|
| 69 | drought | 13972571780 |
| 118 | flood | 10945975050 |
| 253 | hurricane (typhoon) | 5516117800 |
| 267 | ice storm | 5022113500 |
| 162 | hail | 3026094650 |
This plot presents the total amount of economic damage
par(mar = c(4,9,1,1), oma = c(2,2,2,2) )
barplot(orddmg$event_total_damage[1:10]/1e9, names.arg = orddmg$EVTYPE[1:10], horiz = T, las = 1, main = "Top 10 whether events by total economic damage", xlab = "Billion of dollars")
mtext("Event's sources of damage", outer = T, side = 2)
Flood with over 150 Bn$ is by far the most important cause of economic damage by wheather events even if for crop damage with 13 Bn$ drought is the most important.