The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The data is used in this project for descriptive statistics: determining what particular major storms and weather events induce * the highest number of harmful events in the population (e.g. injuries and fatalities); * the highest economic consequences (e.g. crop and property destruction). This will be computed using the NOAA database using R.
The NOAA database is available online (https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2)
in a comma-separated-value file type compressed via the bzip2, which, in
R, can be read in a dataframe using read.csv
after downloading it from the available link via
download.file. The next cell is reading the data and shows
its dimension, its corresponding variables and the data frame
summary.
if(!dir.exists('./data')) file.create('./data')
download.file('https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2','./data/Project2.csv')
df <- read.csv('./data/Project2.csv')
dim(df)
## [1] 902297 37
names(df)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
#str(df)
#head(df)
summary(df$EVTYPE)
## Length Class Mode
## 902297 character character
The aggregated dataframes correspnding to the sum of
INJURIES and FATALITIES respectively are
computed using as grouping elements the ‘EVTYPE’, i.e the type
events.
## Aggregate the data frame on the EVTYPE variable
dfInj <- aggregate(INJURIES ~ EVTYPE, data = df, sum, na.rm = T)
dfFat <- aggregate(FATALITIES ~ EVTYPE, data = df, sum, na.rm = T)
The dfHarm dataframe is formed, with the first column
given by the vector of the events and the second column computed as the
sum of the total injuries and fatalities. THe dataframe is then ordered
in descending order of the number of population harmful events
dfHarm <- dfInj[,1, drop=F]
dfHarm$Harms <- dfInj$INJURIES + dfFat$FATALITIES
dfHarm <- dfHarm[order(dfHarm$Harms,decreasing=TRUE), ]
head(dfHarm,4)
## EVTYPE Harms
## 834 TORNADO 96979
## 130 EXCESSIVE HEAT 8428
## 856 TSTM WIND 7461
## 170 FLOOD 7259
An easy visual representation of population harmful events number per weather events is via a barplot. Only the first 16 weather events in terms of their population harmful events number are ploted.
par(mfrow=c(1,1), mar = c(9.5,4,1,0.5))
barplot(dfHarm$Harms[1:16], names.arg = dfHarm$EVTYPE[1:16],
cex.axis=0.5, cex.names=0.5,
col = 'wheat',
xlab = "Event", ylab = 'Number of Population Harmful Events',
main = "Population Harmful Events [Injuries + Fatalities]", las=2)
The data for the economic consequences is split into two parts: the
crop and the property damage. Both are defined via two variables: * Crop
* CROPDMG: Numerical value corresponding to the crop damage *
CROPDMGEXP: Categorical value, coded as character, corresponding to the
magnitude of the crop damage * Property
* PROPDMG: Numerical value corresponding to the property damage *
PROPDMGEXP: Categorical value, coded as character, corresponding to the
magnitude of the property damage
The magnitude variables have the following signification:
A function that transforms the categorical variables, coded as character into numerical variables according to the significations above is first implemented.
numericValues <- function(a){
if(a == '?' || a == '-' || a == ''){
return(0)
}
if(a == '+'){
return(1)
}
if(a %in% 0:9){
return(10)
}
if(a == 'H' || a == 'h'){
return(10**2)
}
if(a == 'K' || a == 'k'){
return(10**3)
}
if(a == 'M' || a == 'm'){
return(10**6)
}
if(a == 'B' || a == 'b'){
return(10**9)
}
}
The function is then applied to the two magnitude variables
(i.e. PROPDMGEXP and CROPDMGEXP) and the
numerical value for the economic impact is computed for both the
property and the crop. Together with the type of event
(i.e. EVTYPE) their sum, named PropCrop form a
two column data frame, dfEconomic, with the type of
meteorologic event and it’s corresponding economic impact for each
row.
dfEconomic <- df[,'EVTYPE', drop=F]
df$PROPDMGEXP <- vapply(df$PROPDMGEXP, numericValues, numeric(1))
dfEconomic$Property <- df$PROPDMG * df$PROPDMGEXP
df$CROPDMGEXP <- vapply(df$CROPDMGEXP, numericValues, numeric(1))
dfEconomic$Crop <- df$CROPDMG * df$CROPDMGEXP
dfEconomic$PropCrop <- dfEconomic$Property + dfEconomic$Crop
head(dfEconomic)
## EVTYPE Property Crop PropCrop
## 1 TORNADO 25000 0 25000
## 2 TORNADO 2500 0 2500
## 3 TORNADO 25000 0 25000
## 4 TORNADO 2500 0 2500
## 5 TORNADO 2500 0 2500
## 6 TORNADO 2500 0 2500
The dfEconomicAgg data frame is obtained from the data
frame dfEconomic correspnding to the sum of
PropCropis computed using as grouping elements the ‘EVTYPE’
variable, i.e the type events, which is then ordered in decreasing
order.
dfEconomicAgg <- aggregate(PropCrop ~ EVTYPE, data = dfEconomic, sum)
dfEconomicAgg <- dfEconomicAgg[order(dfEconomicAgg$PropCrop,decreasing=TRUE), ]
An easy visual representation of economic impact per weather events is via a barplot. Only the first 16 weather events in terms of their economic impact are ploted.
par(mfrow=c(1,1), mar = c(10.5,4,1,0.5))
barplot(dfEconomicAgg$PropCrop[1:16], names.arg = dfEconomicAgg$EVTYPE[1:16],
cex.axis=0.5, cex.names=0.5,
col = 'wheat',
xlab = "Event", ylab = 'Damage Cost $',
main = "Economic Impact [Crop + Property]", las=2)
For the the population harmful events (obtained as the sum of injuries and fatalities) the most damage is induce by tornados, which account for approximately \(0.62\) percent of the total population harmful events.
print(paste('Percent of tornado population harmful events effects: ', dfHarm$Harms[1]/sum(dfHarm$Harms), '%.'))
## [1] "Percent of tornado population harmful events effects: 0.622966089174102 %."
For and the economic impact (obtained as the sum of the crop and property damaged, accounting for the corresponding magnitudes) the most damage is induced by floods, which account for approximately \(0.32\) percent of the total population harmful events.
In both cases, the proportion of tornado negative effects is much higher compared to the other weather events in the data base:
print(paste('Percent of flood economic damage effects: ', dfEconomicAgg$PropCrop[1]/sum(dfEconomicAgg$PropCrop), '%.'))
## [1] "Percent of flood economic damage effects: 0.315517313838657 %."
Between the 16 weather events that account for the most economic damage impact and the 16 weather events that account for the most population harmful impact, ten of the weather events are common: TORNADO, TSTM WIND, FLOOD, FLASH FLOOD, ICE STORM, WINTER STORM, HIGH WIND, HAIL, HURRICANE/TYPHOON, WILDFIRE, which account for \(0.79%\) of the population harmful events and \(0.73%\) of the economic damage.
commonEvents <- intersect(dfHarm$EVTYPE[1:16], dfEconomicAgg$EVTYPE[1:16])
print(commonEvents)
## [1] "TORNADO" "TSTM WIND" "FLOOD"
## [4] "FLASH FLOOD" "ICE STORM" "WINTER STORM"
## [7] "HIGH WIND" "HAIL" "HURRICANE/TYPHOON"
## [10] "WILDFIRE"
print(paste('Percent of first 10 common weather events in terms of population harmful events effects: ', sum(dfHarm$Harms[match(commonEvents, dfHarm$EVTYPE)])/sum(dfHarm$Harms), '%.'))
## [1] "Percent of first 10 common weather events in terms of population harmful events effects: 0.790959254334406 %."
print(paste('Percent of first 10 common weather events in terms of economic damage effects: ', sum(dfEconomicAgg$PropCrop[match(commonEvents, dfEconomicAgg$EVTYPE)])/sum(dfEconomicAgg$PropCrop), '%.'))
## [1] "Percent of first 10 common weather events in terms of economic damage effects: 0.729596496031931 %."