Nowadays, the National Weather Service (NWS) is an agency of the United States government that is tasked with providing weather forecasts, warnings of hazardous weather, and other weather-related products to organizations and the public for the purposes of protection, safety, and general information. It is a part of the National Oceanic and Atmospheric Administration (NOAA) branch of the Department of Commerce, and is headquartered in Silver Spring, Maryland (located just outside Washington, D.C.).[https://en.wikipedia.org/wiki/National_Weather_Service].
The database currently contains data from January 1950 to November 2015, as entered by NOAA’s National Weather Service (NWS). Due to changes in the data collection and processing procedures over time, there are unique periods of record available depending on the event type. The following timelines show the different time spans for each period of unique data collection and processing procedures. Select below for detailed decriptions of each data collection type. [http://www.ncdc.noaa.gov/stormevents/details.jsp]
This report consists to analyze and visualize the severe weather Events on Public Health and Economy in the US using the NOAA Storm Database from 1950 to 2011. In this paper we will higlight many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. From these data, we investigate which type of events are the most harmful to the population and financially.
library(R.utils) #for bunzip2
library(ggplot2) #for plots
library(plyr) #for count & aggregate method
library(reshape2) #Flexibly restructure and aggregate data using MELT and MERGE
Read the source .csv file
#Unzip and read .csv file into the variable data
dataLoad <- read.csv(bzfile("repdata-data-StormData.csv.bz2"), strip.white = TRUE)
Select useful data
Subsetting data into variables that are needed and adding a new variable.
#Remove unwanted colums (not used for this analysis)
gCol <- c("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
gData <- dataLoad[, gCol]
#Head of two rows with good columns
head(gData,n=2)
## BGN_DATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 1 4/18/1950 0:00:00 TORNADO 0 15 25.0 K 0
## 2 4/18/1950 0:00:00 TORNADO 0 0 2.5 K 0
## CROPDMGEXP
## 1
## 2
#Types of data
str(gData)
## 'data.frame': 902297 obs. of 8 variables:
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
#Formatting date and time
gData$YEAR <- as.integer(format(as.Date(gData$BGN_DATE, "%m/%d/%Y 0:00:00"), "%Y"))
# creates new variable
gData$ECONOMICDMG <- gData$PROPDMG + gData$CROPDMG
Find NA Values
No missing values so moving on to examine data integrity.
#Verifying missing values in the dataset
dataIntegrity <- function(dataframe) {
for (colName in colnames(dataframe)) {
NAcount <- 0
NAcount < as.numeric(sum(is.na(dataframe[,colName])))
if(NAcount > 0) {
message(colName, ":", NAcount, "missing values")
} else {
message(colName, ":", "No missing values")
}
}
}
dataIntegrity(gData)
## BGN_DATE:No missing values
## EVTYPE:No missing values
## FATALITIES:No missing values
## INJURIES:No missing values
## PROPDMG:No missing values
## PROPDMGEXP:No missing values
## CROPDMG:No missing values
## CROPDMGEXP:No missing values
## YEAR:No missing values
## ECONOMICDMG:No missing values
Sum of good columns which will use to analyze our report group by YEAR and EVTYPE
eY <- ddply(
gData[, -1], .(YEAR, EVTYPE),.fun = function(x)
{
return(c(sum(x$FATALITIES), sum(x$ECONOMICDMG), sum(x$INJURIES)))
}
)
names(eY) <- c("YEAR", "EVTYPE", "FATALITIES", "ECONOMICDMG", "INJURIES")
head(eY)
## YEAR EVTYPE FATALITIES ECONOMICDMG INJURIES
## 1 1950 TORNADO 70 16999.15 659
## 2 1951 TORNADO 34 10560.99 524
## 3 1952 TORNADO 230 16679.74 1915
## 4 1953 TORNADO 519 19182.20 5131
## 5 1954 TORNADO 36 23367.82 715
## 6 1955 HAIL 0 0.00 0
There are 902297 rows and 37 columns in total. The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of reliable/complete records.
hist(gData$YEAR, main = "Histogram of Evolution Data From 1950 to 2011", xlab="Year", breaks = 40)
The histogram above shows that the number of events tracked begins to increase in the middle of 1990s.
For this point we are going to check the number of fatalities and injuries which are caused by the severe weather events. Let get first 20 most severe types of weather events.
As indicated in the EVTYPE variable, we check and determine which types of events are most harmful with respect to the economy by aggregating the total damage in US Dollars by event type for property damage, crop damage and total damage. The top 20 events with the highest amount of total damage were subsetted and plotted.
gData$PROPDMGEXP <- as.character(gData$PROPDMGEXP)
gData$PROPDMGEXP[toupper(gData$PROPDMGEXP) == 'H'] <- "2"
gData$PROPDMGEXP[toupper(gData$PROPDMGEXP) == 'K'] <- "3"
gData$PROPDMGEXP[toupper(gData$PROPDMGEXP) == 'M'] <- "6"
gData$PROPDMGEXP[toupper(gData$PROPDMGEXP) == 'B'] <- "9"
gData$PROPDMGEXP <- as.numeric(gData$PROPDMGEXP)
## Warning: NAs introduced by coercion
gData$PROPDMGEXP[is.na(gData$PROPDMGEXP)] <- 0
gData$TOTALPROPDMG <- gData$PROPDMG * 10^gData$PROPDMGEXP
gData$CROPDMGEXP <- as.character(gData$CROPDMGEXP)
gData$CROPDMGEXP[toupper(gData$CROPDMGEXP) == 'H'] <- "2"
gData$CROPDMGEXP[toupper(gData$CROPDMGEXP) == 'K'] <- "3"
gData$CROPDMGEXP[toupper(gData$CROPDMGEXP) == 'M'] <- "6"
gData$CROPDMGEXP[toupper(gData$CROPDMGEXP) == 'B'] <- "9"
gData$CROPDMGEXP <- as.numeric(gData$CROPDMGEXP)
## Warning: NAs introduced by coercion
gData$CROPDMGEXP[is.na(gData$CROPDMGEXP)] <- 0
gData$TOTALCROPDMG <- gData$CROPDMG * 10^gData$CROPDMGEXP
#Damage properties
gSumProp <- aggregate(gData$TOTALPROPDMG, by = list(gData$EVTYPE), "sum")
names(gSumProp) <- c("Event", "Cost")
gSumProp <- gSumProp[order(-gSumProp$Cost), ][1:20, ]
#Damage crop
gSumCrop <- aggregate(gData$TOTALCROPDMG, by = list(gData$EVTYPE), "sum")
names(gSumCrop) <- c("Event", "Cost")
gSumCrop <- gSumCrop[order(-gSumCrop$Cost), ][1:20, ]
#Fatalities
aggFat <- aggregate(gData$FATALITIES, by = list(gData$EVTYPE), "sum")
names(aggFat) <- c("Event", "Fatalities")
aggFat <- aggFat[order(-aggFat$Fatalities), ][1:20,]
aggFat
## Event Fatalities
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
## 972 WINTER STORM 206
## 586 RIP CURRENTS 204
## 278 HEAT WAVE 172
## 140 EXTREME COLD 160
## 760 THUNDERSTORM WIND 133
## 310 HEAVY SNOW 127
## 141 EXTREME COLD/WIND CHILL 125
## 676 STRONG WIND 103
## 30 BLIZZARD 101
## 350 HIGH SURF 101
#Injuries
aggInjury <- aggregate(gData$INJURIES, by = list(gData$EVTYPE), "sum")
names(aggInjury) <- c("Event", "Injuries")
aggInjury <- aggInjury[order(-aggInjury$Injuries), ][1:20,]
As for the impact on public health, we have got two sorted lists of severe weather events below by the number of people badly affected.
Damage roperties
gSumProp
## Event Cost
## 170 FLOOD 144657709807
## 411 HURRICANE/TYPHOON 69305840000
## 834 TORNADO 56947380677
## 670 STORM SURGE 43323536000
## 153 FLASH FLOOD 16822673979
## 244 HAIL 15735267513
## 402 HURRICANE 11868319010
## 848 TROPICAL STORM 7703890550
## 972 WINTER STORM 6688497251
## 359 HIGH WIND 5270046295
## 590 RIVER FLOOD 5118945500
## 957 WILDFIRE 4765114000
## 671 STORM SURGE/TIDE 4641188000
## 856 TSTM WIND 4484928495
## 427 ICE STORM 3944927860
## 760 THUNDERSTORM WIND 3483122472
## 409 HURRICANE OPAL 3172846000
## 955 WILD/FOREST FIRE 3001829500
## 298 HEAVY RAIN/SEVERE WEATHER 2500000000
## 786 THUNDERSTORM WINDS 1944590859
Damage Crop
gSumCrop
## Event Cost
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 590 RIVER FLOOD 5029459000
## 427 ICE STORM 5022113500
## 244 HAIL 3025954473
## 402 HURRICANE 2741910000
## 411 HURRICANE/TYPHOON 2607872800
## 153 FLASH FLOOD 1421317100
## 140 EXTREME COLD 1292973000
## 212 FROST/FREEZE 1094086000
## 290 HEAVY RAIN 733399800
## 848 TROPICAL STORM 678346000
## 359 HIGH WIND 638571300
## 856 TSTM WIND 554007350
## 130 EXCESSIVE HEAT 492402000
## 192 FREEZE 446225000
## 834 TORNADO 414953270
## 760 THUNDERSTORM WIND 414843050
## 275 HEAT 401461500
## 957 WILDFIRE 295472800
Fatalities
aggFat
## Event Fatalities
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
## 972 WINTER STORM 206
## 586 RIP CURRENTS 204
## 278 HEAT WAVE 172
## 140 EXTREME COLD 160
## 760 THUNDERSTORM WIND 133
## 310 HEAVY SNOW 127
## 141 EXTREME COLD/WIND CHILL 125
## 676 STRONG WIND 103
## 30 BLIZZARD 101
## 350 HIGH SURF 101
Injuries
aggInjury
## Event Injuries
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
## 972 WINTER STORM 1321
## 411 HURRICANE/TYPHOON 1275
## 359 HIGH WIND 1137
## 310 HEAVY SNOW 1021
## 957 WILDFIRE 911
## 786 THUNDERSTORM WINDS 908
## 30 BLIZZARD 805
## 188 FOG 734
## 955 WILD/FOREST FIRE 545
## 117 DUST STORM 440
Find the below the following results: (1) Plot on the fatalities and injuries for the top 20 weather Events
#Plot on the fatalities
barplot(aggFat$Fatalities, names.arg = aggFat$Event, col = 'red',main = 'Selecection of Top 20 Weather Events for Fatalities', ylab = 'Nb. of Fatalities')
#Plot on the injuries
barplot(aggInjury$Injuries, names.arg = aggInjury$Event, col = 'blue',main = 'Selecection of Top 20 Weather Events for Injuries', ylab = 'Nb. of Injuries')
#Merging Sum of properties and crop
fatDamage <- merge(x = gSumProp, y = gSumCrop, by = "Event", all = TRUE)
#Merge and melt
fatDamage <- melt(fatDamage, id.vars = 'Event')
#Plot with data merged and melted
ggplot(fatDamage, aes(Event, value)) + geom_bar(aes(fill = variable), position = "dodge", stat="identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Event Type") +
ylab("Damage (Crop and Properties), USD (Current)") + ggtitle("Crop and Property damage splitted")
Using NOAA Storm Database in our report we find that excessive heat and tornado are most harmful with respect to population health, while flood, drought, and hurricane/typhoon have the greatest economic consequences in the Unites States.