NOAA Storm Database Severe Weather Analysis
The NOAA has a database on severe weather. This database goes back to the 1950’s however was standarized in 1996. The data covers major weather events and their impact on human lives, economic impact along with location data. By analyizing the data one can gain an understanding of the impact the different weather events pose. This data can then be used by government and private individuals to prepare for future events as well as understand what has the greatest impacts. This analysis plans to understand what type of event have the greatest impact on human population health and economic impact.
Data processing section will walk through the steps taken to process & create the plots used in the data analysis.
Start with loading the librarys
library(plyr)
## Warning: package 'plyr' was built under R version 3.2.2
Now we need to read in the data.
StormData <- read.csv("repdata-data-StormData.csv.bz2")
head(StormData)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
After analyizing the data structure and reading the information from http://www.ncdc.noaa.gov/stormevents/details.jsp as found on the course forums, one can say we should only investigate incidents after 1996. Otherwise tornados will have an unusally high amount of injuries & damage. This still gives us 20 years of data to work with, time permitting we can look back at the rest of the pre-1996 data and see if its relatively inline with the post 1996 data.
So we need to extract the data after 1996. So first convert the dates and check that everything has converted properly.
StormData$BGN_DATE <- as.Date(StormData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S")
class(StormData$BGN_DATE)
## [1] "Date"
sum(is.na(StormData$BGN_DATE))
## [1] 0
Now we want to subset the data to a working range of data that we want to deal with.
WorkingData <- subset(StormData, BGN_DATE >= "1996-01-01")
First we want to investigate how harmful to population health, i.e fatalities & injuries.
HumanFatal <- aggregate(FATALITIES ~ EVTYPE,WorkingData,sum)
HumanInjury <- aggregate(INJURIES ~ EVTYPE,WorkingData,sum)
HumanImpact <- merge(HumanInjury, HumanFatal, by = "EVTYPE")
We now have a subset of data of the event types & number of injuries & fatalities that each of the event types have caused. Taking a look at the data view(HumanImpact) within RStudio we can see that a lot of the values have 0 for either fatalities or injuries. So lets go ahead and subset the data set down some more.
HumanImpact <- subset(HumanImpact, (INJURIES > 0 & FATALITIES > 0))
We now want to get a combined total of injuries & fatalities for sorting on
HumanImpact$Combined <- HumanImpact$INJURIES + HumanImpact$FATALITIES
HumanImpact <- HumanImpact[order(-HumanImpact$Combined),]
Taking a look at the data we can see clearly that Tornados have quite the impact!
head(HumanImpact)
## EVTYPE INJURIES FATALITIES Combined
## 426 TORNADO 20667 1511 22178
## 81 EXCESSIVE HEAT 6391 1797 8188
## 102 FLOOD 6758 414 7172
## 224 LIGHTNING 4141 651 4792
## 434 TSTM WIND 3629 241 3870
## 98 FLASH FLOOD 1674 887 2561
Lets plot out the data in a stacked bar chart, which is surprisingly hard! First we grab the top ten events and factor the event names.
plotdata <- HumanImpact[1:10, 1:4]
plotdata$EVTYPE <- factor(plotdata$EVTYPE)
We then need to manipulate the data so that we can graph out as a bar chart, this needs to be a matrix, so first create a new data frame.
Injury <- c(plotdata$INJURIES)
Death <- c(plotdata$FATALITIES)
HumanDF <- data.frame(Injury, Death)
Then transpose the dataframe with t() and apply the event names across so that the x-axis is properly labeled. The HumanDF dataframe will be used to plot out the data in the results section.
HumanDF <- t(HumanDF)
colnames(HumanDF) <- plotdata$EVTYPE
Now we move onto the second question, regarding economic consequences. First we need to take a look at the data:
head(WorkingData,1)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 248768 1 1996-01-06 08:00:00 PM CST 1 ALZ001>038 AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE
## 248768 WINTER STORM 0 1/7/1996 0:00:00
## END_TIME COUNTY_END COUNTYENDN END_RANGE END_AZI END_LOCATI
## 248768 03:00:00 PM 0 NA 0
## LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 248768 0 0 NA 0 0 0 380 K 38
## CROPDMGEXP WFO STATEOFFIC
## 248768 K BMX ALABAMA, Central
## ZONENAMES
## 248768 LAUDERDALE - LAUDERDALE - COLBERT - FRANKLIN - LAWRENCE - LIMESTONE - MADISON - MORGAN - MARSHALL - JACKSON - DEKALB - MARION - LAMAR - FAYETTE - WINSTON - WALKER - CULLMAN - BLOUNT - ETOWAH - CALHOUN - CHEROKEE - CLEBURNE - PICKENS - TUSCALOOSA - JEFFERSON - SHELBY - ST. CLAIR - TALLADEGA - CLAY - RANDOLPH - SUMTER - GREENE - HALE - PERRY - BIBB - CHILTON - COOSA - TALLAPOOSA - CHAMBERS
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_
## 248768 0 0 0 0
## REMARKS
## 248768 A winter storm brought a mixture of freezing rain, sleet, and snow to the northern two-thirds of Alabama. Precipitation began as freezing rain and sleet but quickly changed to snow. The precipitation coated roads and caused serious travel problems across the northern sections of thestate that lasted into Monday morning (the 8th). Some higher elevations of the northeast corner of Alabama had travel problems into Tuesday. Amounts were generally light with the highest snowfall reported at Huntsville International Airport with 2 inches. Most other locations across North Alabama reported one-quarter of an inch to an inch and a half. On Sunday the 7th, one fatality occurred in an automobile/train collision in Calhoun County that was attributed to icy roads. The teenage driver of the car was not wearing a seat belt and was thrown from the vehicle.
## REFNUM
## 248768 248768
From this we can identify four columns of data that we need to utilize, PROPDMG, PROPDMGEXP,CROPDMG & CROPDMGEXP in addtion to EVTYPE. So lets subset out this data into our new working variable, lets call it EconData
EconData = data.frame(factor(WorkingData$EVTYPE),WorkingData$PROPDMG,WorkingData$PROPDMGEXP,WorkingData$CROPDMG,WorkingData$CROPDMGEXP)
colnames(EconData) <- c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
head(EconData)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 WINTER STORM 380 K 38 K
## 2 TORNADO 100 K 0
## 3 TSTM WIND 3 K 0
## 4 TSTM WIND 5 K 0
## 5 TSTM WIND 2 K 0
## 6 HAIL 0 0
We can see lots of zero values, so lets filter these out.
EconData <- subset(EconData, (PROPDMG > 0 & CROPDMG > 0))
Now according to the Storm Data Export document the EXP columns will be a multiplyer (if any) to apply to the damage amount, so lets multiply these across using the amazing mapvalues function within plyr. Also convert them to numeric.
EconData$PROPDMGEXP <- as.character(EconData$PROPDMGEXP)
EconData$CROPDMGEXP <- as.character(EconData$CROPDMGEXP)
EconData$PROPDMGEXP <- mapvalues(EconData$PROPDMGEXP,c("K","M","B"),c(1000,1000000,1000000000))
EconData$CROPDMGEXP <- mapvalues(EconData$CROPDMGEXP,c("K","M","B"),c(1000,1000000,1000000000))
EconData$PROPDMGEXP <- sapply(EconData$PROPDMGEXP,as.numeric)
EconData$CROPDMGEXP <- sapply(EconData$CROPDMGEXP,as.numeric)
Now find the total value of damage by multiplying the mulitplyer PROPDMGEXP & CROPDMGEXP by the damage values.
EconData$PROPDMG <- EconData$PROPDMG * EconData$PROPDMGEXP
EconData$CROPDMG <- EconData$CROPDMG * EconData$CROPDMGEXP
head(EconData)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 WINTER STORM 380000 1000 38000 1000
## 58 HAIL 2000 1000 1000 1000
## 59 HAIL 2000 1000 2000 1000
## 60 HAIL 15000 1000 10000 1000
## 62 HAIL 5000 1000 2000 1000
## 64 FLASH FLOOD 200000 1000 25000 1000
We will now repeate the same process as preping the injury data.
Convert EVTYPE to a factor to remove the levels
Aggregate the damages by EVTYPE
Add up these damages to sort by largest
Sort code
Grab the top 10 items for plotting
EconData$EVTYPE <- factor(EconData$EVTYPE)
EconPlot <- aggregate(cbind(PROPDMG,CROPDMG) ~ EVTYPE, EconData, sum)
EconPlot$TotalDMG <- EconPlot$PROPDMG + EconPlot$CROPDMG
EconPlot <- EconPlot[order(-EconPlot$TotalDMG),]
EconPlot <- EconPlot[1:10, ]
Then we make the dataframe consisting of the PROPDMG & CROPDMG, transpose the dataframe and the add the names to the columns so they appear on the bar chart.
EconDF <- data.frame(EconPlot$PROPDMG,EconPlot$CROPDMG)
EconDF <- t(EconDF)
colnames(EconDF) <- EconPlot$EVTYPE
Plotting out the data of the event types we can see tornados are the biggest threat in respect to population health.
par(mar=c(7,5,1,1))
barplot(as.matrix(HumanDF),cex.names = .6,col=c("black","red"),las = 2,ylab = "Count of Injuries/Fatalities", legend = c("Injuries","Fatalities"),mgp=c(3,.5,0))
Taking a look at the data it is easy to see that tornados on their own are the biggest threat. However this in turn is simply a bad storm which coul consist of lightning, flash flooding, thunderstorm winds which also comprise the top 10 items. So in general storms appear to be the biggest threat to life. These are often difficult to prepare for as they are difficult to predict & planning for a tornado is minimal. Most regions in the “tornado alley” are well prepared and have warning systems in place for tornados.
As far as planning excessive heat is one area which could actually be prevented and human impact cut back on. Limits to activity outdoors and required breaks for workers outdoors could lower this number. More analysis into the data specifics is required.
par(mar=c(7,7,1,1))
barplot(as.matrix(EconDF),cex.names = .6,col=c("black","red"),las = 2,ylab = "Nominal Sum of Damages USD$", legend = c("Property Damage","Crop Damage"),mgp=c(5,.5,0))
For economic consequences flooding is the clear “winner” here. Flooding far exceeds the next two items which in turn are often a cause of flooding themselves! As far as preparing for flooding it really boils down to investigating flood zones and preventing/placing restricions on building in those areas. Due to human nature living in a flood zone will be a way of life in general, however specific precautions can be taken to prevent damage where possible.
The above report could be broken down some more. The specific areas that could be addressed are:
EVTYPE even further, there are instances of “HURRICANE/TYPHOON” & “HURRICANE” which could/should be combined. This is debateable as I filtered the dataset to post 1996 data which should have corrected this, taking a look at the unique values we can see that HURRICANE is standalone, which could be corrected and then combined. Others could be combined as well (WINTER STORM & HEAVY SNOW), however the decision to filter to post 1996 data was to prevent this issue and gain a greater understanding of what is causing the greatest harm.unique(EconData$EVTYPE)
## [1] WINTER STORM HAIL FLASH FLOOD
## [4] TORNADO TSTM WIND TSTM WIND/HAIL
## [7] HIGH WIND URBAN/SML STREAM FLD WILD/FOREST FIRE
## [10] Heavy Rain/High Surf FLOOD HEAVY RAIN
## [13] HURRICANE STORM SURGE River Flooding
## [16] DROUGHT HEAVY SNOW LIGHTNING
## [19] ICE STORM RIVER FLOOD Frost/Freeze
## [22] BLIZZARD EXTREME COLD TYPHOON
## [25] TROPICAL STORM GUSTY WINDS DUST STORM
## [28] FREEZE DRY MICROBURST SMALL HAIL
## [31] WILDFIRE HURRICANE/TYPHOON STRONG WIND
## [34] LANDSLIDE EXCESSIVE HEAT THUNDERSTORM WIND
## [37] FROST/FREEZE STORM SURGE/TIDE TSUNAMI
## 39 Levels: BLIZZARD DROUGHT DRY MICROBURST DUST STORM ... WINTER STORM
REMARKS field could yield greater information regarding specific events, such as the EXCESSIVE HEAT item. During my inital data exploration there appears to be a wealth of information reagarding the hows & whys as to the injuries & deaths. This data itself could be explored as to the actual cause of death per EVTYPE and that would give a better idea as to where to focus efforts regarding planning for these events.