In this report we aim to describe weather events in USA from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The goal is to identify which kind of event are more dangeorus for people (fatalities and injuries) and which cause more damage in economics terms (property damage). We work on 902297 events, and we found what tornadoes, storm tropical and all events related with heat cause (in mean) more fatalities, injuries, property and crop damages than others.
From the U.S. National Oceanic and Atmospheric Administration’s (NOAA) we obtained the data. This database tracks characteristics of major storms and weather event and estimates fatalities, injuries, property and crop damage.
We firts read the data from a bz2 compress file. R has the capabilities for read this kind of file. The raw data (decompressed) is a delimited file using , for separate fields.
if(!file.exists("./data")){
dir.create("./data")
}
if(!file.exists("./data/StormData.csv.bz2")){
fileUrl1 <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl1, destfile ="./data/StormData.csv.bz2" ,method = "curl" )
}
storm <- read.csv("./data/StormData.csv.bz2")
In the documentation there are 44 differents evtypes, but in the data has 998. Because we look up the most harmful for population health and the evtypes which have the greater economic consequences, we don’t care about the number levels of evtype in data.
The variables of interest are evtype, fatalities, injuries, propdmb and cropdmg . We can summarize this variable, and create one dataset containing only those.
dim(storm)
## [1] 902297 37
names(storm) <- tolower(names(storm))
nlevels(storm$evtype)
## [1] 985
# a lot of levels, perhaps we'll have to clean data , using 44 levels
# sum columns with similar names which indicate same variables. But
# no NA's, good0
summary(storm[,c("evtype","fatalities","injuries","propdmg","cropdmg")])
## evtype fatalities injuries propdmg
## HAIL :288661 Min. : 0 Min. : 0.0 Min. : 0
## TSTM WIND :219940 1st Qu.: 0 1st Qu.: 0.0 1st Qu.: 0
## THUNDERSTORM WIND: 82563 Median : 0 Median : 0.0 Median : 0
## TORNADO : 60652 Mean : 0 Mean : 0.2 Mean : 12
## FLASH FLOOD : 54277 3rd Qu.: 0 3rd Qu.: 0.0 3rd Qu.: 0
## FLOOD : 25326 Max. :583 Max. :1700.0 Max. :5000
## (Other) :170878
## cropdmg
## Min. : 0.0
## 1st Qu.: 0.0
## Median : 0.0
## Mean : 1.5
## 3rd Qu.: 0.0
## Max. :990.0
##
there isn’t missing values in variables.
We create a new data set containing only the interest variables.
storm1 <- storm[,c("evtype","fatalities","injuries",
"propdmg","cropdmg")]
Our first step is summarize data to find out which type of events are the most harmful. For this we create 2 data.frames, one with total of fatalities, injuries and propdamage by event type and other with the averages.
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
grouped <- group_by(storm1,evtype)
totales <- summarise(grouped,
fatalities = sum(fatalities),
injuries = sum(injuries),
propdmg = sum(propdmg),
cropdmg = sum(cropdmg),
Nevent =n()
)
medias <- summarise(grouped,
fatalities = mean(fatalities),
injuries = mean(injuries),
propdmg = mean(propdmg),
cropdmg = mean(cropdmg),
Nevent =n()
)
Show the 10 higher.
# fatalities
(fatalitiesTot <- arrange(totales, desc(fatalities))[,c("evtype","fatalities","Nevent")])
## Source: local data frame [985 x 3]
##
## evtype fatalities Nevent
## 1 TORNADO 5633 60652
## 2 EXCESSIVE HEAT 1903 1678
## 3 FLASH FLOOD 978 54277
## 4 HEAT 937 767
## 5 LIGHTNING 816 15754
## 6 TSTM WIND 504 219940
## 7 FLOOD 470 25326
## 8 RIP CURRENT 368 470
## 9 HIGH WIND 248 20212
## 10 AVALANCHE 224 386
## .. ... ... ...
(fatalitiesMean <- arrange(medias, desc(fatalities))[,c("evtype","fatalities","Nevent")])
## Source: local data frame [985 x 3]
##
## evtype fatalities Nevent
## 1 TORNADOES, TSTM WIND, HAIL 25.000 1
## 2 COLD AND SNOW 14.000 1
## 3 TROPICAL STORM GORDON 8.000 1
## 4 RECORD/EXCESSIVE HEAT 5.667 3
## 5 EXTREME HEAT 4.364 22
## 6 HEAT WAVE DROUGHT 4.000 1
## 7 HIGH WIND/SEAS 4.000 1
## 8 MARINE MISHAP 3.500 2
## 9 WINTER STORMS 3.333 3
## 10 Heavy surf and wind 3.000 1
## .. ... ... ...
In total the most harmful events are tornadoes and excesive heat while in means are tornadoes, cold and snow and the particular tropical storm names Gordon. This indicate what tornadoes and heat events are a death common cause related with weather events.
(injuriesTot <- arrange(totales, desc(injuries))[,c("evtype","injuries","Nevent")])
## Source: local data frame [985 x 3]
##
## evtype injuries Nevent
## 1 TORNADO 91346 60652
## 2 TSTM WIND 6957 219940
## 3 FLOOD 6789 25326
## 4 EXCESSIVE HEAT 6525 1678
## 5 LIGHTNING 5230 15754
## 6 HEAT 2100 767
## 7 ICE STORM 1975 2006
## 8 FLASH FLOOD 1777 54277
## 9 THUNDERSTORM WIND 1488 82563
## 10 HAIL 1361 288661
## .. ... ... ...
(injuriesMean <- arrange(medias, desc(injuries))[,c("evtype","injuries","Nevent")])
## Source: local data frame [985 x 3]
##
## evtype injuries Nevent
## 1 Heat Wave 70.00 1
## 2 TROPICAL STORM GORDON 43.00 1
## 3 WILD FIRES 37.50 4
## 4 THUNDERSTORMW 27.00 1
## 5 HIGH WIND AND SEAS 20.00 1
## 6 SNOW/HIGH WINDS 18.00 2
## 7 GLAZE/ICE STORM 15.00 1
## 8 HEAT WAVE DROUGHT 15.00 1
## 9 WINTER STORM HIGH WINDS 15.00 1
## 10 HURRICANE/TYPHOON 14.49 88
## .. ... ... ...
Again, tornadoes, heat events and tropical storm (perhaps TSTM winds and floods is related with tropical storms) are the most harmful, in total and means number of injuries
(propdmgTot <- arrange(totales, desc(propdmg))[,c("evtype","propdmg","Nevent")])
## Source: local data frame [985 x 3]
##
## evtype propdmg Nevent
## 1 TORNADO 3212258 60652
## 2 FLASH FLOOD 1420125 54277
## 3 TSTM WIND 1335966 219940
## 4 FLOOD 899938 25326
## 5 THUNDERSTORM WIND 876844 82563
## 6 HAIL 688693 288661
## 7 LIGHTNING 603352 15754
## 8 THUNDERSTORM WINDS 446293 20843
## 9 HIGH WIND 324732 20212
## 10 WINTER STORM 132721 11433
## .. ... ... ...
(propdmgMean <- arrange(medias, desc(propdmg))[,c("evtype","propdmg","Nevent")])
## Source: local data frame [985 x 3]
##
## evtype propdmg Nevent
## 1 COASTAL EROSION 766 1
## 2 HEAVY RAIN AND FLOOD 600 1
## 3 RIVER AND STREAM FLOOD 600 2
## 4 Landslump 570 1
## 5 BLIZZARD/WINTER STORM 500 1
## 6 FLASH FLOOD/ 500 1
## 7 FLASH FLOODING/THUNDERSTORM WI 500 1
## 8 FLOOD/RIVER FLOOD 500 1
## 9 FROST\\FREEZE 500 1
## 10 HEAVY PRECIPITATION 500 1
## .. ... ... ...
For total property damage tornadoes and flood are the worst event while in mean is coastal erosion and heavy rain and flood.
(cropdmgTot <- arrange(totales, desc(cropdmg))[,c("evtype","cropdmg","Nevent")])
## Source: local data frame [985 x 3]
##
## evtype cropdmg Nevent
## 1 HAIL 579596 288661
## 2 FLASH FLOOD 179200 54277
## 3 FLOOD 168038 25326
## 4 TSTM WIND 109203 219940
## 5 TORNADO 100019 60652
## 6 THUNDERSTORM WIND 66791 82563
## 7 DROUGHT 33899 2488
## 8 THUNDERSTORM WINDS 18685 20843
## 9 HIGH WIND 17283 20212
## 10 HEAVY RAIN 11123 11723
## .. ... ... ...
(cropdmgMean <- arrange(medias, desc(cropdmg))[,c("evtype","cropdmg","Nevent")])
## Source: local data frame [985 x 3]
##
## evtype cropdmg Nevent
## 1 DUST STORM/HIGH WINDS 500.0 1
## 2 FOREST FIRES 500.0 1
## 3 TROPICAL STORM GORDON 500.0 1
## 4 HIGH WINDS/COLD 401.0 5
## 5 HURRICANE FELIX 250.0 2
## 6 River Flooding 241.4 5
## 7 WINTER STORMS 166.7 3
## 8 EXCESSIVE WETNESS 142.0 1
## 9 Frost/Freeze 100.0 1
## 10 TYPHOON 75.0 11
## .. ... ... ...
For total crop damage the worst is hail and it’s a common event (288,661), there are events related with storm (tropical? ) like tstm wind and flood . The means show particular events with a high crop damage
Let’s go to use the totals instead of mean to plot because
par(mfrow=(c(2,2)))
tam = 0.7
dotchart(x=fatalitiesTot[1:10,"fatalities"], labels=fatalitiesTot[1:10,"evtype"],
pch=19, main = "Fatalities by event", xlab="Total deaths", cex = tam)
dotchart(x=injuriesTot[1:10,"injuries"], labels=injuriesTot[1:10,"evtype"],
pch=19, main = "Injuries by event",xlab="Total injuries", cex = tam )
dotchart(x=propdmgTot[1:10,"propdmg"], labels=propdmgTot[1:10,"evtype"],
pch=19, main = "Property damage by event", xlab="Total, in $", cex = tam )
dotchart(x=cropdmgTot[1:10,"cropdmg"], labels=cropdmgTot[1:10,"evtype"],
pch=19, main=" Crop damage", xlab="Total, in $", cex = tam )
As we can see in the figure Tornado are the most harmful event for population health and property damage. For farmers, hail is the greatest concern.
Events related with heat are the second (EXCESSIVE HEAT) and fourth (HEAT) by fatalities. Floods are the third causes of death and injuries, but it’s the second in property and crop damage.
In order to clarify analysis we have to clean and depurate data base, especially the EVTYPE variable. We have a lot of different EVTYPE, and we must connect what they refer to the same event