Synopsis

In this report we aim to describe weather events in USA from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The goal is to identify which kind of event are more dangeorus for people (fatalities and injuries) and which cause more damage in economics terms (property damage). We work on 902297 events, and we found what tornadoes, storm tropical and all events related with heat cause (in mean) more fatalities, injuries, property and crop damages than others.

Data processing

From the U.S. National Oceanic and Atmospheric Administration’s (NOAA) we obtained the data. This database tracks characteristics of major storms and weather event and estimates fatalities, injuries, property and crop damage.

Download and load data into R

We firts read the data from a bz2 compress file. R has the capabilities for read this kind of file. The raw data (decompressed) is a delimited file using , for separate fields.

if(!file.exists("./data")){
    dir.create("./data")
}

if(!file.exists("./data/StormData.csv.bz2")){
    fileUrl1 <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
    
    download.file(fileUrl1, destfile ="./data/StormData.csv.bz2" ,method = "curl" )
    
}
storm <- read.csv("./data/StormData.csv.bz2")

In the documentation there are 44 differents evtypes, but in the data has 998. Because we look up the most harmful for population health and the evtypes which have the greater economic consequences, we don’t care about the number levels of evtype in data.

The variables of interest are evtype, fatalities, injuries, propdmb and cropdmg . We can summarize this variable, and create one dataset containing only those.

dim(storm)
## [1] 902297     37
names(storm) <- tolower(names(storm))
nlevels(storm$evtype)
## [1] 985
# a lot of levels, perhaps we'll have to clean data , using 44 levels
# sum columns with similar names which indicate same variables. But  

# no NA's, good0
summary(storm[,c("evtype","fatalities","injuries","propdmg","cropdmg")])
##                evtype         fatalities     injuries         propdmg    
##  HAIL             :288661   Min.   :  0   Min.   :   0.0   Min.   :   0  
##  TSTM WIND        :219940   1st Qu.:  0   1st Qu.:   0.0   1st Qu.:   0  
##  THUNDERSTORM WIND: 82563   Median :  0   Median :   0.0   Median :   0  
##  TORNADO          : 60652   Mean   :  0   Mean   :   0.2   Mean   :  12  
##  FLASH FLOOD      : 54277   3rd Qu.:  0   3rd Qu.:   0.0   3rd Qu.:   0  
##  FLOOD            : 25326   Max.   :583   Max.   :1700.0   Max.   :5000  
##  (Other)          :170878                                                
##     cropdmg     
##  Min.   :  0.0  
##  1st Qu.:  0.0  
##  Median :  0.0  
##  Mean   :  1.5  
##  3rd Qu.:  0.0  
##  Max.   :990.0  
## 

there isn’t missing values in variables.

We create a new data set containing only the interest variables.

storm1 <- storm[,c("evtype","fatalities","injuries",
                         "propdmg","cropdmg")]

Results

Our first step is summarize data to find out which type of events are the most harmful. For this we create 2 data.frames, one with total of fatalities, injuries and propdamage by event type and other with the averages.

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
grouped <- group_by(storm1,evtype)

totales <- summarise(grouped, 
                        fatalities = sum(fatalities),
                        injuries = sum(injuries),
                        propdmg = sum(propdmg),
                        cropdmg = sum(cropdmg),
                        Nevent =n()
                        )


medias <- summarise(grouped, 
                        fatalities = mean(fatalities),
                        injuries = mean(injuries),
                        propdmg = mean(propdmg),
                        cropdmg = mean(cropdmg),
                        Nevent =n()
                        )

Sum and means of fatalities

Show the 10 higher.

# fatalities
(fatalitiesTot <- arrange(totales, desc(fatalities))[,c("evtype","fatalities","Nevent")])
## Source: local data frame [985 x 3]
## 
##            evtype fatalities Nevent
## 1         TORNADO       5633  60652
## 2  EXCESSIVE HEAT       1903   1678
## 3     FLASH FLOOD        978  54277
## 4            HEAT        937    767
## 5       LIGHTNING        816  15754
## 6       TSTM WIND        504 219940
## 7           FLOOD        470  25326
## 8     RIP CURRENT        368    470
## 9       HIGH WIND        248  20212
## 10      AVALANCHE        224    386
## ..            ...        ...    ...
(fatalitiesMean <- arrange(medias, desc(fatalities))[,c("evtype","fatalities","Nevent")])
## Source: local data frame [985 x 3]
## 
##                        evtype fatalities Nevent
## 1  TORNADOES, TSTM WIND, HAIL     25.000      1
## 2               COLD AND SNOW     14.000      1
## 3       TROPICAL STORM GORDON      8.000      1
## 4       RECORD/EXCESSIVE HEAT      5.667      3
## 5                EXTREME HEAT      4.364     22
## 6           HEAT WAVE DROUGHT      4.000      1
## 7              HIGH WIND/SEAS      4.000      1
## 8               MARINE MISHAP      3.500      2
## 9               WINTER STORMS      3.333      3
## 10        Heavy surf and wind      3.000      1
## ..                        ...        ...    ...

In total the most harmful events are tornadoes and excesive heat while in means are tornadoes, cold and snow and the particular tropical storm names Gordon. This indicate what tornadoes and heat events are a death common cause related with weather events.

Sum and means of injuries

(injuriesTot <- arrange(totales, desc(injuries))[,c("evtype","injuries","Nevent")])
## Source: local data frame [985 x 3]
## 
##               evtype injuries Nevent
## 1            TORNADO    91346  60652
## 2          TSTM WIND     6957 219940
## 3              FLOOD     6789  25326
## 4     EXCESSIVE HEAT     6525   1678
## 5          LIGHTNING     5230  15754
## 6               HEAT     2100    767
## 7          ICE STORM     1975   2006
## 8        FLASH FLOOD     1777  54277
## 9  THUNDERSTORM WIND     1488  82563
## 10              HAIL     1361 288661
## ..               ...      ...    ...
(injuriesMean <- arrange(medias, desc(injuries))[,c("evtype","injuries","Nevent")])
## Source: local data frame [985 x 3]
## 
##                     evtype injuries Nevent
## 1                Heat Wave    70.00      1
## 2    TROPICAL STORM GORDON    43.00      1
## 3               WILD FIRES    37.50      4
## 4            THUNDERSTORMW    27.00      1
## 5       HIGH WIND AND SEAS    20.00      1
## 6          SNOW/HIGH WINDS    18.00      2
## 7          GLAZE/ICE STORM    15.00      1
## 8        HEAT WAVE DROUGHT    15.00      1
## 9  WINTER STORM HIGH WINDS    15.00      1
## 10       HURRICANE/TYPHOON    14.49     88
## ..                     ...      ...    ...

Again, tornadoes, heat events and tropical storm (perhaps TSTM winds and floods is related with tropical storms) are the most harmful, in total and means number of injuries

Sum and means of property damage

(propdmgTot <- arrange(totales, desc(propdmg))[,c("evtype","propdmg","Nevent")])
## Source: local data frame [985 x 3]
## 
##                evtype propdmg Nevent
## 1             TORNADO 3212258  60652
## 2         FLASH FLOOD 1420125  54277
## 3           TSTM WIND 1335966 219940
## 4               FLOOD  899938  25326
## 5   THUNDERSTORM WIND  876844  82563
## 6                HAIL  688693 288661
## 7           LIGHTNING  603352  15754
## 8  THUNDERSTORM WINDS  446293  20843
## 9           HIGH WIND  324732  20212
## 10       WINTER STORM  132721  11433
## ..                ...     ...    ...
(propdmgMean <- arrange(medias, desc(propdmg))[,c("evtype","propdmg","Nevent")])
## Source: local data frame [985 x 3]
## 
##                            evtype propdmg Nevent
## 1                 COASTAL EROSION     766      1
## 2            HEAVY RAIN AND FLOOD     600      1
## 3          RIVER AND STREAM FLOOD     600      2
## 4                       Landslump     570      1
## 5           BLIZZARD/WINTER STORM     500      1
## 6                    FLASH FLOOD/     500      1
## 7  FLASH FLOODING/THUNDERSTORM WI     500      1
## 8               FLOOD/RIVER FLOOD     500      1
## 9                   FROST\\FREEZE     500      1
## 10            HEAVY PRECIPITATION     500      1
## ..                            ...     ...    ...

For total property damage tornadoes and flood are the worst event while in mean is coastal erosion and heavy rain and flood.

Sum and means of crop damage

(cropdmgTot <- arrange(totales, desc(cropdmg))[,c("evtype","cropdmg","Nevent")])
## Source: local data frame [985 x 3]
## 
##                evtype cropdmg Nevent
## 1                HAIL  579596 288661
## 2         FLASH FLOOD  179200  54277
## 3               FLOOD  168038  25326
## 4           TSTM WIND  109203 219940
## 5             TORNADO  100019  60652
## 6   THUNDERSTORM WIND   66791  82563
## 7             DROUGHT   33899   2488
## 8  THUNDERSTORM WINDS   18685  20843
## 9           HIGH WIND   17283  20212
## 10         HEAVY RAIN   11123  11723
## ..                ...     ...    ...
(cropdmgMean <- arrange(medias, desc(cropdmg))[,c("evtype","cropdmg","Nevent")])
## Source: local data frame [985 x 3]
## 
##                   evtype cropdmg Nevent
## 1  DUST STORM/HIGH WINDS   500.0      1
## 2           FOREST FIRES   500.0      1
## 3  TROPICAL STORM GORDON   500.0      1
## 4        HIGH WINDS/COLD   401.0      5
## 5        HURRICANE FELIX   250.0      2
## 6         River Flooding   241.4      5
## 7          WINTER STORMS   166.7      3
## 8      EXCESSIVE WETNESS   142.0      1
## 9           Frost/Freeze   100.0      1
## 10               TYPHOON    75.0     11
## ..                   ...     ...    ...

For total crop damage the worst is hail and it’s a common event (288,661), there are events related with storm (tropical? ) like tstm wind and flood . The means show particular events with a high crop damage

Figure

Let’s go to use the totals instead of mean to plot because

par(mfrow=(c(2,2)))

tam = 0.7
dotchart(x=fatalitiesTot[1:10,"fatalities"], labels=fatalitiesTot[1:10,"evtype"],
            pch=19,  main = "Fatalities by event", xlab="Total deaths", cex = tam)

dotchart(x=injuriesTot[1:10,"injuries"], labels=injuriesTot[1:10,"evtype"],
            pch=19,  main = "Injuries by event",xlab="Total injuries", cex = tam )


dotchart(x=propdmgTot[1:10,"propdmg"], labels=propdmgTot[1:10,"evtype"],
            pch=19, main = "Property damage by event", xlab="Total, in $", cex = tam )

dotchart(x=cropdmgTot[1:10,"cropdmg"], labels=cropdmgTot[1:10,"evtype"],
            pch=19, main=" Crop damage", xlab="Total, in $", cex = tam )

plot of chunk unnamed-chunk-2

As we can see in the figure Tornado are the most harmful event for population health and property damage. For farmers, hail is the greatest concern.

Events related with heat are the second (EXCESSIVE HEAT) and fourth (HEAT) by fatalities. Floods are the third causes of death and injuries, but it’s the second in property and crop damage.

Next steps

In order to clarify analysis we have to clean and depurate data base, especially the EVTYPE variable. We have a lot of different EVTYPE, and we must connect what they refer to the same event