Analysis of US Weather Events

Which events are harmful to people and have the biggest economic consequences

## Loading libraries
library(dplyr)
library(lubridate)

Synopsis

Storms are severe weather events which cause damage to people and buildings. It would be of great value, if we could predict weather and especially storms. Before predection a precise analysis of historical weather data may give us some insight in the damage an loss caused by severe events. Using the National Weather Storm Data Documentation provided by the U.S. National Oceanic and Atmospheric Administration’s (NOAA)[www.noaa.gov] we will answer two questions:

  • Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  • Across the United States, which types of events have the greatest economic consequences?

Data processing and correlated problems

A .bz archive containing all data which can be downloaded at:[http://www1.ncdc.noaa.gov/pub/data/swdi/stormevents/csvfiles/]

Additional Information about the content of the file can be seen at:

Reading data from file

   URL <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
   download.file(URL, destfile = "test.bz2", method="curl")
   storm<-read.csv("test.bz2")
   ## There are 
    evlen<- length(unique(storm$EVTYPE))
   ## events named in the database
   ## Fatalies and injuries are summarized in two columns of the database named:FATALIES,INJURIES
   
   ## reduce dataframe to necessary variables
    storm1<-storm[,c(1,2,7,8,23:28)]
   ## Change BGN_DATE from factor to date
    storm1$BGN_DATE<-as.Date(storm1$BGN_DATE, format="%d/%m/%Y")
   ## Extract year
    storm1$year<-year(storm1$BGN_DATE)

Problems:

For everybody who reads this analysis it is essential to understand the problems arising from the quality of the data. There are 985 types of events in the file.

Data are very inconsistent and are spreaded over 61 years. As the value of money changes during this time, economic data therefore are difficult to handle. Instead of calculating the economic consequences on a common base, I will use another approach.

For each question the top 10 events for each year and each category ( harm to people, economic consequenses) are extracted.

From these “top-events” the causing events are classified.

Data processing for answering of the first question (harm to people)

Harm to people are “Fatalites” and “Injuries”. If they are correlated which each other, it would be enough to calculate one of them.

                ## Plotting FATALITIES versus INJURIES
                plot(storm1$INJURIES,storm1$FATALITIES,main="Correlation between INJURIES and FATALITIES", xlab="INJURIES", ylab="FATALITIES")

                ## Correlation
                cr<- cor(storm1$INJURIES,storm1$FATALITIES)

Correlation is 0.3216808 , thus increasing injuries are not directly correlated to increasing fatalities. Consequentely I will look on both effects seperately.

First for INJURIES

        ## order by INJURIES
                ## Split by year
                str_sph<-split(storm1,storm1$year)
        ## order by INJURIES
                str_spordh<-lapply(str_sph,function(x) x[order(-x$INJURIES),])
        ## extract first 10 events ( with highest number of Injuries)
                str_spord10h<-lapply(str_spordh, function(x) x[1:10,])
        ## Concatenate split dataframes
                storm_eventh<-do.call(rbind,str_spord10h)

How does the causing events in this subset look like ? ’r table(droplevels(storm_eventh$EVTYPE)) `

Most of them seem to belong to “wind related” items, thus making an extra column with windrelated events

## make extra column marking all wind related events
storm_eventh$type<-ifelse(grepl("STORM",storm_eventh$EVTYPE)==TRUE|grepl("TORN",storm_eventh$EVTYPE)==TRUE|grepl("HURR",storm_eventh$EVTYPE)==TRUE,"windrelated","")
## Sum of windrelated events
s10_inj<-sum(storm_eventh$type=="windrelated")
l10_inj<-nrow(storm_eventh)
l10p_inj<-round((s10_inj/l10_inj) *100)

# These above calculated events are now ordered decreasing by injuries
inj_order<-storm_eventh[order(-storm_eventh[,6]),]
# First 50 are extracted
inj_order<-inj_order[1:50,]

Results for INJURIES

There are 499 windrelated events causing the Top 10 injuries each year.

The total number of top 10 events is 620 corresponding to a rate of 80 percent

The 50 severe weather events with most injured persons are caused by:

table(droplevels(inj_order$EVTYPE))
## 
##    EXCESSIVE HEAT              HEAT HURRICANE/TYPHOON         ICE STORM 
##                 3                 4                 1                 1 
##           TORNADO 
##                41

Second for Fatalities

        ## order by FATALITIES
                str_spordhf<-lapply(str_sph,function(x) x[order(-x$FATALITIES),])
        ## extract first 10 events ( with highest number of fatalities)
                str_spord10hf<-lapply(str_spordhf, function(x) x[1:10,])
        ## Concatenate split dataframes
                storm_eventhf<-do.call(rbind,str_spord10hf)
        
## make extra column marking all wind related events
storm_eventhf$type<-ifelse(grepl("STORM",storm_eventhf$EVTYPE)==TRUE|grepl("TORN",storm_eventhf$EVTYPE)==TRUE|grepl("HURR",storm_eventhf$EVTYPE)==TRUE,"windrelated","")
## Sum of windrelated events
s10_fat<-sum(storm_eventhf$type=="windrelated")
l10_fat<-nrow(storm_eventh)
l10p_fat<-round((s10_fat/l10_fat) *100)

fat_order<-storm_eventh[order(-storm_eventhf[,6]),]
# First 50 are extracted
fat_order<-fat_order[1:50,]

Results for FATALITIES

There are 396 windrelated events causing the top 10 fatalities each year.

The total number of top 10 events is 620 corresponding to a rate of 64 percent

The 50 severe weather events with most injured persons are caused by:

table(droplevels(fat_order$EVTYPE))
## 
##                 BLIZZARD           EXCESSIVE HEAT                     HEAT 
##                        1                        3                        1 
##               HEAVY SNOW MARINE THUNDERSTORM WIND             RIP CURRENTS 
##                        1                        1                        1 
##        THUNDERSTORM WIND                  TORNADO 
##                        1                       41

Data processing for answering of the second question( economic consequences)

## Function for dealing with PROPDMG and PROPDMGEXP
## Including only k,m and b values because question is about most harmful
## Function for translating values in PROPDMGEXP into numbers and multiply with PROPDMG into "Total Damage" $tdmg       
        exfun<-function(x,y){
            if (x=="k"|x=="K"){
                  y*1000 
                 
               }else if(x=="m"|x=="M") {y*1000000
                
                }else if (x=="b"|x=="B") {
                        y*1000000000}
                 else {y}       
            }
          
        storm1$tdmg<-mapply(exfun,storm1$PROPDMGEXP,storm1$PROPDMG)
        ## Split by year
                str_sp<-split(storm1,storm1$year)
        ## order by tdmg
                str_spord<-lapply(str_sp,function(x) x[order(-x$tdmg),])
        ## extract first 10 events ( with highest tdmc)
                str_spord10<-lapply(str_spord, function(x) x[1:10,])
        ## Concatenate split dataframes
                storm_event<-do.call(rbind,str_spord10)
       
## make extra column marking all wind related events
storm_event$type<-ifelse(grepl("STORM",storm_event$EVTYPE)==TRUE|grepl("TORN",storm_event$EVTYPE)==TRUE|grepl("HURR",storm_event$EVTYPE)==TRUE,"windrelated","")
## Sm of windrelated events
s10<-sum(storm_event$type=="windrelated")
l10<-nrow(storm_event)
l10p<-round((s10/l10) *100)

If we look at the ten most expensive events in each year since 1950 we conclude, that there are 518 events of total 620 events are wind related.

This is equal to 84 percent

Results summarized

From the Storm database three parameters

  • harm(injuries)
  • harm(fatalities)
  • economic consequences

were calculated.

For all three parameters the type of event causing the highest amount of “damage” was calculated. As a result I found for all three parameters “windrelated” events as the major cause for damage From the Top-10 events for each year and each event the following ration was caused bey “wind”

  • harm(injuries) 80 percent
  • harm(fatalities) 64 percent
  • economic consequences 84 percent

Conclusion

Wind related events are most harmful for the US causing harm to people and causing the greatest economic consequences