Synopsis

During this data analysis we have investigated how the weather events have affected human casualties as well as damages to properties and corps. The data for weather events with highest fatalities and injured people as well as damage to properties and corps are presented in the result section.

Loading the data

The data can be found on this link: Storm Data The documentation for the data is here. Storm Data Documentation

data <- read.csv("repdata_data_StormData.csv.bz2", sep = ",")

PART ONE

Processing the data

Define weather_harmful which only takes the columns related to event type, injuries and fatalities:

weather_harmful <- data[, c("EVTYPE", "INJURIES", "FATALITIES")]

The rows with zero injuries and fatalities are excluded from the data set.

weather_harmful <- weather_harmful[(weather_harmful$INJURIES != 0 | weather_harmful$FATALITIES != 0 ),]

Event type is not clean. To clean up the data we change the name for all of them to lower case. Then all rows with the same event type are combined together.

weather_harmful$EVTYPE <- tolower(weather_harmful$EVTYPE)
weather_harmful <- weather_harmful %>% group_by(EVTYPE) %>% summarise(INJURIES = sum(INJURIES), FATALITIES = sum(FATALITIES))

In order to further clean the data, a vector of events is created.

events <- c( "tornado", "cold","flood","snow","fire","dust" ,"rain","blizzard","beach erosion", 
            "dry", "wind", "thunderstorm","hail", "heat", "winter", "hurrican", "storm") 

I define update_terms function which takes the string data and checks if string x is in it. If yes, it will replace the whole string with the string x. This later shall be used to for example replace high wind, high wind 48, winds with windin events vector.

I also define update_events function which updates the event type for the events listed in the EVTYPE column in weather_harmful.

update_terms <- function(x, data) replace(data, grepl(x, data), x)
update_events <- function(ev) mapply(update_terms, ev, weather_harmful$EVTYPE)

for loops update_events function for every event listed in events list.

for (event in events){
  weather_harmful$EVTYPE <- update_events(event)
}

Through process explained above similar values in EVTYPE column are updated with exactly same name as in events list. So now cleaning data again through combining the rows with the same type of events. This makes the data as concise as 72 rows.

weather_harmful <- weather_harmful %>% group_by(EVTYPE) %>% summarise(INJURIES = sum(INJURIES), FATALITIES = sum(FATALITIES))

Results

The first 8 data points for fatalities and injuries are selected to be presented in the plot.

# Top 8 event types for FATALITIES
Fatality <- head(weather_harmful[order(weather_harmful$FATALITIES, decreasing = TRUE),],8)

# Top 8 event types for INJURIES
Injury <- head(weather_harmful[order(weather_harmful$INJURIES, decreasing = TRUE),],8)

The following plots show the final results for weather events with highest fatalities as well as injuries in USA.

 p1 <- ggplot(Fatality, aes(x = reorder(EVTYPE, FATALITIES), y = FATALITIES, fill = FATALITIES))+
        geom_bar(stat = "identity") +
        coord_flip() +
        ylab("Total number of FATALITIES") +
        xlab("Event type") +
        scale_fill_gradient(low="#ff8080", high="#B30000")
        theme(legend.position="none")
## List of 1
##  $ legend.position: chr "none"
##  - attr(*, "class")= chr [1:2] "theme" "gg"
##  - attr(*, "complete")= logi FALSE
##  - attr(*, "validate")= logi TRUE
p2 <- ggplot(Injury, aes(x = reorder(EVTYPE, INJURIES), y = INJURIES, fill = INJURIES))+
        geom_bar(stat = "identity") +
        coord_flip() +
        ylab("Total number of INJURIES") +
        xlab("Event type") +
        scale_fill_gradient(low="#ff8080", high="#B30000")
        theme(legend.position="none")
## List of 1
##  $ legend.position: chr "none"
##  - attr(*, "class")= chr [1:2] "theme" "gg"
##  - attr(*, "complete")= logi FALSE
##  - attr(*, "validate")= logi TRUE
title <- textGrob("Total Casualties from weather events in the USA (1950-2011)",gp = gpar(fontsize = 14,font=3)) 
grid.arrange(p1, p2, top = title)

PART TWO

Processing the data

Define weather_damage which only takes the columns related to event type, property damage and crop damage:

 weather_damage <- data[, c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

The rows with zero property damage and crop damage are excluded from the data set.

weather_damage <- weather_damage[(weather_damage$PROPDMG != 0 | weather_damage$CROPDMG != 0 ),]

To see the levels and NA values for CROPDMGEXP and PROPDMGEXP

summary(factor(weather_damage$CROPDMGEXP))
##             ?      0      B      k      K      m      M 
## 145037      6     17      7     21  97960      1   1982
summary(factor(weather_damage$PROPDMGEXP))
##             -      +      0      2      3      4      5      6      7      B 
##   4357      1      5    209      1      1      4     18      3      2     40 
##      h      H      K      m      M 
##      1      6 229057      7  11319
sum(is.na(weather_damage$CROPDMG))
## [1] 0
sum(is.na(weather_damage$PROPDMG))
## [1] 0

damage factor function is defined to replace the values in CROPDMGEXP and PROPDMGEXP with equivalent numerical values.

dmgfactor <- function(x){
  if (x == "k" | x == "K") return (1000)
  else if (x == "m" | x == "M") return (1000000)
  else if (x == "0" | x == "") return(1)
  else if (x == "h" | x == "H") return(100)
  else if (x == "B") return(1000000000)
  else if (x == "?" | x == "-" | x == "+" ) return(0)
  else return(as.numeric(x))
}

Propdagame and Cropdagame show the damage for property and crop:

weather_damage$Propdagame <- weather_damage$PROPDMG  * sapply(weather_damage$PROPDMGEXP,dmgfactor)
weather_damage$Cropdagame <- weather_damage$CROPDMG  * sapply(weather_damage$CROPDMGEXP,dmgfactor)

Event type is not clean. To clean up the data we change the name for all of them to lower case. Then all rows with the same event type are combined together.

weather_damage$EVTYPE <- tolower(weather_damage$EVTYPE)
weather_damage <- weather_damage %>% group_by(EVTYPE) %>% summarise(Propdagame = sum(Propdagame), 
                                                                    Cropdagame = sum(Cropdagame))

Similar to part one, update_events function updates the event type for the events listed in the EVTYPE column in weather_damage. for loops update_events function for every event listed in events list.

update_events <- function(ev) mapply(update_terms, ev, weather_damage$EVTYPE)
for (event in events){
  weather_damage$EVTYPE <- update_events(event)
}

Through process explained above similar values in EVTYPE column are updated with exactly same name as in events list. So now cleaning data again through combining the rows with the same type of events.

weather_damage <- weather_damage %>% group_by(EVTYPE) %>% summarise(Propdagame = sum(Propdagame), 
                                                                    Cropdagame = sum(Cropdagame))

Results

The first 8 data points for fatalities and injuries are selected to be presented in the plot.

 # Top 8 event types for property damage
Prop_damage <- head(weather_damage[order(weather_damage$Propdagame, decreasing = TRUE),],8)

# Top 8 event types for crop damage
Crop_damage <- head(weather_damage[order(weather_damage$Cropdagame, decreasing = TRUE),],8)

The cost of damage is calculated in billion dollars.

Prop_damage$Propdagame <- Prop_damage$Propdagame/1000000000
Crop_damage$Cropdagame <- Crop_damage$Cropdagame/1000000000

The following plots show the total cost for weather events with highest property damage as well as crop damage in USA.

p3 <- ggplot(Prop_damage, aes(x = reorder(EVTYPE, Propdagame), y = Propdagame, fill = Propdagame))+
              geom_bar(stat = "identity") +
              coord_flip() +
              ylab("Total cost from property damage in billion USD") +
              xlab("Event type") +
              scale_fill_gradient(low="#f6c3ab", high="#e9692c") +
              theme(legend.position="none")

p4 <- ggplot(Crop_damage, aes(x = reorder(EVTYPE, Cropdagame), y = Cropdagame, fill = Cropdagame))+
              geom_bar(stat = "identity") +
              coord_flip() +
              ylab("Total cost from crop damage in billion USD") +
              xlab("Event type") +
              scale_fill_gradient(low="#f6c3ab", high="#e9692c") +
              theme(legend.position="none")

title <- textGrob("Total cost of damage from weather events in the USA (1950-2011)",gp = gpar(fontsize = 14,font=3)) 
grid.arrange(p3, p4, top = title)