Synopsis:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Data Processing:

# getting data into R
setwd("C:\\") # setting up the working directory
if(!file.exists("./stormdata")){dir.create("./stormdata")} # creating a folder
setwd("C:\\stormdata") # setting up the working directory
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2" # setup the url
download.file(url, destfile = "./repdata_data_StormData.csv.bz2") # download the file in your local folder
storm = read.csv("repdata_data_StormData.csv.bz2") # read csv file
storm = tbl_df(storm) # creating compact data frame for viewing

“Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?”

there are two variables we will looks at to determine which event types are in top 5 list for each catagory. These 2 categories or variables are:

  1. “FATALITIES”
  2. “INJURIES”
popdamage = select(storm,c(8,23,24)) # selecting the data from RAW DATA

popdamage = popdamage %>% group_by(EVTYPE) %>% summarise(fatality_total = sum(FATALITIES), injury_total = sum(INJURIES))

dim(popdamage) # 985 unique events in storm data
## [1] 985   3

there are 985 unique events in the storm data. Our goal is to see the most harmful of these for population damages.

First, we will analyze the “Fatality” variable to see our top-5 events causing the damage out of 985 events.

byfatality = popdamage %>% arrange(desc(fatality_total),desc(injury_total))
top5fatality = byfatality[1:5,]

RESULTS: Top-5 events causing highest no. of fatalities are:

## Source: local data frame [5 x 3]
## 
##           EVTYPE fatality_total injury_total
##           (fctr)          (dbl)        (dbl)
## 1        TORNADO           5633        91346
## 2 EXCESSIVE HEAT           1903         6525
## 3    FLASH FLOOD            978         1777
## 4           HEAT            937         2100
## 5      LIGHTNING            816         5230
top5fatality$EVTYPE <- factor(top5fatality$EVTYPE, levels = top5fatality$EVTYPE[order(top5fatality$fatality_total, decreasing = TRUE)])
plot1 = ggplot(top5fatality, aes(x=EVTYPE, y=fatality_total), fill=fatality_total)+
        geom_bar(stat = "identity", fill = 1:5)+
        labs(x="Event Type",y="Total Fatalities",title="Top 5 event types by total Fatalities")+
        theme(axis.text.x = element_text(angle = 45))
plot1

# % of total Fatalies accounted by top 5 events:
top5fatality_percentage = round((sum(top5fatality$fatality_total)/sum(popdamage$fatality_total))*100,1)
therestfatality_percentage = 100 - top5fatality_percentage
x = c(top5fatality_percentage, therestfatality_percentage)
names(x) = c("Fatalities_%_by_Top5","Fatalities_%_by_theRest")
x
##    Fatalities_%_by_Top5 Fatalities_%_by_theRest 
##                    67.8                    32.2

RESULTS: Top-5 events account for 67.8% of all fatalities.

Now, we will analyze the “Injury” variable to see our top-5 events causing the damage out of 985 events.

byinjury = popdamage %>% arrange(desc(injury_total),desc(fatality_total))
top5injury = byinjury[1:5,]

RESULTS: Top-5 events causing highest no. of Injuries are:

## Source: local data frame [5 x 3]
## 
##           EVTYPE fatality_total injury_total
##           (fctr)          (dbl)        (dbl)
## 1        TORNADO           5633        91346
## 2      TSTM WIND            504         6957
## 3          FLOOD            470         6789
## 4 EXCESSIVE HEAT           1903         6525
## 5      LIGHTNING            816         5230
top5injury$EVTYPE <- factor(top5injury$EVTYPE, levels = top5injury$EVTYPE[order(top5injury$injury_total, decreasing = TRUE)])
plot2 = ggplot(top5injury, aes(x=EVTYPE, y=injury_total), fill=injury_total)+
        geom_bar(stat = "identity", fill = 1:5)+
        labs(x="Event Type",y="Total Injuries",title="Top 5 event types by total Injuries")+
        theme(axis.text.x = element_text(angle = 45))
plot2

# % of total Injuries accounted by top 5 events:
top5injury_percentage = round((sum(top5injury$injury_total)/sum(popdamage$injury_total))*100,1)
therestinjury_percentage = 100 - top5injury_percentage
y = c(top5injury_percentage, therestinjury_percentage)
names(y) = c("Injuries_%_by_Top5","Injuries_%_by_theRest")
y
##    Injuries_%_by_Top5 Injuries_%_by_theRest 
##                  83.1                  16.9

RESULTS: Top-5 events account for 83.1% of all Injuries.

There are some events common in both top-10 of Fatalities and Injuries. These are the ones most Harmful as they are in top LIST for both Fatalities and Injuries.

# common events in top 5 Fatalities and Injuries:
common = which(top5fatality$EVTYPE %in% top5injury$EVTYPE)
common
## [1] 1 2 5
top3 = as.vector(top5fatality$EVTYPE[common])
top3
## [1] "TORNADO"        "EXCESSIVE HEAT" "LIGHTNING"

RESULTS: Three most harmful events causing population health damages in form of fatalities and injuries in US based on storm data from year 1950 to end in November 2011 are:

  1. TORNADO
  2. EXCESSIVE HEAT
  3. LIGHTNING

“TORNADO” is the no.1 event in both “FATALITY” and “INJURY” list causing most harm.

“Across the United States, which types of events have the greatest economic consequences?”

there are two variables we will looks at to determine which event types are in top 5 list for having the greatest economic consequences. These 2 categories or variables are:

  1. “PROPDMG”: Property Damage
  2. “CROPDMG”: Crop Damage
ecodamage = select(storm, c(8,25,26,27,28))
ecodamage$PROPDMGEXP = as.character(ecodamage$PROPDMGEXP)
ecodamage$CROPDMGEXP = as.character(ecodamage$CROPDMGEXP)

# Ignoring data with unknown damage expressions.
ecodamage = ecodamage[ecodamage$PROPDMGEXP %in% c("h","H","k","K","m","M","b","B"),]
ecodamage = ecodamage[ecodamage$CROPDMGEXP %in% c("h","H","k","K","m","M","b","B"),]

# conversions for damage expressions:
ecodamage$PROPDMGEXP = as.numeric(with(ecodamage, ifelse(PROPDMGEXP=="k"|PROPDMGEXP=="K",10^3,(ifelse(PROPDMGEXP=="m"|PROPDMGEXP=="M",10^6,(ifelse(PROPDMGEXP=="b"|PROPDMGEXP=="B",10^9,"NA")))))))

ecodamage$CROPDMGEXP = as.numeric(with(ecodamage, ifelse(CROPDMGEXP=="k"|CROPDMGEXP=="K",10^3,(ifelse(CROPDMGEXP=="m"|CROPDMGEXP=="M",10^6,(ifelse(CROPDMGEXP=="b"|CROPDMGEXP=="B",10^9,"NA")))))))

# adding new column for total damages:
ecodamage$TotalDamages_inMillions = ((ecodamage$PROPDMG*ecodamage$PROPDMGEXP) + (ecodamage$CROPDMG*ecodamage$CROPDMGEXP))/(10^6)

ecodamage = ecodamage %>% group_by(EVTYPE) %>% summarise(TotalDamages_inMillions = round(sum(TotalDamages_inMillions),3)) %>% arrange(desc(TotalDamages_inMillions))

top5ecodamage = ecodamage[1:5,]

RESULTS: top-5 events causing highest economical damages in form of property damages and crop damages are:

## Source: local data frame [5 x 2]
## 
##              EVTYPE TotalDamages_inMillions
##              (fctr)                   (dbl)
## 1             FLOOD               138007.45
## 2 HURRICANE/TYPHOON                29348.17
## 3           TORNADO                16520.15
## 4         HURRICANE                12405.27
## 5       RIVER FLOOD                10108.37

“FLOOD” is the No.1 in the list with total damages of $138007 millions.

plot of top-10 events for economical damages:

top5ecodamage$EVTYPE <- factor(top5ecodamage$EVTYPE, levels = top5ecodamage$EVTYPE[order(top5ecodamage$TotalDamages_inMillions, decreasing = TRUE)])
plot3 = ggplot(top5ecodamage, aes(x=EVTYPE, y=TotalDamages_inMillions), fill=TotalDamages_inMillions)+
        geom_bar(stat = "identity", fill = 1:5)+
        labs(x="Event Type",y="Total Damages (in Millions of $)",title="Top 5 event types by total Economical Damages")+
        theme(axis.text.x = element_text(angle = 45))
plot3

# % of total Injuries accounted by top 5 events:
top5eco_percentage = round((sum(top5ecodamage$TotalDamages_inMillions)/sum(ecodamage$TotalDamages_inMillions))*100,1)
theresteco_percentage = 100 - top5eco_percentage

z = c(top5eco_percentage, theresteco_percentage)
names(z) = c("Economical_Damage_%_by_Top5","Economical_Damage_%_by_theRest")
z
##    Economical_Damage_%_by_Top5 Economical_Damage_%_by_theRest 
##                           78.9                           21.1

RESULTS: Top-5 events account for 78.9% of all economical damages.

FINAL RESULTS & SUMMARY:

  1. There 985 unique events in the storm data. Our two goal are to see the most harmful of these for population damages in terms of Fatalitie/Injuries and Economical consequences.

  2. Top-5 events causing highest no. of fatalities are:

## Source: local data frame [5 x 3]
## 
##           EVTYPE fatality_total injury_total
##           (fctr)          (dbl)        (dbl)
## 1        TORNADO           5633        91346
## 2 EXCESSIVE HEAT           1903         6525
## 3    FLASH FLOOD            978         1777
## 4           HEAT            937         2100
## 5      LIGHTNING            816         5230
  1. Top-5 events account for 67.8% of all fatalities.

  2. Top-5 events causing highest no. of Injuries are:

## Source: local data frame [5 x 3]
## 
##           EVTYPE fatality_total injury_total
##           (fctr)          (dbl)        (dbl)
## 1        TORNADO           5633        91346
## 2      TSTM WIND            504         6957
## 3          FLOOD            470         6789
## 4 EXCESSIVE HEAT           1903         6525
## 5      LIGHTNING            816         5230
  1. Top-5 events account for 83.1% of all Injuries.

  2. Three most harmful events causing population health damages in form of both fatalities and injuries in US based on storm data from year 1950 to end in November 2011 are:

    1. TORNADO
    2. EXCESSIVE HEAT
    3. LIGHTNING
  3. “TORNADO” being no.1 in both “FATALITY” and “INJURY” list.

  4. Top-5 events causing highest economical damages in form of property damages and crop damages are:

## Source: local data frame [5 x 2]
## 
##              EVTYPE TotalDamages_inMillions
##              (fctr)                   (dbl)
## 1             FLOOD               138007.45
## 2 HURRICANE/TYPHOON                29348.17
## 3           TORNADO                16520.15
## 4         HURRICANE                12405.27
## 5       RIVER FLOOD                10108.37
  1. “FLOOD” is the No.1 in the list with total damages of $138007 millions.

  2. Top-5 events account for 78.9% of all economical damages.