Exploring the US national oceanic and atmospheric administration’s (noaa) storm database : impacts of severe weather on economy and health.

ASSIGNMENT

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.

SYNOPSIS

As we know that severe weather conditions can cause both health and property damage. So we are going to explore the dataset to determine that which weather condition has the largest impact on health as well as on economy.

Question 1

Across the United States, which types of events i.e Flood, Lightning, Tornado etc. are most harmful with respect to population health?

Data Processing

  1. Download the data from the URL.
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, "./repdata-data-StormData.csv.bz2")
  1. Now create a data frame
dataframe <- read.table(bzfile("./repdata-data-StormData.csv.bz2"), sep = ",", header=TRUE)
  1. Let’s give a look to our data
head(dataframe)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
  1. We only want the health related data for our question. So we are extracting the data we need.
healthData <- dataframe[,c("EVTYPE","FATALITIES","INJURIES")]
head(healthData)
##    EVTYPE FATALITIES INJURIES
## 1 TORNADO          0       15
## 2 TORNADO          0        0
## 3 TORNADO          0        2
## 4 TORNADO          0        2
## 5 TORNADO          0        2
## 6 TORNADO          0        6

Now let’s find out the top 10 conditions that has effect on the health of the population i.e the number of fatalities and injuries.

Fatalities

1. Number of Fatalities by top 10 conditions.

aggData <- aggregate(FATALITIES ~ EVTYPE,data = healthData,sum)
head(aggData)
##                  EVTYPE FATALITIES
## 1    HIGH SURF ADVISORY          0
## 2         COASTAL FLOOD          0
## 3           FLASH FLOOD          0
## 4             LIGHTNING          0
## 5             TSTM WIND          0
## 6       TSTM WIND (G45)          0

But Remember we need top 10 not all.

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.4.4
## -- Attaching packages ------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 2.2.1     v purrr   0.2.4
## v tibble  1.4.2     v dplyr   0.7.4
## v tidyr   0.7.2     v stringr 1.2.0
## v readr   1.1.1     v forcats 0.2.0
## Warning: package 'ggplot2' was built under R version 3.4.4
## -- Conflicts ---------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
topData <- head(arrange(aggData,desc(FATALITIES)), n = 10)
head(topData)
##           EVTYPE FATALITIES
## 1        TORNADO       5633
## 2 EXCESSIVE HEAT       1903
## 3    FLASH FLOOD        978
## 4           HEAT        937
## 5      LIGHTNING        816
## 6      TSTM WIND        504

Now we have found the top 10 weather conditions causing maximum fatalities, let’s create our first histogram.

library(ggplot2)
ggplot(topData, aes(x = EVTYPE, y = FATALITIES)) + 
  geom_bar(stat = "identity", fill = "darkgreen", las = 3) + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
  xlab("Event Type") + ylab("Fatalities") + ggtitle("Number of fatalities by top 10 Weather Events")
## Warning: Ignoring unknown parameters: las

## Injuries

2. Number of Injuries by top 10 conditions.

aggData1 <- aggregate(INJURIES ~ EVTYPE,data = healthData,sum)
head(aggData1)
##                  EVTYPE INJURIES
## 1    HIGH SURF ADVISORY        0
## 2         COASTAL FLOOD        0
## 3           FLASH FLOOD        0
## 4             LIGHTNING        0
## 5             TSTM WIND        0
## 6       TSTM WIND (G45)        0

Now Let’s find the top 10 weather conditions causing maximum injuries.

topData1 <- head(arrange(aggData1,desc(INJURIES)), n = 10)
head(topData1)
##           EVTYPE INJURIES
## 1        TORNADO    91346
## 2      TSTM WIND     6957
## 3          FLOOD     6789
## 4 EXCESSIVE HEAT     6525
## 5      LIGHTNING     5230
## 6           HEAT     2100

Now we have found the top 10 weather conditions causing maximum fatalities, let’s create histogram.

ggplot(topData1,aes(x=EVTYPE,y=INJURIES)) +
  geom_bar(stat = "identity",fill="darkgreen") +
  theme(axis.text.x = element_text(angle = 90,hjust = 1))+
  xlab("Event Type") + ylab("INJURIES") + ggtitle("Number of Injuries by top 10 Weather Conditions")

Result

Weather Condition i.e TORNADO is the most disastrous one causing highest number of injuries and fatalities.

Question 2

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to economy?

economyData <- dataframe[,c(8,25,26,27,28)]
head(economyData)
##    EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO    25.0          K       0           
## 2 TORNADO     2.5          K       0           
## 3 TORNADO    25.0          K       0           
## 4 TORNADO     2.5          K       0           
## 5 TORNADO     2.5          K       0           
## 6 TORNADO     2.5          K       0
tail(economyData)
##                EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 902292 WINTER WEATHER       0          K       0          K
## 902293      HIGH WIND       0          K       0          K
## 902294      HIGH WIND       0          K       0          K
## 902295      HIGH WIND       0          K       0          K
## 902296       BLIZZARD       0          K       0          K
## 902297     HEAVY SNOW       0          K       0          K

Convert H, K, M, B units to calculate the Property Damage

economyData$PROPDMGFIG = 0
economyData[economyData$PROPDMGEXP == "H", ]$PROPDMGFIG = economyData[economyData$PROPDMGEXP == "H", ]$PROPDMG * 10^2
economyData[economyData$PROPDMGEXP == "K", ]$PROPDMGFIG = economyData[economyData$PROPDMGEXP == "K", ]$PROPDMG * 10^3
economyData[economyData$PROPDMGEXP == "M", ]$PROPDMGFIG = economyData[economyData$PROPDMGEXP == "M", ]$PROPDMG * 10^6
economyData[economyData$PROPDMGEXP == "B", ]$PROPDMGFIG = economyData[economyData$PROPDMGEXP == "B", ]$PROPDMG * 10^9

Convert H, K, M, B units to calculate the Crop Damage

economyData$CROPDMGFIG = 0
economyData[economyData$CROPDMGEXP == "H",]$CROPDMGFIG =
  economyData[economyData$CROPDMGEXP == "H",]$CROPDMG * 10^2
economyData[economyData$CROPDMGEXP == "K",]$CROPDMGFIG = 
  economyData[economyData$CROPDMGEXP == "K",]$CROPDMG * 10^3
economyData[economyData$CROPDMGEXP == "M",]$CROPDMGFIG = 
  economyData[economyData$CROPDMGEXP == "M",]$CROPDMG * 10^6
economyData[economyData$CROPDMGEXP == "B",]$CROPDMGFIG = 
  economyData[economyData$CROPDMGEXP == "B",]$CROPDMG * 10^9

head(economyData)
##    EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP PROPDMGFIG CROPDMGFIG
## 1 TORNADO    25.0          K       0                 25000          0
## 2 TORNADO     2.5          K       0                  2500          0
## 3 TORNADO    25.0          K       0                 25000          0
## 4 TORNADO     2.5          K       0                  2500          0
## 5 TORNADO     2.5          K       0                  2500          0
## 6 TORNADO     2.5          K       0                  2500          0

Let’s find out top 10 weather conditions affecting crops and property.

damageData <- aggregate(PROPDMGFIG + CROPDMGFIG ~ EVTYPE,data = economyData,sum)
head(damageData)
##                  EVTYPE PROPDMGFIG + CROPDMGFIG
## 1    HIGH SURF ADVISORY                  200000
## 2         COASTAL FLOOD                       0
## 3           FLASH FLOOD                   50000
## 4             LIGHTNING                       0
## 5             TSTM WIND                 8100000
## 6       TSTM WIND (G45)                    8000

Rename the column names for readability.

colnames(damageData) <- c("EVTYPE","Damages")
topData2 <- head(arrange(damageData,desc(Damages)), n = 10)
head(topData2)
##              EVTYPE      Damages
## 1             FLOOD 150319678250
## 2 HURRICANE/TYPHOON  71913712800
## 3           TORNADO  57340613590
## 4       STORM SURGE  43323541000
## 5              HAIL  18752904670
## 6       FLASH FLOOD  17562128610

plot for property and crop damage by top 10 weather conditions.

ggplot(topData2,aes(x=EVTYPE,y=Damages))+
  geom_bar(stat = "identity",fill="darkgreen")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  ggtitle("Property & Crop Damages by top 10 Weather Conditions ")

Result

Hence Flood is the main cause for maximum crop and property damage.