This analysis is about, impact of natural disasters like tornado, avalanches, thunderstorm etc. on the US economy and public health. The aim is to get insight that among all the natural events occur in the period between 1950 to 2011, which have the most significant influence on population health and economy of US. To accomplish this analysis I’ve refered to U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. In this analysis the economy influenced through total of property damage and crop damage in the country during the specific natural event. Similarly population health have the impact through injuries and fatalities caused by natural events.
Packages:
To process this analysis smoothly packages like “dplyr” and “ggplot2”, are used.
#loading libraries
library(ggplot2)
library(dplyr)
After downloading data from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, for further analysis we need to load the data to explore and analyze it.
#loading data
storm_data<- read.csv("repdata_data_StormData.csv")
For the next step lets get a quick summary of our dataset…
#exploring a bit
summary(storm_data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
##
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0.0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31.0 Class :character Class :character Class :character
## Median : 75.0 Mode :character Mode :character Mode :character
## Mean :100.6
## 3rd Qu.:131.0
## Max. :873.0
##
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE
## Min. : 0.000 Length:902297 Length:902297 Length:902297
## 1st Qu.: 0.000 Class :character Class :character Class :character
## Median : 0.000 Mode :character Mode :character Mode :character
## Mean : 1.484
## 3rd Qu.: 1.000
## Max. :3749.000
##
## END_TIME COUNTY_END COUNTYENDN END_RANGE
## Length:902297 Min. :0 Mode:logical Min. : 0.0000
## Class :character 1st Qu.:0 NA's:902297 1st Qu.: 0.0000
## Mode :character Median :0 Median : 0.0000
## Mean :0 Mean : 0.9862
## 3rd Qu.:0 3rd Qu.: 0.0000
## Max. :0 Max. :925.0000
##
## END_AZI END_LOCATI LENGTH WIDTH
## Length:902297 Length:902297 Min. : 0.0000 Min. : 0.000
## Class :character Class :character 1st Qu.: 0.0000 1st Qu.: 0.000
## Mode :character Mode :character Median : 0.0000 Median : 0.000
## Mean : 0.2301 Mean : 7.503
## 3rd Qu.: 0.0000 3rd Qu.: 0.000
## Max. :2315.0000 Max. :4400.000
##
## F MAG FATALITIES INJURIES
## Min. :0.00 Min. : 0.0 Min. : 0.00000 Min. : 0.0000
## 1st Qu.:0.00 1st Qu.: 0.0 1st Qu.: 0.00000 1st Qu.: 0.0000
## Median :1.00 Median : 50.0 Median : 0.00000 Median : 0.0000
## Mean :0.91 Mean : 46.9 Mean : 0.01678 Mean : 0.1557
## 3rd Qu.:1.00 3rd Qu.: 75.0 3rd Qu.: 0.00000 3rd Qu.: 0.0000
## Max. :5.00 Max. :22000.0 Max. :583.00000 Max. :1700.0000
## NA's :843563
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0.00 Length:902297 Min. : 0.000 Length:902297
## 1st Qu.: 0.00 Class :character 1st Qu.: 0.000 Class :character
## Median : 0.00 Mode :character Median : 0.000 Mode :character
## Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :5000.00 Max. :990.000
##
## WFO STATEOFFIC ZONENAMES LATITUDE
## Length:902297 Length:902297 Length:902297 Min. : 0
## Class :character Class :character Class :character 1st Qu.:2802
## Mode :character Mode :character Mode :character Median :3540
## Mean :2875
## 3rd Qu.:4019
## Max. :9706
## NA's :47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## Min. :-14451 Min. : 0 Min. :-14455 Length:902297
## 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 8707 Median : 0 Median : 0 Mode :character
## Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. : 17124 Max. :9706 Max. :106220
## NA's :40
## REFNUM
## Min. : 1
## 1st Qu.:225575
## Median :451149
## Mean :451149
## 3rd Qu.:676723
## Max. :902297
##
Findings after exploring the data
After revolving around our dataset I’ve found a crucial information about the variables we required for our analysis, these are “PROPDMG” and “CROPDMG” which do not represents the actual values as their exponents are labeled as “K” (thousand), “M” (Million), “B” (Billion) and “H” (Hundred), in the variable name “PROPEXP”, similar observation was found with crop damage and the exponents are labeled in the variable “CROPEXP”.
To deal with actual values we need to structure our data, to do so we need to consider “K”, “M”, “B”, “H”; other than these exponents the value will be considered as “0”..
Property damage
#data pre processing
#property dmg
prop_dmg<- storm_data%>%
mutate(PROPDMGEXP = toupper(PROPDMGEXP),
PROPDMG = case_when(PROPDMGEXP== "K" ~ PROPDMG*1e+03,
PROPDMGEXP == "M" ~ PROPDMG*1e+06,
PROPDMGEXP == "B" ~ PROPDMG*1e+09,
PROPDMGEXP == "H" ~ PROPDMG*1e+02,
TRUE ~ 0))%>%
select(EVTYPE,PROPDMG)
head(prop_dmg)
## EVTYPE PROPDMG
## 1 TORNADO 25000
## 2 TORNADO 2500
## 3 TORNADO 25000
## 4 TORNADO 2500
## 5 TORNADO 2500
## 6 TORNADO 2500
In similar way we’re going to reconstruct the variable Crop damage
#crop dmg
crop_dmg<- storm_data%>%
mutate(CROPDMGEXP = toupper(CROPDMGEXP),
CROPDMG = case_when(CROPDMGEXP == "H" ~ CROPDMG*1e+02,
CROPDMGEXP == "K" ~ CROPDMG*1e+03,
CROPDMGEXP == "M" ~ CROPDMG*1e+06,
CROPDMGEXP == "B" ~ CROPDMG*1e+09,
TRUE ~ 0))%>%
select(EVTYPE,CROPDMG)
head(crop_dmg)
## EVTYPE CROPDMG
## 1 TORNADO 0
## 2 TORNADO 0
## 3 TORNADO 0
## 4 TORNADO 0
## 5 TORNADO 0
## 6 TORNADO 0
Now our data is ready for further analysis…
Now we’ll see which natural events influence population health the most
for the first step lets get the top 5 events that have the most fatalities with reference of a plot
#fatalities
pop_fatality<- aggregate(FATALITIES~EVTYPE, data = storm_data, FUN = sum)
tp_fat<- pop_fatality[order(-pop_fatality$FATALITIES),][1:5,]
ggplot(data = tp_fat, mapping = aes(x = EVTYPE, y = FATALITIES, fill = EVTYPE))+
geom_col(color = "black")+
ylim(0, 6000)+
coord_polar(theta = "y")+
labs(title = "Events most Harmful to Population Health",
subtitle = "Top 5 Events with most Fatalities")+
xlab("EVENTS")
Observation
The plot above shows the top 5 natural events with most fatalities observed.
Which shows that the most fatalities were caused or observed under Tornado followed by other disasters, Lightening, Heat, Flash flood and Excessive heat.
For this section we’ll get top 5 events that caused most numbers of injuries…
#injuries
pop_injuries<- aggregate(INJURIES~EVTYPE, data = storm_data, FUN = sum)
tp_inj<- pop_injuries[order(-pop_injuries$INJURIES),][1:5,]
ggplot(data = tp_inj, mapping = aes(x = EVTYPE, y = INJURIES, fill = EVTYPE))+
geom_col(color = "black")+
ylim(0,100000)+
coord_polar(theta = "y")+
labs(title = "Events most Harmful to Population Health",
subtitle = "Top 5 Events with most Injuries")+
xlab("EVENTS")
Observation
The plot above shows the top 5 natural events with most injuries observed.
Which shows that the most injuries were caused or observed under Tornado followed by other disasters, TSTM wind, Lightening, Flood, Excessive heat.
Conclusion
With the both observation we came across the lead factor which have caused the most harm to population health is Tornado.
In this sub-section of analysis we’ll go through the Natural Events that have shown significant impact on Economy
for this analysis I’ve selected top 5 events with highest property damage observed
# property Damage
agg_prop<- aggregate(PROPDMG~EVTYPE, data = prop_dmg, FUN = sum)
tp_prop<-agg_prop[order(-agg_prop$PROPDMG),][1:5,]
ggplot(data = tp_prop, mapping = aes(x = EVTYPE, y=PROPDMG/10000000, colour =EVTYPE))+
geom_bar(stat = "identity", fill = "white")+
coord_flip()+
labs(title = "Events with most Impact to U.S. Economy",
subtitle = "Top 5 Events with hihest impact on property damage")+
xlab("EVENTS")+
ylab("Rescaled Property Damage")
Observation
The plot above shows the top 5 natural events with highest property damage observed.
Which shows that the most damage was caused or observed under Flood followed by other disasters, Hurricane/Typhoon, Tornado, Storm Surge and Flash Flood.
Similarly for this section of our analysis ive taken top 5 natural disasters that have highest observation of Crop Damage.
#crop dmg
agg_crop<- aggregate(CROPDMG~EVTYPE, data = crop_dmg, FUN = sum)
tp_crop<- agg_crop[order(-agg_crop$CROPDMG),][1:5,]
ggplot(data = tp_crop, aes(x= EVTYPE, y = CROPDMG/10000000, colour = EVTYPE))+
geom_bar(stat = "identity", fill = "white")+
coord_flip()+
xlab("Events")+
ylab("Crop Damage (Rescaled Values)")+
labs(title = "Events with most Impact to U.S. Economy",
subtitle = "Top 5 Events with hihest impact on crop damage")
Observation
The plot above shows the top 5 natural events with highest crop damage observed.
Which shows that the most damage was caused or observed under Drought followed by other disasters, Flood, River flood, Ice Storm and Hail.
Conclusion
Both the observation we got are interestingly different as the property damage have the highest influence of Flood. But in the other section of our analysis we’ve done on crop damage the highest was by drought, it’s a very insightful and fascinating observation we got.
According to my perception main reason we’ve got the drastically different observation was because of geographic location of both the zones.
That’s a quick but captivating process of analysis, and that was the end of my analysis.