Synopsis

This analysis is about, impact of natural disasters like tornado, avalanches, thunderstorm etc. on the US economy and public health. The aim is to get insight that among all the natural events occur in the period between 1950 to 2011, which have the most significant influence on population health and economy of US. To accomplish this analysis I’ve refered to U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. In this analysis the economy influenced through total of property damage and crop damage in the country during the specific natural event. Similarly population health have the impact through injuries and fatalities caused by natural events.

Packages:

To process this analysis smoothly packages like “dplyr” and “ggplot2”, are used.

#loading libraries
library(ggplot2)
library(dplyr)

Loading Dataset

After downloading data from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, for further analysis we need to load the data to explore and analyze it.

#loading data
storm_data<- read.csv("repdata_data_StormData.csv")

For the next step lets get a quick summary of our dataset…

#exploring a bit
summary(storm_data)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY       COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31.0   Class :character   Class :character   Class :character  
##  Median : 75.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100.6                                                           
##  3rd Qu.:131.0                                                           
##  Max.   :873.0                                                           
##                                                                          
##    BGN_RANGE          BGN_AZI           BGN_LOCATI          END_DATE        
##  Min.   :   0.000   Length:902297      Length:902297      Length:902297     
##  1st Qu.:   0.000   Class :character   Class :character   Class :character  
##  Median :   0.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :   1.484                                                           
##  3rd Qu.:   1.000                                                           
##  Max.   :3749.000                                                           
##                                                                             
##    END_TIME           COUNTY_END COUNTYENDN       END_RANGE       
##  Length:902297      Min.   :0    Mode:logical   Min.   :  0.0000  
##  Class :character   1st Qu.:0    NA's:902297    1st Qu.:  0.0000  
##  Mode  :character   Median :0                   Median :  0.0000  
##                     Mean   :0                   Mean   :  0.9862  
##                     3rd Qu.:0                   3rd Qu.:  0.0000  
##                     Max.   :0                   Max.   :925.0000  
##                                                                   
##    END_AZI           END_LOCATI            LENGTH              WIDTH         
##  Length:902297      Length:902297      Min.   :   0.0000   Min.   :   0.000  
##  Class :character   Class :character   1st Qu.:   0.0000   1st Qu.:   0.000  
##  Mode  :character   Mode  :character   Median :   0.0000   Median :   0.000  
##                                        Mean   :   0.2301   Mean   :   7.503  
##                                        3rd Qu.:   0.0000   3rd Qu.:   0.000  
##                                        Max.   :2315.0000   Max.   :4400.000  
##                                                                              
##        F               MAG            FATALITIES           INJURIES        
##  Min.   :0.00     Min.   :    0.0   Min.   :  0.00000   Min.   :   0.0000  
##  1st Qu.:0.00     1st Qu.:    0.0   1st Qu.:  0.00000   1st Qu.:   0.0000  
##  Median :1.00     Median :   50.0   Median :  0.00000   Median :   0.0000  
##  Mean   :0.91     Mean   :   46.9   Mean   :  0.01678   Mean   :   0.1557  
##  3rd Qu.:1.00     3rd Qu.:   75.0   3rd Qu.:  0.00000   3rd Qu.:   0.0000  
##  Max.   :5.00     Max.   :22000.0   Max.   :583.00000   Max.   :1700.0000  
##  NA's   :843563                                                            
##     PROPDMG         PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Min.   :   0.00   Length:902297      Min.   :  0.000   Length:902297     
##  1st Qu.:   0.00   Class :character   1st Qu.:  0.000   Class :character  
##  Median :   0.00   Mode  :character   Median :  0.000   Mode  :character  
##  Mean   :  12.06                      Mean   :  1.527                     
##  3rd Qu.:   0.50                      3rd Qu.:  0.000                     
##  Max.   :5000.00                      Max.   :990.000                     
##                                                                           
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 

Findings after exploring the data

After revolving around our dataset I’ve found a crucial information about the variables we required for our analysis, these are “PROPDMG” and “CROPDMG” which do not represents the actual values as their exponents are labeled as “K” (thousand), “M” (Million), “B” (Billion) and “H” (Hundred), in the variable name “PROPEXP”, similar observation was found with crop damage and the exponents are labeled in the variable “CROPEXP”.

Reconstructing the data:

To deal with actual values we need to structure our data, to do so we need to consider “K”, “M”, “B”, “H”; other than these exponents the value will be considered as “0”..

Property damage

#data pre processing
#property dmg
prop_dmg<- storm_data%>%
  mutate(PROPDMGEXP = toupper(PROPDMGEXP),
         PROPDMG = case_when(PROPDMGEXP== "K" ~ PROPDMG*1e+03,
         PROPDMGEXP == "M" ~ PROPDMG*1e+06,
         PROPDMGEXP == "B" ~ PROPDMG*1e+09,
         PROPDMGEXP == "H" ~ PROPDMG*1e+02,
         TRUE ~ 0))%>%
  select(EVTYPE,PROPDMG)
head(prop_dmg)
##    EVTYPE PROPDMG
## 1 TORNADO   25000
## 2 TORNADO    2500
## 3 TORNADO   25000
## 4 TORNADO    2500
## 5 TORNADO    2500
## 6 TORNADO    2500

In similar way we’re going to reconstruct the variable Crop damage

#crop dmg
crop_dmg<- storm_data%>%
  mutate(CROPDMGEXP = toupper(CROPDMGEXP),
         CROPDMG = case_when(CROPDMGEXP == "H" ~ CROPDMG*1e+02,
                             CROPDMGEXP == "K" ~ CROPDMG*1e+03,
                             CROPDMGEXP == "M" ~ CROPDMG*1e+06,
                             CROPDMGEXP == "B" ~ CROPDMG*1e+09,
                             TRUE ~ 0))%>%
  select(EVTYPE,CROPDMG)
head(crop_dmg)
##    EVTYPE CROPDMG
## 1 TORNADO       0
## 2 TORNADO       0
## 3 TORNADO       0
## 4 TORNADO       0
## 5 TORNADO       0
## 6 TORNADO       0

Now our data is ready for further analysis…

Impact On Population Health

Now we’ll see which natural events influence population health the most

for the first step lets get the top 5 events that have the most fatalities with reference of a plot

#fatalities
pop_fatality<- aggregate(FATALITIES~EVTYPE, data = storm_data, FUN = sum)
tp_fat<- pop_fatality[order(-pop_fatality$FATALITIES),][1:5,]
ggplot(data = tp_fat, mapping = aes(x = EVTYPE, y = FATALITIES, fill = EVTYPE))+
  geom_col(color = "black")+
  ylim(0, 6000)+
  coord_polar(theta = "y")+
  labs(title = "Events most Harmful to Population Health",
       subtitle = "Top 5 Events with most Fatalities")+
  xlab("EVENTS")

Observation

The plot above shows the top 5 natural events with most fatalities observed.

Which shows that the most fatalities were caused or observed under Tornado followed by other disasters, Lightening, Heat, Flash flood and Excessive heat.

For this section we’ll get top 5 events that caused most numbers of injuries…

#injuries
pop_injuries<- aggregate(INJURIES~EVTYPE, data = storm_data, FUN = sum)
tp_inj<- pop_injuries[order(-pop_injuries$INJURIES),][1:5,]
ggplot(data = tp_inj, mapping = aes(x = EVTYPE, y = INJURIES, fill = EVTYPE))+
  geom_col(color = "black")+
  ylim(0,100000)+
  coord_polar(theta = "y")+
  labs(title = "Events most Harmful to Population Health",
       subtitle = "Top 5 Events with most Injuries")+
  xlab("EVENTS")

Observation

The plot above shows the top 5 natural events with most injuries observed.

Which shows that the most injuries were caused or observed under Tornado followed by other disasters, TSTM wind, Lightening, Flood, Excessive heat.

Conclusion

With the both observation we came across the lead factor which have caused the most harm to population health is Tornado.

Impact on Economy

In this sub-section of analysis we’ll go through the Natural Events that have shown significant impact on Economy

for this analysis I’ve selected top 5 events with highest property damage observed

# property Damage
agg_prop<- aggregate(PROPDMG~EVTYPE, data = prop_dmg, FUN = sum)
tp_prop<-agg_prop[order(-agg_prop$PROPDMG),][1:5,]
ggplot(data = tp_prop, mapping = aes(x = EVTYPE, y=PROPDMG/10000000, colour =EVTYPE))+
  geom_bar(stat = "identity", fill = "white")+
  coord_flip()+
  labs(title = "Events with most Impact to U.S. Economy",
       subtitle = "Top 5 Events with hihest impact on property damage")+
  xlab("EVENTS")+
  ylab("Rescaled Property Damage")

Observation

The plot above shows the top 5 natural events with highest property damage observed.

Which shows that the most damage was caused or observed under Flood followed by other disasters, Hurricane/Typhoon, Tornado, Storm Surge and Flash Flood.

Similarly for this section of our analysis ive taken top 5 natural disasters that have highest observation of Crop Damage.

#crop dmg
agg_crop<- aggregate(CROPDMG~EVTYPE, data = crop_dmg, FUN = sum)
tp_crop<- agg_crop[order(-agg_crop$CROPDMG),][1:5,]
ggplot(data = tp_crop, aes(x= EVTYPE, y = CROPDMG/10000000, colour = EVTYPE))+
  geom_bar(stat = "identity", fill = "white")+
  coord_flip()+
  xlab("Events")+
  ylab("Crop Damage (Rescaled Values)")+
  labs(title = "Events with most Impact to U.S. Economy",
       subtitle = "Top 5 Events with hihest impact on crop damage")

Observation

The plot above shows the top 5 natural events with highest crop damage observed.

Which shows that the most damage was caused or observed under Drought followed by other disasters, Flood, River flood, Ice Storm and Hail.

Conclusion

Both the observation we got are interestingly different as the property damage have the highest influence of Flood. But in the other section of our analysis we’ve done on crop damage the highest was by drought, it’s a very insightful and fascinating observation we got.

According to my perception main reason we’ve got the drastically different observation was because of geographic location of both the zones.

That’s a quick but captivating process of analysis, and that was the end of my analysis.