Analyze the impact of severe weather impact on public health and financial damages.

Synopsis:

Here we get the data from the course website and documentation from National Weather Service website. Based on the documentation we do the exploratory analsysis and see what kind of weather events are causing more damage and show the top 15 events which are causing more affect on publich health by using the parameters Property damage, crop damage, Injuries and fatalities. Here we got the information about events with start date and end date. by using this information we can see that national weather service is capturing data more from 1990 onwards. since we have more data, we restrict our analysis for the events captured after 1990.

Data Processing

Load the required libraries in this code block

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
require(gridExtra)
## Loading required package: gridExtra
library(ggplot2)

Load the data. Loading the data takes times.

if (!"stormData" %in% ls()) { 
 
stormData <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))
}
dim(stormData)
## [1] 902297     37
colnames(stormData)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

StormData dataset contains 902297 observations and r dim(stormData)[2] variables. Let us add the year attribute to see how many weather events recorded and from when we have the recorded data storm weather database.

stormData$year <- as.numeric(format(as.Date(stormData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))

hist(stormData$year,breaks=50)

According to the histogram National weather storm database have recordings since 1950,but let us focus from 1990 for our analysis. filter the data to take data since 1990

stormData <- stormData[stormData$year>=1990,]

some of the events are not valid events. Let us focus our analysis on the valid list of events by filtering the data.

#stormData<-  filter(stormData, EVTYPE %in%  c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme Cold/Wind Chill", "Flash Flood", "Flood", "Freezing Fog", "Frost/Freeze", "Funnel Cloud", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane/Typhoon", "Ice Storm", "Lakeshore Flood", "Lake-Effect Snow", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather") )

After exploring the data, we are going to focus on these variables for the analysis. The variables we use are EVTYPE, FATALITIES,INJURIES,PROPDMGEXP,CROPDMGEXP,PROPDMG,CROPDMG.

PROPDMGEXP and CROPDMGEXP variables are giving the cost measurement for that particular measure ment. To do our analysis, we need to convert all the data in to single measurement by converting Billions, Millions, Thousands and Hundreds in to raw numbers by adding couple of additional fields.

stormData<- stormData %>%
        mutate(ProperyCostMeasure=0,PropertyDamage=0,CropCostMeasure=0,CropDamage=0)
stormData$ProperyCostMeasure[stormData$PROPDMGEXP=="B"] <- 9
stormData$ProperyCostMeasure[stormData$PROPDMGEXP=="M"] <- 6
stormData$ProperyCostMeasure[stormData$PROPDMGEXP=="K"] <- 3
stormData$ProperyCostMeasure[stormData$PROPDMGEXP=="H"] <- 2
stormData$PropertyDamage=stormData$PROPDMG*10^stormData$ProperyCostMeasure


stormData$CropCostMeasure[stormData$CROPDMGEXP=="B"] <- 9
stormData$CropCostMeasure[stormData$CROPDMGEXP=="M"] <- 6
stormData$CropCostMeasure[stormData$CROPDMGEXP=="K"] <- 3
stormData$CropCostMeasure[stormData$CROPDMGEXP=="H"] <- 2
stormData$CropDamage=stormData$CROPDMG*10^stormData$CropCostMeasure

Let us aggregate the data by event and see which events are affecting more and nature of the impact.

SummaryStormData<- stormData %>%
        group_by(EVTYPE) %>%
        summarise(TotalPropertyDamage = sum(PropertyDamage,na.rm=TRUE),
                  TotalCropDamage = sum(CropDamage,na.rm=TRUE),
                  TotalFATALITIES = sum(FATALITIES,na.rm=TRUE),
                  TotalINJURIES = sum(INJURIES,na.rm=TRUE)
        )

For each variables we are interested in, we create a summary data set with top 15 events affecting that variable. ##Fatalities

summaryFatalities <- SummaryStormData %>%
        select(EVTYPE, TotalFATALITIES)%>%
        arrange(desc(TotalFATALITIES),EVTYPE)%>%
        top_n(15)
## Selecting by TotalFATALITIES
summaryFatalities <- transform(summaryFatalities,
                               EVTYPE = reorder(EVTYPE, 
                                                order(TotalFATALITIES, decreasing = TRUE)))

Injuries

summaryTotalINJURIES <- SummaryStormData %>%
        select(EVTYPE, TotalINJURIES)%>%
        arrange(desc(TotalINJURIES),EVTYPE)%>%
        top_n(15)
## Selecting by TotalINJURIES
summaryTotalINJURIES <- transform(summaryTotalINJURIES,
                               EVTYPE = reorder(EVTYPE, 
                                                order(TotalINJURIES, decreasing = TRUE)))

Property Damage

summaryPropertyDamage <- SummaryStormData %>%
        select(EVTYPE, TotalPropertyDamage)%>%
        arrange(desc(TotalPropertyDamage),EVTYPE)%>%
        top_n(25)
## Selecting by TotalPropertyDamage
summaryPropertyDamage <- transform(summaryPropertyDamage,
                               EVTYPE = reorder(EVTYPE, 
                                                order(TotalPropertyDamage, decreasing = TRUE)))

Crop Damage

summaryCropDamage <- SummaryStormData %>%
        select(EVTYPE, TotalCropDamage)%>%
        arrange(desc(TotalCropDamage),EVTYPE)%>%
        top_n(25)
## Selecting by TotalCropDamage
summaryCropDamage <- transform(summaryCropDamage,
                               EVTYPE = reorder(EVTYPE, 
                                                order(TotalCropDamage, decreasing = TRUE)))

Let us plot these variables on a graph using ggplot plotting system

fatalitiesPlot <- qplot(EVTYPE, data = summaryFatalities , weight = TotalFATALITIES, geom = "bar", binwidth = 1,width=350) + 
        scale_y_continuous("Number of Fatalities") + 
        
        theme(axis.text.x = element_text(angle = 90,
                                         hjust = 1)) + xlab("Storm Type") + 
        ggtitle("Weather events in the U.S..\n from 1990 - 2011")


InjuriesPlot <- qplot(EVTYPE, data = summaryTotalINJURIES , weight = TotalINJURIES, geom = "bar", binwidth = 1) + 
        scale_y_continuous("Number of Injuries") + 
        
        theme(axis.text.x = element_text(angle = 90,
                                         hjust = 1)) + xlab("Storm Type") + 
        ggtitle("Weather events in the U.S..\n from 1990 - 2011")


PropertyDamagePlot <- qplot(EVTYPE, data = summaryPropertyDamage , weight = TotalPropertyDamage, geom = "bar", binwidth = 1) + 
        scale_y_continuous("Property Damage") + 
        
        theme(axis.text.x = element_text(angle = 90,
                                         hjust = 1)) + xlab("Storm Type") + 
        ggtitle("Weather events in the U.S..\n from 1990 - 2011")




CropDamagePlot <- qplot(EVTYPE, data = summaryCropDamage , weight = TotalCropDamage, geom = "bar", binwidth = 1) + 
        scale_y_continuous("Crop Damage") + 
        
        theme(axis.text.x = element_text(angle = 90,
                                         hjust = 1)) + xlab("Storm Type") + 
        ggtitle("Weather events in the U.S.\n from 1990 - 2011")

Draw the Fatalities and injuries plot and compare the events ##Impact on public health Here we check the impact on public health by checking how many injuries and fatalaties by using the folloowing diagram

grid.arrange(fatalitiesPlot,InjuriesPlot,ncol=2,widths=c(1,1))

Based on Above diagram Excessive heat and Tornados causing more fatilies while Tornados and Floods causing more injuries

Impact on Economy

Here we check the impact of these evens on economy by checking how property damage and crop damage happened due to these storms events.

grid.arrange(PropertyDamagePlot,CropDamagePlot,ncol=2,widths=c(1,1))

Based on Above diagram floods and hurricanes causing more property damage while drought and Floods causing more crop damage.

Results

summaryCropDamage
##                EVTYPE TotalCropDamage
## 1             DROUGHT     13972566000
## 2               FLOOD      5661968450
## 3         RIVER FLOOD      5029459000
## 4           ICE STORM      5022113500
## 5                HAIL      3025537890
## 6           HURRICANE      2741910000
## 7   HURRICANE/TYPHOON      2607872800
## 8         FLASH FLOOD      1421317100
## 9        EXTREME COLD      1292973000
## 10       FROST/FREEZE      1094086000
## 11         HEAVY RAIN       733399800
## 12     TROPICAL STORM       678346000
## 13          HIGH WIND       638571300
## 14          TSTM WIND       554007350
## 15     EXCESSIVE HEAT       492402000
## 16             FREEZE       446225000
## 17            TORNADO       414953270
## 18  THUNDERSTORM WIND       414843050
## 19               HEAT       401461500
## 20           WILDFIRE       295472800
## 21    DAMAGING FREEZE       262100000
## 22 THUNDERSTORM WINDS       190650792
## 23  EXCESSIVE WETNESS       142000000
## 24     HURRICANE ERIN       136010000
## 25         HEAVY SNOW       134653100
summaryPropertyDamage
##                        EVTYPE TotalPropertyDamage
## 1                       FLOOD        144657709807
## 2           HURRICANE/TYPHOON         69305840000
## 3                 STORM SURGE         43323536000
## 4                     TORNADO         30447015620
## 5                 FLASH FLOOD         16140812067
## 6                        HAIL         15727367548
## 7                   HURRICANE         11868319010
## 8              TROPICAL STORM          7703890550
## 9                WINTER STORM          6688497251
## 10                  HIGH WIND          5270046295
## 11                RIVER FLOOD          5118945500
## 12                   WILDFIRE          4765114000
## 13           STORM SURGE/TIDE          4641188000
## 14                  TSTM WIND          4484928495
## 15                  ICE STORM          3944927860
## 16          THUNDERSTORM WIND          3483121284
## 17             HURRICANE OPAL          3152846020
## 18           WILD/FOREST FIRE          3001829500
## 19  HEAVY RAIN/SEVERE WEATHER          2500000000
## 20         THUNDERSTORM WINDS          1733461006
## 21 TORNADOES, TSTM WIND, HAIL          1600000000
## 22        SEVERE THUNDERSTORM          1205360000
## 23                    DROUGHT          1046106000
## 24                 HEAVY SNOW           932589142
## 25                  LIGHTNING           928659447
summaryFatalities
##               EVTYPE TotalFATALITIES
## 1     EXCESSIVE HEAT            1903
## 2            TORNADO            1752
## 3        FLASH FLOOD             978
## 4               HEAT             937
## 5          LIGHTNING             816
## 6              FLOOD             470
## 7        RIP CURRENT             368
## 8          TSTM WIND             327
## 9          HIGH WIND             248
## 10         AVALANCHE             224
## 11      WINTER STORM             206
## 12      RIP CURRENTS             204
## 13         HEAT WAVE             172
## 14      EXTREME COLD             160
## 15 THUNDERSTORM WIND             133
summaryTotalINJURIES
##               EVTYPE TotalINJURIES
## 1            TORNADO         26674
## 2              FLOOD          6789
## 3     EXCESSIVE HEAT          6525
## 4          LIGHTNING          5230
## 5          TSTM WIND          5022
## 6               HEAT          2100
## 7          ICE STORM          1975
## 8        FLASH FLOOD          1777
## 9  THUNDERSTORM WIND          1488
## 10      WINTER STORM          1321
## 11 HURRICANE/TYPHOON          1275
## 12              HAIL          1139
## 13         HIGH WIND          1137
## 14        HEAVY SNOW          1021
## 15          WILDFIRE           911

Conclusion

From the analysis, we came to know that floods and drought cause most damage to economy and Excessive heat and tornadoes cause more problems to public health.