Here we get the data from the course website and documentation from National Weather Service website. Based on the documentation we do the exploratory analsysis and see what kind of weather events are causing more damage and show the top 15 events which are causing more affect on publich health by using the parameters Property damage, crop damage, Injuries and fatalities. Here we got the information about events with start date and end date. by using this information we can see that national weather service is capturing data more from 1990 onwards. since we have more data, we restrict our analysis for the events captured after 1990.
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
require(gridExtra)
## Loading required package: gridExtra
library(ggplot2)
if (!"stormData" %in% ls()) {
stormData <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))
}
dim(stormData)
## [1] 902297 37
colnames(stormData)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
StormData dataset contains 902297 observations and r dim(stormData)[2] variables. Let us add the year attribute to see how many weather events recorded and from when we have the recorded data storm weather database.
stormData$year <- as.numeric(format(as.Date(stormData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))
hist(stormData$year,breaks=50)
According to the histogram National weather storm database have recordings since 1950,but let us focus from 1990 for our analysis. filter the data to take data since 1990
stormData <- stormData[stormData$year>=1990,]
some of the events are not valid events. Let us focus our analysis on the valid list of events by filtering the data.
#stormData<- filter(stormData, EVTYPE %in% c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme Cold/Wind Chill", "Flash Flood", "Flood", "Freezing Fog", "Frost/Freeze", "Funnel Cloud", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane/Typhoon", "Ice Storm", "Lakeshore Flood", "Lake-Effect Snow", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather") )
After exploring the data, we are going to focus on these variables for the analysis. The variables we use are EVTYPE, FATALITIES,INJURIES,PROPDMGEXP,CROPDMGEXP,PROPDMG,CROPDMG.
PROPDMGEXP and CROPDMGEXP variables are giving the cost measurement for that particular measure ment. To do our analysis, we need to convert all the data in to single measurement by converting Billions, Millions, Thousands and Hundreds in to raw numbers by adding couple of additional fields.
stormData<- stormData %>%
mutate(ProperyCostMeasure=0,PropertyDamage=0,CropCostMeasure=0,CropDamage=0)
stormData$ProperyCostMeasure[stormData$PROPDMGEXP=="B"] <- 9
stormData$ProperyCostMeasure[stormData$PROPDMGEXP=="M"] <- 6
stormData$ProperyCostMeasure[stormData$PROPDMGEXP=="K"] <- 3
stormData$ProperyCostMeasure[stormData$PROPDMGEXP=="H"] <- 2
stormData$PropertyDamage=stormData$PROPDMG*10^stormData$ProperyCostMeasure
stormData$CropCostMeasure[stormData$CROPDMGEXP=="B"] <- 9
stormData$CropCostMeasure[stormData$CROPDMGEXP=="M"] <- 6
stormData$CropCostMeasure[stormData$CROPDMGEXP=="K"] <- 3
stormData$CropCostMeasure[stormData$CROPDMGEXP=="H"] <- 2
stormData$CropDamage=stormData$CROPDMG*10^stormData$CropCostMeasure
Let us aggregate the data by event and see which events are affecting more and nature of the impact.
SummaryStormData<- stormData %>%
group_by(EVTYPE) %>%
summarise(TotalPropertyDamage = sum(PropertyDamage,na.rm=TRUE),
TotalCropDamage = sum(CropDamage,na.rm=TRUE),
TotalFATALITIES = sum(FATALITIES,na.rm=TRUE),
TotalINJURIES = sum(INJURIES,na.rm=TRUE)
)
For each variables we are interested in, we create a summary data set with top 15 events affecting that variable. ##Fatalities
summaryFatalities <- SummaryStormData %>%
select(EVTYPE, TotalFATALITIES)%>%
arrange(desc(TotalFATALITIES),EVTYPE)%>%
top_n(15)
## Selecting by TotalFATALITIES
summaryFatalities <- transform(summaryFatalities,
EVTYPE = reorder(EVTYPE,
order(TotalFATALITIES, decreasing = TRUE)))
summaryTotalINJURIES <- SummaryStormData %>%
select(EVTYPE, TotalINJURIES)%>%
arrange(desc(TotalINJURIES),EVTYPE)%>%
top_n(15)
## Selecting by TotalINJURIES
summaryTotalINJURIES <- transform(summaryTotalINJURIES,
EVTYPE = reorder(EVTYPE,
order(TotalINJURIES, decreasing = TRUE)))
summaryPropertyDamage <- SummaryStormData %>%
select(EVTYPE, TotalPropertyDamage)%>%
arrange(desc(TotalPropertyDamage),EVTYPE)%>%
top_n(25)
## Selecting by TotalPropertyDamage
summaryPropertyDamage <- transform(summaryPropertyDamage,
EVTYPE = reorder(EVTYPE,
order(TotalPropertyDamage, decreasing = TRUE)))
summaryCropDamage <- SummaryStormData %>%
select(EVTYPE, TotalCropDamage)%>%
arrange(desc(TotalCropDamage),EVTYPE)%>%
top_n(25)
## Selecting by TotalCropDamage
summaryCropDamage <- transform(summaryCropDamage,
EVTYPE = reorder(EVTYPE,
order(TotalCropDamage, decreasing = TRUE)))
Let us plot these variables on a graph using ggplot plotting system
fatalitiesPlot <- qplot(EVTYPE, data = summaryFatalities , weight = TotalFATALITIES, geom = "bar", binwidth = 1,width=350) +
scale_y_continuous("Number of Fatalities") +
theme(axis.text.x = element_text(angle = 90,
hjust = 1)) + xlab("Storm Type") +
ggtitle("Weather events in the U.S..\n from 1990 - 2011")
InjuriesPlot <- qplot(EVTYPE, data = summaryTotalINJURIES , weight = TotalINJURIES, geom = "bar", binwidth = 1) +
scale_y_continuous("Number of Injuries") +
theme(axis.text.x = element_text(angle = 90,
hjust = 1)) + xlab("Storm Type") +
ggtitle("Weather events in the U.S..\n from 1990 - 2011")
PropertyDamagePlot <- qplot(EVTYPE, data = summaryPropertyDamage , weight = TotalPropertyDamage, geom = "bar", binwidth = 1) +
scale_y_continuous("Property Damage") +
theme(axis.text.x = element_text(angle = 90,
hjust = 1)) + xlab("Storm Type") +
ggtitle("Weather events in the U.S..\n from 1990 - 2011")
CropDamagePlot <- qplot(EVTYPE, data = summaryCropDamage , weight = TotalCropDamage, geom = "bar", binwidth = 1) +
scale_y_continuous("Crop Damage") +
theme(axis.text.x = element_text(angle = 90,
hjust = 1)) + xlab("Storm Type") +
ggtitle("Weather events in the U.S.\n from 1990 - 2011")
Draw the Fatalities and injuries plot and compare the events ##Impact on public health Here we check the impact on public health by checking how many injuries and fatalaties by using the folloowing diagram
grid.arrange(fatalitiesPlot,InjuriesPlot,ncol=2,widths=c(1,1))
Based on Above diagram Excessive heat and Tornados causing more fatilies while Tornados and Floods causing more injuries
Here we check the impact of these evens on economy by checking how property damage and crop damage happened due to these storms events.
grid.arrange(PropertyDamagePlot,CropDamagePlot,ncol=2,widths=c(1,1))
Based on Above diagram floods and hurricanes causing more property damage while drought and Floods causing more crop damage.
summaryCropDamage
## EVTYPE TotalCropDamage
## 1 DROUGHT 13972566000
## 2 FLOOD 5661968450
## 3 RIVER FLOOD 5029459000
## 4 ICE STORM 5022113500
## 5 HAIL 3025537890
## 6 HURRICANE 2741910000
## 7 HURRICANE/TYPHOON 2607872800
## 8 FLASH FLOOD 1421317100
## 9 EXTREME COLD 1292973000
## 10 FROST/FREEZE 1094086000
## 11 HEAVY RAIN 733399800
## 12 TROPICAL STORM 678346000
## 13 HIGH WIND 638571300
## 14 TSTM WIND 554007350
## 15 EXCESSIVE HEAT 492402000
## 16 FREEZE 446225000
## 17 TORNADO 414953270
## 18 THUNDERSTORM WIND 414843050
## 19 HEAT 401461500
## 20 WILDFIRE 295472800
## 21 DAMAGING FREEZE 262100000
## 22 THUNDERSTORM WINDS 190650792
## 23 EXCESSIVE WETNESS 142000000
## 24 HURRICANE ERIN 136010000
## 25 HEAVY SNOW 134653100
summaryPropertyDamage
## EVTYPE TotalPropertyDamage
## 1 FLOOD 144657709807
## 2 HURRICANE/TYPHOON 69305840000
## 3 STORM SURGE 43323536000
## 4 TORNADO 30447015620
## 5 FLASH FLOOD 16140812067
## 6 HAIL 15727367548
## 7 HURRICANE 11868319010
## 8 TROPICAL STORM 7703890550
## 9 WINTER STORM 6688497251
## 10 HIGH WIND 5270046295
## 11 RIVER FLOOD 5118945500
## 12 WILDFIRE 4765114000
## 13 STORM SURGE/TIDE 4641188000
## 14 TSTM WIND 4484928495
## 15 ICE STORM 3944927860
## 16 THUNDERSTORM WIND 3483121284
## 17 HURRICANE OPAL 3152846020
## 18 WILD/FOREST FIRE 3001829500
## 19 HEAVY RAIN/SEVERE WEATHER 2500000000
## 20 THUNDERSTORM WINDS 1733461006
## 21 TORNADOES, TSTM WIND, HAIL 1600000000
## 22 SEVERE THUNDERSTORM 1205360000
## 23 DROUGHT 1046106000
## 24 HEAVY SNOW 932589142
## 25 LIGHTNING 928659447
summaryFatalities
## EVTYPE TotalFATALITIES
## 1 EXCESSIVE HEAT 1903
## 2 TORNADO 1752
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 FLOOD 470
## 7 RIP CURRENT 368
## 8 TSTM WIND 327
## 9 HIGH WIND 248
## 10 AVALANCHE 224
## 11 WINTER STORM 206
## 12 RIP CURRENTS 204
## 13 HEAT WAVE 172
## 14 EXTREME COLD 160
## 15 THUNDERSTORM WIND 133
summaryTotalINJURIES
## EVTYPE TotalINJURIES
## 1 TORNADO 26674
## 2 FLOOD 6789
## 3 EXCESSIVE HEAT 6525
## 4 LIGHTNING 5230
## 5 TSTM WIND 5022
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 WINTER STORM 1321
## 11 HURRICANE/TYPHOON 1275
## 12 HAIL 1139
## 13 HIGH WIND 1137
## 14 HEAVY SNOW 1021
## 15 WILDFIRE 911
From the analysis, we came to know that floods and drought cause most damage to economy and Excessive heat and tornadoes cause more problems to public health.