United States gets effected with various calamities. Below analysis is used for projecting the impact caused by various calamities. The data is an official collected by National Oceanic and Atmospheric Administration (NOAA). This data has been collected from 1950 to 2011. The analysis explorers various weather events that effected various parts of United States. The analysis can be a feed for government or municipal manager who might be responsible for preparing for severe weather events.
Here we will be reading the data file and massaging the data to get fruitful results.
if (!exists("stormData")) {
# ignore blank lines, if any
stormData <- read.csv(bzfile("repdata-data-StormData.csv.bz2"), header = TRUE, blank.lines.skip = TRUE)
}
head(stormData)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
We will have to set the dates to Date format for validating year wise damages. We will even create a subset of storm data what will have rows which have proper values for columns: FATALITIES, INJURIES, PROPDMG and CROPDMG
# filter out data that doesn't have values.
cleanStormData <- stormData[stormData$FATALITIES>0 & stormData$INJURIES>0 & stormData$PROPDMG>0 & stormData$CROPDMG>0,]
To address this question, we will have to concentrate on FATALITIES, INJURIES, as these two will be feed for analyzing population health
Aggregate FATALITIES based on the EVTYPE and order by FATALITIES (descending) to list top calamities that are harmful w.r.t population health
# aggreate FATALITIES
aggreateFacilties <- aggregate(FATALITIES ~ EVTYPE, cleanStormData, FUN = sum)
# sort FATALITIES in decending order
aggreateFacilties <- aggreateFacilties[order(-aggreateFacilties$FATALITIES),]
head(aggreateFacilties)
## EVTYPE FATALITIES
## 15 TORNADO 190
## 4 FLOOD 58
## 2 EXCESSIVE HEAT 46
## 19 TSUNAMI 32
## 20 WILDFIRE 31
## 3 FLASH FLOOD 23
Plot FATALITIES that are harmful
library(ggplot2)
# use ggplot to generate a chart based on the aggreated object
ggplot(aggreateFacilties[1:5,], aes(EVTYPE, FATALITIES) ) + geom_bar(stat="identity")+ labs(title="Top facilties that are harmful with respect to population health",x="Event Source",y="Facitiles count")
Aggregate INJURIES based on the EVTYPE and order by INJURIES (descending) to list top calamities that are harmful w.r.t population health
# aggreate INJURIES
aggreateInjuries <- aggregate(INJURIES ~ EVTYPE, cleanStormData, FUN = sum)
# sort INJURIES in decending order
aggreateInjuries <- aggreateInjuries[order(-aggreateInjuries$INJURIES),]
head(aggreateInjuries)
## EVTYPE INJURIES
## 4 FLOOD 2495
## 15 TORNADO 1630
## 12 ICE STORM 1568
## 11 HURRICANE/TYPHOON 884
## 1 BLIZZARD 402
## 5 HEAT 320
Plot INJURIES that are harmful
# use ggplot to generate a chart based on the aggreated object
ggplot(aggreateInjuries[1:5,], aes(EVTYPE, INJURIES) ) + geom_bar(stat="identity")+ labs(title="Top injuries harmful with respect to population health",x="Event Source",y="Injuries count")
To address this question, we will have to concentrate on PROPDMG, CROPDMG, as these two will be feed for analyzing greatest economic consequences
Aggregate PROPDMG based on the EVTYPE and order by PROPDMG (descending) to list top calamities that are harmful w.r.t economic consequences
# aggreate PROPDMG
aggreateProperty <- aggregate(PROPDMG ~ EVTYPE, cleanStormData, FUN = sum)
# sort PROPDMG in ascending order
aggreateProperty <- aggreateProperty[order(-aggreateProperty$PROPDMG),]
head(aggreateProperty)
## EVTYPE PROPDMG
## 15 TORNADO 6888.06
## 8 HIGH WIND 1747.89
## 13 THUNDERSTORM WIND 800.00
## 14 THUNDERSTORM WINDS 755.00
## 4 FLOOD 720.50
## 16 TROPICAL STORM 678.47
Plot PROPDMG that are harmful
# use ggplot to generate a chart based on the aggreated object
ggplot(aggreateProperty[1:5,], aes(EVTYPE, PROPDMG) ) + geom_bar(stat="identity")+ labs(title="Top property damaged causing greatest economic consequences",x="Event Source",y="Property damages count")
Aggregate CROPDMG based on the EVTYPE and order by CROPDMG (descending) to list top calamities that are harmful w.r.t economic consequences
# aggreate CROPDMG
aggreateCrop <- aggregate(CROPDMG ~ EVTYPE, cleanStormData, FUN = sum)
# sort CROPDMG in ascending order
aggreateCrop <- aggreateCrop[order(-aggreateCrop$CROPDMG),]
head(aggreateCrop)
## EVTYPE CROPDMG
## 15 TORNADO 5313.3
## 4 FLOOD 1746.5
## 3 FLASH FLOOD 1686.5
## 13 THUNDERSTORM WIND 600.0
## 9 HIGH WINDS 500.0
## 17 TROPICAL STORM GORDON 500.0
Plot PROPDMG that are harmful
# use ggplot to generate a chart based on the aggreated object
ggplot(aggreateCrop[1:5,], aes(EVTYPE, CROPDMG) ) + geom_bar(stat="identity")+ labs(title="Top crop damaged causing greatest economic consequences",x="Event Source",y="Crop damages count")
#rm(stormData, cleanStormData, aggreateFacilties, aggreateInjuries, aggreateProperty, )