Mahmoud Shaaban
Here we report on the effect of stroms and other sever weather events on population health and economy across the USA. The data used here was collecte by NOAA, few transformations were added to the original data set maily to reorder the data by the event type then show the totals of the effects by state and county.
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Data was first read using the provided orginal file of .csv.dz2 extension. No further transformation was done on the data as this step. The data set is stored in dat object for further analysis and the next code is exploring the dat object dimensions and structure.
# reading data
dat <- read.csv("repdata-data-StormData.csv.bz2")
dim(dat)
## [1] 902297 37
str(dat)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "10/10/1954 0:00:00",..: 6523 6523 4213 11116 1426 1426 1462 2873 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "000","0000","00:00:00 AM",..: 212 257 2645 1563 2524 3126 122 1563 3126 3126 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels "?","ABNORMALLY DRY",..: 830 830 830 830 830 830 830 830 830 830 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels "","E","Eas","EE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","?","(01R)AFB GNRY RNG AL",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","10/10/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels "","?","0000",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","(0E4)PAYSON ARPT",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels "","2","43","9V9",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels ""," "," "," ",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
For summarizing the effect of variable weather events on population health a few transformations were done using dplyr pckage. First, we grouped sum the injuries and fatalities by event type and reported the top ten types of events that has the biggest effect on population health in terms of number of total injuries and fatalities. Then we show the same values for each state and county.
The following chunck of code groups the total injuries and fatalities by the event type and reports the biggest 10 events that had an impact on poulation health across USA.
# summary of injuries and fatalities with different types of events
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
event <- dat %>%
group_by(EVTYPE) %>%
summarize(injuries = sum(INJURIES), fatalities = sum(FATALITIES))
topinj <- arrange(event, desc(injuries))
head(topinj, 10) # top ten events by casualities
## Source: local data frame [10 x 3]
##
## EVTYPE injuries fatalities
## (fctr) (dbl) (dbl)
## 1 TORNADO 91346 5633
## 2 TSTM WIND 6957 504
## 3 FLOOD 6789 470
## 4 EXCESSIVE HEAT 6525 1903
## 5 LIGHTNING 5230 816
## 6 HEAT 2100 937
## 7 ICE STORM 1975 89
## 8 FLASH FLOOD 1777 978
## 9 THUNDERSTORM WIND 1488 133
## 10 HAIL 1361 15
Next chunck of code groups the total injuries and fatalities by the event type and reports the total injuries and fatalites for each state in two bar graphes. Top graph shows total injuries for each state along the whole recorded period when the data were collected. The bottom graph showes total number of fatalities for each state along the same period of time
# summary of injuries and fatalities across the states caused by different events
states <- dat %>%
group_by(STATE, EVTYPE) %>%
summarize(fatalities = sum(FATALITIES),
injuries = sum(INJURIES))
totals <- states %>% group_by(STATE) %>% summarize(injuries = sum(injuries), fatalities = sum(fatalities))
totals$total <- totals$injuries + totals$fatalities
par(mfrow=c(2,1), mar = c(2,2,5,2), cex.axis = 0.5)
with(totals,
barplot(injuries,
names.arg = STATE,
col = "red"),main = "Number of injures per state")
with(totals,
barplot(fatalities,
names.arg = STATE,
col = "blue"),
main = "Number of injures per state")
Third chunck of code groups the total injuries and fatalities by the event type and reports the total injuries and fatalites for each county and stores the result in object counties then reports the first few rows of the object.
# summary of injuries and fatalities across the counties caused by different events
counties <- dat %>%
group_by(COUNTYNAME, EVTYPE) %>%
summarize(fatalities = sum(FATALITIES),
injuries = sum(INJURIES))
head(counties)
## Source: local data frame [6 x 4]
## Groups: COUNTYNAME [1]
##
## COUNTYNAME EVTYPE fatalities injuries
## (fctr) (fctr) (dbl) (dbl)
## 1 Coastal Flood 0 0
## 2 Coastal Flooding 0 0
## 3 FLASH FLOOD 0 0
## 4 Funnel Cloud 0 0
## 5 FUNNEL CLOUD 0 0
## 6 HAIL 0 0
Similar transformations using dplyr package were done to summarize the effects of sever weather events of economy across the USA. Data were first grouped by event type and reported damage in terms of properties and crops damage. First, events with biggest effect were reported then summary of effect on each state and counties follows.
The following chunck of code groups the total properties and crops damage by the event type and reports the biggest 10 events that had an impact on economy of USA.
# summary of property and crop damage with different types of events
library(dplyr)
eventdmg <- dat %>%
group_by(EVTYPE) %>%
summarize(property = sum(PROPDMG), crop = sum(CROPDMG))
topdmg <- arrange(eventdmg, desc(property))
head(topdmg, 10)
## Source: local data frame [10 x 3]
##
## EVTYPE property crop
## (fctr) (dbl) (dbl)
## 1 TORNADO 3212258.2 100018.52
## 2 FLASH FLOOD 1420124.6 179200.46
## 3 TSTM WIND 1335965.6 109202.60
## 4 FLOOD 899938.5 168037.88
## 5 THUNDERSTORM WIND 876844.2 66791.45
## 6 HAIL 688693.4 579596.28
## 7 LIGHTNING 603351.8 3580.61
## 8 THUNDERSTORM WINDS 446293.2 18684.93
## 9 HIGH WIND 324731.6 17283.21
## 10 WINTER STORM 132720.6 1978.99
Next chunck of code groups the total damages by the event type and reports the totals for each state in two bar graphes. Top graph shows total property damage for each state along the whole recorded period when the data were collected. The bottom graph showes total crops damage for each state along the same period of time.
# summary of property and crop damage across the states caused by different events
statesdmg <- dat %>%
group_by(STATE, EVTYPE) %>%
summarize(property = sum(PROPDMG),
crop = sum(CROPDMG))
totalsdmg <- statesdmg %>%
group_by(STATE) %>%
summarize(property = sum(property),
crop = sum(crop))
head(totalsdmg)
## Source: local data frame [6 x 3]
##
## STATE property crop
## (fctr) (dbl) (dbl)
## 1 AK 33995.51 205.00
## 2 AL 363606.66 9666.94
## 3 AM 5653.80 50.00
## 4 AN 294.00 0.00
## 5 AR 361121.58 25819.13
## 6 AS 2954.50 1564.00
totalsdmg$total <- totalsdmg$property + totalsdmg$crop
par(mfrow=c(2,1), mar = c(2,2,5,2), cex.axis = 0.5)
with(totalsdmg,
barplot(property,
names.arg = STATE,
col = "red"),
main = "Amount of property damages per state")
with(totalsdmg,
barplot(crop,
names.arg = STATE,
col = "blue"),
main = "Amount of crop damages per stat")
Lastly, a chunck of code groups the total damages by the event type and reports the total property and crops damage for each county and stores the result in object countiesdmg then reports the first few rows of the object.
# summary of property and crop damage across the counties caused by different events
countiesdmg <- dat %>%
group_by(COUNTYNAME, EVTYPE) %>%
summarize(property = sum(PROPDMG),
crop = sum(CROPDMG))
head(arrange(countiesdmg, desc(property)))
## Source: local data frame [6 x 4]
## Groups: COUNTYNAME [1]
##
## COUNTYNAME EVTYPE property crop
## (fctr) (fctr) (dbl) (dbl)
## 1 Coastal Flood 270 0
## 2 FLASH FLOOD 100 0
## 3 HEAVY RAIN 45 0
## 4 TORNADO 15 0
## 5 Coastal Flooding 5 0
## 6 Funnel Cloud 0 0