R version 4.0.0 (2020-04-24)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.0.0 tools_4.0.0
The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The file was downladed from the couse website: Storm Data(47Mb).
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
- National Weather Service Storm Data Documentation
- National Climatic Data Center Storm Events FAQ
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
This sections seeks to describe (in words and code) how the data were loaded into R and processed for analysis.
library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library("ggplot2")
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
library(grid)
file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
file_path <- "stormdata.csv.bz2"
download.file(file_url, file_path, method = "curl")
stormdata <- read.csv(file_path, stringsAsFactors = FALSE)
View a snapshot of the first rows and columns of the dataset, the summary of the dataset and the structure of the dataset
#Snapshot of the first rows and columns of dataset
head(stormdata)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
#Summary of the dataset
summary(stormdata)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
##
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0.0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31.0 Class :character Class :character Class :character
## Median : 75.0 Mode :character Mode :character Mode :character
## Mean :100.6
## 3rd Qu.:131.0
## Max. :873.0
##
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE
## Min. : 0.000 Length:902297 Length:902297 Length:902297
## 1st Qu.: 0.000 Class :character Class :character Class :character
## Median : 0.000 Mode :character Mode :character Mode :character
## Mean : 1.484
## 3rd Qu.: 1.000
## Max. :3749.000
##
## END_TIME COUNTY_END COUNTYENDN END_RANGE
## Length:902297 Min. :0 Mode:logical Min. : 0.0000
## Class :character 1st Qu.:0 NA's:902297 1st Qu.: 0.0000
## Mode :character Median :0 Median : 0.0000
## Mean :0 Mean : 0.9862
## 3rd Qu.:0 3rd Qu.: 0.0000
## Max. :0 Max. :925.0000
##
## END_AZI END_LOCATI LENGTH WIDTH
## Length:902297 Length:902297 Min. : 0.0000 Min. : 0.000
## Class :character Class :character 1st Qu.: 0.0000 1st Qu.: 0.000
## Mode :character Mode :character Median : 0.0000 Median : 0.000
## Mean : 0.2301 Mean : 7.503
## 3rd Qu.: 0.0000 3rd Qu.: 0.000
## Max. :2315.0000 Max. :4400.000
##
## F MAG FATALITIES INJURIES
## Min. :0.0 Min. : 0.0 Min. : 0.0000 Min. : 0.0000
## 1st Qu.:0.0 1st Qu.: 0.0 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median :1.0 Median : 50.0 Median : 0.0000 Median : 0.0000
## Mean :0.9 Mean : 46.9 Mean : 0.0168 Mean : 0.1557
## 3rd Qu.:1.0 3rd Qu.: 75.0 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :5.0 Max. :22000.0 Max. :583.0000 Max. :1700.0000
## NA's :843563
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0.00 Length:902297 Min. : 0.000 Length:902297
## 1st Qu.: 0.00 Class :character 1st Qu.: 0.000 Class :character
## Median : 0.00 Mode :character Median : 0.000 Mode :character
## Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :5000.00 Max. :990.000
##
## WFO STATEOFFIC ZONENAMES LATITUDE
## Length:902297 Length:902297 Length:902297 Min. : 0
## Class :character Class :character Class :character 1st Qu.:2802
## Mode :character Mode :character Mode :character Median :3540
## Mean :2875
## 3rd Qu.:4019
## Max. :9706
## NA's :47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## Min. :-14451 Min. : 0 Min. :-14455 Length:902297
## 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 8707 Median : 0 Median : 0 Mode :character
## Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. : 17124 Max. :9706 Max. :106220
## NA's :40
## REFNUM
## Min. : 1
## 1st Qu.:225575
## Median :451149
## Mean :451149
## 3rd Qu.:676723
## Max. :902297
##
#Structure of the dataset
str(stormdata)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
Create a subset of the data relevant to the population health and economic consequences of severe weather events for analysis
substormdata <- stormdata[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
According to the data, popluation health is affected by injuries and fatalities. These are first summarized in decending order and variables are created for each.
pophealth <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data = substormdata, FUN=sum)
fatal <- pophealth[order(pophealth$FATALITIES, decreasing = T), ]
injury <- pophealth[order(pophealth$INJURIES, decreasing = T), ]
Create a function that converts the letter value of exponential values (h = hundred, k = thousand, m = million, b = billion) into numbers.
ExponentValue <- function(e) {
if (e %in% c("h", "H"))
return(2)
else if (e %in% c("k", "K"))
return(3)
else if (e %in% c("m", "M"))
return(6)
else if (e %in% c("b", "B"))
return(9)
else if (!is.na(as.numeric(e)))
return(as.numeric(e))
else if (e %in% c("", "-", "?", "+"))
return(0)
else {
stop("Invalid value.")
}
}
Use the function created previously to calculate the property damage and crop damage. Ignore warnings for NAs introduced by coercion.
PropertyExpVal <- sapply(substormdata$PROPDMGEXP, FUN=ExponentValue)
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
substormdata$PropertyDamage <- substormdata$PROPDMG * (10 ** PropertyExpVal)
CropExpVal <- sapply(substormdata$CROPDMGEXP, FUN=ExponentValue)
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
substormdata$CropDamage <- substormdata$CROPDMG * (10 ** CropExpVal)
Summarize the financial damage for property damage and crop damage based on the event type
EconomicDamage <- aggregate(cbind(PropertyDamage, CropDamage) ~ EVTYPE, data = substormdata, FUN=sum)
Remove events that have no financial impact
EconomicDamage <- EconomicDamage[(EconomicDamage$PropertyDamage > 0 | EconomicDamage$CropDamage > 0), ]
Sort the data in a decreasing order to be able to better identify the values that has most financial impact
PropertyDamageSorted <- EconomicDamage[order(EconomicDamage$PropertyDamage, decreasing = T), ]
CropDamageSorted <- EconomicDamage[order(EconomicDamage$CropDamage, decreasing = T), ]
This section presents the results to the analysis conducted
As required, there are no more than three figures presented, however figures are allowed to have multiple plots in them (ie panel plots)
The analysis is published on RPubs.com
Display the top 10 weather events causing injuries.
head(injury[, c("EVTYPE", "INJURIES")],10)
## EVTYPE INJURIES
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
Display the top 10 events causing fatalitites.
head(fatal[, c("EVTYPE", "FATALITIES")],10)
## EVTYPE FATALITIES
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
Create plots of the top 10 weather realated injuries and fatalaties in one grid for comparison.
library(ggplot2)
library(gridExtra)
library(grid)
p1 <- ggplot(data=head(injury,10), aes(x=reorder(EVTYPE, INJURIES), y=INJURIES)) +
geom_bar(fill= "red",stat="identity") +
coord_flip(xlim = NULL, ylim = NULL, expand = TRUE, clip = "on") +
ylab("Top 10 Weather Related Injuries") +
xlab("Event type") +
ggtitle("Injuries versus Fatalities casued by Weather Events") +
theme_light()
p2 <- ggplot(data=head(fatal,10), aes(x=reorder(EVTYPE, FATALITIES), y=FATALITIES, add = TRUE)) +
geom_bar(fill= "blue",stat="identity") +
coord_flip(xlim = NULL, ylim = NULL, expand = TRUE, clip = "on") +
ylab("Top 10 Weather Related Fatalities") +
xlab("Event type") +
theme_light()
grid.arrange(p1, p2, nrow=2)
Summary of Analysis - Polulation Health
According to the results of the data processing, Tornado was the number one event type to cause weather related injuries and fatalities. Other events common to both injuries and fatalities within the top 10 included Flood, Excessive Heat, TSTM Wind, Heat, Lightning and Flash Flood.
Display the top 10 weather events causing financial damage to property.
head(PropertyDamageSorted[, c("EVTYPE", "PropertyDamage")], 10)
## EVTYPE PropertyDamage
## 170 FLOOD 144657709807
## 411 HURRICANE/TYPHOON 69305840000
## 834 TORNADO 56947380677
## 670 STORM SURGE 43323536000
## 153 FLASH FLOOD 16822673979
## 244 HAIL 15735267513
## 402 HURRICANE 11868319010
## 848 TROPICAL STORM 7703890550
## 972 WINTER STORM 6688497251
## 359 HIGH WIND 5270046295
Display the top 10 weather events causing financial damage to property.
head(CropDamageSorted[, c("EVTYPE", "CropDamage")], 10)
## EVTYPE CropDamage
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 590 RIVER FLOOD 5029459000
## 427 ICE STORM 5022113500
## 244 HAIL 3025954473
## 402 HURRICANE 2741910000
## 411 HURRICANE/TYPHOON 2607872800
## 153 FLASH FLOOD 1421317100
## 140 EXTREME COLD 1292973000
## 212 FROST/FREEZE 1094086000
Create plots of the top 10 weather realated injuries and fatalities in one grid for comparison.
p1 <- ggplot(data=head(PropertyDamageSorted,10), aes(x=reorder(EVTYPE, PropertyDamage), y= PropertyDamage/1000000000, fill=PropertyDamage)) +
geom_bar(fill="orange", stat="identity") +
coord_flip() +
xlab("Event type") +
ylab("Financial Value of damages to Property in Billion Dollars") +
ggtitle("Property versus Crop Damage casued by Weather Events") +
theme_light()
p2 <- ggplot(data=head(CropDamageSorted,10), aes(x=reorder(EVTYPE, CropDamage), y= CropDamage/1000000000, fill=CropDamage)) +
geom_bar(fill="green", stat="identity") +
coord_flip() +
xlab("Event type") +
ylab("Financial Value of damages to Crops in Billion Dollars") +
theme_light()
grid.arrange(p1, p2, ncol=1, nrow =2)
Summary of Analysis - Economic Consequences
According to the results of the data processing, the event ‘Flood’ accounted for the most costly property damage while it was the second costly for damages to crops. The most costly damages to crops was caused by ‘Drought’.
Overall, Tornadoes accounted for the highest costs in relation to injuries and fatalities caused by weather events while drought, Hurricane/Typhoon and flood accounted for the highest costs related to crop and property damages.