Storms and weather events are cause of severe public health and economic problems. The analysis of this information will help obtain conclusions about the required support to be prepared and prevent such outcomes.
The information provided by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) can help us to analyze the characteristics of major storms and weather events in the U.S. and obtain
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
First clean the environment and setup the working directory
rm(list= ls()) setwd(“C:Files-Hopkins-week4”)
#DATA PROCESSING
Now downloading the file
if (!file.exists("StormData.csv.bz2")) {
fileURL <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
download.file(fileURL, destfile='StormData.csv.bz2', method = 'curl')
}
NoaaData <- read.csv(bzfile('StormData.csv.bz2'),header=TRUE, stringsAsFactors = FALSE)
Load libraries for tidying
Loading required package: tidyr
require(dplyr)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Loading required package: lubridate
require(lubridate)
## Loading required package: lubridate
## Warning: package 'lubridate' was built under R version 4.1.1
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
Loading required package: ggplot2
require(ggplot2)
## Loading required package: ggplot2
The Summary of the information
summary(NoaaData)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
##
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0.0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31.0 Class :character Class :character Class :character
## Median : 75.0 Mode :character Mode :character Mode :character
## Mean :100.6
## 3rd Qu.:131.0
## Max. :873.0
##
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE
## Min. : 0.000 Length:902297 Length:902297 Length:902297
## 1st Qu.: 0.000 Class :character Class :character Class :character
## Median : 0.000 Mode :character Mode :character Mode :character
## Mean : 1.484
## 3rd Qu.: 1.000
## Max. :3749.000
##
## END_TIME COUNTY_END COUNTYENDN END_RANGE
## Length:902297 Min. :0 Mode:logical Min. : 0.0000
## Class :character 1st Qu.:0 NA's:902297 1st Qu.: 0.0000
## Mode :character Median :0 Median : 0.0000
## Mean :0 Mean : 0.9862
## 3rd Qu.:0 3rd Qu.: 0.0000
## Max. :0 Max. :925.0000
##
## END_AZI END_LOCATI LENGTH WIDTH
## Length:902297 Length:902297 Min. : 0.0000 Min. : 0.000
## Class :character Class :character 1st Qu.: 0.0000 1st Qu.: 0.000
## Mode :character Mode :character Median : 0.0000 Median : 0.000
## Mean : 0.2301 Mean : 7.503
## 3rd Qu.: 0.0000 3rd Qu.: 0.000
## Max. :2315.0000 Max. :4400.000
##
## F MAG FATALITIES INJURIES
## Min. :0.0 Min. : 0.0 Min. : 0.0000 Min. : 0.0000
## 1st Qu.:0.0 1st Qu.: 0.0 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median :1.0 Median : 50.0 Median : 0.0000 Median : 0.0000
## Mean :0.9 Mean : 46.9 Mean : 0.0168 Mean : 0.1557
## 3rd Qu.:1.0 3rd Qu.: 75.0 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :5.0 Max. :22000.0 Max. :583.0000 Max. :1700.0000
## NA's :843563
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0.00 Length:902297 Min. : 0.000 Length:902297
## 1st Qu.: 0.00 Class :character 1st Qu.: 0.000 Class :character
## Median : 0.00 Mode :character Median : 0.000 Mode :character
## Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :5000.00 Max. :990.000
##
## WFO STATEOFFIC ZONENAMES LATITUDE
## Length:902297 Length:902297 Length:902297 Min. : 0
## Class :character Class :character Class :character 1st Qu.:2802
## Mode :character Mode :character Mode :character Median :3540
## Mean :2875
## 3rd Qu.:4019
## Max. :9706
## NA's :47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## Min. :-14451 Min. : 0 Min. :-14455 Length:902297
## 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 8707 Median : 0 Median : 0 Mode :character
## Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. : 17124 Max. :9706 Max. :106220
## NA's :40
## REFNUM
## Min. : 1
## 1st Qu.:225575
## Median :451149
## Mean :451149
## 3rd Qu.:676723
## Max. :902297
##
str(NoaaData)
#RESULTS
Which types of events are most harmful to population health?
The fatalities.
The following information displays the most common fatalities and the number of events accurred in the last 50 years:
NoFatalities <- aggregate(NoaaData$FATALITIES, by = list(NoaaData$EVTYPE), "sum")
names(NoFatalities) <- c("Event", "Fatalities")
TotalFatalitiesSorted <- NoFatalities[order(-NoFatalities$Fatalities), ][1:20, ]
TotalFatalitiesSorted
## Event Fatalities
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
## 972 WINTER STORM 206
## 586 RIP CURRENTS 204
## 278 HEAT WAVE 172
## 140 EXTREME COLD 160
## 760 THUNDERSTORM WIND 133
## 310 HEAVY SNOW 127
## 141 EXTREME COLD/WIND CHILL 125
## 676 STRONG WIND 103
## 30 BLIZZARD 101
## 350 HIGH SURF 101
The injuries.
The following information displays the most common injuries and the number of events accurred in the last 50 years:
NoInjuries <- aggregate(NoaaData$INJURIES, by = list(NoaaData$EVTYPE), "sum")
names(NoInjuries) <- c("Event", "Injuries")
TotalInjuriesSorted <- NoInjuries[order(-NoInjuries$Injuries), ][1:20, ]
TotalInjuriesSorted
## Event Injuries
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
## 972 WINTER STORM 1321
## 411 HURRICANE/TYPHOON 1275
## 359 HIGH WIND 1137
## 310 HEAVY SNOW 1021
## 957 WILDFIRE 911
## 786 THUNDERSTORM WINDS 908
## 30 BLIZZARD 805
## 188 FOG 734
## 955 WILD/FOREST FIRE 545
## 117 DUST STORM 440
Fatalities and injuries in a single plot:
The following plot displays the most common fatalities & injuries and the number of events accurred in the last 50 years:
par(mfrow = c(1, 2), mar = c(10, 4, 2, 2), las = 3, cex = 0.7, cex.main = 1.4, cex.lab = 1.2)
barplot(TotalFatalitiesSorted$Fatalities, names.arg = TotalFatalitiesSorted$Event, col = 'yellow',
main = 'Top 20 Weather Events for Fatalities', ylab = 'Number of Fatalities', ylim= c(0, 6000))
barplot(TotalInjuriesSorted$Injuries, names.arg = TotalInjuriesSorted$Event, col = 'green',
main = 'Top 20 Weather Events for Injuries', ylab = 'Number of Injuries', ylim= c(0, 100000))
which types of events have the greatest economic consequences?
The following analysis calculates the economic consequences for damages and crop:
Calculate the cost of property and crop damages seperately.
The property:
CostProperty <- aggregate(NoaaData$PROPDMG, by = list(NoaaData$EVTYPE), "sum")
names(CostProperty) <- c("Event", "Property")
TotalCostPropertySorted <- CostProperty[order(-CostProperty$Property), ][1:20, ]
TotalCostPropertySorted
## Event Property
## 834 TORNADO 3212258.16
## 153 FLASH FLOOD 1420124.59
## 856 TSTM WIND 1335965.61
## 170 FLOOD 899938.48
## 760 THUNDERSTORM WIND 876844.17
## 244 HAIL 688693.38
## 464 LIGHTNING 603351.78
## 786 THUNDERSTORM WINDS 446293.18
## 359 HIGH WIND 324731.56
## 972 WINTER STORM 132720.59
## 310 HEAVY SNOW 122251.99
## 957 WILDFIRE 84459.34
## 427 ICE STORM 66000.67
## 676 STRONG WIND 62993.81
## 376 HIGH WINDS 55625.00
## 290 HEAVY RAIN 50842.14
## 848 TROPICAL STORM 48423.68
## 955 WILD/FOREST FIRE 39344.95
## 164 FLASH FLOODING 28497.15
## 919 URBAN/SML STREAM FLD 26051.94
The crop:
TotalCrop <- aggregate(NoaaData$CROPDMG, by = list(NoaaData$EVTYPE), "sum")
names(TotalCrop) <- c("Event", "Crop")
TotalCropSorted <- TotalCrop[order(-TotalCrop$Crop), ][1:20, ]
TotalCropSorted
## Event Crop
## 244 HAIL 579596.28
## 153 FLASH FLOOD 179200.46
## 170 FLOOD 168037.88
## 856 TSTM WIND 109202.60
## 834 TORNADO 100018.52
## 760 THUNDERSTORM WIND 66791.45
## 95 DROUGHT 33898.62
## 786 THUNDERSTORM WINDS 18684.93
## 359 HIGH WIND 17283.21
## 290 HEAVY RAIN 11122.80
## 212 FROST/FREEZE 7034.14
## 140 EXTREME COLD 6121.14
## 848 TROPICAL STORM 5899.12
## 402 HURRICANE 5339.31
## 164 FLASH FLOODING 5126.05
## 411 HURRICANE/TYPHOON 4798.48
## 957 WILDFIRE 4364.20
## 873 TSTM WIND/HAIL 4356.65
## 955 WILD/FOREST FIRE 4189.54
## 464 LIGHTNING 3580.61
Next plot both the cost of property and crop damages in a single plot:
The next plot displays the top 20 events of property and crop damages.
par(mfrow = c(1, 2), mar = c(10, 4, 2, 2), las = 3, cex = 0.5, cex.main = 1.4, cex.lab = 1.2)
barplot(TotalCostPropertySorted$Property, names.arg = TotalCostPropertySorted$Event, col = 'Brown',
main = 'Top 20 Weather Events for Property Damage ', ylab = 'Amount of Property Damage', ylim = c(0, 3500000))
barplot(TotalCropSorted$Crop, names.arg = TotalCropSorted$Event, col = 'Green',
main = 'Top 20 Weather Events for Crop Damage', ylab = 'Amount of Crop Damage', ylim = c(0, 3500000))
Total damage by adding both costs (property and crop damage)
The following information displays the total cost of property and crop.
SuperTotalCost <- aggregate(NoaaData$CROPDMG+NoaaData$PROPDMG, by = list(NoaaData$EVTYPE), "sum")
names(SuperTotalCost) <- c("Event", "TotalCost")
SuperTotalCostSorted <- SuperTotalCost[order(-SuperTotalCost$TotalCost), ][1:20, ]
SuperTotalCostSorted
## Event TotalCost
## 834 TORNADO 3312276.68
## 153 FLASH FLOOD 1599325.05
## 856 TSTM WIND 1445168.21
## 244 HAIL 1268289.66
## 170 FLOOD 1067976.36
## 760 THUNDERSTORM WIND 943635.62
## 464 LIGHTNING 606932.39
## 786 THUNDERSTORM WINDS 464978.11
## 359 HIGH WIND 342014.77
## 972 WINTER STORM 134699.58
## 310 HEAVY SNOW 124417.71
## 957 WILDFIRE 88823.54
## 427 ICE STORM 67689.62
## 676 STRONG WIND 64610.71
## 290 HEAVY RAIN 61964.94
## 376 HIGH WINDS 57384.60
## 848 TROPICAL STORM 54322.80
## 955 WILD/FOREST FIRE 43534.49
## 95 DROUGHT 37997.67
## 164 FLASH FLOODING 33623.20
Top 20 Weather events for total damage
The following table displays the top 20 weather events for total damage.
par(mfrow = c(1,1), mar = c(10, 4, 2, 2), las = 3, cex = 0.7, cex.main = 1.4, cex.lab = 1.2)
barplot(SuperTotalCostSorted$TotalCost, names.arg = SuperTotalCostSorted$Event, col = 'red',
main = 'Top 20 Weather Events for total Damage ', ylab = 'Amount of total Damage', ylim = c(0, 3500000))
#CONCLUSIONS
Tornados are the weather events that represents the most harmful impact to health and costs nation wide.