In this assigment, the NOAA storm data on weather events are expored. The data provides the infromation on different weather events such as hail, thunder, drought and others acrross the United States. In this analysis, some of the harmful weather events are identified based on number of injuries and fatalities. Also the property loss and crop loss due to these weather events are analyses.
Following libraries are loaded for data processing and plotting:
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
The data for analyses was obtained as comma-separated value file. It was downloded and stored in .csv format as follows:
if (!file.exists("data")){
dir.create("data")
}
file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(file_url, destfile ="./data/Dataset.csv")
The stored dataset is read as follows. The structure of the file is as follows:
Weather <- read.csv("./data/Dataset.csv")
str(Weather)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
There are 902297 data points. It is observed that for the events columns some of the events are given different names for similar weather events. These events are assigned the following values:
Weather$Event <- 'Other'
Weather$Event[grep('tide', Weather$EVTYPE, ignore.case = TRUE)] <- 'Tide'
Weather$Event[grep('avalance', Weather$EVTYPE, ignore.case = TRUE)] <- 'Avalanche'
Weather$Event[grep('blizzard', Weather$EVTYPE, ignore.case = TRUE)] <- 'Blizzard'
Weather$Event[grep('cold', Weather$EVTYPE, ignore.case = TRUE)] <- 'Cold'
Weather$Event[grep('heat', Weather$EVTYPE, ignore.case = TRUE)] <- 'Heat'
Weather$Event[grep('flood', Weather$EVTYPE, ignore.case = TRUE)] <- 'Flood'
Weather$Event[grep('lighting', Weather$EVTYPE, ignore.case = TRUE)] <- 'Lighting'
Weather$Event[grep('drought', Weather$EVTYPE, ignore.case = TRUE)] <- 'Drought'
Weather$Event[grep('storm', Weather$EVTYPE, ignore.case = TRUE)] <- 'Storm'
Weather$Event[grep('tornado', Weather$EVTYPE, ignore.case = TRUE)] <- 'Tornado'
Weather$Event[grep('wildfire', Weather$EVTYPE, ignore.case = TRUE)] <- 'Wildfire'
Weather$Event[grep('smoke', Weather$EVTYPE, ignore.case = TRUE)] <- 'Smoke'
Weather$Event[grep('hail', Weather$EVTYPE, ignore.case = TRUE)] <- 'Hail'
Weather$Event[grep('rain', Weather$EVTYPE, ignore.case = TRUE)] <- 'Rain'
Data is arranged according to number of fatalities due to differnt weather events as follows:
health_fatal <- tapply(Weather$FATALITIES, Weather$Event, sum)
health_fatal <- sort(health_fatal, decreasing = TRUE)
head(health_fatal)
## Tornado Other Heat Flood Storm Cold
## 5636 3427 3132 1524 633 451
Similar to fatalities data, injuries due to differnt weather events are estimated as follows:
health_injuries <- tapply(Weather$INJURIES, Weather$Event, sum)
health_injuries <- sort(health_injuries, decreasing = TRUE)
head(health_injuries)
## Tornado Other Heat Flood Storm Hail
## 91407 20791 9209 8602 6691 1467
Now, let us look at the level of damage to property and crops. The damage levels are given as ‘k’, ‘m’ and ‘b’, which corresponds to thousands, million and billion. The exact value of damage is estimated as follows:
# Property loss
Weather$Prop_dam <- 0
Weather$Prop_dam[grep('k', Weather$PROPDMGEXP, ignore.case = TRUE)] <- 10^3
Weather$Prop_dam[grep('m', Weather$PROPDMGEXP, ignore.case = TRUE)] <- 10^6
Weather$Prop_dam[grep('b', Weather$PROPDMGEXP, ignore.case = TRUE)] <- 10^9
Weather$P_value <- Weather$PROPDMG*Weather$Prop_dam
Prop_dam <- tapply(Weather$P_value, Weather$Event, sum)
Prop_dam <- sort(Prop_dam, decreasing = TRUE)
head(Prop_dam)
## Flood Other Storm Tornado Hail Wildfire
## 167513743320 102255776340 72812914700 56993097730 17619990720 4865614000
# Crop damage
Weather$Crop_dam <- 0
Weather$Crop_dam[grep('k', Weather$CROPDMGEXP, ignore.case = TRUE)] <- 10^3
Weather$Crop_dam[grep('m', Weather$CROPDMGEXP, ignore.case = TRUE)] <- 10^6
Weather$Crop_dam[grep('b', Weather$CROPDMGEXP, ignore.case = TRUE)] <- 10^9
Weather$C_value <- Weather$CROPDMG*Weather$Crop_dam
Crop_dam <- tapply(Weather$C_value, Weather$Event, sum)
Crop_dam <- sort(Crop_dam, decreasing = TRUE)
head(Crop_dam)
## Drought Flood Other Storm Hail Cold
## 13972621780 12266926100 9280152420 6406789800 3114212850 1416765500
The processed data is used for anlysing the level damage to peroperty and life.
Damage to life in the form of caulities and injuries computed from the processed data is plotted here :
barplot(health_injuries [1:10], xlab = 'Event', ylab = 'Injuries', main = 'Injuries', las = 2)
barplot(health_fatal[1:10], xlab = 'Event', ylab = 'Fatalities', main = 'Fatalities', las = 2)
It is observed from the fatalities data, tornados have resulted in highest number of fatalities followed by heat and floods.
Similary the total monetary loss due to these weather events are plotted as follows:
damage <- Crop_dam+Prop_dam
barplot(damage [1:10], xlab = 'Event', ylab = 'Monetary loss', main = 'Monetary damage', las = 2)
It is observed that drought has resulted in maximum economic loss followed by flood and storm events.