The objective of this project is to analyse the Storm and other weather events and to find out which is most harmful in terms damages caused to people and properties.
The dataset used for this project is U.S. National Oceanic and Atmospheric Administration’s (NOAA) Storm data.
The data consist of about 902297 rows and 37 columns. So we need to process the data to have only the required columns.
After data processing, the data has been aggregated to the event level and damage estimates were calculated.
The result shows that most harmful event for the people is Tornado and the most harmful event for the property is Flood.
The dataset used for this project is U.S. National Oceanic and Atmospheric Administration’s (NOAA) Storm data. The data tracks characterstics of major storms and weather events in United States, including when and where it occured, as well as any fatalities, injuries and < property Damage
Data is already downloaded, unzipped and it is in the working directory. Name of the file is “repdata_data_StormData.csv”.
Lets load the data in R
fileName <- "repdata-data-StormData.csv"
if(file.exists(fileName)){
data <- read.csv(fileName)
}else{
stop("Data is not there in the working direcorty. Please download the data")
}
dim(data)
## [1] 902297 37
names(data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
To answer the questions asked we just need Event type and damages corresponding to those events. So lets take only relevant details and discard others.
data_small <- data[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
summary(data_small)
## EVTYPE FATALITIES INJURIES
## HAIL :288661 Min. : 0.0000 Min. : 0.0000
## TSTM WIND :219940 1st Qu.: 0.0000 1st Qu.: 0.0000
## THUNDERSTORM WIND: 82563 Median : 0.0000 Median : 0.0000
## TORNADO : 60652 Mean : 0.0168 Mean : 0.1557
## FLASH FLOOD : 54277 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## FLOOD : 25326 Max. :583.0000 Max. :1700.0000
## (Other) :170878
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0.00 :465934 Min. : 0.000 :618413
## 1st Qu.: 0.00 K :424665 1st Qu.: 0.000 K :281832
## Median : 0.00 M : 11330 Median : 0.000 M : 1994
## Mean : 12.06 0 : 216 Mean : 1.527 k : 21
## 3rd Qu.: 0.50 B : 40 3rd Qu.: 0.000 0 : 19
## Max. :5000.00 5 : 28 Max. :990.000 B : 9
## (Other): 84 (Other): 9
The fields, PROPDMGEXP and CROPDMGEXP are expressed in terms of H,K,M,B. So lets change those fields as numeric and calculate the actual property and corp damage. Numeric value for the respective codes are as follows, * H - 100 or 10^2 * K - 1000 or 10^3 * M - 1000000 or 10^6 * B - 1000000000 or 10^9
#Porperty Damage Estimation
data_small[data_small$PROPDMGEXP %in% c("H","h"), ]$PROPDMG <- data_small[data_small$PROPDMGEXP %in% c("H","h"),]$PROPDMG * (10^2)
data_small[data_small$PROPDMGEXP %in% c("K","k"), ]$PROPDMG <- data_small[data_small$PROPDMGEXP %in% c("K","k"),]$PROPDMG * (10^3)
data_small[data_small$PROPDMGEXP %in% c("M","m"), ]$PROPDMG <- data_small[data_small$PROPDMGEXP %in% c("M","m"),]$PROPDMG * (10^6)
data_small[data_small$PROPDMGEXP %in% c("B","b"), ]$PROPDMG <- data_small[data_small$PROPDMGEXP %in% c("B","b"),]$PROPDMG * (10^9)
#Crop Damage Estimation
data_small[data_small$CROPDMGEXP %in% c("K","k"), ]$CROPDMG <- data_small[data_small$CROPDMGEXP %in% c("K","k"),]$CROPDMG * (10^3)
data_small[data_small$CROPDMGEXP %in% c("M","m"), ]$CROPDMG <- data_small[data_small$CROPDMGEXP %in% c("M","m"),]$CROPDMG * (10^6)
data_small[data_small$CROPDMGEXP %in% c("B","b"), ]$CROPDMG <- data_small[data_small$CROPDMGEXP %in% c("B","b"),]$CROPDMG * (10^9)
Lets aggregate the data to estimate the most harmful events in terms of people health.
library(data.table)
data_small.dt <- data.table(data_small)
data_people <- data_small.dt[,list(FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES), TOTAL = sum(FATALITIES+INJURIES)), by = "EVTYPE"]
Now lets find out the top 10 events which created more damage to the people’s health
data_people <- data_people[order(-TOTAL)]
data_people10 <- data_people[1:10,]
data_people10$EVTYPE <- factor(data_people10$EVTYPE, levels = data_people10$EVTYPE)
# Top 10 events harmful for people
data_people10
## EVTYPE FATALITIES INJURIES TOTAL
## 1: TORNADO 5633 91346 96979
## 2: EXCESSIVE HEAT 1903 6525 8428
## 3: TSTM WIND 504 6957 7461
## 4: FLOOD 470 6789 7259
## 5: LIGHTNING 816 5230 6046
## 6: HEAT 937 2100 3037
## 7: FLASH FLOOD 978 1777 2755
## 8: ICE STORM 89 1975 2064
## 9: THUNDERSTORM WIND 133 1488 1621
## 10: WINTER STORM 206 1321 1527
The following plot shows the top 10 Harmful Weather Events for People
library(ggplot2)
ggplot(data_people10, aes(x = EVTYPE, y = TOTAL)) +
geom_bar(stat = "identity", fill = "red") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("Event Type") + ylab("Fatalities and Injuries") + ggtitle("Top 10 Harmful Events for People")
Lets aggregate the data to estimmate the most harmful events in terms of Property Damage.
data_prop <- data_small.dt[, list(PROPERTY = sum(PROPDMG), CROPS = sum(CROPDMG), TOTAL = sum(PROPDMG+CROPDMG)), by = "EVTYPE"]
Now lets find out the top 10 events which ckreated more damage for crops
data_prop <- data_prop[order(-TOTAL)]
data_prop10 <- data_prop[1:10,]
data_prop10 <- data_prop10[, list(EVTYPE, PROPERTY = PROPERTY/(10^9), CROPS = CROPS/(10^9), TOTAL = TOTAL/10^9)]
data_prop10$EVTYPE <- factor(data_prop10$EVTYPE, levels = data_prop10$EVTYPE)
#Top 10 events harmful for Property
data_prop10
## EVTYPE PROPERTY CROPS TOTAL
## 1: FLOOD 144.657710 5.6619684 150.319678
## 2: HURRICANE/TYPHOON 69.305840 2.6078728 71.913713
## 3: TORNADO 56.937161 0.4149533 57.352114
## 4: STORM SURGE 43.323536 0.0000050 43.323541
## 5: HAIL 15.732268 3.0259545 18.758222
## 6: FLASH FLOOD 16.140812 1.4213171 17.562129
## 7: DROUGHT 1.046106 13.9725660 15.018672
## 8: HURRICANE 11.868319 2.7419100 14.610229
## 9: RIVER FLOOD 5.118945 5.0294590 10.148404
## 10: ICE STORM 3.944928 5.0221135 8.967041
The plot below shows the top 10 harmful events for Property and Crops,
library(ggplot2)
ggplot(data_prop10, aes(x = EVTYPE, y = TOTAL)) +
geom_bar(stat = "identity", fill = "red") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("Event Type") + ylab("Property and Crop Damage (in Billions)") + ggtitle("Top 10 Harmful Events for Property")