The goal of this work is determination of main reasons (climate
disasters) for American population’s injuries and fatalities and for
negative economic consequences.
Firstly, I’ve read the data set with necessary information about
injuries, fatalities, suffers and externalities in USA. Then saw some
basic information like summaries. The next step is grouping database by
type and calculating total events with injuries and fatalities. Then
I’ve chosen top 10 types by the number of fatalitites and injuries and
created two barplots to illustrate the results. Next I moved on to the
second question and again grouped database by types of disasters and
calculated the sum of dollars like negative economic consequences. Then
I’ve chosen top 10 again and finally created a barplot for better
illustration.
Firstly, i’d like to read the data and see what data i have.
df <- read.csv('C:/Users/dmitr/Downloads/coursera/repdata_data_StormData.csv')
print(head(df))
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
summary(df$FATALITIES)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.0168 0.0000 583.0000
summary(df$INJURIES)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.1557 0.0000 1700.0000
I’d like to start with grouping data of injuries and fatalities by type of disaster.
library(dplyr)
##
## Присоединяю пакет: 'dplyr'
## Следующие объекты скрыты от 'package:stats':
##
## filter, lag
## Следующие объекты скрыты от 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
filtered_data <- df %>%
group_by(EVTYPE) %>%
summarise(total_injuries = sum(INJURIES, na.rm = TRUE),
total_fatalities = sum(FATALITIES, na.rm = TRUE)) %>%
filter(total_injuries > 0 & total_fatalities > 0)
Then i want to show top 10 disasters by injuries and fatalities.
top_10_injuries <- head(filtered_data$total_injuries[order(-filtered_data$total_injuries)], 10)
top_10_fatalities <- head(filtered_data$total_fatalities[order(-filtered_data$total_fatalities)], 10)
And finally let’s create barplots.
barplot(top_10_injuries, names.arg = tolower(filtered_data$EVTYPE[order(-filtered_data$total_injuries)][1:10]),
col = "skyblue", main = "Top 10 Disasters by Total Injuries",
xlab = "Type of Disaster", ylab = "Total Injuries")
barplot(top_10_fatalities, names.arg = tolower(filtered_data$EVTYPE[order(-filtered_data$total_fatalities)][1:10]),
col = "skyblue", main = "Top 10 Disatsters by Total Fatalities",
xlab = "Type of Disaster", ylab = "Total Fatalities")
So, we can say that tornado, wind and flood are the most important reasons of injuries and tornado again, heat and flood are the most frequent reasons for fatalities.
I’m going to group economic consequences by type of disaster and calculate total sum of missed dollars.
df$PROPDMGEXP <- recode(df$PROPDMGEXP,'H' = 100, 'K' = 1000, 'M' = 1000000, 'B' = 1000000000, .default = 1)
df$CROPDMGEXP <- recode(df$CROPDMGEXP,'H' = 100, 'K' = 1000, 'M' = 1000000, 'B' = 1000000000, .default = 1)
df$PROP <- df$PROPDMG * df$PROPDMGEXP
df$CROP <- df$CROPDMG * df$CROPDMGEXP
economics_all <- df %>%
group_by(EVTYPE) %>%
summarise(total_prop = sum(PROP, na.rm = T),
total_crop = sum(CROP, na.rm = T),
total_cons = total_prop + total_crop) %>%
filter(total_crop > 0 & total_prop > 0)
Again calculate top 10s
top_10_cons <- head(economics_all$total_cons[order(-economics_all$total_cons)], 10)
And create a barplot.
barplot(top_10_cons, names.arg = tolower(economics_all$EVTYPE[order(-economics_all$total_cons)][1:10]),
col = "skyblue", main = "Top 10 Disasters by Total Economic Consequences",
xlab = "Type of Disaster", ylab = "Total Economic Consequences")
So, floods have made a significant impact on economic consequences.