The following mini project tries to use the storm data set from the following link, [Data Set] (https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2) This data set contains information above several natural calamities from which the analysis bring us an intuition aboud different types of losses incurred due to various types of natural calamities. We investigate the data set here onward to find the most dangerous natural calamities that have incurred huge losses to crops and property.
if(!file.exists("data.csv.bz2")) {
file <- download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "data.csv.bz2")
}
data <- read.csv("data.csv.bz2", header = TRUE)
df1 <- tbl_df(data)
Working with duplicates in the “EVTYPE” column Before working on the duplicates, we have these many unique EVTYPES Here we convert all the string in the “EVTYPE” to lowercase and remove all the non-alphanumeric characters and replace them with nothing.
evtype <- data$EVTYPE
print(length(unique(evtype)))
## [1] 985
After eliminating the duplicates we have following unique EVTYPES
evtype <- tolower(evtype)
evtype <- gsub("[[:punct:]]", "", evtype)
print(length(unique(evtype)))
## [1] 874
df1$EVTYPE <- evtype
This is a fuction to convert the exponents to numerical powers to which we can raise the numbers.
expconvert <- function(a) {
if(a=="h" || a=="H")
return(100)
if(a=="k" || a=="K")
return(1000)
if(a=="m" || a=="M")
return(1000000)
if(a=="b" || a=="B")
return(1000000000)
if(is.numeric(a))
return(a)
else
return(0)
}
df1$PROPDMGEXP <- apply(df1["PROPDMGEXP"], MARGIN = 1, FUN = expconvert)
df1$CROPDMGEXP <- apply(df1["CROPDMGEXP"], MARGIN = 1, FUN = expconvert)
df1 <- df1 %>%
mutate(crop_dmg = CROPDMGEXP * CROPDMG,
prop_dmg = PROPDMGEXP * PROPDMG)
fatalities <- df1 %>%
group_by(EVTYPE) %>%
summarise(Total_Fatalities = sum(FATALITIES, na.rm = TRUE))
fatalities <- fatalities[order(fatalities$Total_Fatalities, decreasing = TRUE), ]
injuries <- df1 %>%
group_by(EVTYPE) %>%
summarise(Total_Injuries = sum(INJURIES, na.rm = TRUE))
injuries <- injuries[order(injuries$Total_Injuries, decreasing = TRUE), ]
par(mfrow = c(1, 2))
with(head(injuries), barplot(Total_Injuries, names.arg = head(injuries$EVTYPE), las=2, density = 50, main = "Total Injuries", col = "green"))
with(head(fatalities), barplot(Total_Fatalities, names.arg = head(fatalities$EVTYPE), las=2, density = 60, main = "Total Fatalities", col = "brown"))
### Summary From the above plots we can infer that, Tornado has caused most of the injuries and fatalities, following with tstm wind and excessive heat. So, Tornadoes are the most injurious to the heath of the population from the above analysis.
crop <- df1 %>%
group_by(EVTYPE) %>%
summarise(total_crop_damage = sum(crop_dmg))
prop <- df1 %>%
group_by(EVTYPE) %>%
summarise(total_prop_damage = sum(prop_dmg))
crop <- crop[order(crop$total_crop_damage, decreasing = TRUE),]
prop <- prop[order(prop$total_prop_damage, decreasing = TRUE),]
par(mfrow = c(1, 2),mai = c(2,1,1,1))
with(head(crop, 5), barplot(height = total_crop_damage/1000000000, names.arg = head(crop, 5)$EVTYPE, las = 2, density = 60, col = "red", main = "Crop Damage", ylab = "In $ Billion", cex.names = .7))
with(head(prop, 5), barplot(height = total_prop_damage/1000000000, names.arg = head(prop, 5)$EVTYPE, las = 2, density = 60, col = "yellow", main = "Property Damage", ylab = "In $ Billion", cex.names = .7))
From the above plots it’s clear that, the maximum crop damage occurs due to the ‘Drought’ and maximum property damage occurs due to ‘Floods’. These events cause huge damage to the crops and properties respectively.
Thus among several natural calamitites, we can get an intuition about the losses incurred from the above data analysis.