The motive of the data analysis is to explore the storm which causes the most health casualties(which include both injured and Fatalties) and economic damages (which included both Crop and Property Damage). During the findings the tornado came out in top in causimg the most harm to economy and health by a significant margin. Exploring further there does not seem to have a definate answer reagrding who came in next in causing the most damage especially in case of health casualties but its safe to say however upon deeper analysis floods and their variants did the most damage to both health and ecnomy after tornado
df <- read.csv("StormData.csv.bz2")
## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec =
## dec, : EOF within quoted string
Filtered out the necessary Column
select(df,EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP) -> F_1
Combining both Fatalties and Injuries
mutate(.data=F_1,HealthPopulation=FATALITIES+INJURIES) -> F_1
The exponent columns was uninformative. Here i used Following Website to decipher the weird factor rows
Making a function and applying to exponential column to make life a little easier
change_var <- function(X){
nums <- c('0','1','2','3','4','5','6','7','8')
if(X=='H' | X=='h'){
return (100)
} else if (X=='K' | X=='k') {
return (1000)
} else if(X=='M' | X=='m'){
return (1000000)
} else if(X=='B' | X=='b'){
return (1000000000)
} else if (X %in% nums){
return (10)
} else if (X=='+'){
return (1)
} else {
return (0)
}
}
F_1$PROPDMGEXP <- (sapply(F_1$PROPDMGEXP, change_var))
F_1$CROPDMGEXP <- (sapply(F_1$CROPDMGEXP, change_var))
Aggregating the Ecnomic Columns and Second Filteration of only revelant columns
mutate(F_1,Economic_Damage= ((PROPDMG*PROPDMGEXP)+(CROPDMG*CROPDMGEXP))) -> F_2
select(F_2,EVTYPE,HealthPopulation,Economic_Damage) -> Final_Set
Grouping and Aggregating Rows by Event Type
Final_Set %>% group_by(EVTYPE) %>% summarise_all(sum) -> Grouped
Filtering the Top Storm events which caused the most Health casualties and Plotting them
arrange(.data=Grouped,desc(HealthPopulation))[1:10,] -> Top10Health
ggplot(aes(x=EVTYPE,y=HealthPopulation,group=1),data=Top10Health)+geom_line(col="blue")+labs(x="Events",y="Health Casualties",title="Health Casualties by Event")+theme(text=element_text(size=6))
Filtering the Top Storm events which caused the most economic Damage and Plotting them
arrange(.data=Grouped,desc(Economic_Damage))[1:10,] -> Top10Ecnomy
ggplot(aes(x=EVTYPE,y=Economic_Damage,group=1),data=Top10Ecnomy)+geom_line(col="blue")+labs(x="Events",y="Damage Economy",title="Damage Economy by Event")+theme(text=element_text(size=8))
In Both the cases Tornado caused the most harm while Floods came a distant second