Storms and other weather conditions severe affect the population health as well as the economic conditions. This can results in severe Fatalities, Injuries and Property Damages which becomes very important to control these events.
An analysis has been done on U.S. National Oceanic and Atmospheric Administration’s (NOAA). The database is from 1950 to Nov 2011. In this analysis we will identify which weather events are severe to population health and economic condition across U.S.
We have used barplot to plot the data and identified that Tornado affects the population health majorly and Flood causes major Economic Damage across U.S.
## Loading required packages
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Changing the size of digits
options(scipen = 999,digits = 10)
## Loading the data in R
weather_data = read.csv("C:\\Users\\Abhinav Khandelwal\\Desktop\\R_project\\R_Learning_Coursera\\StormData.csv")
head(weather_data,2)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14 100 3 0 0 15 25.0
## 2 0 2 150 2 0 0 0 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
For the analysis i need only few columns like “EVTYPE”,“FATALITIES”,“INJURIES”,“PROPDMG”,” PROPDMGEXP”,“CROPDMG”,“CROPDMGEXP”. I will use Select function to extract these columns.
weather_data_cleaned = select(weather_data,"EVTYPE","FATALITIES","INJURIES",contains("DMG"))
## Making all the EVTYPE's in upper case
weather_data_cleaned$EVTYPE = toupper(weather_data_cleaned$EVTYPE)
head(weather_data_cleaned,2)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
In the dataset the EVTYPE is in character datatype need to change into factor for the analysis
weather_data_cleaned$EVTYPE = as.factor(weather_data_cleaned$EVTYPE)
str(weather_data_cleaned)
## 'data.frame': 902297 obs. of 7 variables:
## $ EVTYPE : Factor w/ 898 levels " HIGH SURF ADVISORY",..: 758 758 758 758 758 758 758 758 758 758 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
As we can see from above that we have 985 EVTYPE. From the str summary you can see that there are 0’s at many places which is not required in our analysis. So we will keep only those rows which will have values.And for the analysis i need to divide the dataset into 2 parts first is related to Population Health and second related to Economic Consequences.
popdata = weather_data_cleaned %>% select(1,2,3) %>% filter(FATALITIES > 0 | INJURIES > 0)
head(popdata)
## EVTYPE FATALITIES INJURIES
## 1 TORNADO 0 15
## 2 TORNADO 0 2
## 3 TORNADO 0 2
## 4 TORNADO 0 2
## 5 TORNADO 0 6
## 6 TORNADO 0 1
econodata = weather_data_cleaned %>% select(1,contains("DMG")) %>% filter(PROPDMG >0 | CROPDMG > 0)
head(econodata)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 25.0 K 0
## 2 TORNADO 2.5 K 0
## 3 TORNADO 25.0 K 0
## 4 TORNADO 2.5 K 0
## 5 TORNADO 2.5 K 0
## 6 TORNADO 2.5 K 0
Let’s see the econodata what all EXP we have in PROP and CROP.
unique(econodata$PROPDMGEXP)
## [1] "K" "M" "B" "m" "" "+" "0" "5" "6" "4" "h" "2" "7" "3" "H" "-"
unique(econodata$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k"
From above, it is quite visible that there are some EXP which are not valid. Valid values are K for kilos, M for millions ,B for billions.
Replacing all EXP which are invalid to 0 and keeping the valid one’s.
econodata = econodata %>% mutate(CROPDMGEXP = case_when(CROPDMGEXP %in% c("K","m","M","B")~CROPDMGEXP,TRUE ~ "0"))
econodata = econodata %>% mutate(PROPDMGEXP = case_when(PROPDMGEXP %in% c("K","k","b","m","M","B")~PROPDMGEXP,TRUE ~ "0"))
Now we will substitute the actual values of EXP in the Econodata.
econodata = econodata %>% mutate(PROPEXP = case_when(PROPDMGEXP == "K" ~ PROPDMG * 1000,PROPDMGEXP == "k" ~ PROPDMG * 1000,PROPDMGEXP == "m" ~ PROPDMG * 1000000,PROPDMGEXP == "M" ~ PROPDMG * 1000000,PROPDMGEXP == "B" ~ PROPDMG * 1000000000,PROPDMGEXP == "b" ~ PROPDMG * 1000000000, TRUE ~ PROPDMG))
head(arrange(econodata,desc(PROPEXP),10))
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP PROPEXP
## 1 FLOOD 115.00 B 32.5 M 115000000000
## 2 STORM SURGE 31.30 B 0.0 0 31300000000
## 3 HURRICANE/TYPHOON 16.93 B 0.0 0 16930000000
## 4 STORM SURGE 11.26 B 0.0 0 11260000000
## 5 HURRICANE/TYPHOON 10.00 B 0.0 0 10000000000
## 6 HURRICANE/TYPHOON 7.35 B 0.0 0 7350000000
econodata = econodata %>% mutate(CROPEXP = case_when(CROPDMGEXP == "K" ~ CROPDMG * 1000,CROPDMGEXP == "k" ~ CROPDMG * 1000,CROPDMGEXP == "m" ~ CROPDMG * 1000000,CROPDMGEXP == "M" ~ CROPDMG * 1000000,CROPDMGEXP == "B" ~ CROPDMG * 1000000000,CROPDMGEXP == "b" ~ CROPDMG * 1000000000, TRUE ~ CROPDMG))
head(arrange(econodata,desc(CROPEXP),10))
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP PROPEXP CROPEXP
## 1 RIVER FLOOD 5.00 B 5.00 B 5000000000 5000000000
## 2 ICE STORM 500.00 K 5.00 B 500000 5000000000
## 3 HURRICANE/TYPHOON 5.88 B 1.51 B 5880000000 1510000000
## 4 DROUGHT 0.00 0 1.00 B 0 1000000000
## 5 EXTREME COLD 0.00 0 596.00 M 0 596000000
## 6 DROUGHT 0.00 0 578.85 M 0 578850000
Now the cleaning part is done. We will now plot the datasets to answer the questions.
Aggregating the Fatalities and Injuries across the EVTYPES and we will keep only those rows where the value of FATALITIES and INJURIES arae more than 1000 otherwise our plot will be messy.
popdata_total = popdata%>% group_by(EVTYPE) %>% summarise(sum(FATALITIES),sum(INJURIES)) %>% rename(FATALITIES = "sum(FATALITIES)",INJURIES = "sum(INJURIES)") %>% filter(FATALITIES > 1000 | INJURIES > 1000)%>% arrange(desc(FATALITIES),desc(INJURIES)) %>%as.data.frame()
head(popdata_total)
## EVTYPE FATALITIES INJURIES
## 1 TORNADO 5633 91346
## 2 EXCESSIVE HEAT 1903 6525
## 3 FLASH FLOOD 978 1777
## 4 HEAT 937 2100
## 5 LIGHTNING 816 5230
## 6 TSTM WIND 504 6957
Plotting the graph which shows which EVTYPES are more harmful to population health
We will now plot popdata_total to see which weather event is more harmful to population health.
barplot(as.matrix(t(popdata_total[,-1])),beside = TRUE,names.arg = popdata_total$EVTYPE,legend.text = c("FATALITIES","INJURIES"),col = c("grey","lightblue"),ylab = "No. of Injuries / Fatalities",las = 3,cex.axis = 01,cex.names = 0.6,main = "Analysis of Events \nwhich are Harmful Respect to Population Health")
From the above plot we can conclude that Tornado affects the population health majorly.
econodata_total = econodata %>% group_by(EVTYPE) %>% summarise(sum(PROPEXP),sum(CROPEXP)) %>% rename(Property_Damage = "sum(PROPEXP)",Crop_Damage = "sum(CROPEXP)") %>% arrange(desc(Property_Damage),desc(Crop_Damage)) %>% as.data.frame()
head(econodata_total,10)
## EVTYPE Property_Damage Crop_Damage
## 1 FLOOD 144657709807 5661968450
## 2 HURRICANE/TYPHOON 69305840000 2607872800
## 3 TORNADO 56937160779 414953270
## 4 STORM SURGE 43323536000 5000
## 5 FLASH FLOOD 16140812067 1421317100
## 6 HAIL 15732267048 3025537890
## 7 HURRICANE 11868319010 2741910000
## 8 TROPICAL STORM 7703890550 678346000
## 9 WINTER STORM 6688497251 26944000
## 10 HIGH WIND 5270046295 638571300
## As we have more than 390 observations which will be very difficult to see in the plot so we will keep only those records which will have damages in billions of dollars.
econodata_total = econodata_total %>% filter(Property_Damage > 10000000000 | Crop_Damage > 1000000000)
head(econodata_total)
## EVTYPE Property_Damage Crop_Damage
## 1 FLOOD 144657709807 5661968450
## 2 HURRICANE/TYPHOON 69305840000 2607872800
## 3 TORNADO 56937160779 414953270
## 4 STORM SURGE 43323536000 5000
## 5 FLASH FLOOD 16140812067 1421317100
## 6 HAIL 15732267048 3025537890
Plotting the graph which shows which EVTYPES are more harmful to economic consequences
We will now plot econodata_total to see which weather event caused more harm to economic condition in USA. As the values of the damage are bigger so i have divided the values with 1000000000 so the y-axis is in billion dollars.
barplot(t(econodata_total[,-1]/1000000000),names.arg = econodata_total$EVTYPE,las = 3,cex.names = 0.6,cex.axis = 0.7,col = c("brown","grey"),legend.text = TRUE,ylab = "Total Damages in Billion Dollars",main = "Analysis of Events which causes Economic Damage")
From the above we can conclude that Flood caused more Economic Damages (Property and Crop Damages)
From the above plot’s we can conclude that Tornado weather event majorly affects Population Health in U.S. and Flood weather event caused major damage to Economic Conditions across U.S.