The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
Storm Data [47Mb] https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 #### 1.1 Load the necessary libraries
# Load necessary libraries
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(tidyr)
# Load the data
storm_data <- read.csv("C:\\Users\\Changcheng\\Documents\\repdata_data_StormData.csv", na.strings = c("NA", "N/A", "NULL", " ", "", "?"), stringsAsFactors = FALSE)
# Check the first few rows of the data
head(storm_data)
# Check the dimensions of the data
dim(storm_data)
## [1] 902297 37
# Check the column names
colnames(storm_data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
# Create a new data frame with only the relevant columns
stormdata_tidy <- storm_data %>%
subset(!is.na(EVTYPE)&(FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0)) %>%
select(BGN_DATE, END_DATE, EVTYPE, STATE,FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
mutate(BGN_DATE = as.Date(BGN_DATE, format = "%m/%d/%Y"),
END_DATE = as.Date(END_DATE, format = "%m/%d/%Y"),
)
# Check the structure of the new data frame
str(stormdata_tidy)
## 'data.frame': 254632 obs. of 10 variables:
## $ BGN_DATE : Date, format: "1950-04-18" "1950-04-18" ...
## $ END_DATE : Date, format: NA NA ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr NA NA NA NA ...
# Check the dimensions of the new data frame
dim(stormdata_tidy)
## [1] 254632 10
# Check the summary of the new data frame
summary(stormdata_tidy)
## BGN_DATE END_DATE EVTYPE
## Min. :1950-01-03 Min. :1993-01-01 Length:254632
## 1st Qu.:1997-01-23 1st Qu.:2000-05-24 Class :character
## Median :2002-08-02 Median :2005-03-07 Mode :character
## Mean :2000-06-13 Mean :2004-08-14
## 3rd Qu.:2008-05-07 3rd Qu.:2009-01-07
## Max. :2011-11-30 Max. :2011-11-30
## NA's :50928
## STATE FATALITIES INJURIES PROPDMG
## Length:254632 Min. : 0.00000 Min. : 0.0000 Min. : 0.00
## Class :character 1st Qu.: 0.00000 1st Qu.: 0.0000 1st Qu.: 2.00
## Mode :character Median : 0.00000 Median : 0.0000 Median : 5.00
## Mean : 0.05948 Mean : 0.5519 Mean : 42.75
## 3rd Qu.: 0.00000 3rd Qu.: 0.0000 3rd Qu.: 25.00
## Max. :583.00000 Max. :1700.0000 Max. :5000.00
##
## PROPDMGEXP CROPDMG CROPDMGEXP
## Length:254632 Min. : 0.000 Length:254632
## Class :character 1st Qu.: 0.000 Class :character
## Mode :character Median : 0.000 Mode :character
## Mean : 5.411
## 3rd Qu.: 0.000
## Max. :990.000
##
Clean and tidy the data, including converting the EVTYPE column to uppercase and checking for unique values
# Convert to uppercase
stormdata_tidy$EVTYPE<-toupper(stormdata_tidy$EVTYPE)
# Check the unique values in the EVTYPE column
length(unique(stormdata_tidy$EVTYPE))
## [1] 446
The original EVTYPE column contained 446 unique values. Group the EVTYPE column to a more consistent format
Group the EVTYPE based on Storm Data Event Table as below:
Event Name Designator Event Name Designator
Astronomical Low Tide Z Avalanche Z Blizzard Z Coastal Flood Z Cold/Wind Chill Z Debris Flow C Dense Fog Z Dense Smoke Z Drought Z Dust Devil C Dust Storm Z Excessive Heat Z Extreme Cold/Wind Chill Z Flash Flood C Flood C Frost/Freeze Z Funnel Cloud C Freezing Fog Z Hail C Heat Z Heavy Rain C Heavy Snow Z High Surf Z High Wind Z Hurricane (Typhoon) Z Ice Storm Z Lake-Effect Snow Z Lakeshore Flood Z Lightning C Marine Hail M Marine High Wind M Marine Strong Wind M Marine Thunderstorm Wind M Rip Current Z Seiche Z Sleet Z Storm Surge/Tide Z Strong Wind Z Thunderstorm Wind C Tornado C Tropical Depression Z Tropical Storm Z Tsunami Z Volcanic Ash Z Waterspout M Wildfire Z Winter Storm Z Winter Weather Z
#Clean the EVTYPE column
stormdata_tidy$EVTYPE <- gsub(".*HIGH TIDE.*", "ASTRONOMICAL HIGH TIDE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*ASTRONOMICAL LOW TIDE.*", "ASTRONOMICAL LOW TIDE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub('^AVALANCH?E.*', 'AVALANCHE (Z)', stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*BLIZZARD.*", "BLIZZARD (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*COASTAL[/s*]FLOOD.*", "COASTAL FLOOD (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*EROSION.*", "COASTAL EROSION", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^COLD.*", "COLD/WIND CHILL (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^(HYPOTHERMIA|LOW TEMPERATURE).*", "COLD/WIND CHILL (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*COLD$", "COLD/WIND CHILL (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*DENSE FOG.*", "DENSE FOG (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*DENSE SMOKE.*", "DENSE SMOKE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*DROUGHT.*", "DROUGHT (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*DUST DEVIL.*", "DUST DEVIL (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*DUST STORM.*", "DUST STORM (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*DUST$", "DUST STORM (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(EXCESSIVE|EXTREME) HEAT.*", "EXCESSIVE HEAT (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*HYPERTHERMIA.*", "EXCESSIVE HEAT (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*EXTREME COLD.*", "EXTREME COLD/WIND CHILL (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*EXTREME WIND.*", "EXTREME COLD/WIND CHILL (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(FLASH FLOOD|STREAM).*", "FLASH FLOOD (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*RAPIDLY RISING WATER.*", "FLASH FLOOD (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*URBAN.*SMALL", "FLASH FLOOD (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^LAKE.*FLOOD$", "LAKESHORE FLOOD (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^(FLOOD|HIGH WATER).*", "FLOOD (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*FLOOD(S|ING)?$", "FLOOD (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*FREEZING FOG.*", "FREEZING FOG (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*GLAZE.*", "FREEZING FOG (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^FOG*", "FREEZING FOG (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*FROST.*", "FROST/FREEZE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*FREEZE$", "FROST/FREEZE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*FUNNEL CLOUD.*", "FUNNEL CLOUD (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^HAIL.*", "HAIL (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("SMALL HAIL.*", "HAIL (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^HEAT.*", "HEAT (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*WARM.*", "HEAT (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("RECORD HEAT.*", "HEAT (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^(HEAVY|HVY|EXCESSIVE|TORRENTIAL|RECORD)? ?(RAIN(FALL)?|SHOWER|PRECIPITATION).*", "HEAVY RAIN (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(DAM BREAK|RAINSTORM).*", "HEAVY RAIN (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(HEAVY|EXCESSIVE) SNOW.*", "HEAVY SNOW (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(SURF|SWELLS).*", "HIGH SURF (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*ROGUE WAVE.*", "HIGH SURF (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^HIGH.*WIND.*", "HIGH WIND (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("HIGH$", "HIGH WIND (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(HURRICANE|TYPHOON|HIGH WAVE).*", "HURRICANE/TYPHOON (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*IC[E|Y].*", "ICE STORM (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^(HEAVY )?LAKE.*SNOW$", "LAKE-EFFECT SNOW (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^(LIGNTNING|LIGHTN?ING).*", "LIGHTNING (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*MARINE HAIL.*", "MARINE HAIL (M)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*MARINE HIGH WIND.*", "MARINE HIGH WIND (M)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*MARINE STRONG WIND.*", "MARINE STRONG WIND (M)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*MARINE THUNDERSTORM.*", "MARINE THUNDERSTORM WIND (M)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(SLIDE|LANDSLUMP).*", "DEBRIS FLOW (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*RIP CURRENT.*", "RIP CURRENT (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*SEICHE.*", "SEICHE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("MARINE (MISHAP|ACCIDENT)", "SEICHE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(ROUGH |HEAVY )?SEA.*", "SEICHE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*SLEET.*", "SLEET (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*MIX.*", "SLEET (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*RAIN.*SNOW.*", "SLEET (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*SNOW.*RAIN.*", "SLEET (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*STORM SURGE.*", "STORM SURGE/TIDE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*COASTAL ?(STORM|SURGE).*", "STORM SURGE/TIDE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^(STRONG|GUSTY)? ?WINDS?.*", "STRONG WIND (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^WINDS?.*", "STRONG WIND (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^(SEVERE )?TH?UN?D?E.*", "THUNDERSTORM WIND (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(TSTM|APACHE COUNTY).*", "THUNDERSTORM WIND (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(BURST|WHIRLWIND|GUSTNADO).*", "THUNDERSTORM WIND (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^TORN.*", "TORNADO (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("LANDSPOUT.*", "TORNADO (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(TROPICAL DEPRESSION|GRADIENT WIND).*", "TROPICAL DEPRESSION (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(TROPICAL STORM|STORM FORCE WINDS).*", "TROPICAL STORM (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*TSUNAMI.*", "TSUNAMI (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*VOLCANIC.*", "VOLCANIC ASH (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^WATERSPOUT.*", "WATERSPOUT (M)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*FIRE.*", "WILDFIRE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*WINTER WEATHER.*", "WINTER WEATHER (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^(LIGHT|BLOWING|RECORD)? ?SNOW(FALL)?$", "WINTER WEATHER (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("(LIGHT )?FREEZING (RAIN|DRIZZLE|SPRAY)$", "WINTER WEATHER (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^SNOW.*", "WINTER WEATHER (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*WINTER.*STORM.*", "WINTER STORM (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*WET.*", "WET", stormdata_tidy$EVTYPE)
length(unique(stormdata_tidy$EVTYPE))
## [1] 55
After cleaning, the EVTYPE column contains 55 unique values.
top10_health_data <- stormdata_tidy %>%
group_by(EVTYPE) %>%
summarise(Total_Fatalities = sum(FATALITIES, na.rm = TRUE),
Total_Injuries = sum(INJURIES, na.rm = TRUE)) %>%
arrange(desc(Total_Fatalities + Total_Injuries)) %>%
slice(1:10)
top10_health_data
top10_health_data_long <- top10_health_data %>%
pivot_longer(cols = c(Total_Fatalities, Total_Injuries), names_to = "Type", values_to = "Count")
Create a bar plot for the top 10 weather events by health impact, fatalities and injuries with different colors in the same plot
ggplot(top10_health_data_long, aes(x = reorder(EVTYPE, Count), y = Count, fill = Type)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Top 10 Weather Events by Health Impact (Fatalities + Injuries)",
x = "Weather Event Type",
y = "Count") +
scale_fill_manual(values = c("Total_Fatalities" = "red", "Total_Injuries" = "blue")) +
theme_minimal()
### Result: The plot shows the top 10 weather events by health impact,
with fatalities in red and injuries in blue. The x-axis represents the
weather event type, while the y-axis represents the count of fatalities
and injuries. The plot is flipped to make it easier to read the event
types. The top 10 weather events are displayed in descending order of
total fatalities and injuries. * The plot shows that “TORNADO (C)” has
the highest total fatalities and injuries, followed by “EXCESSIVE HEAT
(Z)” and “FLOOD (C)”. * The plot also shows that “WINTER WEATHER (Z)”
and “WINTER STORM (Z)” have a significant number of injuries, while
“HAIL (C)” has a high number of fatalities. * The plot also shows that
“WINTER WEATHER (Z)” and “WINTER STORM (Z)” have a significant number of
injuries, while “HAIL (C)” has a high number of fatalities.
Check the structure of stormdata_tidy
str(stormdata_tidy)
## 'data.frame': 254632 obs. of 10 variables:
## $ BGN_DATE : Date, format: "1950-04-18" "1950-04-18" ...
## $ END_DATE : Date, format: NA NA ...
## $ EVTYPE : chr "TORNADO (C)" "TORNADO (C)" "TORNADO (C)" "TORNADO (C)" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr NA NA NA NA ...
stormdata_tidy$CROPDMGEXP[is.na(stormdata_tidy$CROPDMGEXP)] <- "0"
stormdata_tidy$PROPDMGEXP[is.na(stormdata_tidy$PROPDMGEXP)] <- "0"
unique(stormdata_tidy$CROPDMGEXP)
## [1] "0" "M" "K" "m" "B" "k"
unique(stormdata_tidy$PROPDMGEXP)
## [1] "K" "M" "0" "B" "m" "+" "5" "6" "4" "h" "2" "7" "3" "H" "-"
#Create a new column for crop damage
stormdata_tidy$CROPDMGEXP <- toupper(stormdata_tidy$CROPDMGEXP)
unique(stormdata_tidy$CROPDMGEXP)
## [1] "0" "M" "K" "B"
stormdata_tidy$CROPDMGVAL <- stormdata_tidy$CROPDMG
stormdata_tidy$CROPDMGVAL[stormdata_tidy$CROPDMGEXP == "K"] <- stormdata_tidy$CROPDMG[stormdata_tidy$CROPDMGEXP == "K"] * 1000
stormdata_tidy$CROPDMGVAL[stormdata_tidy$CROPDMGEXP == "M"] <- stormdata_tidy$CROPDMG[stormdata_tidy$CROPDMGEXP == "M"] * 1000000
stormdata_tidy$CROPDMGVAL[stormdata_tidy$CROPDMGEXP == "B"] <- stormdata_tidy$CROPDMG[stormdata_tidy$CROPDMGEXP == "B"] * 1000000000
stormdata_tidy$CROPDMGVAL[stormdata_tidy$CROPDMGEXP == "H"] <- stormdata_tidy$CROPDMG[stormdata_tidy$CROPDMGEXP == "H"] * 100
stormdata_tidy$CROPDMGVAL[stormdata_tidy$CROPDMGEXP == "0"] <- stormdata_tidy$CROPDMG[stormdata_tidy$CROPDMGEXP == "0"] * 1
# create a new column for property damage
stormdata_tidy$PROPDMGEXP <- toupper(stormdata_tidy$PROPDMGEXP)
unique(stormdata_tidy$PROPDMGEXP)
## [1] "K" "M" "0" "B" "+" "5" "6" "4" "H" "2" "7" "3" "-"
stormdata_tidy$PROPDMGVAL <- stormdata_tidy$PROPDMG
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "K"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "K"] * 1000
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "M"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "M"] * 1000000
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "B"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "B"] * 1000000000
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "H"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "H"] * 100
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "0"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "0"] * 1
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "2"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "2"] * 100
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "3"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "3"] * 1000
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "4"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "4"] * 10000
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "5"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "5"] * 100000
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "6"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "6"] * 1000000
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "7"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "7"] * 10000000
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "+"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "+"] * 1
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "-"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "-"] * 1
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "?"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "?"] * 1
# Create a new column for total damage
stormdata_tidy$TOTALDMGVAL <- stormdata_tidy$PROPDMGVAL + stormdata_tidy$CROPDMGVAL
# Check the structure of the new data frame
str(stormdata_tidy)
## 'data.frame': 254632 obs. of 13 variables:
## $ BGN_DATE : Date, format: "1950-04-18" "1950-04-18" ...
## $ END_DATE : Date, format: NA NA ...
## $ EVTYPE : chr "TORNADO (C)" "TORNADO (C)" "TORNADO (C)" "TORNADO (C)" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ FATALITIES : num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP : chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP : chr "0" "0" "0" "0" ...
## $ CROPDMGVAL : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PROPDMGVAL : num 25000 2500 25000 2500 2500 2500 2500 2500 25000 25000 ...
## $ TOTALDMGVAL: num 25000 2500 25000 2500 2500 2500 2500 2500 25000 25000 ...
# Check the summary of the new data frame
summary(stormdata_tidy)
## BGN_DATE END_DATE EVTYPE
## Min. :1950-01-03 Min. :1993-01-01 Length:254632
## 1st Qu.:1997-01-23 1st Qu.:2000-05-24 Class :character
## Median :2002-08-02 Median :2005-03-07 Mode :character
## Mean :2000-06-13 Mean :2004-08-14
## 3rd Qu.:2008-05-07 3rd Qu.:2009-01-07
## Max. :2011-11-30 Max. :2011-11-30
## NA's :50928
## STATE FATALITIES INJURIES PROPDMG
## Length:254632 Min. : 0.00000 Min. : 0.0000 Min. : 0.00
## Class :character 1st Qu.: 0.00000 1st Qu.: 0.0000 1st Qu.: 2.00
## Mode :character Median : 0.00000 Median : 0.0000 Median : 5.00
## Mean : 0.05948 Mean : 0.5519 Mean : 42.75
## 3rd Qu.: 0.00000 3rd Qu.: 0.0000 3rd Qu.: 25.00
## Max. :583.00000 Max. :1700.0000 Max. :5000.00
##
## PROPDMGEXP CROPDMG CROPDMGEXP CROPDMGVAL
## Length:254632 Min. : 0.000 Length:254632 Min. :0.000e+00
## Class :character 1st Qu.: 0.000 Class :character 1st Qu.:0.000e+00
## Mode :character Median : 0.000 Mode :character Median :0.000e+00
## Mean : 5.411 Mean :1.928e+05
## 3rd Qu.: 0.000 3rd Qu.:0.000e+00
## Max. :990.000 Max. :5.000e+09
##
## PROPDMGVAL TOTALDMGVAL
## Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.:2.000e+03 1st Qu.:2.500e+03
## Median :1.000e+04 Median :1.000e+04
## Mean :1.682e+06 Mean :1.875e+06
## 3rd Qu.:3.500e+04 3rd Qu.:5.000e+04
## Max. :1.150e+11 Max. :1.150e+11
##
# Check the first few rows of the new data frame
head(stormdata_tidy)
Check the top 10 weather events by economic impact
top10_economic_data <- stormdata_tidy %>%
group_by(EVTYPE) %>%
summarise(Total_Property_Damage = sum(PROPDMGVAL, na.rm = TRUE),
Total_Crop_Damage = sum(CROPDMGVAL, na.rm = TRUE),
Total_Damage = sum(TOTALDMGVAL, na.rm = TRUE)) %>%
arrange(desc(Total_Damage)) %>%
slice(1:10)
top10_economic_data
top10_economic_data_long <- top10_economic_data %>%
pivot_longer(cols = c(Total_Property_Damage, Total_Crop_Damage), names_to = "Type", values_to = "Count")
# Create a bar plot for the top 10 weather events by economic impact, property damage and crop damage with different colors in the same plot
ggplot(top10_economic_data_long, aes(x = reorder(EVTYPE, Count), y = Count, fill = Type)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Top 10 Weather Events by Economic Impact (Property Damage + Crop Damage)",
x = "Weather Event Type",
y = "Count") +
scale_fill_manual(values = c("Total_Property_Damage" = "red", "Total_Crop_Damage" = "blue")) +
theme_minimal()
### Result: The plot shows the top 10 weather events by economic impact,
with property damage in red and crop damage in blue. The x-axis
represents the weather event type, while the y-axis represents the count
of property damage and crop damage. The plot is flipped to make it
easier to read the event types. 1 The top 10 weather events are
displayed in descending order of total property damage and crop damage.
2 The plot shows that “FLOOD (C)” has the highest total property
damage and crop damage, followed by “HURRICANE/TYPHOON (Z)” and “TORNADO
(C)”. *3 The plot also shows that “HAIL (C)” has a significant amount of
property damage, while “DROUGHT (Z)” has a high amount of crop
damage.
The analysis shows that tornadoes have the highest impact on health, with the highest number of fatalities and injuries. Floods have the highest economic impact, with the highest property and crop damage. The analysis also shows that hurricanes/typhoons and excessive heat have a significant impact on both health and economy. 1 The analysis also shows that floods have a significant impact on both health and economy, with a high number of injuries and property damage. 2 The analysis also shows that winter weather events have a significant impact on health, with a high number of injuries, while hail has a high number of fatalities. 3 The analysis also shows that drought has a significant impact on crop damage, with a high amount of crop damage. 4 The analysis also shows that winter weather events have a significant impact on health, with a high number of injuries, while hail has a high number of fatalities.