1. load the data and realted libraries:

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm Data [47Mb] https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 #### 1.1 Load the necessary libraries

# Load necessary libraries
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(tidyr)

1.2 Load the data

# Load the data
storm_data <- read.csv("C:\\Users\\Changcheng\\Documents\\repdata_data_StormData.csv", na.strings = c("NA", "N/A", "NULL", " ", "", "?"), stringsAsFactors = FALSE)
# Check the first few rows of the data
head(storm_data)
# Check the dimensions of the data
dim(storm_data)
## [1] 902297     37
# Check the column names
colnames(storm_data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

1.3 Create a new data frame with only the relevant columns

# Create a new data frame with only the relevant columns
stormdata_tidy <- storm_data %>%
  subset(!is.na(EVTYPE)&(FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0)) %>%
select(BGN_DATE, END_DATE, EVTYPE, STATE,FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
  mutate(BGN_DATE = as.Date(BGN_DATE, format = "%m/%d/%Y"),
         END_DATE = as.Date(END_DATE, format = "%m/%d/%Y"),
         )
# Check the structure of the new data frame
str(stormdata_tidy)
## 'data.frame':    254632 obs. of  10 variables:
##  $ BGN_DATE  : Date, format: "1950-04-18" "1950-04-18" ...
##  $ END_DATE  : Date, format: NA NA ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  NA NA NA NA ...
# Check the dimensions of the new data frame
dim(stormdata_tidy)
## [1] 254632     10
# Check the summary of the new data frame
summary(stormdata_tidy)
##     BGN_DATE             END_DATE             EVTYPE         
##  Min.   :1950-01-03   Min.   :1993-01-01   Length:254632     
##  1st Qu.:1997-01-23   1st Qu.:2000-05-24   Class :character  
##  Median :2002-08-02   Median :2005-03-07   Mode  :character  
##  Mean   :2000-06-13   Mean   :2004-08-14                     
##  3rd Qu.:2008-05-07   3rd Qu.:2009-01-07                     
##  Max.   :2011-11-30   Max.   :2011-11-30                     
##                       NA's   :50928                          
##     STATE             FATALITIES           INJURIES            PROPDMG       
##  Length:254632      Min.   :  0.00000   Min.   :   0.0000   Min.   :   0.00  
##  Class :character   1st Qu.:  0.00000   1st Qu.:   0.0000   1st Qu.:   2.00  
##  Mode  :character   Median :  0.00000   Median :   0.0000   Median :   5.00  
##                     Mean   :  0.05948   Mean   :   0.5519   Mean   :  42.75  
##                     3rd Qu.:  0.00000   3rd Qu.:   0.0000   3rd Qu.:  25.00  
##                     Max.   :583.00000   Max.   :1700.0000   Max.   :5000.00  
##                                                                              
##   PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Length:254632      Min.   :  0.000   Length:254632     
##  Class :character   1st Qu.:  0.000   Class :character  
##  Mode  :character   Median :  0.000   Mode  :character  
##                     Mean   :  5.411                     
##                     3rd Qu.:  0.000                     
##                     Max.   :990.000                     
## 
1.4 Data Exploration

Clean and tidy the data, including converting the EVTYPE column to uppercase and checking for unique values

# Convert to uppercase
stormdata_tidy$EVTYPE<-toupper(stormdata_tidy$EVTYPE) 
# Check the unique values in the EVTYPE column
length(unique(stormdata_tidy$EVTYPE))
## [1] 446

2. Data processing

2.1. Clean the EVTYPE column

The original EVTYPE column contained 446 unique values. Group the EVTYPE column to a more consistent format

Group the EVTYPE based on Storm Data Event Table as below:

Event Name Designator Event Name Designator

Astronomical Low Tide Z Avalanche Z Blizzard Z Coastal Flood Z Cold/Wind Chill Z Debris Flow C Dense Fog Z Dense Smoke Z Drought Z Dust Devil C Dust Storm Z Excessive Heat Z Extreme Cold/Wind Chill Z Flash Flood C Flood C Frost/Freeze Z Funnel Cloud C Freezing Fog Z Hail C Heat Z Heavy Rain C Heavy Snow Z High Surf Z High Wind Z Hurricane (Typhoon) Z Ice Storm Z Lake-Effect Snow Z Lakeshore Flood Z Lightning C Marine Hail M Marine High Wind M Marine Strong Wind M Marine Thunderstorm Wind M Rip Current Z Seiche Z Sleet Z Storm Surge/Tide Z Strong Wind Z Thunderstorm Wind C Tornado C Tropical Depression Z Tropical Storm Z Tsunami Z Volcanic Ash Z Waterspout M Wildfire Z Winter Storm Z Winter Weather Z

#Clean the EVTYPE column
stormdata_tidy$EVTYPE <- gsub(".*HIGH TIDE.*", "ASTRONOMICAL HIGH TIDE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*ASTRONOMICAL LOW TIDE.*", "ASTRONOMICAL LOW TIDE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub('^AVALANCH?E.*', 'AVALANCHE (Z)', stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*BLIZZARD.*", "BLIZZARD (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*COASTAL[/s*]FLOOD.*", "COASTAL FLOOD (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*EROSION.*", "COASTAL EROSION", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^COLD.*", "COLD/WIND CHILL (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^(HYPOTHERMIA|LOW TEMPERATURE).*", "COLD/WIND CHILL (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*COLD$", "COLD/WIND CHILL (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*DENSE FOG.*", "DENSE FOG (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*DENSE SMOKE.*", "DENSE SMOKE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*DROUGHT.*", "DROUGHT (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*DUST DEVIL.*", "DUST DEVIL (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*DUST STORM.*", "DUST STORM (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*DUST$", "DUST STORM (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(EXCESSIVE|EXTREME) HEAT.*", "EXCESSIVE HEAT (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*HYPERTHERMIA.*", "EXCESSIVE HEAT (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*EXTREME COLD.*", "EXTREME COLD/WIND CHILL (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*EXTREME WIND.*", "EXTREME COLD/WIND CHILL (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(FLASH FLOOD|STREAM).*", "FLASH FLOOD (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*RAPIDLY RISING WATER.*", "FLASH FLOOD (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*URBAN.*SMALL", "FLASH FLOOD (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^LAKE.*FLOOD$", "LAKESHORE FLOOD (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^(FLOOD|HIGH WATER).*", "FLOOD (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*FLOOD(S|ING)?$", "FLOOD (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*FREEZING FOG.*", "FREEZING FOG (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*GLAZE.*", "FREEZING FOG (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^FOG*", "FREEZING FOG (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*FROST.*", "FROST/FREEZE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*FREEZE$", "FROST/FREEZE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*FUNNEL CLOUD.*", "FUNNEL CLOUD (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^HAIL.*", "HAIL (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("SMALL HAIL.*", "HAIL (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^HEAT.*", "HEAT (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*WARM.*", "HEAT (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("RECORD HEAT.*", "HEAT (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^(HEAVY|HVY|EXCESSIVE|TORRENTIAL|RECORD)? ?(RAIN(FALL)?|SHOWER|PRECIPITATION).*", "HEAVY RAIN (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(DAM BREAK|RAINSTORM).*", "HEAVY RAIN (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(HEAVY|EXCESSIVE) SNOW.*", "HEAVY SNOW (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(SURF|SWELLS).*", "HIGH SURF (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*ROGUE WAVE.*", "HIGH SURF (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^HIGH.*WIND.*", "HIGH WIND (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("HIGH$", "HIGH WIND (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(HURRICANE|TYPHOON|HIGH WAVE).*", "HURRICANE/TYPHOON (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*IC[E|Y].*", "ICE STORM (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^(HEAVY )?LAKE.*SNOW$", "LAKE-EFFECT SNOW (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^(LIGNTNING|LIGHTN?ING).*", "LIGHTNING (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*MARINE HAIL.*", "MARINE HAIL (M)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*MARINE HIGH WIND.*", "MARINE HIGH WIND (M)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*MARINE STRONG WIND.*", "MARINE STRONG WIND (M)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*MARINE THUNDERSTORM.*", "MARINE THUNDERSTORM WIND (M)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(SLIDE|LANDSLUMP).*", "DEBRIS FLOW (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*RIP CURRENT.*", "RIP CURRENT (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*SEICHE.*", "SEICHE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("MARINE (MISHAP|ACCIDENT)", "SEICHE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(ROUGH |HEAVY )?SEA.*", "SEICHE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*SLEET.*", "SLEET (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*MIX.*", "SLEET (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*RAIN.*SNOW.*", "SLEET (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*SNOW.*RAIN.*", "SLEET (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*STORM SURGE.*", "STORM SURGE/TIDE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*COASTAL ?(STORM|SURGE).*", "STORM SURGE/TIDE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^(STRONG|GUSTY)? ?WINDS?.*", "STRONG WIND (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^WINDS?.*", "STRONG WIND (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^(SEVERE )?TH?UN?D?E.*", "THUNDERSTORM WIND (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(TSTM|APACHE COUNTY).*", "THUNDERSTORM WIND (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(BURST|WHIRLWIND|GUSTNADO).*", "THUNDERSTORM WIND (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^TORN.*", "TORNADO (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("LANDSPOUT.*", "TORNADO (C)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(TROPICAL DEPRESSION|GRADIENT WIND).*", "TROPICAL DEPRESSION (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*(TROPICAL STORM|STORM FORCE WINDS).*", "TROPICAL STORM (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*TSUNAMI.*", "TSUNAMI (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*VOLCANIC.*", "VOLCANIC ASH (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^WATERSPOUT.*", "WATERSPOUT (M)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*FIRE.*", "WILDFIRE (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*WINTER WEATHER.*", "WINTER WEATHER (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^(LIGHT|BLOWING|RECORD)? ?SNOW(FALL)?$", "WINTER WEATHER (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("(LIGHT )?FREEZING (RAIN|DRIZZLE|SPRAY)$", "WINTER WEATHER (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub("^SNOW.*", "WINTER WEATHER (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*WINTER.*STORM.*", "WINTER STORM (Z)", stormdata_tidy$EVTYPE)
stormdata_tidy$EVTYPE <- gsub(".*WET.*", "WET", stormdata_tidy$EVTYPE)
length(unique(stormdata_tidy$EVTYPE))
## [1] 55

After cleaning, the EVTYPE column contains 55 unique values.

Data Processing for health impact

3.1. Calculate the total number of fatalities and injuries for each event type and choose the top 10 event types with the highest total fatalities and injuries.

top10_health_data <- stormdata_tidy %>%
  group_by(EVTYPE) %>%
  summarise(Total_Fatalities = sum(FATALITIES, na.rm = TRUE),
            Total_Injuries = sum(INJURIES, na.rm = TRUE)) %>%
  arrange(desc(Total_Fatalities + Total_Injuries)) %>%
  slice(1:10)

3.2 Check the top 10 health data

top10_health_data

3.3 Create a bar plot for the top 10 weather events by health impact, fatalities and injuries with different colors in the same plot

top10_health_data_long <- top10_health_data %>%
  pivot_longer(cols = c(Total_Fatalities, Total_Injuries), names_to = "Type", values_to = "Count")

Create a bar plot for the top 10 weather events by health impact, fatalities and injuries with different colors in the same plot

ggplot(top10_health_data_long, aes(x = reorder(EVTYPE, Count), y = Count, fill = Type)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(title = "Top 10 Weather Events by Health Impact (Fatalities + Injuries)",
       x = "Weather Event Type",
       y = "Count") +
  scale_fill_manual(values = c("Total_Fatalities" = "red", "Total_Injuries" = "blue")) +
  theme_minimal()

### Result: The plot shows the top 10 weather events by health impact, with fatalities in red and injuries in blue. The x-axis represents the weather event type, while the y-axis represents the count of fatalities and injuries. The plot is flipped to make it easier to read the event types. The top 10 weather events are displayed in descending order of total fatalities and injuries. * The plot shows that “TORNADO (C)” has the highest total fatalities and injuries, followed by “EXCESSIVE HEAT (Z)” and “FLOOD (C)”. * The plot also shows that “WINTER WEATHER (Z)” and “WINTER STORM (Z)” have a significant number of injuries, while “HAIL (C)” has a high number of fatalities. * The plot also shows that “WINTER WEATHER (Z)” and “WINTER STORM (Z)” have a significant number of injuries, while “HAIL (C)” has a high number of fatalities.

4 Data analysis for economic impact

4.1. Convert the PROPDMGEXP and CROPDMGEXP columns to numeric values

Check the structure of stormdata_tidy
str(stormdata_tidy)
## 'data.frame':    254632 obs. of  10 variables:
##  $ BGN_DATE  : Date, format: "1950-04-18" "1950-04-18" ...
##  $ END_DATE  : Date, format: NA NA ...
##  $ EVTYPE    : chr  "TORNADO (C)" "TORNADO (C)" "TORNADO (C)" "TORNADO (C)" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  NA NA NA NA ...

4.2 Fill the missing values in the PROPDMGEXP and CROPDMGEXP column with “0”

stormdata_tidy$CROPDMGEXP[is.na(stormdata_tidy$CROPDMGEXP)] <- "0"
stormdata_tidy$PROPDMGEXP[is.na(stormdata_tidy$PROPDMGEXP)] <- "0"

4.3 Check the unique values in the CROPDMGEXP and PROPDMGEXP column for the conversion

unique(stormdata_tidy$CROPDMGEXP)
## [1] "0" "M" "K" "m" "B" "k"
unique(stormdata_tidy$PROPDMGEXP)
##  [1] "K" "M" "0" "B" "m" "+" "5" "6" "4" "h" "2" "7" "3" "H" "-"
4.4 Create a new column for crop and property damage
#Create a new column for crop damage
stormdata_tidy$CROPDMGEXP <- toupper(stormdata_tidy$CROPDMGEXP)
unique(stormdata_tidy$CROPDMGEXP)
## [1] "0" "M" "K" "B"
stormdata_tidy$CROPDMGVAL <- stormdata_tidy$CROPDMG
stormdata_tidy$CROPDMGVAL[stormdata_tidy$CROPDMGEXP == "K"] <- stormdata_tidy$CROPDMG[stormdata_tidy$CROPDMGEXP == "K"] * 1000
stormdata_tidy$CROPDMGVAL[stormdata_tidy$CROPDMGEXP == "M"] <- stormdata_tidy$CROPDMG[stormdata_tidy$CROPDMGEXP == "M"] * 1000000
stormdata_tidy$CROPDMGVAL[stormdata_tidy$CROPDMGEXP == "B"] <- stormdata_tidy$CROPDMG[stormdata_tidy$CROPDMGEXP == "B"] * 1000000000
stormdata_tidy$CROPDMGVAL[stormdata_tidy$CROPDMGEXP == "H"] <- stormdata_tidy$CROPDMG[stormdata_tidy$CROPDMGEXP == "H"] * 100
stormdata_tidy$CROPDMGVAL[stormdata_tidy$CROPDMGEXP == "0"] <- stormdata_tidy$CROPDMG[stormdata_tidy$CROPDMGEXP == "0"] * 1

# create a new column for property damage
stormdata_tidy$PROPDMGEXP <- toupper(stormdata_tidy$PROPDMGEXP)
unique(stormdata_tidy$PROPDMGEXP)
##  [1] "K" "M" "0" "B" "+" "5" "6" "4" "H" "2" "7" "3" "-"
stormdata_tidy$PROPDMGVAL <- stormdata_tidy$PROPDMG
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "K"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "K"] * 1000
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "M"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "M"] * 1000000
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "B"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "B"] * 1000000000
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "H"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "H"] * 100
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "0"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "0"] * 1
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "2"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "2"] * 100
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "3"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "3"] * 1000
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "4"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "4"] * 10000
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "5"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "5"] * 100000
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "6"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "6"] * 1000000
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "7"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "7"] * 10000000
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "+"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "+"] * 1
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "-"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "-"] * 1
stormdata_tidy$PROPDMGVAL[stormdata_tidy$PROPDMGEXP == "?"] <- stormdata_tidy$PROPDMG[stormdata_tidy$PROPDMGEXP == "?"] * 1

4.5 Create a new column for total damage and check the summary of the new data frame

# Create a new column for total damage
stormdata_tidy$TOTALDMGVAL <- stormdata_tidy$PROPDMGVAL + stormdata_tidy$CROPDMGVAL
# Check the structure of the new data frame
str(stormdata_tidy)
## 'data.frame':    254632 obs. of  13 variables:
##  $ BGN_DATE   : Date, format: "1950-04-18" "1950-04-18" ...
##  $ END_DATE   : Date, format: NA NA ...
##  $ EVTYPE     : chr  "TORNADO (C)" "TORNADO (C)" "TORNADO (C)" "TORNADO (C)" ...
##  $ STATE      : chr  "AL" "AL" "AL" "AL" ...
##  $ FATALITIES : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES   : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG    : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP : chr  "K" "K" "K" "K" ...
##  $ CROPDMG    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP : chr  "0" "0" "0" "0" ...
##  $ CROPDMGVAL : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PROPDMGVAL : num  25000 2500 25000 2500 2500 2500 2500 2500 25000 25000 ...
##  $ TOTALDMGVAL: num  25000 2500 25000 2500 2500 2500 2500 2500 25000 25000 ...
# Check the summary of the new data frame
summary(stormdata_tidy)
##     BGN_DATE             END_DATE             EVTYPE         
##  Min.   :1950-01-03   Min.   :1993-01-01   Length:254632     
##  1st Qu.:1997-01-23   1st Qu.:2000-05-24   Class :character  
##  Median :2002-08-02   Median :2005-03-07   Mode  :character  
##  Mean   :2000-06-13   Mean   :2004-08-14                     
##  3rd Qu.:2008-05-07   3rd Qu.:2009-01-07                     
##  Max.   :2011-11-30   Max.   :2011-11-30                     
##                       NA's   :50928                          
##     STATE             FATALITIES           INJURIES            PROPDMG       
##  Length:254632      Min.   :  0.00000   Min.   :   0.0000   Min.   :   0.00  
##  Class :character   1st Qu.:  0.00000   1st Qu.:   0.0000   1st Qu.:   2.00  
##  Mode  :character   Median :  0.00000   Median :   0.0000   Median :   5.00  
##                     Mean   :  0.05948   Mean   :   0.5519   Mean   :  42.75  
##                     3rd Qu.:  0.00000   3rd Qu.:   0.0000   3rd Qu.:  25.00  
##                     Max.   :583.00000   Max.   :1700.0000   Max.   :5000.00  
##                                                                              
##   PROPDMGEXP           CROPDMG         CROPDMGEXP          CROPDMGVAL       
##  Length:254632      Min.   :  0.000   Length:254632      Min.   :0.000e+00  
##  Class :character   1st Qu.:  0.000   Class :character   1st Qu.:0.000e+00  
##  Mode  :character   Median :  0.000   Mode  :character   Median :0.000e+00  
##                     Mean   :  5.411                      Mean   :1.928e+05  
##                     3rd Qu.:  0.000                      3rd Qu.:0.000e+00  
##                     Max.   :990.000                      Max.   :5.000e+09  
##                                                                             
##    PROPDMGVAL         TOTALDMGVAL       
##  Min.   :0.000e+00   Min.   :0.000e+00  
##  1st Qu.:2.000e+03   1st Qu.:2.500e+03  
##  Median :1.000e+04   Median :1.000e+04  
##  Mean   :1.682e+06   Mean   :1.875e+06  
##  3rd Qu.:3.500e+04   3rd Qu.:5.000e+04  
##  Max.   :1.150e+11   Max.   :1.150e+11  
## 
# Check the first few rows of the new data frame
head(stormdata_tidy)

Check the top 10 weather events by economic impact

top10_economic_data <- stormdata_tidy %>%
  group_by(EVTYPE) %>%
  summarise(Total_Property_Damage = sum(PROPDMGVAL, na.rm = TRUE),
            Total_Crop_Damage = sum(CROPDMGVAL, na.rm = TRUE),
            Total_Damage = sum(TOTALDMGVAL, na.rm = TRUE)) %>%
  arrange(desc(Total_Damage)) %>%
  slice(1:10)

4.6 Check the top 10 economic data

top10_economic_data

4.7 Create a bar plot for the top 10 weather events by economic impact, property damage and crop damage with different colors in the same plot

top10_economic_data_long <- top10_economic_data %>%
  pivot_longer(cols = c(Total_Property_Damage, Total_Crop_Damage), names_to = "Type", values_to = "Count")
# Create a bar plot for the top 10 weather events by economic impact, property damage and crop damage with different colors in the same plot
ggplot(top10_economic_data_long, aes(x = reorder(EVTYPE, Count), y = Count, fill = Type)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(title = "Top 10 Weather Events by Economic Impact (Property Damage + Crop Damage)",
       x = "Weather Event Type",
       y = "Count") +
  scale_fill_manual(values = c("Total_Property_Damage" = "red", "Total_Crop_Damage" = "blue")) +
  theme_minimal()

### Result: The plot shows the top 10 weather events by economic impact, with property damage in red and crop damage in blue. The x-axis represents the weather event type, while the y-axis represents the count of property damage and crop damage. The plot is flipped to make it easier to read the event types. 1 The top 10 weather events are displayed in descending order of total property damage and crop damage. 2 The plot shows that “FLOOD (C)” has the highest total property damage and crop damage, followed by “HURRICANE/TYPHOON (Z)” and “TORNADO (C)”. *3 The plot also shows that “HAIL (C)” has a significant amount of property damage, while “DROUGHT (Z)” has a high amount of crop damage.

Conclusion:

The analysis shows that tornadoes have the highest impact on health, with the highest number of fatalities and injuries. Floods have the highest economic impact, with the highest property and crop damage. The analysis also shows that hurricanes/typhoons and excessive heat have a significant impact on both health and economy. 1 The analysis also shows that floods have a significant impact on both health and economy, with a high number of injuries and property damage. 2 The analysis also shows that winter weather events have a significant impact on health, with a high number of injuries, while hail has a high number of fatalities. 3 The analysis also shows that drought has a significant impact on crop damage, with a high amount of crop damage. 4 The analysis also shows that winter weather events have a significant impact on health, with a high number of injuries, while hail has a high number of fatalities.