Prepared by: Sanny Garin Jr

Data Processing

Reading data

data <- read.csv(bzfile("data/repdata_data_StormData.csv.bz2"))

head(data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6

Question 1

Across the United States, which types of events (as indicated in the EVTYPE are most harmful with respect to population health?

Importing/intalling Libraries

if (!requireNamespace("dplyr", quietly = TRUE)) {
    install.packages("dplyr")
}

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
if (!requireNamespace("ggplot2", quietly = TRUE)) {
    install.packages("ggplot2")
}

library(ggplot2)

In order to determine what types of events are most harmful to population heath we need to subset the data.

We selected the columns “EVTYPE” and “FATALITIES”

We will find out the total sum of Fatalities grouped by types of events

Data Transformation

#Grouping Data
grouped_data <- group_by(data, EVTYPE)
#Aggregating it to get the sum per event type
summarized_data <- summarize(grouped_data, Total = sum(FATALITIES))
#Sorting the data to descending order
sorted_data <- arrange(summarized_data, desc(Total))
#Getting the top 10 Event Type by Fatalities
Top_10_by_fatalities <- head(sorted_data, 10)
#Printing 
print(Top_10_by_fatalities)
## # A tibble: 10 × 2
##    EVTYPE         Total
##    <chr>          <dbl>
##  1 TORNADO         5633
##  2 EXCESSIVE HEAT  1903
##  3 FLASH FLOOD      978
##  4 HEAT             937
##  5 LIGHTNING        816
##  6 TSTM WIND        504
##  7 FLOOD            470
##  8 RIP CURRENT      368
##  9 HIGH WIND        248
## 10 AVALANCHE        224

Results

Ploting the data

This Bar graph shows that the most harmful events in the United States with respect to population heath is Tornado followed by Excessive Heat and so on.

# Create the bar plot
ggplot(Top_10_by_fatalities, aes(x = reorder(EVTYPE, Total), y = Total)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = Total), vjust = -0.1, hjust = -0.1) + 
  theme_minimal() +
  labs(title = "Top 10 Events by Fatalities", x = "EVTYPE", y = "FATALITIES") +
  coord_flip() +  # Optional: Flip coordinates for better readability if there are many categories
  ylim(0, 5900)

Question 2

Across the United States, which types of events have the greatest economic consequences?

In order to answer this question we need to choose the column in the data that would have a great impact to the economy. Then we need to subset the data.

We selected the columns “EVTYPE” and “CROPDMG”

We will find out the total sum of Crop Damages grouped by types of events

Data Transformation

#Grouping Data
grouped_data <- group_by(data, EVTYPE)
#Aggregating it to get the sum per event type
summarized_data <- summarize(grouped_data, Total = sum(CROPDMG))
#Sorting the data to descending order
sorted_data <- arrange(summarized_data, desc(Total))
#Getting the top 10 Event Type by Fatalities
Top_10_by_cropdmg <- head(sorted_data, 10)
#Printing 
print(Top_10_by_cropdmg)
## # A tibble: 10 × 2
##    EVTYPE               Total
##    <chr>                <dbl>
##  1 HAIL               579596.
##  2 FLASH FLOOD        179200.
##  3 FLOOD              168038.
##  4 TSTM WIND          109203.
##  5 TORNADO            100019.
##  6 THUNDERSTORM WIND   66791.
##  7 DROUGHT             33899.
##  8 THUNDERSTORM WINDS  18685.
##  9 HIGH WIND           17283.
## 10 HEAVY RAIN          11123.

Results

Ploting the data

This Bar graph shows which types of events have the greatest economic consequence in the United States, The most impactful is HAIL followed by Flash flood and so on.

ggplot(Top_10_by_cropdmg, aes(x = reorder(EVTYPE, Total), y = Total)) +
    geom_bar(stat = "identity") +
    geom_text(aes(label = Total), vjust = -0.1, hjust = -0.1) + 
    theme_minimal() +
    labs(title = "Top 10 Events by impact on Crops", x = "EVTYPE", y = "CROPDMG") +
    coord_flip()  + # Optional: Flip coordinates for better readability if there are many categories
    ylim(0, 700000)

Thank you!

End of file.