Prepared by: Sanny Garin Jr
Data Processing
Reading data
data <- read.csv(bzfile("data/repdata_data_StormData.csv.bz2"))
head(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
Question 1
Across the United States, which types of events (as indicated in the
EVTYPE are most harmful with respect to population health?
Importing/intalling Libraries
if (!requireNamespace("dplyr", quietly = TRUE)) {
install.packages("dplyr")
}
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
if (!requireNamespace("ggplot2", quietly = TRUE)) {
install.packages("ggplot2")
}
library(ggplot2)
In order to determine what types of events are most harmful to
population heath we need to subset the data.
We selected the columns “EVTYPE” and “FATALITIES”
We will find out the total sum of Fatalities grouped by types of
events
Results
Ploting the data
This Bar graph shows that the most harmful events in the United
States with respect to population heath is Tornado
followed by Excessive Heat and so on.
# Create the bar plot
ggplot(Top_10_by_fatalities, aes(x = reorder(EVTYPE, Total), y = Total)) +
geom_bar(stat = "identity") +
geom_text(aes(label = Total), vjust = -0.1, hjust = -0.1) +
theme_minimal() +
labs(title = "Top 10 Events by Fatalities", x = "EVTYPE", y = "FATALITIES") +
coord_flip() + # Optional: Flip coordinates for better readability if there are many categories
ylim(0, 5900)

Question 2
Across the United States, which types of events have the greatest
economic consequences?
In order to answer this question we need to choose the column in the
data that would have a great impact to the economy. Then we need to
subset the data.
We selected the columns “EVTYPE” and “CROPDMG”
We will find out the total sum of Crop Damages grouped by types of
events
Results
Ploting the data
This Bar graph shows which types of events have the greatest
economic consequence in the United States, The most impactful is
HAIL followed by Flash flood and so on.
ggplot(Top_10_by_cropdmg, aes(x = reorder(EVTYPE, Total), y = Total)) +
geom_bar(stat = "identity") +
geom_text(aes(label = Total), vjust = -0.1, hjust = -0.1) +
theme_minimal() +
labs(title = "Top 10 Events by impact on Crops", x = "EVTYPE", y = "CROPDMG") +
coord_flip() + # Optional: Flip coordinates for better readability if there are many categories
ylim(0, 700000)

Thank you!
End of file.