Synopsis
This R Markdown Document consist a analysis of damage caused by
Natural Calamities in the US. The
damage is analysis in terms of two themes: Impact on Population
Health and Impact on the Economy. The
impact on population health is assessed on two variables
viz. Fatalities and Injuries. The
impact on economy is assessed on another two variables
Crop Damage and Property Damage. The
results are shown using a single graph
of 4-Plots that show top ten type of natural calamities causing the
negative impact. Overall, Tornados seems to cause most
damage in all variables. Hence, pro-active steps are required to
minimise the damge from tornados.
1. Data Pre - Processing
1.1 Installing all required libraries
In the following command a simple method has been developed to load
all required libraries using a single command. And, we have hide all the
warning and package loading messages.
lib_names <- c("readr","dplyr","tidyr","ggplot2","ggpubr")
lapply(lib_names, require, character.only = TRUE)
1.2 Download Required Data to Working
Directory
Firstly, creating a destfile vector viz. a location
where the downloading data will be stored. Secondly, creating a
url vector of the web link from where the data is
goining to be downloaded. Finally, using download.file
function the data is being downloaded in current working directory, file
is named as raw_data_p2. The code output is currently
hidden and the file can take few minutes to process as the data file is
bit large. Also, we’ll do cache = TRUE so next we run
the same code it should’nt take much time.
destfile <- paste(getwd(),"/raw_data_P2.zip", sep = "")
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url = url, destfile = destfile)
1.3 Read Data
Generally, after downloading data in .zip format we
must unzip it first. However, using the
read_csv function from Tidyverse we
can directly read csv file in a zipped folder. Thus, we are doing this
here and we’ll name it as data1 for further
analysis.
data1 <- read_csv("raw_data_P2.zip")
## Rows: 902297 Columns: 37
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (18): BGN_DATE, BGN_TIME, TIME_ZONE, COUNTYNAME, STATE, EVTYPE, BGN_AZI,...
## dbl (18): STATE__, COUNTY, BGN_RANGE, COUNTY_END, END_RANGE, LENGTH, WIDTH, ...
## lgl (1): COUNTYENDN
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
*1.4 Data Cleaning
Usually tidy format is considered as clean data.
Thus, using as_tibble function from tidyverse we’ll
covert our data1 into a tidy format and we’ll name the
transformed version as data2.
data2 <- as_tibble(data1)
data2
## # A tibble: 902,297 x 37
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE BGN_RANGE
## <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 1 4/18/195~ 0130 CST 97 MOBILE AL TORNA~ 0
## 2 1 4/18/195~ 0145 CST 3 BALDWIN AL TORNA~ 0
## 3 1 2/20/195~ 1600 CST 57 FAYETTE AL TORNA~ 0
## 4 1 6/8/1951~ 0900 CST 89 MADISON AL TORNA~ 0
## 5 1 11/15/19~ 1500 CST 43 CULLMAN AL TORNA~ 0
## 6 1 11/15/19~ 2000 CST 77 LAUDERDALE AL TORNA~ 0
## 7 1 11/16/19~ 0100 CST 9 BLOUNT AL TORNA~ 0
## 8 1 1/22/195~ 0900 CST 123 TALLAPOOSA AL TORNA~ 0
## 9 1 2/13/195~ 2000 CST 125 TUSCALOOSA AL TORNA~ 0
## 10 1 2/13/195~ 2000 CST 57 FAYETTE AL TORNA~ 0
## # ... with 902,287 more rows, and 28 more variables: BGN_AZI <chr>,
## # BGN_LOCATI <chr>, END_DATE <chr>, END_TIME <chr>, COUNTY_END <dbl>,
## # COUNTYENDN <lgl>, END_RANGE <dbl>, END_AZI <chr>, END_LOCATI <chr>,
## # LENGTH <dbl>, WIDTH <dbl>, F <dbl>, MAG <dbl>, FATALITIES <dbl>,
## # INJURIES <dbl>, PROPDMG <dbl>, PROPDMGEXP <chr>, CROPDMG <dbl>,
## # CROPDMGEXP <chr>, WFO <chr>, STATEOFFIC <chr>, ZONENAMES <chr>,
## # LATITUDE <dbl>, LONGITUDE <dbl>, LATITUDE_E <dbl>, LONGITUDE_ <dbl>, ...
2. Main Data Processing or Data Analysis
This part contain our main analysis or the experiement with the
variables in the dataset.
2.1 Most Harmful Events to Public Health
The following code extract top 10 most harmful events associated with
Fatalities and Injuries.
Fatalities <- data2 %>% group_by(EVTYPE) %>% summarise(FATALITIES=sum(FATALITIES)) %>% arrange(desc(FATALITIES)) %>%
head(10) %>% ggplot(aes(x=FATALITIES,y=EVTYPE)) + geom_col() +
ggtitle("FATALITIES")
Injuries <- data2 %>% group_by(EVTYPE) %>% summarise(INJURIES=sum(INJURIES)) %>% arrange(desc(INJURIES)) %>%
head(10) %>% ggplot(aes(x=INJURIES,y=EVTYPE)) + geom_col() +
ggtitle("INJURIES")
pop_h <- ggarrange(Fatalities, Injuries,
labels = c("A","B"),ncol=2,nrow=1)
2.2 Most Lossful Events to Economy
The following code extract top 10 most lossful creating events to the
economy. The variable assessed with event types are crop damage and
property damage.
Property_dmg <- data2 %>% group_by(EVTYPE) %>% summarise(PROPDMG=sum(PROPDMG)) %>% arrange(desc(PROPDMG)) %>%
head(10) %>% ggplot(aes(x=PROPDMG,y=EVTYPE)) + geom_col() +
ggtitle("PROPERTY DAMAGE")
Crop_dmg <- data2 %>% group_by(EVTYPE) %>% summarise(CROPDMG=sum(CROPDMG)) %>% arrange(desc(CROPDMG)) %>%
head(10) %>% ggplot(aes(x=CROPDMG,y=EVTYPE)) + geom_col() +
ggtitle("CROP DAMAGE")
eco_dmg <- ggarrange(Crop_dmg,Property_dmg, labels = c("C","D"),ncol = 2,nrow = 1)
3. Results
Using the ggarrange function we’ll plot all the
results on a single graph.
pol_h1 <- annotate_figure(pop_h, top = text_grob("Population Health", color = "red", size = 20, face = "bold"))
eco_dmg1 <- annotate_figure(eco_dmg, top = text_grob("Economic Damge", color = "red", size = 20, face = "bold"))
ggarrange(pol_h1, eco_dmg1, ncol = 1, nrow = 2)
