Synopsis

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The analysis of the fatalities and injuries specifies tornado as the most harmful weather events for population. In the same time, the biggest damage for the property and crop was caused by thunderstorm winds.

Data processing

Preparing of database

  1. Setting working directory
options(tinytex.verbose = TRUE)
setwd("C:/Users/rikig/OneDrive/Рабочий стол/project R 5.2")
getwd()

[1] “C:/Users/rikig/OneDrive/Рабочий стол/project R 5.2”

  1. Downloading the data from the link and saving it in csv format
options(tinytex.verbose = TRUE)

StormData <- "repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists(StormData)){
  fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  download.file(fileUrl, destfile = file.path("C:/Users/rikig/OneDrive/Рабочий стол/project R 5.2/", "repdata%2Fdata%2FStormData.csv.bz2")
                , method="curl")
}

Analysis of population harm

  1. Reading the data and saving it as table in environment
options(tinytex.verbose = TRUE)
StormDataTable <- read.csv(StormData)
  1. Selecting needed columns - type of event, number of fatalities and injuries
options(tinytex.verbose = TRUE)
library('dplyr')
head(StormData)

[1] “repdata%2Fdata%2FStormData.csv.bz2”

PopulationDamage <- StormDataTable %>% select(,c(8,23,24,))
  1. Grouping by the name of events and summarizing the damage of each of them (the sum of fatalities and injuries will be saved in one column called “Impact on Population) Then arranging in descending order by impact on population and choosing only top 5 of them for analysis
options(tinytex.verbose = TRUE)
PopulationDamage <- PopulationDamage %>%
  group_by(EVTYPE) %>%
  summarise(sum(FATALITIES, INJURIES))

colnames(PopulationDamage)[2] <- "Impact_on_Population"

PopulationDamage <- PopulationDamage %>%
    arrange(desc(Impact_on_Population))
PopulationDamage <- PopulationDamage[1:5,]
  1. Making plot for the population damage, the most harmful events will be in the left side
options(tinytex.verbose = TRUE)
library(ggplot2)

ggplot(PopulationDamage, aes(x=reorder(EVTYPE, -Impact_on_Population), y=Impact_on_Population)) + 
  geom_bar(stat = "identity", color="darkolivegreen3", fill= "darkolivegreen3") +
  xlab("Weather event") + 
  ylab("Population harm, num of cases")

Analysis of property damage

  1. Reading the data and saving as table in environment
options(tinytex.verbose = TRUE)
StormDataTable <- read.csv(StormData)
  1. Selecting needed columns - type of event, property and crop damage, pay attention that these numbers are shown in thousands, millions or billions, so selecting also the column of the measure indicators
library('dplyr')

PropertyDamage <- StormDataTable %>% select(,c(8,25,26,27,28))
  1. Calculating all expenses in thousands
options(tinytex.verbose = TRUE)
options(scipen=999)

PropertyDamage$PROPTHOUSANDS <- ifelse(PropertyDamage$PROPDMGEXP  == '', 0, 
                                       ifelse(PropertyDamage$PROPDMGEXP  == 'K',PropertyDamage$PROPDMG, 
                                              ifelse(PropertyDamage$PROPDMGEXP  == 'M',
                                                     PropertyDamage$PROPDMG*1000,PropertyDamage$PROPDMG* 1000000)))

PropertyDamage$CROPTHOUSANDS <- ifelse(PropertyDamage$CROPDMGEXP  == '', 0, 
                                   ifelse(PropertyDamage$CROPDMGEXP  == 'K',PropertyDamage$CROPDMG, 
                                          ifelse(PropertyDamage$CROPDMGEXP  == 'M',
                                                 PropertyDamage$CROPDMG*1000,PropertyDamage$CROPDMG* 1000000)))
  1. Choosing the relevant columns in the database - events, property and crop damage in thousands of dollars
options(tinytex.verbose = TRUE)
PropertyDamage <-  PropertyDamage %>% select(,c(1,6,7))
  1. Grouping by the name of events and checking the damage of each of them. Property and crop damage were summarized
options(tinytex.verbose = TRUE)
PropertyDamage <- PropertyDamage %>%
  group_by(EVTYPE) %>%
  summarise(sum(PROPTHOUSANDS,CROPTHOUSANDS ))

colnames(PropertyDamage)[2] <- "Impact_on_Property"

PropertyDamage <- PropertyDamage %>%
  arrange(desc(Impact_on_Property))
PropertyDamage <- PropertyDamage[1:5,]
  1. Making plot for the population damage, the most harmful events will be in the left side
options(tinytex.verbose = TRUE)
library(ggplot2)

ggplot(PropertyDamage, aes(x=reorder(EVTYPE, -Impact_on_Property), y=Impact_on_Property)) + 
  geom_bar(stat = "identity", color="salmon", fill= "salmon") +
  xlab("Weather event") + 
  ylab("Property damage, $K")

Results

The analysis of fatalities and injuries shows the following the most harmful for the population events: tornado, excessive heat, thunderstorm wind, flood and lighting, although we can see the tornado’s impact is the biggest one in top-5 the most harmful events As for the property the biggest damage was caused by thunderstorm winds, hail, tornado, flash flood and lighting. The maximum damage was received from thunderstorm winds Thus, we should pay attention to all of these types of weather events, taking in account that some of them are harmful either for population or property, and of course the most harmful events (tornado and thunderstorm winds) should get special attention.