Synopsis

Every year the countries are hit by climatic events, generating great human and economic losses. The need to be prepared and generate public policies to face these events is a priority to avoid great losses. This is why an analysis of the climatic events in the United States will be carried out, with data obtained from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) to estimate which are the climatic events that generate the most losses in this country.

Data Processing

The libraries that will be used are

library(tidyr)
library(dplyr)
library(ggplot2)
library(gridExtra)

First we download the data and save it in the data folder. Then using read_table, with sep = “,” and header = TRUE, we load the data.

#Get the data
URL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
filename <- paste("./data", "/", "data.bz2", sep = "" )
dir.create("data")
download.file(url = URL, destfile = filename)


#Read the data
data <- read.table("data/data.bz2", sep = ",", header =  TRUE)
colnames(data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

When reviewing the data using colnames (data) we realize that there are many variables that will not be used, this is why we will be left with:

data2 <- data %>% select(EVTYPE, FATALITIES, INJURIES, PROPDMG,PROPDMGEXP, CROPDMG,CROPDMGEXP)

When performing data2$PROPDMGEXP and data2$CROPDMGEXP we realize that there are non-numerical values, expressing millions, billions, among others. This is why they will be modified by changing them to integer values.

data3 <- data2 %>% mutate(PROPDMGEXP = replace(PROPDMGEXP,PROPDMGEXP == "K",3),
                          PROPDMGEXP = replace(PROPDMGEXP,PROPDMGEXP == "k",3),
                          PROPDMGEXP = replace(PROPDMGEXP,PROPDMGEXP == "M",6),
                          PROPDMGEXP = replace(PROPDMGEXP,PROPDMGEXP == "m",6),
                          PROPDMGEXP = replace(PROPDMGEXP,PROPDMGEXP == "B",9),
                          PROPDMGEXP = replace(PROPDMGEXP,PROPDMGEXP == "h",2),
                          PROPDMGEXP = replace(PROPDMGEXP,PROPDMGEXP == "H",2),
                          PROPDMGEXP = as.integer(PROPDMGEXP),
                          PROPDMGEXP = replace(PROPDMGEXP,is.na(PROPDMGEXP),0),
                          CROPDMGEXP = replace(CROPDMGEXP,CROPDMGEXP == "K",3),
                          CROPDMGEXP = replace(CROPDMGEXP,CROPDMGEXP == "k",3),
                          CROPDMGEXP = replace(CROPDMGEXP,CROPDMGEXP == "M",6),
                          CROPDMGEXP = replace(CROPDMGEXP,CROPDMGEXP == "m",6),
                          CROPDMGEXP = replace(CROPDMGEXP,CROPDMGEXP == "B",9),
                          CROPDMGEXP = as.integer(CROPDMGEXP),
                          CROPDMGEXP = replace(CROPDMGEXP,is.na(CROPDMGEXP),0),
                          INJURIES = as.integer(INJURIES),
                          FATALITIES = as.integer(FATALITIES))
## Warning in mask$eval_all_mutate(dots[[i]]): NAs introducidos por coerci

## Warning in mask$eval_all_mutate(dots[[i]]): NAs introducidos por coerci
data4 <- data3 %>% 
  mutate(PROPDMG = PROPDMG*10^PROPDMGEXP,
         CROPDMG = CROPDMG*10^CROPDMGEXP) %>%
  select(-PROPDMGEXP,-CROPDMGEXP) %>%
  group_by(EVTYPE) %>%
  summarise(total_fatalities = sum(FATALITIES, na.rm = TRUE),
            total_injuries = sum(INJURIES, na.rm = TRUE),
            total_propdmg = sum(PROPDMG, na.rm = TRUE),
            total_cropdmg = sum(CROPDMG, na.rm = TRUE)) 

After replacing the values, columns were obtained by multiplying the columns in the form PROPDMG*10^PROPDMGEXP. These data were grouped by event and the sum of the damages was obtained accordingly.

Results

The results obtained were the following

Fatalities and Injuries per event.

As can be seen, tornadoes were the most disastrous weather events considering loss and damage to people. In addition, another major cause of loss of human life was events related to high temperatures and flash floods. While TSTM wind and Flood were the ones that caused the most injuries

Property and crop losses per event.

Unlike the loss and damage to people caused by the tornado, in this case floods and drought were the events that caused the most damage to property and crops.

Conclusion

As a conclusion we see that climatic events do not affect in the same way. While tornadoes cause the most damage to people, property and crops are affected the most by drought and floods.