Assignment Description

Overview

Climate change and science has been an issue for discussion and debate for at least the last decade. Climate data collection is currently being collected for areas all over the world. Policy decisions are based on the most recent analysis conducted on data extracted from huge online repositories of this data. Due to the inherent growth in the electronic production and storage of information, there is often a feeling of “information overload” or inundation when facing the process of quantitative decision making. As an analyst your job will often be to explore large data sets and develop questions or ideas from visualizations of those data sets (The information is cited from the Lab 2 - Data Exploration and Analysis Laboratory).

Data scientists play a pivotal role in this scenario, as they are tasked with exploring large datasets, synthesizing relevant information, and developing questions or ideas based on their findings. To overcome the challenges of information inundation, data scientists must harness the power of visualization techniques to distill complex datasets into clear, comprehensible, and actionable insights. Effective visualization not only facilitates the extraction of pertinent knowledge but also enables scientists to communicate their findings to policymakers, stakeholders, and the public in an easily digestible manner.

Objective

The aim of this lab is to investigate existing data, conceptualize, design, and develop an information dashboard or presentation that not only delves into the data but also assists in generating questions based on the data exploration. To achieve this goal, several steps must be undertaken:

Determine the aspects of climate change that pique your interest.

Locate, gather, arrange, and condense the required data to formulate your data exploration strategy.

Design and construct a minimum of five suitable visualizations to examine the data and convey the information effectively.

Organize the arrangement of these visualizations within a dashboard (utilizing the flexdashboard package) in a manner that demonstrates your data exploration journey.

Formulate four questions or ideas about climate change derived from your visualizations.

Row

Getting Data

Numerous sources are available to obtain climate data to address your inquiries. One of the most straightforward sources is the NOAA National Centers for Environmental Information (https://www.ncdc.noaa.gov/), which offers a plethora of data types, including regional, global, and marine. Additionally, the NOAA homepage provides links to other websites containing climate data, such as (https://www.climate.gov/), (https://www.weather.gov/), (https://www.drought.gov/drought/), and (https://www.globalchange.gov/). While it is not necessary to utilize all these sources, browsing through them could spark ideas for formulating your questions.

For a more professional approach, numerous R packages are available that grant access to climate data. This method allows for a more streamlined and efficient data acquisition process, simplifying the overall data exploration journey.

library(ggplot2)
library(usmap)
library(rjson)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

The Dynamics of Global Snow Coverage

snow <- read.csv("https://www.ncei.noaa.gov/access/monitoring/snow-and-ice-extent/snow-cover/namgnld/2/data.csv", skip=4)
snow$date <- as.Date(as.character(snow$Date), format = "%Y")

average_anomaly_1967_20yrs <- mean(subset(snow, Date >= 1967 & Date <= 1992)$Anomaly)
average_anomaly_2022_20yrs <- mean(subset(snow, Date >= 2002 & Date <= 2022)$Anomaly)
background_cleanup <- theme(panel.grid.major = element_blank(),
                            panel.grid.minor = element_blank(),
                            panel.background = element_blank(),
                            axis.line.x = element_line(color = 'black'),
                            axis.line.y = element_line(color = 'black'),
                            legend.key = element_rect(fill = 'white'))
ggplot(snow, aes(x = date, y = Anomaly)) +
  geom_point(color = "#1f77b4", size = 2, alpha = 0.7) +
  geom_hline(yintercept = average_anomaly_1967_20yrs, color = 'forestgreen', size = 1) +
  geom_hline(yintercept = average_anomaly_2022_20yrs, color = "firebrick", size = 1) +
  ggtitle("Global Snow Coverage From 1967 to 2022") + xlab("Year") + ylab("Anomaly") +
  scale_x_date(date_labels = "%Y", date_breaks = "5 years") + background_cleanup +
  theme(plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
        axis.title.x = element_text(size = 12, face = "bold"),
        axis.title.y = element_text(size = 12, face = "bold"),
        axis.text.x = element_text(size = 10),
        axis.text.y = element_text(size = 10),
        legend.position = "none")

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.

Column

The green line indicates the average snow cover anomaly from 1967 to 1987, while the red line represents the same parameter for the period between 2002 and 2022.

The Comparison of Average Temperatures in the U.S.: 100 Years Ago and Now

Column

Average Temperature 100 Years Ago: 1918 - 1923

avg_t_100y_ago <- read.csv("https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/statewide/mapping/110-tavg-192312-60.csv", skip=3)

colnames(avg_t_100y_ago)[which(names(avg_t_100y_ago) == "Location")] <- "state"
plot_usmap(data = avg_t_100y_ago, values = "Value", color = "gray") + 
  scale_fill_continuous(
    low = "lightblue", high = "red", name = "Average Temperature 100 Years Ago: 1918 - 1923", label = scales::comma
  ) + theme(legend.position = "top")

Average Temperature in the U.S. During the Last Five Years

avg_t_last5years <- read.csv("https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/statewide/mapping/110-tavg-202301-60.csv", skip=3)

colnames(avg_t_last5years)[which(names(avg_t_last5years) == "Location")] <- "state"

plot_usmap(data = avg_t_last5years, values = "Value", color = "gray") + 
  scale_fill_continuous(
    low = "lightblue", high = "red", name = "Average Temperature in the United States in 2018-2023", label = scales::comma
  ) + theme(legend.position = "top")

Drought Severity Index Comparison Analysis

years <- c("1922", "1972", "2022")
months <- c("01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12")

full_df <- data.frame()

for (x in 1:length(years)) {
  drought <- rep(0, times=12)
  for (y in 1:length(months)) {
    input <- fromJSON(file=paste(paste(gsub(" ","",paste("https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/national/mapping/110-pdsi-", years[x], seq=" ")), months[y], sep = ""), "-1.json", sep = ""))
    drought[y] <- as.double(input$data$`110`$value)
  }
  year_column <- rep(years[x], times=12)
  temp_df <- data.frame(year_column, months, drought)
  full_df <- rbind(full_df, temp_df)
}

names(full_df) <- c("Year", "Month", "DroughtSeverityIndex")

full_df$Month <- factor(full_df$Month, levels = c("01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12"), labels = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"))

ggplot() +
  geom_point(data=full_df, mapping=aes(x=Month, y=DroughtSeverityIndex, color=Year, group=Year), size=3, shape=21, fill="white") +
  geom_line(data=full_df, mapping=aes(x=Month, y=DroughtSeverityIndex, color=Year, group=Year), size=1.2, linetype="solid") +
  scale_color_manual(values=c("1922"="darkblue", "1972"="purple", "2022"="darkred")) +
  labs(title="Drought Severity Index Change During 1922 - 2022 Years",
       x="Month",
       y="Drought Severity Index") +
  theme_minimal() +
  theme(plot.title=element_text(face="bold", size=16),
        axis.title=element_text(face="bold", size=12),
        axis.text=element_text(size=10),
        axis.text.x = element_text(angle=45, hjust=1),
        legend.position="right",
        legend.title=element_text(face="bold", size=12)) +
  scale_x_discrete(labels = levels(full_df$Month))

Trend Analysis of the National U.S. Average Temperature

avg_temp <- read.csv("https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/national/time-series/110/tavg/ytd/1/1895-2023.csv", skip=3)
avg_temp$Year <- as.integer(substring(as.character(avg_temp$Date),1,4))

ggplot(avg_temp, aes(x=Year, y=Value)) +
  geom_line(color="darkred", size=1.5, linetype="solid") +
  geom_point(size=3, color="steelblue", shape=21, fill="white") +
  stat_smooth(method="lm",
              formula=y ~ x,
              geom="smooth",
              color="darkgreen",
              linetype="dashed",
              size=1) +
  labs(title="National Annual Average Temperature",
       x="Year",
       y="Average Temperature",
       caption="Source: NOAA National Centers for Environmental Information") +
  theme_minimal() +
  theme(plot.title=element_text(face="bold", size=16),
        axis.title=element_text(face="bold", size=12),
        axis.text=element_text(size=10),
        legend.position="bottom",
        plot.caption=element_text(size=8, hjust=1))

Frequency of Occurrence of Billion-Dollar Disasters During the Last 42 Years

weather <- read.csv("https://www.nodc.noaa.gov/archive/arc0153/0209268/13.13/data/0-data/events-US-1980-2022.csv", skip=1)
weather <- weather %>% mutate(Year = as.integer(substring(Begin.Date, 1, 4)))
weather$Disaster <- factor(weather$Disaster)

ggplot(weather, aes(x = Year, fill = Disaster)) +
  geom_bar(stat = "count", width = 0.8) + 
  scale_fill_brewer(palette = "Set2") + 
  labs(title = "Frequency of Occurrence of Billion-Dollar Disasters, from 1980 to 2022",
       x = "Year",
       y = "Frequency") +
  background_cleanup +
  scale_x_continuous(breaks = seq(min(weather$Year), max(weather$Year), by = 5)) +
  theme(plot.title = element_text(face = "bold", size = 16),
        axis.title = element_text(face = "bold", size = 12),
        axis.text = element_text(size = 10),
        legend.position = "right",
        legend.title = element_text(face = "bold", size = 12))

Questions and Ideas About Climate Change

What is the effect of climate change on the global snow coverage? We can see that the volume of snow coverage has been increased in the last 20 years. The anomalies associated with global warming lead to the situation, when more global ice have melted. At the same time, the snow coverage has increased.

What is the dynamics of the number of disasters in 1980 - 2022? The stacked bar chart indicated a rapid increase in the number of billion-cost disasters occurred during the mentioned period. Started from few disasters per year, the total number of disasters reached its maximum in 2018 (23 disasters). It is notable that a considerable proportion of such disasters in recent decades is associated with severe storms.

How the situation with drought has been changing over the last 100 years? Comparing a 100-year period from 1922 to 2022, with 50-year intervals across months, reveals that most months in 2022 experienced negative measurements. In contrast, 1972 saw predominantly positive index values, with some even exceeding 3. The negative indices indicate that drought severity has become more concerning in recent years. Interestingly, the overall situation in 1922 was worse than in 1972, which suggests that nature can recover and that opportunities still exist to improve today’s drought conditions.

How Climate Change Affects Average Temperature in the U.S.? Despite the fact that the colors on two map charts differ slightly, we still can clearly see that the average temperature map in 2018-2023 became redder than the map of the average temperature of the 5-year period 100 years ago. This means that the average temperature increases with the time, and so the effect of global warming is significant.

Lab 2 (Climate Change Data Visualization

Assignment Description

Overview

Objective

Row

Getting Data

For a more professional approach, numerous R packages are available that grant access to climate data. This method allows for a more streamlined and efficient data acquisition process, simplifying the overall data exploration journey.

The Dynamics of Global Snow Coverage

Column

The Comparison of Average Temperatures in the U.S.: 100 Years Ago and Now

Column

Average Temperature 100 Years Ago: 1918 - 1923

Average Temperature in the U.S. During the Last Five Years

Drought Severity Index Comparison Analysis

Trend Analysis of the National U.S. Average Temperature

Frequency of Occurrence of Billion-Dollar Disasters During the Last 42 Years

Questions and Ideas About Climate Change