Overview

Row

Records

Years Covered

Mean Average Temperature

12.9 °C

Mean CO2

422.7 ppm

Row

Dashboard purpose

This dashboard provides a comprehensive analysis of climate change patterns using a multi-dimensional approach that includes temporal trends, seasonal variation, missing-data handling, and inter-variable relationships. The objective is to transform raw climate data into meaningful insights that highlight changes in temperature, precipitation, and atmospheric conditions over time.

Missing values were addressed through a structured approach by first removing placeholder entries and then imputing values within each month. This method preserves seasonal patterns and improves the reliability of comparisons. The dashboard integrates multiple visualization techniques to support both trend identification and relationship analysis, enabling a clearer understanding of potential climate shifts.

Row

Average temperature trend

Annual climate indicators

CO2 vs temperature relationship

Row

Design rationale

The design of this dashboard follows established data visualization principles to ensure clarity, interpretability, and effective communication of insights. A structured layout was used, beginning with high-level KPIs to provide a quick overview, followed by detailed visualizations that support deeper analysis. This top-down approach allows users to first understand the overall dataset before exploring specific trends and relationships.

Time-series line charts were selected to represent temperature and CO2 trends, as they effectively highlight patterns and changes over time. Seasonal analysis is presented using aggregated monthly views to emphasize recurring patterns, while correlation heatmaps and scatterplots are used to explore relationships between variables. Each chart type was chosen based on its suitability for the underlying data and the analytical objective.

The dashboard design also emphasizes simplicity and readability. Following Tufte’s principle of minimizing non-essential elements, unnecessary visual clutter such as excessive gridlines and decorations was removed. Additionally, Wexler’s dashboard design guidelines were applied by maintaining consistent color usage, clear labeling, and logical grouping of related visuals. Together, these choices ensure that the dashboard communicates insights effectively while remaining accessible and easy to interpret.

Seasonality

Seasonal interpretation

A monthly view helps reveal recurring climate structure that annual averages alone can hide. Because the dataset is organized by month, imputing missing values within each month is more defensible than using one grand mean, since it preserves seasonal behavior more effectively.

Row

Average temperature by month

Monthly precipitation and humidity

Temperature variability by month

Row

Key seasonal insight

The analysis reveals clear seasonal patterns in temperature, with higher values observed toward the end of the year and lower values in mid-year months. While average temperature trends follow a consistent seasonal cycle, the variability plot indicates that certain months experience greater fluctuations, suggesting less stability during those periods.

Additionally, precipitation and humidity patterns show noticeable variation across months, indicating that seasonal environmental conditions are influenced by multiple interacting factors. Together, these patterns highlight the importance of considering both average trends and variability when analyzing climate behavior.

Missing Data + Relationships

Missing-value handling

The dataset contains both true missing values and placeholder entries that act like missing data. The chart below shows the amount of missingness before imputation. After month-based replacement and fallback to overall means where necessary, the dataset became suitable for comparative analysis.

Column

Correlation heatmap

Missing values before imputation

Top correlations with temperature

Row

Key missing data insight

The dataset contains moderate levels of missing values across several variables, particularly temperature-related fields. After applying month-based imputation, the dataset becomes more consistent and suitable for comparative analysis while preserving seasonal structure.

Data Tables

Notes for interpretation

Because the most recent year may contain only partial-year observations, annual averages should be interpreted with caution. The dashboard is therefore most useful for identifying broad temporal and seasonal patterns rather than making strong causal claims.

Row

Missing-value summary table

Cleaned climate dataset preview

Conclusion

Column

🔍 Key conclusion

Climate patterns are shaped by seasonality and multiple interacting variables
Temperature follows predictable seasonal trends
Variability and weak correlations suggest a complex system, not a single dominant driver

⚙️ Analytical approach

The problem was analyzed through:
- temporal trends
- seasonal behavior
- inter-variable relationships
Missing values were handled using month-based mean imputation
This approach preserves seasonal structure but may smooth natural variability

⚠️ Limitations and future work

The dataset is limited in size and time coverage
Observed relationships may be weaker because of limited data
Future work could include:
- larger datasets
- more climate variables
- predictive modeling

Row

References

Tufte, E. R. (2001). The visual display of quantitative information (2nd ed.). Graphics Press.
Wexler, S., Shaffer, J., & Cotgreave, A. (2017). The big book of dashboards: Visualizing your data using real-world business scenarios. Wiley.
Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for data science (2nd ed.). O’Reilly.
Intergovernmental Panel on Climate Change (IPCC). (2021). Climate change 2021: The physical science basis. Cambridge University Press.
Knaflic, C. N. (2015). Storytelling with data: A data visualization guide for business professionals. Wiley.
Little, R. J. A., & Rubin, D. B. (2019). Statistical analysis with missing data (3rd ed.). Wiley.

Row

🌍 Climate Data Analysis Complete

This dashboard summarizes key climate patterns and insights derived from the dataset.

Thank you

---
title: "Climate Change Dashboard Analysis"
author: "Lipi Thakker"
date: "2026-04-12"
output: 
  flexdashboard::flex_dashboard:
    orientation: rows
    vertical_layout: fill
    theme:
      version: 5
      bootswatch: flatly
    source_code: embed
---

<style>
.section.level1 {
  padding-top: 15px !important;
}

body {
  padding-top: 60px !important;
}

.dashboard-row {
  margin-top: 8px !important;
}

.chart-title {
  font-weight: 600 !important;
}
</style>

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE)

library(flexdashboard)
library(tidyverse)
library(plotly)
library(scales)
library(lubridate)
library(DT)
library(janitor)
library(htmltools)
library(janitor)
library(readr)
```

```{r data-prep}
climate_raw <- read_csv("climate_change_dataset.csv", show_col_types = FALSE)

climate_raw <- climate_raw %>%
  clean_names()

names(climate_raw) <- gsub("µ", "u", names(climate_raw))
```


```{r}

# Keep the original column names for the dashboard, but create a clean internal copy
climate <- climate_raw %>%
  mutate(across(-c(year, month), as.character)) %>%
  mutate(across(-c(year, month), ~na_if(., "Unknown"))) %>%
  mutate(across(-c(year, month), ~na_if(., "unknown"))) %>%
  mutate(across(-c(year, month), ~na_if(., "NAN"))) %>%
  mutate(across(-c(year, month), ~na_if(., "NaN"))) %>%
  mutate(across(-c(year, month), ~na_if(., "99999"))) %>%
  mutate(across(-c(year, month), as.numeric)) %>%
  mutate(date = ymd(paste(year, month, 1, sep = "-")))

num_cols <- climate %>%
  select(-year, -month, -date) %>%
  names()

missing_before <- climate %>%
  summarise(across(all_of(num_cols), ~sum(is.na(.)))) %>%
  pivot_longer(everything(), names_to = "variable", values_to = "missing_before")

# Impute within month first, then use overall mean for any month with all values missing
climate_imp <- climate %>%
  group_by(month) %>%
  mutate(across(all_of(num_cols), ~ifelse(is.na(.), mean(., na.rm = TRUE), .))) %>%
  ungroup() %>%
  mutate(across(all_of(num_cols), ~ifelse(is.nan(.), NA, .))) %>%
  mutate(across(all_of(num_cols), ~ifelse(is.na(.), mean(., na.rm = TRUE), .)))

missing_after <- climate_imp %>%
  summarise(across(all_of(num_cols), ~sum(is.na(.)))) %>%
  pivot_longer(everything(), names_to = "variable", values_to = "missing_after")

missing_summary <- missing_before %>%
  left_join(missing_after, by = "variable") %>%
  mutate(percent_missing = round(missing_before / nrow(climate) * 100, 1)) %>%
  arrange(desc(missing_before))

# KPIs
records_n <- nrow(climate_imp)
years_n <- n_distinct(climate_imp$year)
avg_temp_mean <- mean(climate_imp$`avg_temp_c`, na.rm = TRUE)
co2_mean <- mean(climate_imp$`co2_concentration_ppm`, na.rm = TRUE)
precip_mean <- mean(climate_imp$`precipitation_mm`, na.rm = TRUE)

# Long data for temperature time series
temp_long <- climate_imp %>%
  select(date, year, month, `avg_temp_c`, `max_temp_c`, `min_temp_c`) %>%
  pivot_longer(
    cols = c(`avg_temp_c`, `max_temp_c`, `min_temp_c`),
    names_to = "measure",
    values_to = "value"
  )

# Monthly seasonal profile
seasonal_summary <- climate_imp %>%
  group_by(month) %>%
  summarise(
    avg_temp = mean(`avg_temp_c`),
    avg_precip = mean(`precipitation_mm`),
    avg_humidity = mean(`humidity_percent`),
    .groups = "drop"
  ) %>%
  mutate(month_label = month.abb[month])

# Annual summary
annual_summary <- climate_imp %>%
  group_by(year) %>%
  summarise(
    avg_temp = mean(`avg_temp_c`),
    co2 = mean(`co2_concentration_ppm`),
    precipitation = mean(`precipitation_mm`),
    pm = mean(`particulate_matter_mg_m3`),
    sea_surface = mean(`sea_surface_temp_c`),
    .groups = "drop"
  )

# Correlation data
cor_mat <- climate_imp %>%
  select(all_of(num_cols)) %>%
  cor(use = "pairwise.complete.obs")

cor_df <- as.data.frame(as.table(cor_mat)) %>%
  rename(var1 = Var1, var2 = Var2, correlation = Freq)

# Strongest relationships with average temperature
strong_temp <- climate_imp %>%
  select(all_of(num_cols)) %>%
  cor(use = "pairwise.complete.obs") %>%
  as.data.frame() %>%
  rownames_to_column("variable") %>%
  select(variable, `avg_temp_c`) %>%
  arrange(desc(abs(`avg_temp_c`))) %>%
  slice(2:6)

# For display
climate_table <- climate_imp %>%
  arrange(date) %>%
  mutate(date = format(date, "%b %Y"))
```

Overview
=====================================

Row {data-height=20}
-------------------------------------
### Records
```{r}

valueBox(
  value = comma(records_n),
  caption = "Monthly observations in the dataset",
  icon = "fa-table",
  color = "primary"
)
```

### Years Covered
```{r}
valueBox(
  value = years_n,
  caption = "Years represented",
  icon = "fa-calendar",
  color = "info"
)
```

### Mean Average Temperature
```{r}
valueBox(
  value = paste0(round(avg_temp_mean, 1), " °C"),
  caption = "Overall mean average temperature",
  icon = "fa-temperature-half",
  color = "warning"
)
```

### Mean CO2
```{r}
valueBox(
  value = paste0(round(co2_mean, 1), " ppm"),
  caption = "Average CO2 concentration",
  icon = "fa-industry",
  color = "success"
)
```


Row {data-height=90}
-------------------------------------

### Dashboard purpose
This dashboard provides a comprehensive analysis of climate change patterns using a multi-dimensional approach that includes temporal trends, seasonal variation, missing-data handling, and inter-variable relationships. The objective is to transform raw climate data into meaningful insights that highlight changes in temperature, precipitation, and atmospheric conditions over time.

Missing values were addressed through a structured approach by first removing placeholder entries and then imputing values within each month. This method preserves seasonal patterns and improves the reliability of comparisons. The dashboard integrates multiple visualization techniques to support both trend identification and relationship analysis, enabling a clearer understanding of potential climate shifts.


Row {data-height=180}
-------------------------------------

### Average temperature trend
```{r}

p_avg <- ggplot(climate_imp, aes(x = date, y = avg_temp_c)) +
  geom_line(color = "#14B8A6", linewidth = 1) +
  geom_smooth(method = "loess", se = FALSE, color = "black", linewidth = 1) +
  labs(
    title = "Average temperature trend over time",
    subtitle = "Smoothed trend highlights long-term pattern",
    x = NULL,
    y = "Temperature (C)"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold")
  )
p_avg
```


### Annual climate indicators

```{r}
annual_long <- annual_summary %>%
  select(year, avg_temp, co2, precipitation) %>%
  pivot_longer(-year, names_to = "metric", values_to = "value")

p_annual <- ggplot(annual_long, aes(x = year, y = value, group = metric)) +
  geom_line(color = "#14B8A6", linewidth = 1) +
  geom_point(size = 2) +
  facet_wrap(
    ~metric,
    scales = "free_y",
    ncol = 1,
    labeller = as_labeller(c(
      avg_temp = "Average temperature",
      co2 = "CO2 concentration",
      precipitation = "Precipitation"
    ))
  ) +
  labs(
    title = "Annual averages for key indicators",
    x = NULL,
    y = NULL
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold"),
    strip.text = element_text(face = "bold"),
    panel.grid.minor = element_blank()
  )

p_annual
```

### CO2 vs temperature relationship

```{r}

p_scatter <- ggplot(
  climate_imp,
  aes(x = co2_concentration_ppm, y = avg_temp_c)
) +
  geom_point(color = "#14B8A6", size = 2.5, alpha = 0.8) +
  geom_smooth(method = "lm", se = TRUE, color = "#1F2937", linewidth = 1) +
  labs(
    title = "Relationship between CO2 and temperature",
    subtitle = "A weak but positive relationship is observed",
    x = "CO2 concentration (ppm)",
    y = "Average temperature (C)"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold")
  )

p_scatter
```


Row {data-height=30}
-------------------------------------

### Design rationale
The design of this dashboard follows established data visualization principles to ensure clarity, interpretability, and effective communication of insights. A structured layout was used, beginning with high-level KPIs to provide a quick overview, followed by detailed visualizations that support deeper analysis. This top-down approach allows users to first understand the overall dataset before exploring specific trends and relationships.

Time-series line charts were selected to represent temperature and CO2 trends, as they effectively highlight patterns and changes over time. Seasonal analysis is presented using aggregated monthly views to emphasize recurring patterns, while correlation heatmaps and scatterplots are used to explore relationships between variables. Each chart type was chosen based on its suitability for the underlying data and the analytical objective.

The dashboard design also emphasizes simplicity and readability. Following Tufte’s principle of minimizing non-essential elements, unnecessary visual clutter such as excessive gridlines and decorations was removed. Additionally, Wexler’s dashboard design guidelines were applied by maintaining consistent color usage, clear labeling, and logical grouping of related visuals. Together, these choices ensure that the dashboard communicates insights effectively while remaining accessible and easy to interpret.

Seasonality
=====================================
### Seasonal interpretation
A monthly view helps reveal recurring climate structure that annual averages alone can hide. Because the dataset is organized by month, imputing missing values within each month is more defensible than using one grand mean, since it preserves seasonal behavior more effectively.

Row {data-height=180}
-------------------------------------

### Average temperature by month
```{r}

p_season_temp <- ggplot(
  seasonal_summary,
  aes(
    x = factor(month_label, levels = month.abb),
    y = avg_temp,
    group = 1,
    text = paste0(month_label, "<br>Avg Temp: ", round(avg_temp, 2), " °C")
  )
) +
  geom_line(linewidth = 1) +
  geom_point(size = 2.5) +
  labs(
    title = "Seasonal profile of average temperature",
    x = NULL,
    y = "Temperature (°C)"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 11),
    axis.text.x = element_text(angle = 45, hjust = 1, size = 10),
    axis.text.y = element_text(size = 10),
    axis.title = element_text(size = 10)
  )

ggplotly(p_season_temp, tooltip = "text")
```

### Monthly precipitation and humidity
```{r}
seasonal_long <- seasonal_summary %>%
  select(month_label, avg_precip, avg_humidity) %>%
  pivot_longer(-month_label, names_to = "metric", values_to = "value")

p_season_other <- ggplot(
  seasonal_long,
  aes(
    x = factor(month_label, levels = month.abb),
    y = value,
    fill = metric,
    text = paste0(
      "Month: ", month_label,
      "<br>Metric: ", metric,
      "<br>Value: ", round(value, 2)
    )
  )
) +
  geom_col(position = "dodge") +
  labs(
    title = "Average monthly precipitation and humidity",
    x = NULL,
    y = "Average value",
    fill = NULL
  ) +
  theme_minimal(base_size = 14) +
theme(
  plot.title = element_text(face = "bold", size = 11),
  axis.text.x = element_text(angle = 45, hjust = 1, size = 10),
  axis.text.y = element_text(size = 10),
  axis.title = element_text(size = 10),
  legend.position = "side"
)

ggplotly(p_season_other, tooltip = "text")
```

### Temperature variability by month
```{r}
p_var <- ggplot(climate_imp, aes(x = factor(month), y = avg_temp_c)) +
  geom_boxplot(fill = "#3B82F6", alpha = 0.7) +
  labs(
    title = "Temperature variability by month",
    subtitle = "Shows spread and outliers across months",
    x = "Month",
    y = "Temperature (C)"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold")
  )

p_var
```

Row {data-height=70}
-------------------------------------


### Key seasonal insight
The analysis reveals clear seasonal patterns in temperature, with higher values observed toward the end of the year and lower values in mid-year months. While average temperature trends follow a consistent seasonal cycle, the variability plot indicates that certain months experience greater fluctuations, suggesting less stability during those periods.

Additionally, precipitation and humidity patterns show noticeable variation across months, indicating that seasonal environmental conditions are influenced by multiple interacting factors. Together, these patterns highlight the importance of considering both average trends and variability when analyzing climate behavior.


Missing Data + Relationships
=====================================

### Missing-value handling
The dataset contains both true missing values and placeholder entries that act like missing data. The chart below shows the amount of missingness before imputation. After month-based replacement and fallback to overall means where necessary, the dataset became suitable for comparative analysis.

Column
-------------------------------------

### Correlation heatmap
```{r}
p_cor <- ggplot(
  cor_df,
  aes(
    x = var1,
    y = var2,
    fill = correlation,
    text = paste0(
      var1, " vs ", var2,
      "<br>Correlation: ", round(correlation, 2)
    )
  )
) +
  geom_tile() +
  scale_fill_gradient2(low = "#B2182B", mid = "white", high = "#2166AC", midpoint = 0) +
  labs(
    title = "Correlation of climate indicators",
    x = NULL,
    y = NULL,
    fill = "r"
  ) +
  theme_minimal(base_size = 10) +
  theme(
    plot.title = element_text(face = "bold", size = 9),
    axis.text.x = element_text(angle = 45, hjust = 1, size = 8),
    axis.text.y = element_text(size = 9),
    axis.title = element_text(size = 9)
  )

ggplotly(p_cor, tooltip = "text")
```

### Missing values before imputation
```{r}
p_missing <- ggplot(
  missing_summary,
  aes(
    x = reorder(variable, missing_before),
    y = missing_before,
    text = paste0(
      variable,
      "<br>Missing before: ", missing_before,
      "<br>Percent: ", percent_missing, "%"
    )
  )
) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Missing values by before cleaning",
    x = NULL,
    y = "Count of missing values"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 9),
    axis.text.x = element_text(angle = 45, hjust = 1, size = 8),
    axis.text.y = element_text(size = 9),
    axis.title = element_text(size = 9)
  )

ggplotly(p_missing, tooltip = "text")
```

### Top correlations with temperature
```{r}
ggplot(strong_temp, aes(x = reorder(variable, avg_temp_c), y = avg_temp_c)) +
  geom_col(fill = "#14B8A6") +
  coord_flip() +
  labs(
    title = "Top variables influencing temperature",
    x = NULL,
    y = "Correlation"
  ) +
  theme_minimal(base_size = 14)
```

Row {data-height=150}
-------------------------------------

### Key missing data insight
The dataset contains moderate levels of missing values across several variables, particularly temperature-related fields. After applying month-based imputation, the dataset becomes more consistent and suitable for comparative analysis while preserving seasonal structure.


Data Tables
=====================================

### Notes for interpretation
Because the most recent year may contain only partial-year observations, annual averages should be interpreted with caution. The dashboard is therefore most useful for identifying broad temporal and seasonal patterns rather than making strong causal claims.

Row
-------------------------------------

### Missing-value summary table
```{r}
datatable(
  missing_summary,
  rownames = FALSE,
  options = list(pageLength = 10, scrollX = TRUE),
  colnames = c("Variable", "Missing Before", "Missing After", "% Missing Before")
)
```

### Cleaned climate dataset preview
```{r}
datatable(
  climate_table,
  rownames = FALSE,
  options = list(pageLength = 8, scrollX = TRUE)
)
```

Conclusion
=====================================

Column {data-height=180}
-------------------------------------

<div class="conclusion-card">

### 🔍 Key conclusion

- Climate patterns are shaped by **seasonality** and **multiple interacting variables**
- Temperature follows predictable seasonal trends
- Variability and weak correlations suggest a **complex system**, not a single dominant driver

</div>

<div class="conclusion-card">

### ⚙️ Analytical approach

- The problem was analyzed through:
  - **temporal trends**
  - **seasonal behavior**
  - **inter-variable relationships**
- Missing values were handled using **month-based mean imputation**
- This approach preserves seasonal structure but may smooth natural variability

</div>

<div class="conclusion-card">

### ⚠️ Limitations and future work

- The dataset is limited in **size** and **time coverage**
- Observed relationships may be weaker because of limited data
- Future work could include:
  - larger datasets
  - more climate variables
  - predictive modeling

</div>

Row {data-height=150}
-------------------------------------

<div class="reference-card">

### References
- Tufte, E. R. (2001). *The visual display of quantitative information* (2nd ed.). Graphics Press.  
- Wexler, S., Shaffer, J., & Cotgreave, A. (2017). *The big book of dashboards: Visualizing your data using real-world business scenarios*. Wiley.  
- Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). *R for data science* (2nd ed.). O’Reilly.
- Intergovernmental Panel on Climate Change (IPCC). (2021). Climate change 2021: The physical science basis. Cambridge University Press.
- Knaflic, C. N. (2015). Storytelling with data: A data visualization guide for business professionals. Wiley.
- Little, R. J. A., & Rubin, D. B. (2019). Statistical analysis with missing data (3rd ed.). Wiley.

</div>

Row {data-height=90}
-------------------------------------

<div style="text-align: center; margin-top: 20px;">

### 🌍 Climate Data Analysis Complete

This dashboard summarizes key climate patterns and insights derived from the dataset.

**Thank you**

</div>