Exploratory Data Analysis of Global Temperature Data

This document outlines the steps taken in the Exploratory Data Analysis (EDA) of a global temperature dataset using R. The analysis involves data loading, cleaning, visualization, and statistical examination to uncover insights about global temperature trends.

Dataset Information:

The data set is reterived from Kaggle

Introduction to the Climate Change Dataset

This dataset, sourced from the Berkeley Earth Surface Temperature Study and affiliated with the Lawrence Berkeley National Laboratory, offers an in-depth look into global temperature trends. It compiles an extensive array of data, with 1.6 billion temperature reports drawn from 16 different archives. The collection underscores the evolution of temperature measurement, from early mercury thermometers to modern electronic devices, and reflects significant efforts in data cleaning and preparation. The dataset is crucial for understanding both historical and contemporary climate patterns, accommodating analyses that range from broad global trends to specific regional details.

The primary file, Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv), traces global average temperatures dating back to 1750 for land and 1850 for combined land and ocean readings, including average, maximum, and minimum temperatures along with their 95% confidence intervals. Complementing this are files detailing temperature trends by country, state, major city, and city, each offering unique insights into localized climate behaviors. This dataset not only serves as a vital tool for historical climate trend analysis but also as a means for engaging in the ongoing, critical discourse on climate change.

In this project, I will simplified the project so that I am only using the General Global Temperature data from the “GlobalTemperatures.csv” from the downloaded dataset and running EDA on this data.

The following are my steps:

Initial Setup

Loading and Installing Packages

Required R packages for the analysis include dplyr, tidyr, ggplot2, rcompanion, readxl, lubridate, corrplot, and patchwork.

library(dplyr)
library(tidyr)
library(ggplot2)
library(rcompanion)
library(readxl)
library(lubridate)
library(corrplot)
library(patchwork)

Environment Setting

Setting the working directory and checking it with getwd().

# Setup your working directory if necessary
setwd("your/working/directory")
getwd() #

Step 1: Importing Data

Loading the global temperature dataset from a CSV file.

file_name <- "./data/GlobalTemperatures.csv"
data_init <- read.csv(file_name, header = TRUE)

Step 2: Overview of Data Structure

Creating a function explore_data to examine the dataset’s structure, and then applying it to data_init.

explore_data <- function(data_name, n = 5){
  cat(paste("THE FIRST", n, "ROWS OF THE DATASET ARE\n"))
  print(head(data_name, n))
  
  cat(paste("\n\nTHE LAST", n, "ROWS OF THE DATASET ARE\n"))
  print(tail(data_name, n))
  
  cat("\n\nTHE COLUMNS' NAME OF THE DATA ARE\n")
  print(colnames(data_name))
  
  cat("\n\nTHE STRUCTURE OF THE DATASET ARE\n")
  print(str(data_name))
}
explore_data(data_init)

## THE FIRST 5 ROWS OF THE DATASET ARE
##           dt LandAverageTemperature LandAverageTemperatureUncertainty
## 1 1750-01-01                  3.034                             3.574
## 2 1750-02-01                  3.083                             3.702
## 3 1750-03-01                  5.626                             3.076
## 4 1750-04-01                  8.490                             2.451
## 5 1750-05-01                 11.573                             2.072
##   LandMaxTemperature LandMaxTemperatureUncertainty LandMinTemperature
## 1                 NA                            NA                 NA
## 2                 NA                            NA                 NA
## 3                 NA                            NA                 NA
## 4                 NA                            NA                 NA
## 5                 NA                            NA                 NA
##   LandMinTemperatureUncertainty LandAndOceanAverageTemperature
## 1                            NA                             NA
## 2                            NA                             NA
## 3                            NA                             NA
## 4                            NA                             NA
## 5                            NA                             NA
##   LandAndOceanAverageTemperatureUncertainty
## 1                                        NA
## 2                                        NA
## 3                                        NA
## 4                                        NA
## 5                                        NA
## 
## 
## THE LAST 5 ROWS OF THE DATASET ARE
##              dt LandAverageTemperature LandAverageTemperatureUncertainty
## 3188 2015-08-01                 14.755                             0.072
## 3189 2015-09-01                 12.999                             0.079
## 3190 2015-10-01                 10.801                             0.102
## 3191 2015-11-01                  7.433                             0.119
## 3192 2015-12-01                  5.518                             0.100
##      LandMaxTemperature LandMaxTemperatureUncertainty LandMinTemperature
## 3188             20.699                         0.110              9.005
## 3189             18.845                         0.088              7.199
## 3190             16.450                         0.059              5.232
## 3191             12.892                         0.093              2.157
## 3192             10.725                         0.154              0.287
##      LandMinTemperatureUncertainty LandAndOceanAverageTemperature
## 3188                         0.170                         17.589
## 3189                         0.229                         17.049
## 3190                         0.115                         16.290
## 3191                         0.106                         15.252
## 3192                         0.099                         14.774
##      LandAndOceanAverageTemperatureUncertainty
## 3188                                     0.057
## 3189                                     0.058
## 3190                                     0.062
## 3191                                     0.063
## 3192                                     0.062
## 
## 
## THE COLUMNS' NAME OF THE DATA ARE
## [1] "dt"                                       
## [2] "LandAverageTemperature"                   
## [3] "LandAverageTemperatureUncertainty"        
## [4] "LandMaxTemperature"                       
## [5] "LandMaxTemperatureUncertainty"            
## [6] "LandMinTemperature"                       
## [7] "LandMinTemperatureUncertainty"            
## [8] "LandAndOceanAverageTemperature"           
## [9] "LandAndOceanAverageTemperatureUncertainty"
## 
## 
## THE STRUCTURE OF THE DATASET ARE
## 'data.frame':    3192 obs. of  9 variables:
##  $ dt                                       : chr  "1750-01-01" "1750-02-01" "1750-03-01" "1750-04-01" ...
##  $ LandAverageTemperature                   : num  3.03 3.08 5.63 8.49 11.57 ...
##  $ LandAverageTemperatureUncertainty        : num  3.57 3.7 3.08 2.45 2.07 ...
##  $ LandMaxTemperature                       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ LandMaxTemperatureUncertainty            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ LandMinTemperature                       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ LandMinTemperatureUncertainty            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ LandAndOceanAverageTemperature           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ LandAndOceanAverageTemperatureUncertainty: num  NA NA NA NA NA NA NA NA NA NA ...
## NULL

sample_n(data_init, 5) # Sample 5 random rows

##           dt LandAverageTemperature LandAverageTemperatureUncertainty
## 1 1907-11-01                  5.207                             0.207
## 2 1907-12-01                  2.956                             0.251
## 3 1887-10-01                  8.738                             0.263
## 4 1961-01-01                  2.926                             0.089
## 5 1872-07-01                 14.285                             0.478
##   LandMaxTemperature LandMaxTemperatureUncertainty LandMinTemperature
## 1             10.710                         0.310             -0.557
## 2              8.831                         0.270             -2.585
## 3             14.716                         0.690              2.531
## 4              8.380                         0.176             -2.383
## 5             19.705                         1.105              7.884
##   LandMinTemperatureUncertainty LandAndOceanAverageTemperature
## 1                         0.325                         13.872
## 2                         0.288                         13.225
## 3                         0.413                         14.983
## 4                         0.129                         13.664
## 5                         1.045                         16.843
##   LandAndOceanAverageTemperatureUncertainty
## 1                                     0.115
## 2                                     0.121
## 3                                     0.126
## 4                                     0.064
## 5                                     0.205

summary(data_init)

##       dt            LandAverageTemperature LandAverageTemperatureUncertainty
##  Length:3192        Min.   :-2.080         Min.   :0.0340                   
##  Class :character   1st Qu.: 4.312         1st Qu.:0.1867                   
##  Mode  :character   Median : 8.611         Median :0.3920                   
##                     Mean   : 8.375         Mean   :0.9385                   
##                     3rd Qu.:12.548         3rd Qu.:1.4192                   
##                     Max.   :19.021         Max.   :7.8800                   
##                     NA's   :12             NA's   :12                       
##  LandMaxTemperature LandMaxTemperatureUncertainty LandMinTemperature
##  Min.   : 5.90      Min.   :0.0440                Min.   :-5.407    
##  1st Qu.:10.21      1st Qu.:0.1420                1st Qu.:-1.335    
##  Median :14.76      Median :0.2520                Median : 2.950    
##  Mean   :14.35      Mean   :0.4798                Mean   : 2.744    
##  3rd Qu.:18.45      3rd Qu.:0.5390                3rd Qu.: 6.779    
##  Max.   :21.32      Max.   :4.3730                Max.   : 9.715    
##  NA's   :1200       NA's   :1200                  NA's   :1200      
##  LandMinTemperatureUncertainty LandAndOceanAverageTemperature
##  Min.   :0.0450                Min.   :12.47                 
##  1st Qu.:0.1550                1st Qu.:14.05                 
##  Median :0.2790                Median :15.25                 
##  Mean   :0.4318                Mean   :15.21                 
##  3rd Qu.:0.4582                3rd Qu.:16.40                 
##  Max.   :3.4980                Max.   :17.61                 
##  NA's   :1200                  NA's   :1200                  
##  LandAndOceanAverageTemperatureUncertainty
##  Min.   :0.0420                           
##  1st Qu.:0.0630                           
##  Median :0.1220                           
##  Mean   :0.1285                           
##  3rd Qu.:0.1510                           
##  Max.   :0.4570                           
##  NA's   :1200

Step 3: Data Cleaning and Preprocessing

Step3.1: Correct any misspelled or bad written column names

## Step3.1: Correct any misspelled or bad written column names ----
global_temp_data <- data_init %>%
  rename("date" = "dt",
         "LandAvgTemp"= "LandAverageTemperature",
         "LandAvgTempUncer" = "LandAverageTemperatureUncertainty",
         "LandMaxTemp" = "LandMaxTemperature",
         "LandMaxTempUncer" = "LandMaxTemperatureUncertainty",
         "LandMinTemp" = "LandMinTemperature",
         "LandMinTempUncer" = "LandMinTemperatureUncertainty",
         "LandOceanAvgTemp" = "LandAndOceanAverageTemperature",
         "LandOceanAvgTempUncer" = "LandAndOceanAverageTemperatureUncertainty") # LHS = new, RHS = old

Step3.2: Correct data structure of the columns

global_temp_data$date <- as.Date(global_temp_data$date, format = "%Y-%m-%d")

Step3.3: Check missing values

Checking how many and percentage of missing values of the whole dataset

find_na_and_clean_column <- function(column, delete_na = FALSE) {
  
  # Extract the parent data frame from the column
  dataset_name <- deparse(substitute(column))
  dataset <- eval(parse(text = gsub("\\$.*", "", dataset_name)), envir = parent.frame())
  
  # Extract column name from the column
  column_name <- gsub(".*\\$", "", dataset_name)
  
  # Check if column exists in the dataset
  if (!column_name %in% names(dataset)) {
    stop("Column not found in the dataset.")
  }
  
  # Calculate total missing values in the specified column
  missing_values <- sum(is.na(column))
  total_values <- nrow(dataset)
  missing_percentage <- (missing_values / total_values) * 100
  
  # Print summary of missing data in the specified column
  print(paste("Total number of NA in", column_name, ":", missing_values))
  print(paste("Percentage of NA values in", column_name, ":", missing_percentage, "%"))
  
  # Find indices of missing data in the specified column
  missing_indices <- which(is.na(column))
  print(paste("Indices of missing values in", column_name, ":"))
  print(missing_indices)
  
  # Clean the specified column if delete_na is TRUE
  if (delete_na) {
    cleaned_dataset <- dataset[!is.na(column), ]
    cat(paste("Data cleaned in column", column_name, ". Removed rows with missing values.\n"))
    return(cleaned_dataset)
  } else {
    return(dataset)
  }
}
global_temp_data <- find_na_and_clean_column(global_temp_data$LandAvgTemp, delete_na = TRUE)

## [1] "Total number of NA in LandAvgTemp : 12"
## [1] "Percentage of NA values in LandAvgTemp : 0.37593984962406 %"
## [1] "Indices of missing values in LandAvgTemp :"
##  [1] 11 17 19 22 23 24 26 29 30 31 32 33
## Data cleaned in column LandAvgTemp . Removed rows with missing values.

Step3.4: Redefine/ Add categorical variables

In this case, we can define season (categorize months into seasons)

global_temp_data <- global_temp_data %>%
  mutate(
    Year = year(as.Date(date)), # Extracting year from the date
    Month = month(as.Date(date), label = TRUE, abbr = FALSE), # Extracting month names
    Season = case_when(
      Month %in% c("December", "January", "February") ~ "Winter",
      Month %in% c("March", "April", "May") ~ "Spring",
      Month %in% c("June", "July", "August") ~ "Summer",
      Month %in% c("September", "October", "November") ~ "Autumn",
      TRUE ~ NA_character_ # For any other cases, which should not exist
    ),
    Century = ceiling(Year / 100), # Calculating the century
    OceanAvgTemp = LandOceanAvgTemp - LandAvgTemp, # Calculating Ocean Average Temperature
    OceanAvgTempUncer = LandOceanAvgTempUncer - LandAvgTempUncer # Calculating Ocean Average Temperature Uncertainty
  ) %>%
  select(date, Century, Year, Month, Season, LandAvgTemp, OceanAvgTemp, everything()) # Reordering Columns

global_temp_data$Century <- as.factor(global_temp_data$Century)

Step3.5 Save cleaned data (optional)

Save to csv for backup when data is cleaned and ready for EDA.

file_name = "./data/cleaned_global_temperature_data.csv"
write.csv(global_temp_data, file_name, row.names = FALSE)

Step 4: Exploratory Data Analysis (EDA)

Step4.1: Descriptive Statistics

Select the columns you want to compare

global_temp_data[c(6,8,10,12)] %>%
  summarise(across(everything(), list(
    Mean = ~mean(., na.rm = TRUE),
    Median = ~median(., na.rm = TRUE),
    Min = ~min(., na.rm = TRUE),
    Max = ~max(., na.rm = TRUE),
    StdDev = ~sd(., na.rm = TRUE)
  )))

##   LandAvgTemp_Mean LandAvgTemp_Median LandAvgTemp_Min LandAvgTemp_Max
## 1         8.374731             8.6105           -2.08          19.021
##   LandAvgTemp_StdDev LandAvgTempUncer_Mean LandAvgTempUncer_Median
## 1            4.38131             0.9384679                   0.392
##   LandAvgTempUncer_Min LandAvgTempUncer_Max LandAvgTempUncer_StdDev
## 1                0.034                 7.88                 1.09644
##   LandMaxTempUncer_Mean LandMaxTempUncer_Median LandMaxTempUncer_Min
## 1             0.4797816                   0.252                0.044
##   LandMaxTempUncer_Max LandMaxTempUncer_StdDev LandMinTempUncer_Mean
## 1                4.373                0.583203             0.4318489
##   LandMinTempUncer_Median LandMinTempUncer_Min LandMinTempUncer_Max
## 1                   0.279                0.045                3.498
##   LandMinTempUncer_StdDev
## 1               0.4458378

Step4.2: Data Visualization

Boxplots

draw_boxplots <- function(data, variable_name) {
  # Ensure that the variable name is one of the columns in the dataset
  if (!variable_name %in% names(data)) {
    stop("Variable not found in the dataset.")
  }
  
  # Define categories for which boxplots will be made
  categories <- c("Century", "Month", "Season")
  
  # Create boxplots for each category
  for (category in categories) {
    # Create a formula for the boxplot function
    formula <- as.formula(paste(variable_name, "~", category))
    
    # Create the boxplot
    boxplot(formula, data = data,
            main = paste("Boxplot of", variable_name, "by", category),
            xlab = category,
            ylab = variable_name,
            col = rainbow(length(unique(data[[category]]))))
  }
}

# Adjust the categories as needed
draw_boxplots(global_temp_data, "LandAvgTemp")

draw_boxplots(global_temp_data, "OceanAvgTemp")

draw_boxplots(global_temp_data, "LandMaxTemp")

Histrograms

split_data <- split(global_temp_data$LandAvgTemp, global_temp_data$Century)
for (century in unique(global_temp_data$Century)) {
  hist(split_data[[century]],
       main = paste("Histogram of Land Average Temperature for Century", century),
       xlab = "Land Average Temperature",
       ylab = "Frequency",
       col = "lightblue",  # Histogram bar color
       border = "black",   # Border color of bars
       breaks = 20)        # Number of bins (adjust as needed)
} # Change variables if needed (i.e. century to month)

Scatter plots

Time series plots

temperature_plot <- function(data, variable) {
  ggplot(data, aes(x = date)) +
    geom_line(aes(y = .data[[variable]], color = Season)) +
    geom_smooth(aes(y = .data[[variable]], color = Season), method = "auto") +
    theme_minimal() +
    labs(
      title = paste(variable, "Over Time with Uncertainty"),
      x = "Year",
      y = variable
    ) +
    scale_color_manual(
      values = c("Winter" = "blue", "Spring" = "green", "Summer" = "red", "Autumn" = "orange")
    ) +
    guides(color = guide_legend(title = "Season"))
}
temperature_plot(global_temp_data, "LandAvgTemp")

temperature_plot(global_temp_data, "OceanAvgTemp")

Alternate representation

alt_temp_plot <- function(data, variable) {
  plot1 <- ggplot(data, aes(x = date)) +
    geom_line(aes(y = .data[[variable]], color = Season)) +
    theme_minimal() +
    labs(
      title = paste("Temperature Over Time -", variable),
      x = "Year",
      y = "Land Average Temperature"
    ) +
    scale_color_manual(
      values = c("Winter" = "blue", "Spring" = "green", "Summer" = "red", "Autumn" = "orange")
    ) +
    guides(color = guide_legend(title = "Season")) +
    theme(plot.title = element_text(hjust = 0.5))
  
  plot2 <- ggplot(data, aes(x = date)) +
    geom_line(aes(y = .data[[paste0(variable, "Uncer")]], color = Season)) +
    theme_minimal() +
    labs(
      title = paste("Temperature Uncertainty Over Time  -", variable),
      x = "Year",
      y = "Land Average Temperature"
    ) +
    scale_color_manual(
      values = c("Winter" = "blue", "Spring" = "green", "Summer" = "red", "Autumn" = "orange")
    ) +
    guides(color = guide_legend(title = "Season")) +
    theme(plot.title = element_text(hjust = 0.5))
  
  combined_plot <- plot1 / plot2
  return(combined_plot)
}
alt_temp_plot(global_temp_data, "LandAvgTemp")

alt_temp_plot(global_temp_data, "OceanAvgTemp")

Comparison between Land and Ocean Annual Avg Temperature

# Remove rows where 'LandAvgTemp' or 'OceanAvgTemp' is NA
global_temp_data <- global_temp_data %>%
  filter(!is.na(LandAvgTemp) & !is.na(OceanAvgTemp))

# Group by Year and summarize to get the average temperature of each
avg_temp_data <- global_temp_data %>%
  group_by(Year) %>%
  summarise(
    LandAvgTemp = mean(LandAvgTemp, na.rm = TRUE),
    LandAvgTempUncer = mean(LandAvgTempUncer, na.rm = TRUE),
    OceanAvgTemp = mean(OceanAvgTemp, na.rm = TRUE),
    OceanAvgTempUncer = mean(OceanAvgTempUncer, na.rm = TRUE)
  ) %>%
  ungroup()

# Create the plot
ggplot(avg_temp_data, aes(x = Year)) +
  geom_ribbon(aes(ymin = LandAvgTemp - LandAvgTempUncer, ymax = LandAvgTemp + LandAvgTempUncer), fill = "orange", alpha = 0.2) +
  geom_ribbon(aes(ymin = OceanAvgTemp - OceanAvgTempUncer, ymax = OceanAvgTemp + OceanAvgTempUncer), fill = "blue", alpha = 0.2) +
  geom_line(aes(y = LandAvgTemp, color = "Land Average Temperature"), size = 1) +
  geom_line(aes(y = OceanAvgTemp, color = "Ocean Average Temperature"), size = 1) +
  geom_smooth(aes(y = LandAvgTemp, color = "Land Average Temperature"), method = "lm", se = FALSE, size = 1) +
  geom_smooth(aes(y = OceanAvgTemp, color = "Ocean Average Temperature"), method = "lm", se = FALSE, size = 1) +
  scale_color_manual(values = c("Land Average Temperature" = "orange", "Ocean Average Temperature" = "blue")) +
  labs(title = "Comparison between Ocean and Land Average Temperature (1850 - 2015)",
       x = "Year",
       y = "Temperature (°C)") +
  theme_minimal() +
  theme(legend.title = element_blank())

Step 5: Outlier Detection

Using statistical methods like Z-score and IQR to detect and handle outliers.

Step5.1: Z-score

global_temp_data <- global_temp_data %>%
  mutate(LandAvgTemp_ZScore = (LandAvgTemp - mean(LandAvgTemp, na.rm = TRUE)) / sd(LandAvgTemp, na.rm = TRUE))

outliers_zscore <- filter(global_temp_data, abs(LandAvgTemp_ZScore) > 2) # Identify outliers (using 2 as the threshold)
print(outliers_zscore)

##  [1] date                  Century               Year                 
##  [4] Month                 Season                LandAvgTemp          
##  [7] OceanAvgTemp          LandAvgTempUncer      LandMaxTemp          
## [10] LandMaxTempUncer      LandMinTemp           LandMinTempUncer     
## [13] LandOceanAvgTemp      LandOceanAvgTempUncer OceanAvgTempUncer    
## [16] LandAvgTemp_ZScore   
## <0 rows> (or 0-length row.names)

Step5.2: IQR Method

Calculate IQR

Q1 <- quantile(global_temp_data$LandAvgTemp, 0.25, na.rm = TRUE)
Q3 <- quantile(global_temp_data$LandAvgTemp, 0.75, na.rm = TRUE)
IQR <- Q3 - Q1
outliers_iqr <- filter(global_temp_data, LandAvgTemp < (Q1 - 1.5 * IQR) | LandAvgTemp > (Q3 + 1.5 * IQR))
print(outliers_iqr)

##  [1] date                  Century               Year                 
##  [4] Month                 Season                LandAvgTemp          
##  [7] OceanAvgTemp          LandAvgTempUncer      LandMaxTemp          
## [10] LandMaxTempUncer      LandMinTemp           LandMinTempUncer     
## [13] LandOceanAvgTemp      LandOceanAvgTempUncer OceanAvgTempUncer    
## [16] LandAvgTemp_ZScore   
## <0 rows> (or 0-length row.names)

Conclusion

The EDA conducted on the land average temperature data reveals several key insights into the temporal dynamics of climate patterns:

Seasonal pattern temperature fluctuations

The time series plots with uncertainty (Figure 1) suggest a clear seasonal pattern in temperature fluctuations, with peaks and troughs corresponding to summer and winter seasons, respectively. This seasonal variation appears consistent over the years. Notably, there is a visible trend of increasing temperatures over time across all seasons, with the most pronounced rise observed in recent decades, indicating a potential long-term warming trend.

Increasing temperatures

The trend of increasing temperatures is further corroborated by the boxplot of average land temperature by century (Figure 2, top). Each successive century shows a median temperature higher than the previous, with the 21st century displaying the highest median temperature. Additionally, the spread of temperatures (interquartile range) in the 21st century is narrower than in the 19th century, suggesting less variability and a more consistent higher temperature regime.

Seasonal temperature distribution

Seasonal distributions of temperatures are explored through the boxplot of average land temperature by month (Figure 2, bottom). This plot highlights the highest median temperatures occurring in the traditional summer months (June, July, August) and the lowest in winter months (December, January, February). The presence of outliers, particularly in the transitional months, suggests occasional temperature extremes outside the expected norms.

Overall

From these observations, we can conclude that there is a clear and consistent pattern of seasonal temperature variation, as well as a long-term trend of rising average land temperatures. The increasing median temperatures across centuries, especially the marked rise into the 21st century, align with broader concerns about global warming and climate change. These insights can inform further research into the causes of these trends and aid in the development of climate models and policy decisions.

Further statistical analysis for deeper insgiths

1. Trend Analysis:

Perform a more detailed statistical analysis to quantify the warming trend. This could involve fitting a linear regression model to the time series data to estimate the rate of temperature increase. Decompose the time series into trend, seasonal, and residual components to better understand the underlying patterns.

2. Predictive Modeling:

Develop predictive models to forecast future temperature changes. Techniques could include ARIMA models, seasonal-trend decomposition using LOESS (STL), and machine learning methods like random forests or neural networks.

3. Correlation with External Factors:

Analyze the correlation between land temperatures and various potential drivers such as greenhouse gas concentrations, solar cycles, ocean temperatures, and land use changes. Use Granger causality tests to investigate whether changes in these factors precede changes in temperature, suggesting a potential causal relationship.

4. Climate Model Comparisons:

Compare the observed temperature data with projections from climate models. This can help validate the models and improve our understanding of their predictive capabilities.

5. Multivariate Time Series Analysis:

Include additional climate-related variables such as precipitation, humidity, and atmospheric pressure to perform a multivariate time series analysis. This can provide a more holistic understanding of climate dynamics.

Exploratory Data Analysis (EDA) of Global Temperature Data

T.K. Lam

2024-01-16