Final Report

Quarto

library(ggplot2)
library(ggthemes)
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ lubridate 1.9.2     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)
library(pkgdown)
library(esquisse)
library(readxl)

Fossil_fuel_consumption_subsidies_2010_2022 <- read_excel("/Users/sherri/downloads/Fossil fuel consumption subsidies, 2010-2022.xlsx")

library(dplyr)

# Vector with the names of the countries of interest
countries_of_interest <- c("Cambodia", "China","Fiji", "Guam", "Hong Kong SAR, China", "Indonesia", "Kiribati", "Korea", "Laos", "Malaysia", "Marshall Islands","Mongolia", "Myanmar", "NEW CALEDONIA","Philippines", "Samoa", "Singapore", "Solomon Islands", "Thailand","Tonga", "Taipei","Tuvalu", "Vanuatu", "Vietnam")

# Filter out the APAC countries
filtered_data <- Fossil_fuel_consumption_subsidies_2010_2022 %>%
  filter(Country %in% countries_of_interest)

# Summarize the subsidies for comparison
#subsidies_summary <- filtered_data %>%
#  group_by(Country) %>%
#  summarise(
#    OilSubsidies = sum(Oil, na.rm = TRUE),
#    ElectricitySubsidies = sum(Electricity, na.rm = TRUE),
#    GasSubsidies = sum(Gas, na.rm = TRUE),
#    CoalSubsidies = sum(Coal, na.rm = TRUE),
#    TotalSubsidies = sum(Total, na.rm = TRUE))

#I'm currently stuck here. I need to add several data cleaning steps before I can analyze the data.

#Objective: To prepare a tidy version of the "Fossil fuel consumption subsidies, 2010-2022" dataset for further analysis in the context of the Asia-Pacific climate finance landscape.

#Step 1: Data Appraisal and Preliminary Cleaning

#Review the structure of the "Fossil fuel consumption subsidies, 2010-2022" dataset to understand the variables and time span.
#Remove any irrelevant data that does not correspond to the Asia-Pacific region or is outside the study period.
#Standardize the country names for consistency across datasets.

#Step 2: Data Transformation

#Pivot the dataset from a wide format (with separate columns for each year's data) to a long format, where each row represents a single observation for a country-year combination.

#Step 3: Data Integration

#Prepare to merge the subsidies data with other relevant datasets, such as GDP figures and carbon market data.
#Identify key joining variables, which will likely include 'Country' and 'Year'.

#Step 4: Data Type Standardization

#Convert all subsidy amounts to a consistent numeric format.
#Ensure that country names are in a text format.
#Format the 'Year' column to a date or integer format, as appropriate for temporal analysis.

#Step 5: Data Validation

#Perform spot checks on the merged data to ensure accuracy against the original datasets.
#Address any inconsistencies or missing data through imputation or removal, as dictated by the research methodology.

#Step 6: Analytical Variable Creation

#Calculate any new variables needed for analysis, such as the proportion of subsidies to GDP.
#Create indicators or flags that may be necessary for segmentation or stratification in the analysis (e.g., income level of countries, membership in regional agreements).

#Step 7: Documentation and Metadata

#Document each variable, its source, and any transformations or calculations performed.
#Create a comprehensive codebook that details the tidy dataset structure and variable definitions.

#Step 8: Finalization and Quality Assurance

#Review the tidy dataset to ensure it meets the requirements for the intended analysis.
#Save the dataset in a format conducive to the analysis tools that will be used (e.g., CSV, XLSX, RData).

#Step 9: Data Visualization and Exploratory Analysis

#Generate preliminary visualizations to understand the distribution and trends within the data.
#Conduct exploratory analysis to identify any potential outliers or patterns of interest.