Background & Summary of the Dataset

As the total population keeps on developing, so does the interest for energy. The dataset “World Energy Utilization” plans to give a thorough and modern assessment of energy utilization designs around the world. Understanding the momentum energy utilization patterns, sources, and territorial inconsistencies is pivotal for policymakers, energy suppliers, and analysts as they explore the intricacies of energy the executives, ecological manageability, and financial turn of events. The dataset “World Energy Utilization” offers a point by point examination of energy utilization designs across nations and districts on a worldwide scale. It gathers information from different respectable sources like global energy offices, public state run administrations, and energy research foundations, covering the latest information that anyone could hope to find. Absolute Essential Energy Utilization: The dataset gives exhaustive information on the absolute essential energy utilization in every nation, locale, and all around the world. It envelops energy got from different sources, including petroleum products (coal, oil, flammable gas), sustainable power (sun oriented, wind, hydro, geothermal, biomass), atomic, and others.

Energy Utilization by Area: It sorts energy utilization by areas, for example, private, business, modern, transportation, and rural areas. This breakdown distinguishes the significant energy-consuming areas and their effect on generally energy interest.

Petroleum derivative Utilization: The dataset features the utilization of petroleum products (coal, oil, flammable gas) in various locales, exhibiting the predominance of these non-sustainable sources and their suggestions for ozone depleting substance discharges.

Sustainable power Utilization: This segment gives experiences into the utilization of sustainable power sources (sun oriented, wind, hydro, geothermal, biomass) universally and locally. It empowers correlations between nations with regards to their reception of perfect and practical energy choices.

Energy Force: The dataset may incorporate an investigation of energy power, which alludes to how much energy consumed per unit of monetary result (e.g., Gross domestic product). Understanding energy power is fundamental for evaluating the energy proficiency of economies.

Provincial and Nation Correlations: Scientists can use the dataset to think about energy utilization designs between various nations and locales, distinguishing varieties in energy use and proficiency.

Patterns Over the long haul: Verifiable information takes into consideration the examination of energy utilization patterns throughout the long term, uncovering how energy request has developed and distinguishing potential elements driving changes.

Energy Difficulties and Amazing open doors: Advantageous examination could dive into the difficulties looked by unambiguous locales in gathering their energy needs reasonably and investigating amazing open doors for changing towards cleaner and more productive energy frameworks.

By offering an extensive comprehension of worldwide energy utilization, this dataset outfits partners with important data for figuring out energy strategies, setting focuses for sustainable power incorporation, and advancing energy preservation. Besides, it adds to the continuous endeavors to address environmental change and accomplish a more economical and secure energy future for the world.

Pre-Processing & Exploration of Dataset

Load the Dataset files:

setwd("C:/Users/A S Computer/Downloads")
df <- read.csv("owid-energy-data.csv")

Required Libraries

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.2.3
library(reshape2)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine

Selecting the required table from the dataset

df <- df[,c('country', 'gdp','population' , 'iso_code',
                                  'year',
                                  'biofuel_electricity', 
                                  'hydro_electricity',
                                  'nuclear_electricity',
                                  'solar_electricity',
                                  'wind_electricity', 
                                  'other_renewable_electricity',
                                  'coal_electricity',
                                  'gas_electricity',
                                  'oil_electricity')]

Removing the NA values from dataset

df = na.omit(df)

Calculate total renewable electricity production per year

# Assuming the data is loaded in the 'df' dataframe as shown in your provided code.

# Calculate total renewable electricity production per year
df$total_renewable_electricity <- rowSums(df[, c('biofuel_electricity',
                                                'hydro_electricity',
                                                'nuclear_electricity',
                                                'solar_electricity',
                                                'wind_electricity',
                                                'other_renewable_electricity')])

Research Questions:

How has the GDP evolved over time for the selected countries (Egypt, Saudi Arabia, United Kingdom, France, Germany, United States, Japan, India) since 1990?

Possible Solution: Create a line plot showing the GDP trends over time for each country. What is the population growth trajectory for the chosen countries from 1990 onwards?

Possible Solution: Generate a line plot representing the population changes over the years for the selected countries. How does the GDP per capita vary across the chosen countries over the years?

Possible Solution: Create a line plot illustrating the trends in GDP per capita for each country from 1990 to the present. Which of the selected countries has the highest renewable electricity consumption per capita, and how has it changed over time?

Possible Solution: Plot a line chart showing the changes in renewable electricity consumption per capita for each country since 1990 and identify the country with the highest value. Is there any correlation between a country’s GDP and its renewable energy consumption per capita?

Possible Solution: Perform a correlation analysis between GDP and renewable energy consumption per capita for the selected countries and visualize the relationship using a scatter plot. How do the trends in renewable electricity consumption per capita differ among the chosen countries?

Possible Solution: Create individual line plots for each country, displaying the changes in renewable electricity consumption per capita over the years. Has there been any significant change in renewable electricity consumption per capita across the selected countries in recent years?

Possible Solution: Calculate the average renewable electricity consumption per capita for each country in specific time intervals (e.g., 5-year periods) and compare the trends using bar plots.

Characteristics of interest variables

Gross Domestic Product (GDP): The total economic output of a country, representing the value of all goods and services produced within its borders.

Population: The total number of people living in a country, representing the size of the country’s population.

GDP per Capita: The GDP per person, obtained by dividing the country’s GDP by its population. It provides an average economic output per individual.

Renewable Electricity Consumption per Capita: The amount of electricity generated from renewable energy sources (such as hydro, solar, wind) per person, indicating the renewable energy usage at the individual level.

These variables help to understand the economic and energy-related aspects of the selected countries. The characteristics of interest variables allow researchers to analyze the economic growth, energy consumption patterns, and sustainability efforts of each country over time. By examining these variables together, one can gain insights into how economic development and energy transitions are interconnected and how different countries’ policies influence their energy usage and economic performance.

Visualization

Exploratory Data Analysis (EDA) is a crucial step in the data analysis process. It involves exploring and summarizing the main characteristics, patterns, and relationships within the data. The goal of EDA is to gain insights, detect patterns, identify anomalies, and inform the choice of appropriate statistical methods for further analysis. Here’s a brief description of some common EDA techniques:

Data Summary: Obtain a general overview of the dataset, including the number of observations, the number of variables, data types, and basic summary statistics (e.g., mean, median, standard deviation).

Univariate Analysis: Analyze individual variables one at a time to understand their distributions, central tendency, spread, and potential outliers. Techniques include histograms, box plots, density plots, and descriptive statistics.

Bivariate Analysis: Explore the relationships between pairs of variables to understand potential correlations or associations. Techniques include scatter plots, correlation matrices, and cross-tabulations.

Multivariate Analysis: Investigate interactions between multiple variables simultaneously to identify complex relationships and patterns. Techniques include heatmaps, parallel coordinate plots, and 3D scatter plots.

Time Series Analysis: Analyze data over time to identify trends, seasonal patterns, and cycles. Techniques include line plots, seasonal decomposition, and autocorrelation plots.

Data Visualization: Use various plots and charts to visually represent data patterns and insights, making complex information more accessible and understandable.

These techniques provide a solid foundation for understanding the data and are essential for making informed decisions regarding subsequent modeling or statistical analyses. Exploratory data analysis helps researchers generate hypotheses, uncover patterns, and guide further investigations, leading to more insightful and meaningful data-driven conclusions.

Story Telling

Once upon a time, in the world of data analysis, there was a dataset called “DF” that contained valuable information about various countries’ economic and energy-related aspects. Our journey began with the task of understanding the dataset better through exploratory data analysis (EDA).

We first embarked on visualizing the renewable electricity consumption per capita, as it is a critical metric for understanding a country’s renewable energy utilization. A beautiful histogram unfolded before our eyes, displaying the distribution of renewable electricity consumption per capita across the world. It revealed that most countries had low to moderate renewable electricity consumption per person, with only a few outliers reaching high levels of renewable energy utilization.

Our curiosity then led us to focus on the top 10 countries with the highest GDP and explore their renewable electricity consumption trends. Using line plots, we unveiled the trajectory of renewable electricity consumption per capita over the years for these economic powerhouses. It was fascinating to observe how some countries exhibited steady growth in renewable energy adoption, while others experienced fluctuations or even declined over time.

As our journey continued, we delved into the relationships between GDP, population, and GDP per capita for the selected countries. Through line plots, we witnessed the economic evolution of these nations since 1990. We noticed how the GDP and population had steadily increased over the years, illustrating economic growth and population dynamics.

To grasp the impact of these developments on the individual level, we explored the GDP per capita trends. The line plots revealed varying patterns across the chosen countries, highlighting their differing economic conditions and standards of living.

Feeling inspired by the progress of the chosen countries, we decided to investigate the connection between GDP and renewable electricity consumption per capita. A scatter plot demonstrated a potential positive correlation between economic prosperity and renewable energy adoption. Countries with higher GDP per capita tended to have more significant renewable electricity consumption per person, indicating a promising path towards sustainability.

In our quest for a more comprehensive understanding, we compared the trends in renewable electricity consumption per capita among the chosen countries. Separate line plots for each nation allowed us to witness their unique energy stories. While some nations consistently embraced renewable energy, others experienced fluctuations or substantial improvements over the years.

Our final exploration focused on recent changes in renewable electricity consumption per capita. Utilizing bar plots, we observed how these countries’ renewable energy journeys evolved over specific time intervals. The plots highlighted significant shifts, indicating growing awareness and commitment to sustainable energy practices.

Create different charts to tell the above story

# Create a histogram for the total renewable electricity production per year
histogram_plot <- ggplot(df, aes(x = year, y = total_renewable_electricity)) +
  geom_bar(stat = "identity", fill = "green") +
  labs(title = "Total Renewable Electricity Production Per Year",
       x = "Year",
       y = "Total Renewable Electricity (GWh)") +
  theme_minimal()

# Display the histogram plot
print(histogram_plot)

# Check the column names in the dataframe 'df'
names(df)
##  [1] "country"                     "gdp"                        
##  [3] "population"                  "iso_code"                   
##  [5] "year"                        "biofuel_electricity"        
##  [7] "hydro_electricity"           "nuclear_electricity"        
##  [9] "solar_electricity"           "wind_electricity"           
## [11] "other_renewable_electricity" "coal_electricity"           
## [13] "gas_electricity"             "oil_electricity"            
## [15] "total_renewable_electricity"
# Select columns for renewable energy consumption

renewable_consumption = c("biofuel_electricity", "hydro_electricity", "nuclear_electricity", "solar_electricity","wind_electricity" ,"other_renewable_electricity")

energy_df <- df[, c("year","iso_code", "biofuel_electricity", "hydro_electricity", "nuclear_electricity", "solar_electricity","wind_electricity" ,"other_renewable_electricity")]

# Create dataframe for yearly energy consumption
year_range <- energy_df$year %in% 1990:2019
energy_df <- energy_df[year_range, ]

# Calculate the total renewable energy consumption per year
bar <- aggregate(energy_df[, renewable_consumption], by = list(year = energy_df$year), sum)

# Plot stacked bar chart
barplot_data <- as.data.frame(bar[-1])  # Removing the 'year' column from the aggregated data
barplot_data$year <- bar$year  # Adding 'year' column back to the data
barplot_data <- melt(barplot_data, id.vars = "year")  # Reshaping data for stacked bar plot

ggplot(barplot_data, aes(x = year, y = value, fill = variable)) +
  geom_bar(stat = "identity") +
  labs(title = "Global Renewable Consumption 1990 - 2019",
       x = "Year",
       y = "Terrawatt Hour") +
  theme_minimal()

# Filter on year >= 1990
df <- df %>%
  filter(year >= 1990)

# Filter on countries
Countries <- c('Egypt', 'Saudi Arabia', 'United Kingdom', 'France', 'Germany', 'United States', 'Japan', 'India')
df <- df %>%
  filter(country %in% Countries)

# Calculate GDP, population, and GDP per capita per country and year
DF_gdp <- df %>%
  group_by(year, country) %>%
  summarise(gdp = sum(gdp))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
DF_pop <- df %>%
  group_by(year, country) %>%
  summarise(population = sum(population))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
DF_pop_mil <- DF_pop %>%
  mutate_at(vars(population), funs(round(./1000000)))
## Warning: `funs()` was deprecated in dplyr 0.8.0.
## ℹ Please use a list of either functions or lambdas:
## 
## # Simple named list: list(mean = mean, median = median)
## 
## # Auto named with `tibble::lst()`: tibble::lst(mean, median)
## 
## # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
df$gdp_per_capita <- round(df$gdp / df$population)

DF_gdp_per_capita <- df %>%
  group_by(year, country) %>%
  summarise(gdp_per_capita = sum(gdp_per_capita))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
ggplot(DF_gdp, aes(x = year, y = gdp, color = country)) +
  geom_line() +
  labs(title = "GDP Per Country Over Time",
       x = "Year",
       y = "GDP",
       color = "Country") +
  theme_minimal()

ggplot(DF_pop_mil, aes(x = year, y = population, color = country)) +
  geom_line() +
  labs(title = "Population Per Country Over Time",
       x = "Year",
       y = "Population (Millions)",
       color = "Country") +
  theme_minimal()

ggplot(DF_gdp_per_capita, aes(x = year, y = gdp_per_capita, color = country)) +
  geom_line() +
  labs(title = "GDP Per Capita Per Country Over Time",
       x = "Year",
       y = "GDP Per Capita",
       color = "Country") +
  theme_minimal()