Introduction:
The concept of intelligence quotient (IQ) has long been a subject of
interest and study in the field of psychology and education. While IQ
tests are not without controversy, they remain a widely used tool for
measuring cognitive abilities and predicting academic and professional
success. The average IQ level of a nation is often used as an indicator
of the intellectual capacity and potential of its population.
This notebook aims to investigate the potential relationships between
national IQ levels and various socio-economic factors, specifically
focusing on temperature, average income, and expenditure on education.
These factors have been suggested to influence cognitive development and
intellectual performance in populations around the world.
Climate and temperature have been hypothesized to impact cognitive
functioning, with some studies suggesting that extreme temperatures may
affect cognitive abilities. Additionally, economic factors such as
average income levels are known to play a crucial role in determining
access to resources and opportunities for education, which in turn can
influence cognitive development and intelligence levels.
Furthermore, investment in education is widely recognized as a key
factor in enhancing human capital and fostering intellectual growth. By
examining the relationship between national IQ levels and expenditure on
education, this notebook seeks to shed light on the importance of
educational resources and infrastructure in shaping the cognitive
abilities of a population.
Through a comprehensive analysis of existing data and statistical
methods, this notebook aims to provide valuable insights into the
complex interplay between national IQ levels, temperature, average
income, and education expenditure. By understanding these relationships,
policymakers and educators may be better equipped to implement targeted
interventions and policies aimed at promoting cognitive development and
enhancing intellectual capabilities on a national scale.

About Dataset
The IQ_level.csv file contains a comprehensive dataset that explores
the relationship between average IQ levels and various socioeconomic
factors across different countries.
content
This content about this dataset from Data Card in Kaggle
Rank : The rank of the country based on
IQ level.
Country:The name of the country.
IQ: The average IQ score for the
population.
Education expenditure: The amount of
money spent on education in the country. In US dollar
Avg Income: The average income in the
country. In US dollar
Avg Temp: The average temperature in
the country.
Loading libraries
Firstly I will start by loading some packages that I will use during
the analysis
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.0 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.1 ✔ tibble 3.1.8
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(psych)
##
## Attaching package: 'psych'
##
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(ggsci)
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
Getting the data
iq<-read.csv('IQ_level.csv')
Exploration of the data
The structure of the data
str(iq)
## 'data.frame': 108 obs. of 6 variables:
## $ rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ country : chr "Hong Kong " "Japan" "Singapore" "Taiwan " ...
## $ IQ : int 106 106 106 106 104 103 101 101 100 100 ...
## $ education_expenditure: int 1283 1340 1428 NA 183 1024 2386 2725 2052 NA ...
## $ avg_income : int 35304 40964 41100 NA 4654 22805 45337 42706 40207 NA ...
## $ avg_temp : num 26.2 19.2 31.5 26.9 19.1 18.2 14.4 8.2 7.4 15.3 ...
We will eliminate the rank variables and have education expenditure,
average income, and average temperature as numerical factors in addition
to the IQ and country variables.
iq<-select(iq,-rank)
The first fifteen rows of the data
head(iq,15)
## country IQ education_expenditure avg_income avg_temp
## 1 Hong Kong 106 1283 35304 26.2
## 2 Japan 106 1340 40964 19.2
## 3 Singapore 106 1428 41100 31.5
## 4 Taiwan 106 NA NA 26.9
## 5 China 104 183 4654 19.1
## 6 South Korea 103 1024 22805 18.2
## 7 Netherlands 101 2386 45337 14.4
## 8 Finland 101 2725 42706 8.2
## 9 Canada 100 2052 40207 7.4
## 10 North Korea 100 NA NA 15.3
## 11 Luxembourg 100 3665 71296 14.7
## 12 Macao 100 1448 44072 26.1
## 13 Germany 100 1883 39911 13.8
## 14 Switzerland 100 3550 70399 15.2
## 15 Estonia 100 749 13770 10.1
Exploratory data analysis
let’s perform some exploratory data analysis (EDA) on the dataset.
We’ll cover the following aspects:
- Summary Statistics: General statistics for numerical
columns.
- Missing Values: Identify and quantify missing values in the
dataset.
- Distribution Analysis: Distribution of key numerical
columns (IQ, education expenditure, average income, average
temperature).
- Correlation Analysis: Correlation between different
numerical variables.
Let’s start with the missing values .
Checking for NAs
colSums(is.na(iq))
## country IQ education_expenditure
## 0 0 5
## avg_income avg_temp
## 2 0
remove rows that contain any NA values
iq<-na.omit(iq)
Basic descriptive statistics for the data
describe(iq)
## vars n mean sd median trimmed mad min
## country* 1 103 52.00 29.88 52.0 52.00 38.55 1.0
## IQ 2 103 86.12 12.62 88.0 87.24 13.34 51.0
## education_expenditure 3 103 903.06 1166.63 336.0 680.82 461.09 1.0
## avg_income 4 103 17525.05 21067.80 7586.0 14112.47 9854.84 316.0
## avg_temp 5 103 23.79 8.47 25.6 24.38 9.64 0.4
## max range skew kurtosis se
## country* 103.0 102.0 0.00 -1.24 2.94
## IQ 106.0 55.0 -0.70 0.02 1.24
## education_expenditure 5436.0 5435.0 1.64 2.26 114.95
## avg_income 108349.0 108033.0 1.54 2.45 2075.87
## avg_temp 36.5 36.1 -0.51 -0.86 0.83
A skewness of -0.7 for IQ indicates that the distribution of IQ is
moderately skewed to the left (negatively skewed). This means the left
tail (lower values) is longer or fatter than the right tail (higher
values), suggesting that there are more extreme low values in the
dataset than high values. A kurtosis of 0.02 is very close to 0, which
suggests that the data distribution has a shape very similar to a normal
distribution in terms of its tails. This means there is an average or
typical number of outliers (neither particularly heavy-tailed nor
light-tailed compared to a normal distribution).
while skewness and kurtosis in education expenditure and avareage
income indicate that it is moderately positively skewed, meaning it has
a longer tail on the right side. The kurtosis value suggests that the
distribution has a sharper peak and fatter tails than a normal
distribution, indicating more frequent extreme values.
The avareage temperature is slightly negatively skewed, meaning it
has a somewhat longer tail on the left side. The kurtosis value suggests
that the distribution has thinner tails and a flatter peak than a normal
distribution, indicating fewer extreme values or outliers.
Distributions of the variables
Let’s start with the IQ.
ggplot(iq, aes(x =IQ)) +
geom_density(fill="#3B9C9C",col="white")+
theme_solarized()+
scale_fill_brewer(palette="Set2")+
ggtitle("Distribution of IQ")+
theme(plot.title = element_text(hjust = 0.5))+
xlab("IQ")
The distribution is relatively normal, centered around the mean of
approximately 86.
Education Expenditure
ggplot(iq, aes(x =education_expenditure)) +
geom_density(fill="#E6AB02",col="white")+
theme_solarized()+
scale_fill_brewer(palette="Set2")+
ggtitle("Distribution of Education Expenditure")+
theme(plot.title = element_text(hjust = 0.5))+
xlab("Education Expenditure")

Highly right-skewed, indicating a few countries spend significantly
more on education per capita compared to others.
Average Income
ggplot(iq, aes(x =avg_income)) +
geom_density(fill="#728FCE",col="white")+
theme_solarized()+
scale_fill_brewer(palette="Set2")+
ggtitle("Distribution of Average Income")+
theme(plot.title = element_text(hjust = 0.5))+
xlab("Average Income")

Also right-skewed, showing that a few countries have much higher
average incomes.
Average Temperature
ggplot(iq, aes(x =avg_temp)) +
geom_density(fill="#5E5A80",col="white")+
theme_solarized()+
scale_fill_brewer(palette="Set2")+
ggtitle("Distribution of Average Temperature")+
theme(plot.title = element_text(hjust = 0.5))+
xlab("Average Temperature")

Appears to be roughly normally distributed but with a few extreme
values.
the distributions of all variables that, as suggested by the
statistical descriptive
Correlation Analysis
enhanced scatter plot matrix
##plot the correlation matrix
pairs.panels(iq[,-1])

These correlations suggest that the IQ variable has a consistent
positive relationship with education expenditure and average income and
a consistent negative relationship with the average temperature
variable.
Mapping plots
Understanding the geographical distribution of the variables is
crucial. To visualize this distribution, we employed a mapping plot that
highlights regional variations and patterns.
load the ggplot map data for the world and left join it to the IQ
data after changing the country variable to region to use it in our
analysis.
mapdata<-map_data("world")
colnames(iq)[1]<-"region"
maodata<-left_join(mapdata,iq,by="region")
Geographic Distribution of IQ Levels
##IQ
ggplot(maodata, aes(x = long, y = lat, group = group)) +
geom_polygon( aes(fill=IQ))+
xlab("")+
ylab("")+
ggtitle("Countries by IQ - Average IQ by Country")+
theme_solarized()+
theme(legend.position = c(0.1,0.4))+
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())+
scale_fill_gradient(low="#A3E355",high="#4893E5",na.value = "white")+
geom_curve(
aes(
x =144,
xend =140,
y =36,
yend =44),
curvature=0.6,
arrow = arrow(length=unit(.35, 'cm')),
col="#808080",size=0.8)+
geom_curve(
aes(
x =long[which.min(IQ)],
xend =long[which.min(IQ)],
y =lat[which.min(IQ)]-23,
yend =lat[which.min(IQ)]),
curvature=0.6,
arrow = arrow(length=unit(.35, 'cm')),
col="#808080",size=0.8)+
geom_text(
aes(x = 139,
y = 27),
label = "The nation of Japan\n has one of the highest IQs.",
size = 2.9, colour = "#df9100")+
geom_text(
aes(x = long[which.min(IQ)]-10,
y = lat[which.min(IQ)]-27),
label = "Nepal is among the lowest-IQ countries, Although it's almost\n the only country in its region with an IQ that low.",
size = 2, colour = "#df9100")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.

High IQ Concentrations: Countries like Hong Kong, Japan, Singapore,
and Taiwan show the highest average IQs (106). These countries are
primarily in East Asia.
Education Expenditure
##education expenditure
ggplot(maodata, aes(x = long, y = lat, group = group)) +
geom_polygon( aes(fill=education_expenditure))+
xlab("")+
ylab("")+
ggtitle("Countries by income - Average income by Country")+
theme_solarized()+
theme(legend.position = c(0.1,0.4))+
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())+
scale_fill_gradient(low="#A3E355",high="#4893E5",na.value = "white")+
geom_curve(
aes(
x =long[which.max(education_expenditure)]-17,
xend =long[which.max(education_expenditure)],
y =lat[which.max(education_expenditure)]-12,
yend =lat[which.max(education_expenditure)]),
curvature=-0.6,
arrow = arrow(length=unit(.35, 'cm')),
col="#808080",size=0.8)+
geom_curve(
aes(
x =long[which.min(education_expenditure)],
xend =long[which.min(education_expenditure)],
y =lat[which.min(education_expenditure)]-18,
yend =lat[which.min(education_expenditure)]),
curvature=0.6,
arrow = arrow(length=unit(.35, 'cm')),
col="#808080",size=0.8)+
geom_text(
aes(x = long[which.max(education_expenditure)]-22,
y = lat[which.max(education_expenditure)]-17),
label = "The nation of Canada\n has one of the highest IQs.",
size = 2.9, colour = "#df9100")+
geom_text(
aes(x = long[which.min(education_expenditure)],
y = lat[which.min(education_expenditure)]-20),
label = "Niger is among the lowest-IQ countries, Although it's almost\n the only country in its region with an IQ that low.",
size = 1.9, colour = "#df9100")

Japan and Singapore have high education expenditures, aligning with
their high IQ rankings. On the other hand, China has a high average IQ
and low education expenditure, which might indicate other underlying
factors supporting high cognitive performance.
Average Income
##avg income
ggplot(maodata, aes(x = long, y = lat, group = group)) +
geom_polygon( aes(fill=avg_income))+
xlab("")+
ylab("")+
ggtitle("Countries by income - Average income by Country")+
theme_solarized()+
theme(legend.position = c(0.1,0.4))+
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())+
scale_fill_gradient(low="#A3E355",high="#4893E5",na.value = "white")+
geom_curve(
aes(
x =long[which.max(avg_income)]-17,
xend =long[which.max(avg_income)],
y =lat[which.max(avg_income)]-12,
yend =lat[which.max(avg_income)]),
curvature=-0.6,
arrow = arrow(length=unit(.35, 'cm')),
col="#808080",size=0.8)+
geom_curve(
aes(
x =long[which.min(avg_income)],
xend =long[which.min(avg_income)],
y =lat[which.min(avg_income)]-18,
yend =lat[which.min(avg_income)]),
curvature=0.6,
arrow = arrow(length=unit(.35, 'cm')),
col="#808080",size=0.8)+
geom_text(
aes(x = long[which.max(avg_income)]-22,
y = lat[which.max(avg_income)]-17),
label = "The nation of Canada\n has one of the highest IQs.",
size = 2.9, colour = "#df9100")+
geom_text(
aes(x = long[which.min(avg_income)],
y = lat[which.min(avg_income)]-20),
label = "Niger is among the lowest-IQ countries, Although it's almost\n the only country in its region with an IQ that low.",
size = 1.9, colour = "#df9100")

Mapping average income shows that wealthier countries tend to have
higher IQ levels. Countries with a higher GDP per capita often have
better access to quality education and resources, potentially leading to
higher IQs.
##avg_temp
ggplot(maodata, aes(x = long, y = lat, group = group)) +
geom_polygon( aes(fill=avg_temp))+
xlab("")+
ylab("")+
ggtitle("Countries by temperature - Average temp by Country")+
theme_solarized()+
theme(legend.position = c(0.1,0.4))+
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())+
scale_fill_gradient(low="#A3E355",high="#4893E5",na.value = "white")+
geom_curve(
aes(
x =long[which.max(avg_temp)]-17,
xend =long[which.max(avg_temp)],
y =lat[which.max(avg_temp)]-32,
yend =lat[which.max(avg_temp)]),
curvature=-0.6,
arrow = arrow(length=unit(.35, 'cm')),
col="#808080",size=0.8)+
geom_curve(
aes(
x =long[which.min(avg_temp)],
xend =long[which.min(avg_temp)],
y =lat[which.min(avg_temp)]-10,
yend =lat[which.min(avg_temp)]),
curvature=0.6,
arrow = arrow(length=unit(.35, 'cm')),
col="#808080",size=0.8)+
geom_text(
aes(x = long[which.max(avg_temp)]-22,
y = lat[which.max(avg_temp)]-37),
label = "The nation of Canada\n has one of the highest IQs.",
size = 2.9, colour = "#df9100")+
geom_text(
aes(x = long[which.min(avg_temp)],
y = lat[which.min(avg_temp)]-12),
label = "Niger is among the lowest-IQ countries, Although it's almost\n the only country in its region with an IQ that low.",
size = 1.9, colour = "#df9100")

Countries with high IQs (106) such as Japan (19.2°C), Singapore
(31.5°C), and Hong Kong (26.2°C) show that high cognitive performance
can be achieved in both temperate and tropical climates. Also, the
lowest IQ levels all belong to the higher temperature region; it’s
essential to explore if there are any subtle impacts of temperature on
cognitive performance, possibly mitigated by socio-economic factors.
Countries like Hong Kong, Japan, Singapore, and Taiwan not only show
high IQ levels but also significant investments in education. This
region might serve as a model for understanding how educational policies
and economic development contribute to cognitive performance.
China presents an interesting case with a high IQ (104) but
significantly lower average income and education expenditure compared to
Hong Kong or Japan. This might indicate other factors at play, such as
cultural emphasis on education or innate cognitive skills.
Two-dimensional analysis
Let’s create the scatter plots to analyze the relationships between
IQ and other variables (education expenditure, average income, average
temperature). This will help us provide insights based on the actual
data.
ggplot(iq,aes(y=IQ,x=education_expenditure))+
geom_point(col="#488AC7")+
##geom_text_repel(aes(label = region), size = 1.5, colour = "gray30", max.overlaps = 50, fontface = "bold")+
theme_solarized()+
ggtitle("Relationship between IQ and education xpenditure")+
xlab("education expenditure")+
ylab("IQ")+
annotate("rect", xmin = 0.00, xmax = 1250, ymin = 50, ymax = 70, alpha = 0.2, colour = "#005d89", fill = "#005d89")+
annotate("text", x = 580, y = 68, label = "Low IQ, Low education expenditure", size = 3, colour = "#005d89", fontface = "bold")+
annotate("rect", xmin = 3200, xmax = 5436, ymin = 90, ymax = 106, alpha = 0.2, colour = "#df9100", fill = "#df9100")+
annotate("text", x = 4400, y = 105, label = "High IQ, High education expenditure", size = 3.3, colour = "#df9100", fontface = "bold")+
annotate("rect", xmin = 3200, xmax = 5436, ymin = 50, ymax = 70, alpha = 0.2, colour = "#B83C08", fill = "#B83C08")+
annotate("text", x = 4400, y = 64, label = "No nation has a high average\n spending on education\n in a region with a low IQ.", size = 3.3, colour = "#B83C08", fontface = "bold")+
theme(plot.title = element_text(hjust = 0.5))

The results showed a positive correlation between education
expenditure and average IQ, suggesting that education spending can have
a substantial effect on cognitive development and performance. Moreover,
countries with high IQ but low education expenditure may be able to
explain this by factors other than direct financial investment in
education, such as cultural emphasis on education or high-quality
education.
ggplot(iq,aes(y=IQ,x=avg_income))+
geom_point(col="#488AC7")+
theme_solarized()+
ggtitle("Relationship between IQ and Average income")+
ylab("IQ")+
xlab("Average income")+
annotate("rect", xmin = 0.00, xmax = 30000, ymin = 50, ymax = 70, alpha = 0.2, colour = "#005d89", fill = "#005d89")+
annotate("text", x = 13900, y = 69, label = "Low IQ, Low income", size = 4.3, colour = "#005d89", fontface = "bold")+
annotate("rect", xmin = 50000, xmax = 108349, ymin = 90, ymax = 106, alpha = 0.2, colour = "#df9100", fill = "#df9100")+
annotate("text", x = 77000, y = 105, label = "High IQ, High income", size = 4.3, colour = "#df9100", fontface = "bold")+
annotate("rect", xmin = 50000, xmax = 108349, ymin = 50, ymax = 70, alpha = 0.2, colour = "#B83C08", fill = "#B83C08")+
annotate("text", x = 77000, y = 66, label = "No nation has a high average income\n in a region with a low IQ.", size = 4.3, colour = "#B83C08", fontface = "bold")+
theme(plot.title = element_text(hjust = 0.5))

positive correlation between average income and IQ; maybe that is
because wealthier countries tend to have higher IQ levels. This might be
due to better access to educational resources, healthcare, and overall
living conditions. Also, countries with high IQ but relatively lower
average income might have other strong cultural or educational practices
contributing to high cognitive performance, suggesting that income alone
isn’t the sole determinant of IQ.
ggplot(iq,aes(y=IQ,x=avg_temp))+
geom_point(col="#488AC7")+
theme_solarized()+
ggtitle("Relationship between IQ and avg temp")+
ylab("IQ")+
xlab("avg temp")+
annotate("rect", xmin = 23, xmax = 36.5, ymin = 50, ymax = 70, alpha = 0.2, colour = "#005d89", fill = "#005d89")+
annotate("text", x = 30, y = 69, label = "Low IQ, High temp", size = 4.3, colour = "#005d89", fontface = "bold")+
annotate("rect", xmin = 0, xmax = 10, ymin = 90, ymax = 106, alpha = 0.2, colour = "#df9100", fill = "#df9100")+
annotate("text", x = 5, y = 105, label = "High IQ, Low temp", size = 4.3, colour = "#df9100", fontface = "bold")+
theme(plot.title = element_text(hjust = 0.5))+
annotate("rect", xmin = 0, xmax = 15, ymin = 50, ymax = 70, alpha = 0.2, colour = "#B83C08", fill = "#B83C08")+
annotate("text", x = 7, y = 64, label = "No country with low IQ\n in a low-temperature area", size = 4.3, colour = "#B83C08", fontface = "bold")+
theme(plot.title = element_text(hjust = 0.5))

This scatter plot shows there’s a significant impact of average
temperature on IQ levels. Extreme temperatures could potentially
correlate with lower IQ levels due to environmental stressors. High IQ
levels in countries with extreme temperatures could indicate effective
adaptation strategies, such as advanced infrastructure and healthcare
systems, that mitigate the negative effects of climate.
Cluster Analysis
To perform cluster analysis on this dataset, we’ll use the k-means
clustering algorithm. This analysis will help identify groups of
countries with similar characteristics in terms of IQ, education
expenditure, average income, and average temperature.
Steps for Cluster Analysis:
- Data Preparation:
- Normalize the data to ensure that each feature contributes equally
to the clustering process.
- Cluster Analysis:
- Use the k-means algorithm to identify clusters.
- Determine the optimal number of clusters using the elbow
method.
- Visualize Clusters:
- Visualize the clusters to understand the grouping of countries.
row_names_data <- iq$region
iq_data <- iq[,-1]
rownames(iq_data) <- row_names_data
## Scale data
scale_data <- scale(iq_data)
scale_data<-as.data.frame(scale_data)
Determine the optimal number of clusters using the elbow method
## Determine the optimal number of clusters using the elbow method
fviz_nbclust(scale_data,kmeans,method = "wss")+
labs(subtitle = "Elbow method")

We will apply K-means clustering with 3 clusters and visualize the
cluster in a scatter plot.
## K-MEANS CLUSTERING
## CLUSTERING
fitK <- kmeans(scale_data, 3)
iq_cluster<-iq
iq_cluster$cluster<-as.factor(fitK$cluster)
ggplot(iq_cluster,aes(y=IQ,x=education_expenditure,col=cluster))+
geom_point()+
##geom_text_repel(aes(label = region), size = 1.5, colour = "gray30", max.overlaps = 50, fontface = "bold")+
theme_solarized()+
ggtitle("Relationship between IQ and education xpenditure")+
xlab("education expenditure")+
ylab("IQ")+
theme(plot.title = element_text(hjust = 0.5))

ggplot(iq_cluster,aes(y=IQ,x=avg_income,col=cluster))+
geom_point()+
##geom_text_repel(aes(label = region), size = 1.5, colour = "gray30", max.overlaps = 50, fontface = "bold")+
theme_solarized()+
ggtitle("Relationship between IQ and education xpenditure")+
xlab("education expenditure")+
ylab("IQ")+
theme(plot.title = element_text(hjust = 0.5))

ggplot(iq_cluster,aes(y=IQ,x=avg_temp,col=cluster))+
geom_point()+
##geom_text_repel(aes(label = region), size = 1.5, colour = "gray30", max.overlaps = 50, fontface = "bold")+
theme_solarized()+
ggtitle("Relationship between IQ and education xpenditure")+
xlab("education expenditure")+
ylab("IQ")+
theme(plot.title = element_text(hjust = 0.5))
The cluster showed us there is a cluster with countries that have high
education expenditure and high IQ levels, indicating effective
investment in education and also high average income with a low
temperature.
Another cluster shows countries with high IQ but lower education
expenditure, suggesting that other factors (like cultural or informal
education systems) contribute to cognitive performance.
and the third cluster with low income, low education expenditure, and
extreme temperature may refer to countries that can’t adapt to their
environment or are suffering from political instability.
cluster_map<-left_join(mapdata,iq_cluster,by="region")
ggplot(cluster_map, aes(x = long, y = lat, group = group)) +
geom_polygon( aes(fill=cluster))+
xlab("")+
ylab("")+
ggtitle("clustering countries")+
theme_solarized()+
theme(legend.position = c(0.1,0.4))+
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())+
scale_fill_brewer(palette="Set2",na.value="white")

Hierarchical Clustering
Hierarchical clustering can provide a dendrogram to visualize how
clusters are nested within each other.
Steps for Hierarchical Clustering:
- Compute Distance Matrix:
- Compute the distance matrix using Euclidean distance.
- Perform Hierarchical Clustering:
- Use the hclust function to perform hierarchical clustering.
- Visualize the Dendrogram:
- Plot the dendrogram to visualize the clustering structure.
Compute the distance matrix
## Compute the distance matrix
d <- dist(scale_data,method = "euclidean")
Perform hierarchical clustering using the ward.D2 method
## Perform hierarchical clustering using the ward.D2 method
fitH <- hclust(d, "ward.D2")
Visualize the Dendrogram
fviz_dend(x = fitH, cex = 0.8, lwd = 0.8, k = 3,
k_colors = c("jco"),
rect = TRUE,
rect_border = "jco",
rect_fill = TRUE,
ggtheme = theme_solarized())
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## ℹ The deprecated feature was likely used in the factoextra package.
## Please report the issue at <https://github.com/kassambara/factoextra/issues>.

# Phylogenic
fviz_dend(fitH, cex = 0.8, lwd = 0.8, k = 3,
rect = TRUE,
k_colors = "jco",
rect_border = "jco",
rect_fill = TRUE,
type = "phylogenic",
ggtheme = theme_solarized(),
repel = TRUE,
phylo_layout = "layout.gem")

##
fviz_dend(fitH, cex = 0.8, lwd = 0.8, k = 3,
rect = TRUE,
k_colors = "jco",
rect_border = "jco",
rect_fill = TRUE,
type = "phylogenic",
ggtheme = theme_solarized(),
repel = TRUE)

## circular
fviz_dend(fitH, cex = 0.8, lwd = 0.8, k = 3,
rect = TRUE,
k_colors = "jco",
rect_border = "jco",
rect_fill = TRUE,
type = "circular",
ggtheme = theme_solarized(),
repel = TRUE)

Final Conclusion
Dataset Overview:
The dataset includes information on average IQ, education
expenditure, average income, and average temperature for various
countries. The analysis aimed to understand the relationships between
these variables and how they influence or correlate with cognitive
performance (IQ).
Key Findings:
- IQ and Education Expenditure:
- Positive Correlation: Generally, countries with higher
education expenditure tend to have higher IQ levels, indicating that
investment in education positively impacts cognitive development.
- Clusters: After applying k-means clustering, we identified
groups of countries with similar profiles. For example, one cluster
might represent countries with high education spending and high IQ,
while another cluster might show moderate IQ with lower education
spending.
- IQ and Average Income:
- Economic Influence: A positive correlation between average
income and IQ suggests that wealthier countries typically have better
access to resources that enhance cognitive abilities.
- Disparities: Some countries achieve high IQ levels despite
lower average income, highlighting the potential influence of cultural
factors, efficient use of resources, or robust informal education
systems.
- IQ and Average Temperature:
- Environmental Impact: The relationship between average
temperature and IQ is less straightforward. While extreme temperatures
could negatively impact cognitive performance due to environmental
stressors, countries with high IQ in extreme climates might have
effective adaptation strategies.
- Clusters: Clustering revealed groups of countries that
manage to maintain high IQ levels across various temperature ranges,
emphasizing the importance of socio-economic resilience and
adaptation.
Insights from Clustering Analysis:
- Cluster Characteristics: The clusters identified through
k-means clustering reveal common characteristics among countries. For
instance, clusters with high education expenditure and income generally
correspond to high IQ levels.
- Strategic Grouping: Clustering helps in understanding how
different countries group together based on their socio-economic and
environmental profiles. This can guide tailored policy-making and
resource allocation.