Introduction:

The concept of intelligence quotient (IQ) has long been a subject of interest and study in the field of psychology and education. While IQ tests are not without controversy, they remain a widely used tool for measuring cognitive abilities and predicting academic and professional success. The average IQ level of a nation is often used as an indicator of the intellectual capacity and potential of its population.

This notebook aims to investigate the potential relationships between national IQ levels and various socio-economic factors, specifically focusing on temperature, average income, and expenditure on education. These factors have been suggested to influence cognitive development and intellectual performance in populations around the world.

Climate and temperature have been hypothesized to impact cognitive functioning, with some studies suggesting that extreme temperatures may affect cognitive abilities. Additionally, economic factors such as average income levels are known to play a crucial role in determining access to resources and opportunities for education, which in turn can influence cognitive development and intelligence levels.

Furthermore, investment in education is widely recognized as a key factor in enhancing human capital and fostering intellectual growth. By examining the relationship between national IQ levels and expenditure on education, this notebook seeks to shed light on the importance of educational resources and infrastructure in shaping the cognitive abilities of a population.

Through a comprehensive analysis of existing data and statistical methods, this notebook aims to provide valuable insights into the complex interplay between national IQ levels, temperature, average income, and education expenditure. By understanding these relationships, policymakers and educators may be better equipped to implement targeted interventions and policies aimed at promoting cognitive development and enhancing intellectual capabilities on a national scale.

About Dataset

The IQ_level.csv file contains a comprehensive dataset that explores the relationship between average IQ levels and various socioeconomic factors across different countries.

content

This content about this dataset from Data Card in Kaggle

Rank : The rank of the country based on IQ level.

Country:The name of the country.

IQ: The average IQ score for the population.

Education expenditure: The amount of money spent on education in the country. In US dollar

Avg Income: The average income in the country. In US dollar

Avg Temp: The average temperature in the country.

Loading libraries

Firstly I will start by loading some packages that I will use during the analysis

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.0     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.1     ✔ tibble    3.1.8
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(psych)
## 
## Attaching package: 'psych'
## 
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
library(ggsci)
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

Getting the data

iq<-read.csv('IQ_level.csv')

Exploration of the data

The structure of the data

str(iq)
## 'data.frame':    108 obs. of  6 variables:
##  $ rank                 : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ country              : chr  "Hong Kong " "Japan" "Singapore" "Taiwan " ...
##  $ IQ                   : int  106 106 106 106 104 103 101 101 100 100 ...
##  $ education_expenditure: int  1283 1340 1428 NA 183 1024 2386 2725 2052 NA ...
##  $ avg_income           : int  35304 40964 41100 NA 4654 22805 45337 42706 40207 NA ...
##  $ avg_temp             : num  26.2 19.2 31.5 26.9 19.1 18.2 14.4 8.2 7.4 15.3 ...

We will eliminate the rank variables and have education expenditure, average income, and average temperature as numerical factors in addition to the IQ and country variables.

iq<-select(iq,-rank)

The first fifteen rows of the data

head(iq,15)
##        country  IQ education_expenditure avg_income avg_temp
## 1   Hong Kong  106                  1283      35304     26.2
## 2        Japan 106                  1340      40964     19.2
## 3    Singapore 106                  1428      41100     31.5
## 4      Taiwan  106                    NA         NA     26.9
## 5        China 104                   183       4654     19.1
## 6  South Korea 103                  1024      22805     18.2
## 7  Netherlands 101                  2386      45337     14.4
## 8      Finland 101                  2725      42706      8.2
## 9       Canada 100                  2052      40207      7.4
## 10 North Korea 100                    NA         NA     15.3
## 11  Luxembourg 100                  3665      71296     14.7
## 12      Macao  100                  1448      44072     26.1
## 13     Germany 100                  1883      39911     13.8
## 14 Switzerland 100                  3550      70399     15.2
## 15     Estonia 100                   749      13770     10.1

Exploratory data analysis

let’s perform some exploratory data analysis (EDA) on the dataset. We’ll cover the following aspects:

  1. Summary Statistics: General statistics for numerical columns.
  2. Missing Values: Identify and quantify missing values in the dataset.
  3. Distribution Analysis: Distribution of key numerical columns (IQ, education expenditure, average income, average temperature).
  4. Correlation Analysis: Correlation between different numerical variables.

Let’s start with the missing values .

Checking for NAs

colSums(is.na(iq))
##               country                    IQ education_expenditure 
##                     0                     0                     5 
##            avg_income              avg_temp 
##                     2                     0

remove rows that contain any NA values

iq<-na.omit(iq)

Basic descriptive statistics for the data

describe(iq)
##                       vars   n     mean       sd median  trimmed     mad   min
## country*                 1 103    52.00    29.88   52.0    52.00   38.55   1.0
## IQ                       2 103    86.12    12.62   88.0    87.24   13.34  51.0
## education_expenditure    3 103   903.06  1166.63  336.0   680.82  461.09   1.0
## avg_income               4 103 17525.05 21067.80 7586.0 14112.47 9854.84 316.0
## avg_temp                 5 103    23.79     8.47   25.6    24.38    9.64   0.4
##                            max    range  skew kurtosis      se
## country*                 103.0    102.0  0.00    -1.24    2.94
## IQ                       106.0     55.0 -0.70     0.02    1.24
## education_expenditure   5436.0   5435.0  1.64     2.26  114.95
## avg_income            108349.0 108033.0  1.54     2.45 2075.87
## avg_temp                  36.5     36.1 -0.51    -0.86    0.83

A skewness of -0.7 for IQ indicates that the distribution of IQ is moderately skewed to the left (negatively skewed). This means the left tail (lower values) is longer or fatter than the right tail (higher values), suggesting that there are more extreme low values in the dataset than high values. A kurtosis of 0.02 is very close to 0, which suggests that the data distribution has a shape very similar to a normal distribution in terms of its tails. This means there is an average or typical number of outliers (neither particularly heavy-tailed nor light-tailed compared to a normal distribution).

while skewness and kurtosis in education expenditure and avareage income indicate that it is moderately positively skewed, meaning it has a longer tail on the right side. The kurtosis value suggests that the distribution has a sharper peak and fatter tails than a normal distribution, indicating more frequent extreme values.

The avareage temperature is slightly negatively skewed, meaning it has a somewhat longer tail on the left side. The kurtosis value suggests that the distribution has thinner tails and a flatter peak than a normal distribution, indicating fewer extreme values or outliers.

Distributions of the variables

Let’s start with the IQ.

ggplot(iq, aes(x =IQ)) +
  geom_density(fill="#3B9C9C",col="white")+
  theme_solarized()+
  scale_fill_brewer(palette="Set2")+
  ggtitle("Distribution of IQ")+
  theme(plot.title = element_text(hjust = 0.5))+
  xlab("IQ")

The distribution is relatively normal, centered around the mean of approximately 86.

Education Expenditure

ggplot(iq, aes(x =education_expenditure)) +
  geom_density(fill="#E6AB02",col="white")+
  theme_solarized()+
  scale_fill_brewer(palette="Set2")+
  ggtitle("Distribution of Education Expenditure")+
  theme(plot.title = element_text(hjust = 0.5))+
  xlab("Education Expenditure")

Highly right-skewed, indicating a few countries spend significantly more on education per capita compared to others.

Average Income

ggplot(iq, aes(x =avg_income)) +
  geom_density(fill="#728FCE",col="white")+
  theme_solarized()+
  scale_fill_brewer(palette="Set2")+
  ggtitle("Distribution of Average Income")+
  theme(plot.title = element_text(hjust = 0.5))+
  xlab("Average Income")

Also right-skewed, showing that a few countries have much higher average incomes.

Average Temperature

ggplot(iq, aes(x =avg_temp)) +
  geom_density(fill="#5E5A80",col="white")+
  theme_solarized()+
  scale_fill_brewer(palette="Set2")+
  ggtitle("Distribution of Average Temperature")+
  theme(plot.title = element_text(hjust = 0.5))+
  xlab("Average Temperature")

Appears to be roughly normally distributed but with a few extreme values.

the distributions of all variables that, as suggested by the statistical descriptive

Correlation Analysis

enhanced scatter plot matrix

##plot the correlation matrix
pairs.panels(iq[,-1])

These correlations suggest that the IQ variable has a consistent positive relationship with education expenditure and average income and a consistent negative relationship with the average temperature variable.

Mapping plots

Understanding the geographical distribution of the variables is crucial. To visualize this distribution, we employed a mapping plot that highlights regional variations and patterns.

load the ggplot map data for the world and left join it to the IQ data after changing the country variable to region to use it in our analysis.

mapdata<-map_data("world")

colnames(iq)[1]<-"region"

maodata<-left_join(mapdata,iq,by="region")

Geographic Distribution of IQ Levels

##IQ
ggplot(maodata, aes(x = long, y = lat, group = group)) +
  geom_polygon( aes(fill=IQ))+
  xlab("")+
  ylab("")+
  ggtitle("Countries by IQ - Average IQ by Country")+
  theme_solarized()+
  theme(legend.position = c(0.1,0.4))+
  theme(axis.text.x=element_blank(), 
        axis.ticks.x=element_blank(), 
        axis.text.y=element_blank(), 
        axis.ticks.y=element_blank())+
  scale_fill_gradient(low="#A3E355",high="#4893E5",na.value = "white")+
  geom_curve(
    aes(
      x =144,
      xend =140,
      y =36,
      yend =44),
    curvature=0.6,
    arrow = arrow(length=unit(.35, 'cm')),
    col="#808080",size=0.8)+
  geom_curve(
    aes(
      x =long[which.min(IQ)],
      xend =long[which.min(IQ)],
      y =lat[which.min(IQ)]-23,
      yend =lat[which.min(IQ)]),
    curvature=0.6,
    arrow = arrow(length=unit(.35, 'cm')),
    col="#808080",size=0.8)+
  geom_text(
    aes(x = 139,
        y = 27),
    label = "The nation of Japan\n has one of the highest IQs.",
    size = 2.9, colour = "#df9100")+
  geom_text(
    aes(x = long[which.min(IQ)]-10,
        y = lat[which.min(IQ)]-27),
    label = "Nepal is among the lowest-IQ countries, Although it's almost\n the only country in its region with an IQ that low.",
    size = 2, colour = "#df9100")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.

High IQ Concentrations: Countries like Hong Kong, Japan, Singapore, and Taiwan show the highest average IQs (106). These countries are primarily in East Asia.

Education Expenditure

##education expenditure
ggplot(maodata, aes(x = long, y = lat, group = group)) +
  geom_polygon( aes(fill=education_expenditure))+
  xlab("")+
  ylab("")+
  ggtitle("Countries by income - Average income by Country")+
  theme_solarized()+
  theme(legend.position = c(0.1,0.4))+
  theme(axis.text.x=element_blank(), 
        axis.ticks.x=element_blank(), 
        axis.text.y=element_blank(), 
        axis.ticks.y=element_blank())+
  scale_fill_gradient(low="#A3E355",high="#4893E5",na.value = "white")+
  geom_curve(
    aes(
      x =long[which.max(education_expenditure)]-17,
      xend =long[which.max(education_expenditure)],
      y =lat[which.max(education_expenditure)]-12,
      yend =lat[which.max(education_expenditure)]),
    curvature=-0.6,
    arrow = arrow(length=unit(.35, 'cm')),
    col="#808080",size=0.8)+
  geom_curve(
    aes(
      x =long[which.min(education_expenditure)],
      xend =long[which.min(education_expenditure)],
      y =lat[which.min(education_expenditure)]-18,
      yend =lat[which.min(education_expenditure)]),
    curvature=0.6,
    arrow = arrow(length=unit(.35, 'cm')),
    col="#808080",size=0.8)+
  geom_text(
    aes(x = long[which.max(education_expenditure)]-22,
        y = lat[which.max(education_expenditure)]-17),
    label = "The nation of Canada\n has one of the highest IQs.",
    size = 2.9, colour = "#df9100")+
  geom_text(
    aes(x = long[which.min(education_expenditure)],
        y = lat[which.min(education_expenditure)]-20),
    label = "Niger is among the lowest-IQ countries, Although it's almost\n the only country in its region with an IQ that low.",
    size = 1.9, colour = "#df9100")

Japan and Singapore have high education expenditures, aligning with their high IQ rankings. On the other hand, China has a high average IQ and low education expenditure, which might indicate other underlying factors supporting high cognitive performance.

Average Income

##avg income
ggplot(maodata, aes(x = long, y = lat, group = group)) +
  geom_polygon( aes(fill=avg_income))+
  xlab("")+
  ylab("")+
  ggtitle("Countries by income - Average income by Country")+
  theme_solarized()+
  theme(legend.position = c(0.1,0.4))+
  theme(axis.text.x=element_blank(), 
        axis.ticks.x=element_blank(), 
        axis.text.y=element_blank(), 
        axis.ticks.y=element_blank())+
  scale_fill_gradient(low="#A3E355",high="#4893E5",na.value = "white")+
  geom_curve(
    aes(
      x =long[which.max(avg_income)]-17,
      xend =long[which.max(avg_income)],
      y =lat[which.max(avg_income)]-12,
      yend =lat[which.max(avg_income)]),
    curvature=-0.6,
    arrow = arrow(length=unit(.35, 'cm')),
    col="#808080",size=0.8)+
  geom_curve(
    aes(
      x =long[which.min(avg_income)],
      xend =long[which.min(avg_income)],
      y =lat[which.min(avg_income)]-18,
      yend =lat[which.min(avg_income)]),
    curvature=0.6,
    arrow = arrow(length=unit(.35, 'cm')),
    col="#808080",size=0.8)+
  geom_text(
    aes(x = long[which.max(avg_income)]-22,
        y = lat[which.max(avg_income)]-17),
    label = "The nation of Canada\n has one of the highest IQs.",
    size = 2.9, colour = "#df9100")+
  geom_text(
    aes(x = long[which.min(avg_income)],
        y = lat[which.min(avg_income)]-20),
    label = "Niger is among the lowest-IQ countries, Although it's almost\n the only country in its region with an IQ that low.",
    size = 1.9, colour = "#df9100")

Mapping average income shows that wealthier countries tend to have higher IQ levels. Countries with a higher GDP per capita often have better access to quality education and resources, potentially leading to higher IQs.

##avg_temp
ggplot(maodata, aes(x = long, y = lat, group = group)) +
  geom_polygon( aes(fill=avg_temp))+
  xlab("")+
  ylab("")+
  ggtitle("Countries by temperature - Average temp by Country")+
  theme_solarized()+
  theme(legend.position = c(0.1,0.4))+
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank())+
  scale_fill_gradient(low="#A3E355",high="#4893E5",na.value = "white")+
  geom_curve(
    aes(
      x =long[which.max(avg_temp)]-17,
      xend =long[which.max(avg_temp)],
      y =lat[which.max(avg_temp)]-32,
      yend =lat[which.max(avg_temp)]),
    curvature=-0.6,
    arrow = arrow(length=unit(.35, 'cm')),
    col="#808080",size=0.8)+
  geom_curve(
    aes(
      x =long[which.min(avg_temp)],
      xend =long[which.min(avg_temp)],
      y =lat[which.min(avg_temp)]-10,
      yend =lat[which.min(avg_temp)]),
    curvature=0.6,
    arrow = arrow(length=unit(.35, 'cm')),
    col="#808080",size=0.8)+
  geom_text(
    aes(x = long[which.max(avg_temp)]-22,
        y = lat[which.max(avg_temp)]-37),
    label = "The nation of Canada\n has one of the highest IQs.",
    size = 2.9, colour = "#df9100")+
  geom_text(
    aes(x = long[which.min(avg_temp)],
        y = lat[which.min(avg_temp)]-12),
    label = "Niger is among the lowest-IQ countries, Although it's almost\n the only country in its region with an IQ that low.",
    size = 1.9, colour = "#df9100")

Countries with high IQs (106) such as Japan (19.2°C), Singapore (31.5°C), and Hong Kong (26.2°C) show that high cognitive performance can be achieved in both temperate and tropical climates. Also, the lowest IQ levels all belong to the higher temperature region; it’s essential to explore if there are any subtle impacts of temperature on cognitive performance, possibly mitigated by socio-economic factors.

Countries like Hong Kong, Japan, Singapore, and Taiwan not only show high IQ levels but also significant investments in education. This region might serve as a model for understanding how educational policies and economic development contribute to cognitive performance.

China presents an interesting case with a high IQ (104) but significantly lower average income and education expenditure compared to Hong Kong or Japan. This might indicate other factors at play, such as cultural emphasis on education or innate cognitive skills.

Two-dimensional analysis

Let’s create the scatter plots to analyze the relationships between IQ and other variables (education expenditure, average income, average temperature). This will help us provide insights based on the actual data.

ggplot(iq,aes(y=IQ,x=education_expenditure))+
  geom_point(col="#488AC7")+
  ##geom_text_repel(aes(label = region), size = 1.5, colour = "gray30",         max.overlaps = 50, fontface = "bold")+ 
  theme_solarized()+
  ggtitle("Relationship between IQ and education xpenditure")+
  xlab("education expenditure")+
  ylab("IQ")+
  annotate("rect", xmin = 0.00, xmax = 1250, ymin = 50, ymax = 70, alpha = 0.2, colour = "#005d89", fill = "#005d89")+
  annotate("text", x = 580, y = 68, label = "Low IQ, Low education expenditure", size = 3, colour = "#005d89", fontface = "bold")+
    annotate("rect", xmin = 3200, xmax = 5436, ymin = 90, ymax = 106, alpha = 0.2, colour = "#df9100", fill = "#df9100")+
  annotate("text", x = 4400, y = 105, label = "High IQ, High education expenditure", size = 3.3, colour = "#df9100", fontface = "bold")+
    annotate("rect", xmin = 3200, xmax = 5436, ymin = 50, ymax = 70, alpha = 0.2, colour = "#B83C08", fill = "#B83C08")+
  annotate("text", x = 4400, y = 64, label = "No nation has a high average\n  spending on education\n in a region with a low IQ.", size = 3.3, colour = "#B83C08", fontface = "bold")+
  theme(plot.title = element_text(hjust = 0.5))

The results showed a positive correlation between education expenditure and average IQ, suggesting that education spending can have a substantial effect on cognitive development and performance. Moreover, countries with high IQ but low education expenditure may be able to explain this by factors other than direct financial investment in education, such as cultural emphasis on education or high-quality education.

ggplot(iq,aes(y=IQ,x=avg_income))+
  geom_point(col="#488AC7")+
  theme_solarized()+
  ggtitle("Relationship between IQ and Average income")+
  ylab("IQ")+
  xlab("Average income")+
  annotate("rect", xmin = 0.00, xmax = 30000, ymin = 50, ymax = 70, alpha = 0.2, colour = "#005d89", fill = "#005d89")+
  annotate("text", x = 13900, y = 69, label = "Low IQ, Low income", size = 4.3, colour = "#005d89", fontface = "bold")+
    annotate("rect", xmin = 50000, xmax = 108349, ymin = 90, ymax = 106, alpha = 0.2, colour = "#df9100", fill = "#df9100")+
  annotate("text", x = 77000, y = 105, label = "High IQ, High income", size = 4.3, colour = "#df9100", fontface = "bold")+
    annotate("rect", xmin = 50000, xmax = 108349, ymin = 50, ymax = 70, alpha = 0.2, colour = "#B83C08", fill = "#B83C08")+
  annotate("text", x = 77000, y = 66, label = "No nation has a high average income\n in a region with a low IQ.", size = 4.3, colour = "#B83C08", fontface = "bold")+
  theme(plot.title = element_text(hjust = 0.5))

positive correlation between average income and IQ; maybe that is because wealthier countries tend to have higher IQ levels. This might be due to better access to educational resources, healthcare, and overall living conditions. Also, countries with high IQ but relatively lower average income might have other strong cultural or educational practices contributing to high cognitive performance, suggesting that income alone isn’t the sole determinant of IQ.

ggplot(iq,aes(y=IQ,x=avg_temp))+
  geom_point(col="#488AC7")+
  theme_solarized()+
  ggtitle("Relationship between IQ and avg temp")+
  ylab("IQ")+
  xlab("avg temp")+
  annotate("rect", xmin = 23, xmax = 36.5, ymin = 50, ymax = 70, alpha = 0.2, colour = "#005d89", fill = "#005d89")+
  annotate("text", x = 30, y = 69, label = "Low IQ, High temp", size = 4.3, colour = "#005d89", fontface = "bold")+
    annotate("rect", xmin = 0, xmax = 10, ymin = 90, ymax = 106, alpha = 0.2, colour = "#df9100", fill = "#df9100")+
  annotate("text", x = 5, y = 105, label = "High IQ, Low temp", size = 4.3, colour = "#df9100", fontface = "bold")+
  theme(plot.title = element_text(hjust = 0.5))+
    annotate("rect", xmin = 0, xmax = 15, ymin = 50, ymax = 70, alpha = 0.2, colour = "#B83C08", fill = "#B83C08")+
  annotate("text", x = 7, y = 64, label = "No country with low IQ\n in a low-temperature area", size = 4.3, colour = "#B83C08", fontface = "bold")+
  theme(plot.title = element_text(hjust = 0.5))

This scatter plot shows there’s a significant impact of average temperature on IQ levels. Extreme temperatures could potentially correlate with lower IQ levels due to environmental stressors. High IQ levels in countries with extreme temperatures could indicate effective adaptation strategies, such as advanced infrastructure and healthcare systems, that mitigate the negative effects of climate.

Cluster Analysis

To perform cluster analysis on this dataset, we’ll use the k-means clustering algorithm. This analysis will help identify groups of countries with similar characteristics in terms of IQ, education expenditure, average income, and average temperature.

Steps for Cluster Analysis:

  1. Data Preparation:
    • Normalize the data to ensure that each feature contributes equally to the clustering process.
  2. Cluster Analysis:
    • Use the k-means algorithm to identify clusters.
    • Determine the optimal number of clusters using the elbow method.
  3. Visualize Clusters:
    • Visualize the clusters to understand the grouping of countries.
row_names_data <- iq$region

iq_data <- iq[,-1]

rownames(iq_data) <- row_names_data


## Scale data
scale_data <- scale(iq_data)
scale_data<-as.data.frame(scale_data)

Determine the optimal number of clusters using the elbow method

## Determine the optimal number of clusters using the elbow method
fviz_nbclust(scale_data,kmeans,method = "wss")+
  labs(subtitle = "Elbow method")

We will apply K-means clustering with 3 clusters and visualize the cluster in a scatter plot.

## K-MEANS CLUSTERING
## CLUSTERING
fitK <- kmeans(scale_data, 3)

iq_cluster<-iq

iq_cluster$cluster<-as.factor(fitK$cluster)
ggplot(iq_cluster,aes(y=IQ,x=education_expenditure,col=cluster))+
    geom_point()+
    ##geom_text_repel(aes(label = region), size = 1.5, colour = "gray30",         max.overlaps = 50, fontface = "bold")+ 
    theme_solarized()+
    ggtitle("Relationship between IQ and education xpenditure")+
    xlab("education expenditure")+
    ylab("IQ")+
    theme(plot.title = element_text(hjust = 0.5))

ggplot(iq_cluster,aes(y=IQ,x=avg_income,col=cluster))+
    geom_point()+
    ##geom_text_repel(aes(label = region), size = 1.5, colour = "gray30",         max.overlaps = 50, fontface = "bold")+ 
    theme_solarized()+
    ggtitle("Relationship between IQ and education xpenditure")+
    xlab("education expenditure")+
    ylab("IQ")+
    theme(plot.title = element_text(hjust = 0.5))

ggplot(iq_cluster,aes(y=IQ,x=avg_temp,col=cluster))+
    geom_point()+
    ##geom_text_repel(aes(label = region), size = 1.5, colour = "gray30",         max.overlaps = 50, fontface = "bold")+ 
    theme_solarized()+
    ggtitle("Relationship between IQ and education xpenditure")+
    xlab("education expenditure")+
    ylab("IQ")+
    theme(plot.title = element_text(hjust = 0.5))

The cluster showed us there is a cluster with countries that have high education expenditure and high IQ levels, indicating effective investment in education and also high average income with a low temperature.

Another cluster shows countries with high IQ but lower education expenditure, suggesting that other factors (like cultural or informal education systems) contribute to cognitive performance.

and the third cluster with low income, low education expenditure, and extreme temperature may refer to countries that can’t adapt to their environment or are suffering from political instability.

cluster_map<-left_join(mapdata,iq_cluster,by="region")


ggplot(cluster_map, aes(x = long, y = lat, group = group)) +
    geom_polygon( aes(fill=cluster))+
    xlab("")+
    ylab("")+
    ggtitle("clustering countries")+
    theme_solarized()+
    theme(legend.position = c(0.1,0.4))+
    theme(axis.text.x=element_blank(), 
          axis.ticks.x=element_blank(), 
          axis.text.y=element_blank(), 
          axis.ticks.y=element_blank())+
    scale_fill_brewer(palette="Set2",na.value="white")

Hierarchical Clustering

Hierarchical clustering can provide a dendrogram to visualize how clusters are nested within each other.

Steps for Hierarchical Clustering:

  1. Compute Distance Matrix:
    • Compute the distance matrix using Euclidean distance.
  2. Perform Hierarchical Clustering:
    • Use the hclust function to perform hierarchical clustering.
  3. Visualize the Dendrogram:
    • Plot the dendrogram to visualize the clustering structure.

Compute the distance matrix

## Compute the distance matrix
d <- dist(scale_data,method = "euclidean")

Perform hierarchical clustering using the ward.D2 method

## Perform hierarchical clustering using the ward.D2 method
fitH <- hclust(d, "ward.D2")

Visualize the Dendrogram

fviz_dend(x = fitH, cex = 0.8, lwd = 0.8, k = 3,
          k_colors = c("jco"),
          rect = TRUE, 
          rect_border = "jco", 
          rect_fill = TRUE,
          ggtheme = theme_solarized())
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## ℹ The deprecated feature was likely used in the factoextra package.
##   Please report the issue at <https://github.com/kassambara/factoextra/issues>.

# Phylogenic
fviz_dend(fitH, cex = 0.8, lwd = 0.8, k = 3,
          rect = TRUE,
          k_colors = "jco",
          rect_border = "jco",
          rect_fill = TRUE,
          type = "phylogenic",
          ggtheme = theme_solarized(),
          repel = TRUE,
          phylo_layout = "layout.gem")

##
fviz_dend(fitH, cex = 0.8, lwd = 0.8, k = 3,
          rect = TRUE,
          k_colors = "jco",
          rect_border = "jco",
          rect_fill = TRUE,
          type = "phylogenic",
          ggtheme = theme_solarized(),
          repel = TRUE)

## circular
fviz_dend(fitH, cex = 0.8, lwd = 0.8, k = 3,
          rect = TRUE,
          k_colors = "jco",
          rect_border = "jco",
          rect_fill = TRUE,
          type = "circular",
          ggtheme = theme_solarized(),
          repel = TRUE)

Final Conclusion

Dataset Overview:

The dataset includes information on average IQ, education expenditure, average income, and average temperature for various countries. The analysis aimed to understand the relationships between these variables and how they influence or correlate with cognitive performance (IQ).

Key Findings:

  1. IQ and Education Expenditure:
  • Positive Correlation: Generally, countries with higher education expenditure tend to have higher IQ levels, indicating that investment in education positively impacts cognitive development.
  • Clusters: After applying k-means clustering, we identified groups of countries with similar profiles. For example, one cluster might represent countries with high education spending and high IQ, while another cluster might show moderate IQ with lower education spending.
  1. IQ and Average Income:
  • Economic Influence: A positive correlation between average income and IQ suggests that wealthier countries typically have better access to resources that enhance cognitive abilities.
  • Disparities: Some countries achieve high IQ levels despite lower average income, highlighting the potential influence of cultural factors, efficient use of resources, or robust informal education systems.
  1. IQ and Average Temperature:
  • Environmental Impact: The relationship between average temperature and IQ is less straightforward. While extreme temperatures could negatively impact cognitive performance due to environmental stressors, countries with high IQ in extreme climates might have effective adaptation strategies.
  • Clusters: Clustering revealed groups of countries that manage to maintain high IQ levels across various temperature ranges, emphasizing the importance of socio-economic resilience and adaptation.

Insights from Clustering Analysis:

  • Cluster Characteristics: The clusters identified through k-means clustering reveal common characteristics among countries. For instance, clusters with high education expenditure and income generally correspond to high IQ levels.
  • Strategic Grouping: Clustering helps in understanding how different countries group together based on their socio-economic and environmental profiles. This can guide tailored policy-making and resource allocation.