In this project, we explore the interesting association between European social values and Eurovision contest outcomes. By using skills learned in the semester, our focus is to cluster and visualize European survey data alongside Eurovision contest results. This approach aims to uncover potential correlations and insights into how cultural and social attitudes across various European regions might influence or reflect in the performances and results of the widely celebrated Eurovision Song Contest. By integrating and analyzing these diverse datasets, we seek to offer a unique perspective on the cultural dynamics of Europe as expressed through popular media and societal beliefs.
We stored the European survey data on MongoDB because it was too large. A MongoDB database was constructed and data was imported to it. Afterwards, a connection was established to a MongoDB server, specifying the database and collection to be accessed.
We defined a specific set of columns to be retrieved. This was done by creating a vector of column names, which are fields within the MongoDB collection. These fields represent different attributes or questions from the survey data.
After defining the columns, a MongoDB projection query is constructed. This projection is a way to specify which fields in the documents should be included in the returned data. In this case, the query is set up to retrieve only the specific columns.
connection_string <- "mongodb+srv://teamandroid:O9sWUq5h7JMDioFk@cluster0.kk0rkpj.mongodb.net/"
collection <- mongo(collection="european_values", db="database", url=connection_string)
columns <- c("study", "wave", "uniqid", "year", "cntry_AN", "A001", "A002", "A003", "A004", "A005", "A006", "A008", "A170", "A040", "A065", "A066", "A068", "A165", "B008", "D001_B", "G007_35_B", "Y002", "E015", "E035", "E036", "E039", "E069_01", "E225", "F025", "F028", "F028B_WVS7", "F066_EVS5", "F034", "F050", "F051", "F053", "F054", "F063", "X001", "X003R", "G027A", "X007", "X011", "X013", "X025A_01", "X028", "X047_WVS7", "X047E_EVS5", "C002", "G052")
#giturl <- "https://github.com/Mattr5541/DATA_607_Final_Project/raw/main/EVS_WVS_Joint_rData_v4_0.rdata"
#data <- import(giturl)
projection <- paste0('{', paste0('"', columns, '": 1', collapse = ','), '}')
data <- collection$find(query = '{}', fields = projection)
We imported the eurovision datasets two ways. Originally, the data
was intended to be imported from a Google Sheet using the
read_sheet function. This approach would have allowed for a
direct import of data from a Google Sheet specified by its ID, making
use of a package called googlesheets4. The specific sheet
titled “All Songs Ever” from the Eurovision dataset was targeted.
However, due to difficulties in accessing the Google Sheet among the group an alternative method was adopted. The data was instead placed on GitHub.
##Google Drive Attmept
#eurovision_id="1jRFrSEQaLmYSFLujaaEMEE33rNUo0UMOlLGXngqaYLQ"
#eurovis_raw=read_sheet(eurovision_id, sheet="All Songs Ever")
eurogit <- "https://raw.githubusercontent.com/Mattr5541/DATA_607_Final_Project/main/Copy_of_Every_Eurovision_Result_Ever%20-%20All%20Songs%20Ever.csv"
eurovis_raw <- import(eurogit)
eurogit <- "https://raw.githubusercontent.com/Mattr5541/DATA_607_Final_Project/main/Copy_of_Every_Eurovision_Result_Ever%20-%20All%20Songs%20Ever.csv"
eurovis_raw <- import(eurogit)
From the European survey data, we selected many variables. These variables cover a wide range of topics, such as country abbreviations, various life values (like family, friends, work, religion), happiness and life satisfaction, religious and political affiliations, trust levels in different entities, demographic information, and socio-economic indicators.
Afterwards we filtered the data and applied regional classification. The classification divides the countries into regions like Western Europe and Southern Europe. We also dropped NA values.
#Selecting our variables of interest (go over this in case you want to add more variables or remove any more); also, I may try to integrate these changes into MongoDb, but that remains to be seen
#Some notes: Only waves 5 & 7 are integrated into this dataset; wave 5 refers to the EVS data and wave 7 refers to the WVS data; we may need to keep this in mind when we filter data and plan our analyses
##Here are the variables I chose to include: cntry_AN = abbreviated names for countries; A001 = Important in Life: Family; A002 = Important in Life: Friends; A003 = Important in Life: Leisure Time; A004 = Important in Life: Politics; A005 = Important in Life: Work; A006 = Important in Life: Religion; A008 = Feeling of happiness; A170 = Satisfaction with your life; A040 = Important child qualities: religious faith; A065 = Member: Belong to religious organization; A066 = belong to education, arts, music, or cultural activities; A068 = Belong to political parties; A165 = Most people can be trusted; B008 = Protecting environment vs. economic growth; ; D001_B = Trust your family; G007_35_B = Trust: People of another religion; Y002 = Post-Materialist Index Score; E015 = Future changes: Less importance placed on work; E035 = Income inequality; E036 = Private vs State ownership of business; E039 = Competition good or harmful; E069_01 = Confidence: Churches; E225: Democracy: Religious authorities interpret the laws; F025 = Religious denomination; F028 = How often do you attend religious services; F028B_WVS7 = How often do you pray; F066_EVS5 = Pray to God outside of religious services; F034 = Religious person; F050 = Believe in God; F051 = Believe in: life after death; F053 = Believe in: hell; F054 - Believe in: heaven; F063 - How important is God in your life; X001 - Sex; X003R - Age recoded (6 intervals); G027A - Respondent immigrant / born in country; X007 - Marital status; X011 - How many children do you have; X013 - Number of people in household; X025A_01 - Highest educational level attained; X028 - Employment status Respondent; X047_WVS7 - Scale of incomes (WVS7); X047E_EVS5 - Scale of incomes (EVS5); C002 - Jobs scarce: Employers should give priority to (nation) people than immigrants; G052 - Evaluate the impact of immigrants on the development of [your country]
ews <- data[, !names(data) %in% "_id"]
#Belgium, Liechtenstein, Luxembourg, Monaco, Malta, San Marino were not included in the dataset; Note: CH denotes Switzerland; HR denotes Croatia
#Regions were defined in accordance with UN geoscheme classification (although I decided to add in Great Britain to Western Europe, following the CIA classification system, just as a matter of convenience. Let me know if there's a better way to go about this)
ews <- ews %>% dplyr::filter(cntry_AN %in% c("AT", "FR", "DE", "NL", "CH", "GB", "AL", "AD", "BA", "HR", "CY", "GR", "IT", "ME", "MK", "PT", "RS", "SI", "ES", "US"))
ews <- ews %>% dplyr::mutate(region = ifelse(cntry_AN %in% c("AT", "FR", "DE", "NL", "CH", "GB"), "Western Europe", "Southern Europe"))
ews <- ews %>% dplyr::mutate(region = dplyr::if_else(cntry_AN == "US", "America", region))
##And now I can clean the data
#First, I'll start by removing all values that denote a participant's uncertainty or refusal to answer question items; Items coded -1 -- -5 are considered missing, or instances in which the participant did not answer; -4 represents instances in which the question was not included in the survey (i.e., the question was included in the EVS survey, but not the EWS survey. As a result, I will keep it in for now, since its removal will lead to too much systematic data loss; we can handle those situations by simply excluding wave 5 or 7 from certain analyses/graphs)
ews <- ews %>% dplyr::mutate_all(~ ifelse(. %in% c(-1, -2, -3, -5), NA, .))
ews <- drop_na(ews)
From the eurovision data set, I created
semiclean_eurovis. This subset focuses on specific columns:
Country, Year, Language, Grand Final Points, and Grand Final Place. I
filtered the data to include only the years from 2017 to 2022, focusing
on Eurovision contests that correspond to the survey years.
I then process the different languages the songs are in to extract first language. I then classify them by language type.
#names(eurovis_raw)
semiclean_eurovis = eurovis_raw %>%
select(Country, Year, Language, `Grand Final Points`,`Grand Final Place`) %>%
filter(Year >= 2017, Year <= 2022)
semiclean_eurovis$`Grand Final Points`=as.numeric(unlist(semiclean_eurovis$`Grand Final Points`))
## Warning: NAs introduced by coercion
semiclean_eurovis$`Grand Final Place`=as.numeric(unlist(semiclean_eurovis$`Grand Final Place`))
## Warning: NAs introduced by coercion
names(semiclean_eurovis)=c("country","year","lang","pts","place")
#unique(semiclean_eurovis$lang)
sce1=semiclean_eurovis %>%
mutate(first_lang=strsplit(lang, split=',')) %>%
unnest(first_lang) %>%
select(country, year, first_lang, pts, place)
#turning Lang into Factor
clean_eurovis=sce1 %>%
filter(complete.cases(.)) %>%
mutate(language_category = case_when(
first_lang %in% c("Portuguese", "Italian", "French", " French","Spanish"," Italian") ~1,
first_lang %in% c("English", "Icelandic"," Srnán Tongo"," English") ~2,
first_lang %in% c("Russian","Serbian", "Belarusian","Ukrainian","Slovene") ~3,
first_lang %in% c("Hungarian","Northern Sami"," Northern Sami") ~ 4,
first_lang == "Albanian" ~ 5))
clean_eurovis=clean_eurovis %>%
select(country, year, language_category, pts, place)
Germany is the only country that shows up twice for 2 years for Western Europe, other countries only show up once for a year Serbia is the same case as Germany, but for Southern Europe How important God is in people’s lives for these two countries increased
library(geomtextpath)
## Warning: package 'geomtextpath' was built under R version 4.3.2
ews |>
filter(cntry_AN %in% c("DE", "RS")) |>
group_by(year, cntry_AN) |>
mutate(mean = mean(F063), country = ifelse(cntry_AN == "DE", "Germany", "Serbia")) |>
ggplot(aes(year, mean, col=country, label = country)) +
geom_textpath() +
labs(title = "Belief in God", x = "Year", y = "Importance of God (Mean)")
America only shows up for the year 2017, but by comparison, happiness levels is lower than both Southern & Western Europe
ews |>
ggplot(aes(x=A008, col=region)) +
geom_histogram() +
labs(title="Happiness by Region", x="Important in Life: Family") +
facet_grid(year~region)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#X025A_01
Interquartile on Views on Income All 3 regions have the same median and first quartile, but it seems in Europe for the third quartile, income differences should be larger.
ews |>
ggplot(aes(region, E035)) +
geom_boxplot() +
labs(title = "Views on Income Equality", x = "Region", y = "Thought on Income Difference", caption="On a scale of 1 to 10, 1 means more equal and 10 means income differences need to be larger")
For immigrants, the Americans seems mostly satisfied regardless of education levels while in Europe it’s more varied.
ews |>
filter(G027A == 2) |>
group_by(year, cntry_AN) |>
mutate(satisfaction = mean(A170)) |>
ggplot(aes(X025A_01, satisfaction, color=cntry_AN)) +
geom_point() +
facet_grid(year~region) +
labs(x = "Education Level", y = "Satisfaction with Life", title = "Immigrants Satisfaction")
I created most_gsb, which targets countries that have
won the top three places (first, second, or third) in each year of the
contest. This is done by grouping the data by year, filtering to include
only the top three places, and then re-grouping by country to count the
total number of top-three finishes for each country.
After this grouping and filtering, the data is summarized to count the number of times each country has appeared in the top three places during the specified years. This summary is then arranged in descending order to highlight the countries with the most medals.
The next step involves creating a table visualization using the
flextable package. The flextable function
transforms the most_gsb data into a more visually appealing
and understandable table format. The column names are made more
descriptive: ‘country’ is relabeled as ‘Nation’ and ‘count’ as ‘Number
of Eurovision Medals’. This makes the table easier to interpret for
readers.
Furthermore, an additional header row is added to the table to provide a clear title: “Table of Eurovision Medals by Country 2017-2022”.
ft_gsb2, is a well-structured and informative table that
visually represents the countries with the most top-three finishes in
the Eurovision song contest over the specified years. This visualization
is a useful tool for quickly understanding which countries have been
most successful in recent Eurovisions, highlighting patterns and trends
in the contest’s outcomes.
#Visualizing Countries with most Gold, Silver, and bronze medals
most_gsb=semiclean_eurovis %>%
group_by(year) %>%
filter(place==1| place ==2| place == 3) %>%
ungroup()%>%
group_by(country)%>%
summarise(count = n()) %>%
arrange(desc(count))
ft_gsb1= flextable(most_gsb)
ft_gsb2 = ft_gsb1 |>
set_header_labels(country = "Nation", count = "Number of Eurovision Medals") |>
add_header_row(values = "Table of Eurovison Medals by Country 2017-2022", colwidths = 2)
ft_gsb2
Table of Eurovison Medals by Country 2017-2022 | |
|---|---|
Nation | Number of Eurovision Medals |
Italy | 2 |
Austria | 1 |
Bulgaria | 1 |
Cyprus | 1 |
France | 1 |
Israel | 1 |
Moldova | 1 |
Netherlands | 1 |
Portugal | 1 |
Russia | 1 |
Switzerland | 1 |
After extracting the primary language, the data is regrouped by this first language and summarized to count the number of songs performed in each language. This summarization helps us see the frequency of each language’s use in the contest. The results are arranged in descending order to highlight the most common languages.
The flextable package is again used to create a visually
appealing table, langs, representing this language
distribution data. The table’s headers are renamed for clarity, and an
explanatory header row is added, providing context to the table.
In addition to the table, a histogram is created using the
ggplot2. This histogram, eurolang_hist,
visualizes the same language distribution data.
#Visualizing Language Distributions
eurovis_langs= semiclean_eurovis %>%
group_by(lang) %>%
mutate(first_lang=sapply(strsplit(lang, ", "), `[`, 1)) %>%
ungroup()%>%
group_by(first_lang) %>%
summarise(count=n()) %>%
arrange(desc(count))
total_songs=eurovis_langs %>%
summarise(result=sum(count))
langs=flextable(eurovis_langs)|>
set_header_labels(first_lang = "Song Performed First Language", count = "Number") |>
add_header_row(values = "Table of Eurovison Song First Language", colwidths = 2)
langs
Table of Eurovison Song First Language | |
|---|---|
Song Performed First Language | Number |
English | 155 |
French | 7 |
Italian | 6 |
Spanish | 5 |
Portuguese | 4 |
Albanian | 3 |
Hungarian | 3 |
Serbian | 3 |
Slovene | 3 |
Belarusian | 2 |
Georgian | 2 |
Ukrainian | 2 |
Armenian | 1 |
Croatian | 1 |
Danish | 1 |
English[a] | 1 |
English[c] | 1 |
Greek | 1 |
Icelandic | 1 |
Montenegrin | 1 |
Polish | 1 |
Russian | 1 |
Serbian[b] | 1 |
eurolang_hist=ggplot(eurovis_langs, aes(x = first_lang, y = count)) +
geom_bar(stat = "identity", fill = "blue") +
theme_minimal() +
labs(x = "Language", y = "Count", title = "Histogram of Eurovision Song First Language 2017-2022") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
eurolang_hist
This step calculates the total points each language has garnered in top positions across the specified years.
The results are then visualized in two ways. First, a table named
lang_pt_tbl is created using the flextable
package. Second, a histogram is generated using the ggplot2
package.
#Determining Winning Languages pts
lang_pts=semiclean_eurovis %>%
group_by(year) %>%
filter(place==1| place ==2| place == 3) %>%
ungroup()%>%
group_by(lang) %>%
mutate(first_lang=sapply(strsplit(lang, ", "), `[`, 1)) %>%
ungroup()%>%
group_by(first_lang) %>%
summarise(agg_pts=sum(pts, na.rm= TRUE))
lang_pt_tbl=flextable(lang_pts)|>
set_header_labels(first_lang = "Song Performed First Language", agg_pts = "Aggregated Points") |>
add_header_row(values = "Table of Eurovision 1st, 2nd, or 3rd Place Points by Language 2017-2022", colwidths = 2)
lang_pt_tbl
Table of Eurovision 1st, 2nd, or 3rd Place Points by Language 2017-2022 | |
|---|---|
Song Performed First Language | Aggregated Points |
English | 3,164 |
French | 931 |
Italian | 996 |
Portuguese | 758 |
lang_pts_hist=ggplot(lang_pts, aes(x = first_lang, y = agg_pts)) +
geom_bar(stat = "identity", fill = "blue") +
theme_minimal() +
labs(x = "Language", y = "Aggregated Points", title = "Histogram of Eurovision 1st, 2nd, or 3rd Place Points by Language 2017-2022") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
lang_pts_hist
The dataset is grouped by both year and the first language, to observe the distribution of languages across different years. A summary is created for each language and year, counting the number of times each language appeared in the top three positions.
#Visualizing time distribution of Languages
lang_time=semiclean_eurovis %>%
group_by(year) %>%
filter(place==1| place ==2| place == 3) %>%
ungroup()%>%
group_by(lang) %>%
mutate(first_lang=sapply(strsplit(lang, ", "), `[`, 1)) %>%
ungroup()%>%
group_by(year, first_lang) %>%
summarise(count = n(), .groups = 'drop') %>%
arrange(year, desc(count))
lang_time_tbl=flextable(lang_time)|>
set_header_labels(year="Year", first_lang = "Song Performed First Language", count = "Count") |>
add_header_row(values = "Table of Eurovision 1st, 2nd, or 3rd Place Points by Language and Year 2017-2022", colwidths = 3)
lang_time_tbl
Table of Eurovision 1st, 2nd, or 3rd Place Points by Language and Year 2017-2022 | ||
|---|---|---|
Year | Song Performed First Language | Count |
2,017 | English | 2 |
2,017 | Portuguese | 1 |
2,018 | English | 3 |
2,019 | English | 2 |
2,019 | Italian | 1 |
2,021 | French | 2 |
2,021 | Italian | 1 |
The data is grouped by country. For each country, the total points (
‘pts’) accumulated over the specified years are aggregated. This is done
using the sum function, with the na.rm = TRUE
parameter to ensure that missing values do not affect the calculation.
T
To visualize this data, two methods are used: a table and a histogram.
#Visualizing Point Distributions by Country
eurovis_pts= semiclean_eurovis %>%
group_by(country) %>%
summarise(agg_pts=sum(pts, na.rm= TRUE)) %>%
arrange(desc(agg_pts))
pts_tab=flextable(eurovis_pts)|>
set_header_labels(country = "Country", agg_pts = "Aggregated Eurovision Points") |>
add_header_row(values = "Table of Aggregated Eurovison Points by Country 2017-2022", colwidths = 2)
pts_tab
Table of Aggregated Eurovison Points by Country 2017-2022 | |
|---|---|
Country | Aggregated Eurovision Points |
Italy | 1,638 |
Sweden | 1,061 |
Bulgaria | 951 |
Portugal | 950 |
France | 912 |
Switzerland | 796 |
Netherlands | 780 |
Norway | 708 |
Cyprus | 707 |
Moldova | 698 |
Israel | 696 |
Iceland | 610 |
Russia | 574 |
Australia | 556 |
Ukraine | 530 |
Azerbaijan | 487 |
Czech Republic | 438 |
Belgium | 437 |
Austria | 435 |
Denmark | 423 |
Lithuania | 401 |
Germany | 373 |
Malta | 362 |
Finland | 347 |
Albania | 331 |
Estonia | 321 |
Greece | 321 |
North Macedonia | 305 |
Serbia | 304 |
Hungary | 293 |
Romania | 282 |
United Kingdom | 170 |
Slovenia | 169 |
Ireland | 136 |
Croatia | 128 |
San Marino | 127 |
Spain | 126 |
Belarus | 114 |
Armenia | 79 |
Poland | 64 |
Georgia | 0 |
Latvia | 0 |
Montenegro | 0 |
europts_hist=ggplot(eurovis_pts, aes(x = country, y = agg_pts)) +
geom_bar(stat = "identity", fill = "blue") +
theme_minimal() +
labs(x = "Country", y = "Total Points", title = "Histogram of Aggregated Eurovision Points by Country 2017-2022") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
europts_hist
We use mutate to rename a series of survey question
codes (like A001, A002, etc.) to more descriptive variable names. This
renaming makes the dataset more interpretable and user-friendly. For
instance, ‘A001’ is renamed to ‘Family’, ‘A002’ to ‘Friends’, and so on,
covering a wide range of topics like leisure, politics, work, religion,
happiness, life satisfaction, and various other aspects related to
personal beliefs, social trust, and environmental attitudes.
After these transformations, the select function is
applied. This function is used to pick out only the relevant variables
from the dataset.
#ews_group <- ews %>% group_by(year,cntry_AN ) %>% summarize(Family = mean(A001#), Friends = mean(A002), Leisure = mean(A003), Politics = mean(A004), Work = #mean(A005), Religion = mean(A006), Happiness = mean(A008), Satisfaction = #mean#(A170), Religious_Children = mean(A040), Organized_Religion = mean(A065), #Cultural_Activities = mean(A066), Political_Party_Affiliation = mean(A068), #Trust_People = mean(A165), Protect_Environment_vs_Economy = mean(B008), #Trust_Family = mean(D001_B), Trust_Religion = mean(G007_35_B), Materialism = #mean(Y002), Future_Change_Less_Work = mean(E015), Income_Inequality = mean#(E035), Business_Ownership = mean(E036), Competition = mean(E039), #Confidence_Church = mean(E069_01), Democracy_Religion = mean(E225), #Deonomnation = mean(F025), Religious_Service_Attendance = mean(F025), Pray = #mean(F028B_WVS7), Pray_Outside_Service = mean(F066_EVS5), Religious = mean#(F034), Belivie_in_God = mean(F050), Belive_in_Afterlife = mean(F051), #Belive_in_Hell = mean(F053), Believe_in_Heaven = mean(F054), #Job_Priority_to_Immigrants = mean(C002), Impact_of_Immigration = mean(G052))
ews_group <- ews %>% mutate(Family =A001, Friends = A002, Leisure = A003, Politics = A004, Work = A005, Religion = A006, Happiness = A008, Satisfaction = A170, Religious_Children = A040, Organized_Religion = A065, Cultural_Activities =A066, Political_Party_Affiliation = A068, Trust_People = A165, Protect_Environment_vs_Economy = B008, Trust_Family = D001_B, Trust_Religion = G007_35_B, Materialism = Y002, Future_Change_Less_Work =E015, Income_Inequality = E035, Business_Ownership = E036, Competition = E039, Confidence_Church = E069_01, Democracy_Religion = E225, Deonomnation = F025, Religious_Service_Attendance = F025, Pray = F028B_WVS7, Pray_Outside_Service = F066_EVS5, Religious = F034, Belivie_in_God = F050, Belive_in_Afterlife = F051, Belive_in_Hell = F053, Believe_in_Heaven = F054, Job_Priority_to_Immigrants = C002, Impact_of_Immigration = G052) %>%
select(cntry_AN, year, Family , Friends , Leisure, Politics , Work, Religion , Happiness , Satisfaction , Religious_Children , Organized_Religion , Cultural_Activities , Political_Party_Affiliation , Trust_People , Protect_Environment_vs_Economy, Trust_Family , Trust_Religion , Materialism , Future_Change_Less_Work , Income_Inequality , Business_Ownership , Competition , Confidence_Church, Democracy_Religion , Deonomnation, Religious_Service_Attendance , Pray , Pray_Outside_Service, Religious , Belivie_in_God , Belive_in_Afterlife , Belive_in_Hell, Believe_in_Heaven, Job_Priority_to_Immigrants , Impact_of_Immigration )
Here, I merge two datasets, one containing Eurovision data and another with country codes.
The purpose of this operation is to combine the Eurovision data with a standardized set of country codes, which is necessary to link it with our other data.
country_key=read.csv(url("https://gist.githubusercontent.com/tadast/8827699/raw/f5cac3d42d16b78348610fc4ec301e9234f82821/countries_codes_and_coordinates.csv"))
country_key=country_key %>%
select(Country, Alpha.2.code)
names(country_key)=c("country","cntry_AN")
eurovis_w_cc=clean_eurovis %>%
right_join(country_key, by="country")
# Checking column names and types in ews_group
#print(names(ews_group))
#str(ews_group)
# Checking column names and types in eurovis_w_cc
#print(names(eurovis_w_cc))
#str(eurovis_w_cc)
ews_group$year = as.integer(ews_group$year)
eurovis_w_cc$year = as.integer(eurovis_w_cc$year)
# Trim leading and trailing spaces in cntry_AN of eurovis_w_cc
eurovis_w_cc$cntry_AN <- trimws(eurovis_w_cc$cntry_AN)
joint_data = inner_join(ews_group, eurovis_w_cc, by = c("cntry_AN", "year"))
## Warning in inner_join(ews_group, eurovis_w_cc, by = c("cntry_AN", "year")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 2696 of `x` matches multiple rows in `y`.
## ℹ Row 38 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
joint_data$language_category=as.integer(joint_data$language_category)
joint_data=joint_data%>%
ungroup()%>%
filter(complete.cases(.)) %>%
select(-cntry_AN)
#Alright, so character variables do not work with k-means clustering, so I have two options: 1) exclude country from the analysis entirely; 2) find the means for each country so that way, each instance of country is unique and can be used as an identifier (I think year will need to be excluded entirely)
joint_data_mean <- joint_data %>% group_by(country) %>% summarize(Family = mean(Family), Friends = mean(Friends), Leisure = mean(Leisure), Politics = mean(Politics), Work = mean(Work), Religion = mean(Religion), Happiness = mean(Happiness), Satisfaction = mean(Satisfaction), Religious_Children = mean(Religious_Children), Organized_Religion = mean(Organized_Religion), Cultural_Activities = mean(Cultural_Activities), Political_Party_Affiliation = mean(Political_Party_Affiliation), Trust_People = mean(Trust_People), Protect_Environment_vs_Economy = mean(Protect_Environment_vs_Economy), Trust_Family = mean(Trust_Family), Trust_Religion = mean(Trust_Religion), Materialism = mean(Materialism), Future_Change_Less_Work = mean(Future_Change_Less_Work), Income_Inequality = mean(Income_Inequality), Business_Ownership = mean(Business_Ownership), Competition = mean(Competition), Confidence_Church = mean(Confidence_Church), Democracy_Religion = mean(Democracy_Religion), Deonomnation = mean(Deonomnation), Religious_Service_Attendance = mean(Religious_Service_Attendance), Pray = mean(Pray), Pray_Outside_Service = mean(Pray_Outside_Service), Religious = mean(Religious), Belivie_in_God = mean(Belivie_in_God), Belive_in_Afterlife = mean(Belive_in_Afterlife), Belive_in_Hell = mean(Belive_in_Hell), Believe_in_Heaven = mean(Believe_in_Heaven), Job_Priority_to_Immigrants = mean(Job_Priority_to_Immigrants), Impact_of_Immigration = mean(Impact_of_Immigration), language_category = mean(language_category), pts = mean(pts), place = mean(place))
joint_data_mean <- joint_data_mean %>% column_to_rownames(var = "country")
First, we define a function summary_stats. This function
computes various summary statistics for a given numeric vector,
including sample size, mean, standard deviation, variance, minimum and
maximum values, and the interquartile range.
The function is then applied to several variables
(Friends, Leisure, Work, and
Religion) from the joint_data dataset. For
each variable, the data is grouped by country, and the
summary_stats function is applied to calculate the summary
statistics. The results for each variable are stored in separate data
frames (friends_res, leisure_res,
work_res, religion_res) and are printed out
for review.
Afterwards, we visualize dependent variables like points
(pts) and placement (place) in the Eurovision
contest by country. The first plot shows points by country, the second
shows placement in the Eurovision by country, and the third plot
compares points and placement, using a scatter plot to visualize the
relationship between these variables across different countries.
Afterwards, we conduct a correlation analysis. It calculates a
correlation matrix (cor_matrix) for numeric columns in the
joint_data dataset. Although there is a commented-out code
for visualizing the correlation matrix using corrplot, the
analysis proceeds to identify the top 10 most strongly correlated pairs
of variables, excluding self-correlations. This analysis reveals
significant correlations, like a perfect correlation between
‘Denomination’ and ‘Religious_Service_Attendance’ and a strong
correlation between ‘Belive_in_Hell’ and ‘Believe_in_Heaven’.
Lastly, the code conducts a principal component analysis (PCA) on the
correlation matrix. PCA is a technique used to reduce the dimensionality
of the data while retaining as much variability as possible. The PCA
results (pca_result) are summarized, and the transformed
scores for the first 35 components are printed.
# Custom function to get summary statistics
summary_stats <- function(x) {
qs <- quantile(x, c(0.25, 0.5, 0.75))
data.frame(
sample_size = length(x),
mean = mean(x),
sd = sd(x),
variance = var(x),
minimum = min(x),
maximum = max(x),
interquartile_range = qs[3] - qs[1]
)
}
friends_res <- joint_data |>
group_by(country) |>
summarize(across(Friends, summary_stats)) |>
as_tibble()
leisure_res <- joint_data |>
group_by(country) |>
summarize(across(Leisure, summary_stats)) |>
as_tibble()
work_res <- joint_data |>
group_by(country) |>
summarize(across(Work, summary_stats)) |>
as_tibble()
religion_res <- joint_data |>
group_by(country) |>
summarize(across(Religion, summary_stats)) |>
as_tibble()
print.data.frame(religion_res)
## country Religion.sample_size Religion.mean Religion.sd
## 1 Albania 767 2.119948 0.9260974
## 2 Austria 771 2.700389 1.0076267
## 3 Croatia 1490 2.306040 0.9670341
## 4 Cyprus 300 1.566667 0.7664946
## 5 France 1001 2.900100 1.0440354
## 6 Germany 1822 2.760703 0.9996337
## 7 Greece 728 1.714286 0.9134268
## 8 Italy 843 2.124555 0.9310695
## 9 Netherlands 1336 2.917665 1.0059535
## 10 Serbia 562 1.955516 0.8250619
## 11 Spain 986 2.626775 1.1123000
## 12 United Kingdom 1325 2.752453 1.0070764
## Religion.variance Religion.minimum Religion.maximum
## 1 0.8576564 1 4
## 2 1.0153115 1 4
## 3 0.9351549 1 4
## 4 0.5875139 1 4
## 5 1.0900100 1 4
## 6 0.9992676 1 4
## 7 0.8343486 1 4
## 8 0.8668904 1 4
## 9 1.0119424 1 4
## 10 0.6807271 1 4
## 11 1.2372113 1 4
## 12 1.0142028 1 4
## Religion.interquartile_range
## 1 2
## 2 2
## 3 1
## 4 1
## 5 2
## 6 2
## 7 1
## 8 2
## 9 2
## 10 1
## 11 2
## 12 2
print.data.frame(work_res)
## country Work.sample_size Work.mean Work.sd Work.variance
## 1 Albania 767 1.190352 0.4486833 0.2013167
## 2 Austria 771 1.664073 0.8042055 0.6467465
## 3 Croatia 1490 1.536913 0.6297258 0.3965546
## 4 Cyprus 300 1.580000 0.8241343 0.6791973
## 5 France 1001 1.497502 0.6973118 0.4862438
## 6 Germany 1822 1.718441 0.8320068 0.6922352
## 7 Greece 728 1.359890 0.6290850 0.3957480
## 8 Italy 843 1.309609 0.5931038 0.3517722
## 9 Netherlands 1336 1.895958 0.8420317 0.7090174
## 10 Serbia 562 1.521352 0.7242228 0.5244987
## 11 Spain 986 1.320487 0.6001067 0.3601281
## 12 United Kingdom 1325 1.898113 1.0250802 1.0507895
## Work.minimum Work.maximum Work.interquartile_range
## 1 1 4 0
## 2 1 4 1
## 3 1 4 1
## 4 1 4 1
## 5 1 4 1
## 6 1 4 1
## 7 1 4 1
## 8 1 4 1
## 9 1 4 1
## 10 1 4 1
## 11 1 4 1
## 12 1 4 1
print.data.frame(friends_res)
## country Friends.sample_size Friends.mean Friends.sd Friends.variance
## 1 Albania 767 1.753585 0.6386462 0.4078690
## 2 Austria 771 1.409857 0.5703554 0.3253053
## 3 Croatia 1490 1.621477 0.5668908 0.3213652
## 4 Cyprus 300 1.480000 0.6252224 0.3909030
## 5 France 1001 1.570430 0.6806502 0.4632847
## 6 Germany 1822 1.439078 0.5685988 0.3233046
## 7 Greece 728 1.465659 0.6371591 0.4059718
## 8 Italy 843 1.634638 0.5740515 0.3295351
## 9 Netherlands 1336 1.502246 0.5648859 0.3190961
## 10 Serbia 562 1.483986 0.5510578 0.3036647
## 11 Spain 986 1.421907 0.5704098 0.3253673
## 12 United Kingdom 1325 1.499623 0.5953304 0.3544183
## Friends.minimum Friends.maximum Friends.interquartile_range
## 1 1 4 1
## 2 1 4 1
## 3 1 4 1
## 4 1 4 1
## 5 1 4 1
## 6 1 4 1
## 7 1 4 1
## 8 1 4 1
## 9 1 4 1
## 10 1 3 1
## 11 1 4 1
## 12 1 4 1
print.data.frame(leisure_res)
## country Leisure.sample_size Leisure.mean Leisure.sd Leisure.variance
## 1 Albania 767 2.029987 0.6385263 0.4077158
## 2 Austria 771 1.556420 0.6244953 0.3899944
## 3 Croatia 1490 1.789262 0.6109743 0.3732896
## 4 Cyprus 300 1.623333 0.6705378 0.4496210
## 5 France 1001 1.772228 0.6663842 0.4440679
## 6 Germany 1822 1.661910 0.6009952 0.3611953
## 7 Greece 728 1.722527 0.6797833 0.4621053
## 8 Italy 843 1.715302 0.6122974 0.3749081
## 9 Netherlands 1336 1.508234 0.5740385 0.3295202
## 10 Serbia 562 1.539146 0.6142090 0.3772527
## 11 Spain 986 1.529412 0.6573652 0.4321290
## 12 United Kingdom 1325 1.607547 0.6468168 0.4183720
## Leisure.minimum Leisure.maximum Leisure.interquartile_range
## 1 1 4 0
## 2 1 4 1
## 3 1 4 1
## 4 1 4 1
## 5 1 4 1
## 6 1 4 1
## 7 1 4 1
## 8 1 4 1
## 9 1 4 1
## 10 1 4 1
## 11 1 4 1
## 12 1 4 1
# Dependent Variables
# Points by Country
joint_data |>
distinct(country, pts, .keep_all = TRUE) |>
ggplot(aes(country, pts)) +
geom_col()
# Placement in Eurovision by Country
joint_data |>
distinct(country, place, .keep_all = TRUE) |>
ggplot(aes(country, place)) +
geom_col()
joint_data |>
distinct(country, place, pts, .keep_all = TRUE) |>
ggplot(aes(x = place, y = pts, color = country)) +
geom_point() +
labs(x = "Place", y = "Points", title = "Points and Placement")
numeric_columns <- sapply(joint_data, is.numeric)
cor_matrix <- cor(joint_data[, numeric_columns])
# The results are very messy and numbers don't fit properly that's why the below lines are commented out
# corrplot(cor_matrix, method="number")
# corrplot(cor_matrix, method="color")
pca_result <- princomp(cor_matrix, cor=TRUE)
summary(pca_result)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 3.4109506 2.1663852 1.79269204 1.58457448 1.29504516
## Proportion of Variance 0.3061733 0.1235059 0.08457223 0.06607569 0.04413532
## Cumulative Proportion 0.3061733 0.4296792 0.51425141 0.58032710 0.62446242
## Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## Standard deviation 1.20754460 1.11090338 1.04972998 1.01269937 0.94687639
## Proportion of Variance 0.03837274 0.03247648 0.02899824 0.02698842 0.02359408
## Cumulative Proportion 0.66283515 0.69531163 0.72430987 0.75129829 0.77489237
## Comp.11 Comp.12 Comp.13 Comp.14 Comp.15
## Standard deviation 0.93960584 0.8728406 0.85616604 0.79806630 0.77739047
## Proportion of Variance 0.02323314 0.0200487 0.01929001 0.01676078 0.01590358
## Cumulative Proportion 0.79812550 0.8181742 0.83746422 0.85422500 0.87012858
## Comp.16 Comp.17 Comp.18 Comp.19 Comp.20
## Standard deviation 0.74023009 0.73180542 0.66705226 0.65228834 0.63627212
## Proportion of Variance 0.01441949 0.01409314 0.01170944 0.01119684 0.01065374
## Cumulative Proportion 0.88454807 0.89864120 0.91035064 0.92154749 0.93220123
## Comp.21 Comp.22 Comp.23 Comp.24
## Standard deviation 0.60031454 0.572687147 0.559848032 0.542667298
## Proportion of Variance 0.00948362 0.008630804 0.008248153 0.007749679
## Cumulative Proportion 0.94168485 0.950315655 0.958563808 0.966313487
## Comp.25 Comp.26 Comp.27 Comp.28
## Standard deviation 0.519746691 0.485592947 0.442898406 0.42760823
## Proportion of Variance 0.007108858 0.006205277 0.005162079 0.00481181
## Cumulative Proportion 0.973422345 0.979627622 0.984789701 0.98960151
## Comp.29 Comp.30 Comp.31 Comp.32
## Standard deviation 0.355999442 0.298572379 0.258204595 0.240164809
## Proportion of Variance 0.003335147 0.002345933 0.001754463 0.001517872
## Cumulative Proportion 0.992936659 0.995282592 0.997037056 0.998554928
## Comp.33 Comp.34 Comp.35 Comp.36
## Standard deviation 0.1592253410 0.1256070427 0.1117999462 3.582856e-02
## Proportion of Variance 0.0006671766 0.0004151876 0.0003289271 3.378121e-05
## Cumulative Proportion 0.9992221041 0.9996372917 0.9999662188 1.000000e+00
## Comp.37 Comp.38
## Standard deviation 0 0
## Proportion of Variance 0 0
## Cumulative Proportion 1 1
transformed_scores <- pca_result$scores[, 1:35]
print(transformed_scores)
## Comp.1 Comp.2 Comp.3 Comp.4
## year 1.0608969 0.42412453 4.02515229 1.61097110
## Family -2.6947787 1.38226099 -0.11435288 -2.38719843
## Friends -0.1356264 3.22108601 -0.43917204 -0.53436271
## Leisure 0.8492334 2.55477748 -0.07043226 0.10831693
## Politics 1.5536267 3.32694123 -1.06910667 1.18561231
## Work -2.4874032 -0.46593785 0.29316030 -1.64001547
## Religion -7.2522207 1.48237035 0.44264037 0.74272880
## Happiness 0.6272919 3.86959919 -0.42197801 -1.40441082
## Satisfaction -1.7586534 -4.27336118 0.56996765 1.93529491
## Religious_Children 4.1560602 -1.11546355 -0.78207028 -0.84247161
## Organized_Religion 1.8060070 -3.43334956 1.00509219 -1.24072819
## Cultural_Activities -1.7935872 -3.55127225 0.59536576 -0.55645680
## Political_Party_Affiliation -1.0125809 -1.92039779 0.78144495 -0.72770535
## Trust_People 2.7036990 3.89699293 -0.23035420 0.33693481
## Protect_Environment_vs_Economy 0.5880318 1.63938017 -0.15007618 0.51176638
## Trust_Family -1.4417021 2.45521980 -0.04203925 -2.74522241
## Trust_Religion 2.0785161 3.85166730 0.65778368 -0.18354028
## Materialism -4.0438160 -2.52595023 0.69935836 -0.07182142
## Future_Change_Less_Work 2.3286882 1.26532605 0.09939557 1.67754502
## Income_Inequality -0.5629820 -0.75280870 0.58911230 1.52772742
## Business_Ownership -1.6033639 -0.91890396 -1.23765505 -2.05768774
## Competition -1.8696087 -0.49165574 -0.70803150 -3.17570640
## Confidence_Church -6.3404112 1.46640913 0.25129428 0.72986495
## Democracy_Religion 3.2781475 0.43264836 -1.24788164 0.13364627
## Deonomnation 4.4929774 -0.66051717 -0.99381699 2.19135157
## Religious_Service_Attendance 4.4929774 -0.66051717 -0.99381699 2.19135157
## Pray 0.9776760 0.01952371 3.91814671 -0.29067318
## Pray_Outside_Service -6.0823449 0.35121524 -2.80828518 0.65681921
## Religious -6.8500166 0.94480659 0.35651014 0.48822091
## Belivie_in_God 6.0008684 -1.03568215 -0.47652476 -0.10019224
## Belive_in_Afterlife 3.9327757 -2.29722209 -0.47766219 -1.94501360
## Belive_in_Hell 4.9777367 -1.16869818 -0.85177900 -1.66981008
## Believe_in_Heaven 5.1343965 -1.83147520 -0.70086932 -1.85088655
## Job_Priority_to_Immigrants -3.6414015 -2.94110717 -0.07708842 0.23238267
## Impact_of_Immigration -1.8769926 -3.05503105 -0.92964414 2.60113102
## language_category 1.7417031 1.06125612 0.19371163 4.18838587
## pts 0.2813345 0.15842261 6.01200438 -0.36207749
## place -1.6151542 -0.70467682 -5.66750360 0.73592905
## Comp.5 Comp.6 Comp.7
## year 0.06620205 0.236335064 1.7122292019
## Family 1.02570793 -1.407438647 -1.0008375869
## Friends 1.29628981 -3.377231298 0.0240046557
## Leisure 1.60073674 -3.311336497 -0.0002122818
## Politics -1.21572126 -1.250969754 1.3124327738
## Work 0.55069943 -1.331850205 0.1402494232
## Religion -0.64021990 0.245470472 0.2561572984
## Happiness 2.42993828 0.996286075 -0.7879166796
## Satisfaction -2.85910374 -1.041454989 0.7466655794
## Religious_Children -0.59759010 0.034790291 0.1686779653
## Organized_Religion 0.48150768 -0.513129882 -1.0760330695
## Cultural_Activities 0.75213513 0.334110456 -1.8993110298
## Political_Party_Affiliation 1.31993743 1.544716937 -2.1779884442
## Trust_People -1.13365041 1.399211082 -0.1414322346
## Protect_Environment_vs_Economy -2.20876070 0.613964883 -0.8699432276
## Trust_Family 0.33211836 -0.054793020 -1.1327292177
## Trust_Religion -0.92053790 1.853034490 -0.2466719417
## Materialism 0.56213904 0.006477605 0.0574224703
## Future_Change_Less_Work -1.67705023 -0.394525134 -0.9128862158
## Income_Inequality -1.82847230 -0.334382717 -3.1676930377
## Business_Ownership 0.61524457 1.429558963 2.8888768690
## Competition 0.03630128 1.337284013 2.0614096184
## Confidence_Church -0.55349864 0.687462894 -0.0902965037
## Democracy_Religion -0.79966513 0.848580762 1.0796095605
## Deonomnation 1.98039026 0.655645592 0.1831961309
## Religious_Service_Attendance 1.98039026 0.655645592 0.1831961309
## Pray 0.98787966 2.135707806 -0.2286923525
## Pray_Outside_Service -1.73897780 -0.897398908 0.3326994116
## Religious -0.40038918 1.059118915 0.3224958043
## Belivie_in_God -0.08770541 -0.508564628 -0.0627213156
## Belive_in_Afterlife -1.14908289 -0.587680939 0.0849576136
## Belive_in_Hell -1.09456364 -0.234825912 0.2677797963
## Believe_in_Heaven -1.21598820 -0.518062030 0.1664918010
## Job_Priority_to_Immigrants 1.11435812 -0.541824660 0.0722669848
## Impact_of_Immigration 1.99428646 -0.706671810 1.3600290860
## language_category 1.60487213 0.268541875 0.0053300526
## pts -0.72726178 -0.745513989 0.9940655100
## place 0.11710457 1.415711255 -0.6248785994
## Comp.8 Comp.9 Comp.10 Comp.11
## year 1.01105350 0.62189310 -0.005225484 0.99270230
## Family -0.23274593 -1.11048831 0.757090039 1.34737829
## Friends -0.23338746 0.59741306 -0.542371995 -1.47873276
## Leisure -0.14857204 1.06472789 -0.975665972 -1.92673840
## Politics -0.41563974 -1.23444661 -0.717007950 0.07858743
## Work -2.49362997 -2.06392417 -1.560264971 1.05159308
## Religion 0.13555711 -0.29127744 -0.430733779 -0.04209463
## Happiness 0.79560036 0.76268109 0.802505748 0.44826859
## Satisfaction -0.83433282 -0.51451721 -1.161419617 -0.39915301
## Religious_Children -0.02061824 0.21338191 -0.331096933 0.12132051
## Organized_Religion -0.90175552 -0.27804103 -0.193263441 0.22131319
## Cultural_Activities -0.68065087 0.26846145 -0.384449982 0.04236521
## Political_Party_Affiliation -0.61479845 3.40701431 -2.011225197 -0.14321219
## Trust_People 0.53441203 -0.08995883 -0.752262017 -0.08548165
## Protect_Environment_vs_Economy -3.27469225 0.97274805 1.523298290 1.36259694
## Trust_Family 1.48993510 -0.68138218 1.126272922 1.71708305
## Trust_Religion 0.11837769 -0.69297283 -1.318412481 -0.48239349
## Materialism 1.57410710 -0.51910934 -0.535463968 0.28918980
## Future_Change_Less_Work -0.36365611 1.91773805 1.234173679 -0.10002984
## Income_Inequality 1.17569064 -1.38345407 2.096667612 -2.83059844
## Business_Ownership -1.43074690 1.18830530 0.919075967 -1.90081742
## Competition -0.91093328 -0.21531114 1.923928258 -1.41494105
## Confidence_Church 0.10011999 -0.19684189 -0.278180701 0.08408680
## Democracy_Religion 1.98999842 0.30807274 -1.109214405 0.03137941
## Deonomnation -0.71055533 -1.09329406 0.493948510 0.24855191
## Religious_Service_Attendance -0.71055533 -1.09329406 0.493948510 0.24855191
## Pray -0.15629849 -1.64855168 -0.844569555 -1.24943568
## Pray_Outside_Service 0.45040793 1.17980338 0.280704015 1.09661243
## Religious 0.43070735 0.01592450 -0.427475275 -0.06100263
## Belivie_in_God 0.04967788 -0.03248554 0.171348098 0.20693737
## Belive_in_Afterlife 0.87576660 0.02654801 -0.053104838 0.44489726
## Belive_in_Hell 0.92905713 0.22358907 -0.188344915 0.28931677
## Believe_in_Heaven 0.64847721 0.06364000 -0.080050151 0.34303450
## Job_Priority_to_Immigrants 0.69771997 -0.42560529 0.916035871 -0.08054823
## Impact_of_Immigration 1.48584985 0.69561173 0.896114257 0.67799097
## language_category -0.94896775 0.16291718 0.537606985 0.68224155
## pts 0.29694596 0.54365236 0.677668143 0.42776617
## place 0.29307468 -0.66916753 -0.950583273 -0.25858602
## Comp.12 Comp.13 Comp.14
## year 0.171201183 1.7125086635 0.13514538
## Family -1.965602023 0.7540548346 -0.16334195
## Friends -0.455977856 0.2909620664 -0.97672510
## Leisure 0.902118723 -0.1375352213 -0.86218836
## Politics 0.062042369 -1.3768579442 0.47389109
## Work 1.096435657 1.7527751144 2.23566708
## Religion 0.196541151 -0.5524369837 0.09582394
## Happiness 1.773038841 -0.7691990488 1.42160474
## Satisfaction -1.503719360 0.1610899005 -0.82449905
## Religious_Children 0.219213470 0.1657810795 -0.15442533
## Organized_Religion 0.017860107 -0.7788609483 0.05994509
## Cultural_Activities -0.558286499 -1.8210139943 0.51007185
## Political_Party_Affiliation -0.713965967 1.3520761805 -0.22273861
## Trust_People -0.495843761 -0.0123351285 0.16891647
## Protect_Environment_vs_Economy 2.122708579 0.7392503236 -1.85654954
## Trust_Family -1.869008889 0.4597464822 -1.10028516
## Trust_Religion -0.617499026 -0.2601877117 -0.43096827
## Materialism 0.745777327 -0.8958857358 -0.25008606
## Future_Change_Less_Work -0.777691567 -1.4273133898 1.03406961
## Income_Inequality 0.250583145 1.7947415110 1.15361216
## Business_Ownership -1.373149402 0.0188063555 1.25810262
## Competition 0.112006233 0.3742674772 -0.75852103
## Confidence_Church -0.003372218 -0.9686279490 0.01618293
## Democracy_Religion 0.366155416 1.5745798788 -0.29932986
## Deonomnation -0.702533581 -0.1181016069 -0.55702232
## Religious_Service_Attendance -0.702533581 -0.1181016069 -0.55702232
## Pray 0.640035292 -0.4543403433 -0.70577706
## Pray_Outside_Service -0.131607666 -0.0016300988 0.53806031
## Religious 0.327025427 -0.1439658256 -0.09942516
## Belivie_in_God -0.039246402 -0.1911382702 0.20338546
## Belive_in_Afterlife 0.581516410 -0.5584724125 0.27936596
## Belive_in_Hell 0.418231795 -0.2932217422 0.22442019
## Believe_in_Heaven 0.404449340 -0.3267107976 0.27725654
## Job_Priority_to_Immigrants 1.275557368 -0.6819327532 -1.41675362
## Impact_of_Immigration 0.430816987 0.8033162239 0.22592090
## language_category -0.636034357 0.0002864778 0.85053170
## pts -0.032777745 -0.3704419255 0.31870231
## place 0.465535080 0.3040688684 -0.24501752
## Comp.15 Comp.16 Comp.17 Comp.18
## year 1.02694609 0.20832389 -0.002928109 0.18598291
## Family -1.36228169 0.15785953 0.409738397 -2.15750758
## Friends 0.12375053 0.11262910 -0.567368504 0.32205478
## Leisure 0.36925587 0.33729939 0.397672900 0.41075011
## Politics 0.58186478 -0.12028953 -0.231389316 -1.77016668
## Work -0.09035245 -0.49010453 0.869332435 1.07825088
## Religion -0.13719043 -0.10724660 -0.257777598 -0.30631540
## Happiness -0.21129745 -0.03710157 -0.417281415 -0.18682628
## Satisfaction -0.44132632 -0.10561451 0.037964381 0.21209198
## Religious_Children -0.10791710 0.06349360 0.532547801 -0.33181614
## Organized_Religion 0.41249026 0.14619292 -0.512131082 0.74323001
## Cultural_Activities 2.92598135 1.46448325 0.869545133 -0.59872570
## Political_Party_Affiliation -0.44445390 -1.30871815 -0.567302111 -0.72038300
## Trust_People 0.09151297 -0.04853936 0.058324661 0.74655381
## Protect_Environment_vs_Economy -0.15506586 1.35961411 -0.478239868 -0.25800349
## Trust_Family 0.50009285 0.71813382 0.306450546 1.57740086
## Trust_Religion 0.30520726 -0.25153606 0.008243622 0.46449980
## Materialism -0.77433448 0.24967877 -1.702070283 0.59097341
## Future_Change_Less_Work -1.24220968 -0.77138049 2.280252018 0.76433454
## Income_Inequality 0.22227965 0.15533029 -0.676645067 -0.25600025
## Business_Ownership -1.07698905 2.12971541 -0.579918602 0.38769135
## Competition 1.71287352 -2.41057153 0.414872012 -0.26259005
## Confidence_Church -0.27245490 -0.36279904 -0.512561060 0.03759238
## Democracy_Religion 0.64421979 1.05822042 1.089708106 -0.50189780
## Deonomnation 0.03652022 -0.54241490 -0.706071082 0.22157975
## Religious_Service_Attendance 0.03652022 -0.54241490 -0.706071082 0.22157975
## Pray -1.12718926 0.46187623 0.919201937 -0.40373400
## Pray_Outside_Service 0.70377390 -0.53591595 -0.794149331 0.37476232
## Religious -0.12165102 -0.06598738 0.223372914 -0.20217059
## Belivie_in_God -0.07023833 -0.01673891 -0.373174117 0.10307084
## Belive_in_Afterlife -0.40333907 -0.28888804 -0.568847489 0.00629040
## Belive_in_Hell -0.37140204 -0.26534857 -0.251444704 -0.38487122
## Believe_in_Heaven -0.36696143 -0.21089389 -0.387491992 -0.19750921
## Job_Priority_to_Immigrants -0.96187374 -0.42042627 1.397944093 0.42156343
## Impact_of_Immigration -0.02038564 0.39091776 0.788272954 -0.51419517
## language_category 0.14752222 -0.18454191 -0.255304528 0.03946574
## pts 0.08532925 -0.18420628 -0.444265233 -0.07544444
## place -0.16722685 0.25790988 0.388988660 0.21843797
## Comp.19 Comp.20 Comp.21
## year 0.2516526727 0.28431756 0.15144248
## Family 1.0638061433 0.32082744 -0.46962811
## Friends 0.8681012125 1.40457937 0.08409526
## Leisure -0.0984556349 -1.17805213 0.18311574
## Politics -1.7482218225 -1.06680336 0.10311335
## Work -0.0895358041 0.02007308 0.56807964
## Religion -0.1044422610 -0.06690287 0.09989933
## Happiness -0.0007837731 0.02739997 -0.44831275
## Satisfaction 0.0464931772 -0.06166470 -0.07413891
## Religious_Children 0.0562464842 -0.47247159 -1.03980349
## Organized_Religion -0.1101664904 -0.65211892 -2.68910006
## Cultural_Activities 0.4443030438 0.70531234 0.83507909
## Political_Party_Affiliation -1.0739224992 -0.35108360 0.41760435
## Trust_People 0.0682337816 0.96819076 -0.28689493
## Protect_Environment_vs_Economy 0.2421338073 -0.02064270 0.24164457
## Trust_Family -1.2339352062 -1.11042860 0.61416371
## Trust_Religion 0.0718103194 1.23442659 -0.75073432
## Materialism 1.6607381772 -1.08859907 0.54366652
## Future_Change_Less_Work 0.9610008869 -0.76075627 0.33892702
## Income_Inequality -0.4350438513 -0.07223882 0.01953620
## Business_Ownership -0.6124542090 0.12229762 0.09039504
## Competition 0.7469736343 -0.45566722 0.02266248
## Confidence_Church -0.1316811263 0.06660683 0.24379390
## Democracy_Religion 0.5639921358 -0.71760775 -0.22949099
## Deonomnation -0.0058610566 -0.17753405 0.44534105
## Religious_Service_Attendance -0.0058610566 -0.17753405 0.44534105
## Pray 0.1844540028 -0.36667467 0.29509836
## Pray_Outside_Service -0.0726882880 0.27054822 -0.44925049
## Religious 0.1694502831 -0.12806374 -0.23063074
## Belivie_in_God -0.1837940991 0.11828791 0.11256608
## Belive_in_Afterlife 0.0136957117 0.49464896 0.64462801
## Belive_in_Hell -0.0351825871 0.50967579 0.63116039
## Believe_in_Heaven -0.0396617957 0.45248835 0.50088702
## Job_Priority_to_Immigrants -1.6225808445 1.44069713 -0.12888835
## Impact_of_Immigration -0.1771314605 0.26674604 -0.40756564
## language_category 0.2567053302 0.21950859 -0.13691955
## pts -0.2564800166 0.07647784 -0.18504644
## place 0.3680930786 -0.07826628 -0.10583589
## Comp.22 Comp.23 Comp.24
## year 0.005845127 0.24802237 6.219493e-01
## Family -0.416437437 -0.46412786 7.971595e-01
## Friends 1.351347924 1.20952879 -8.651877e-01
## Leisure -1.655820476 -1.00953915 5.868632e-01
## Politics 1.307385841 -0.24014538 -5.942843e-05
## Work 0.255083557 -0.05362043 -2.034437e-02
## Religion -0.407857574 0.43830375 -1.062471e-01
## Happiness -0.390950256 0.66099994 -4.989565e-02
## Satisfaction -0.566042754 0.16129617 -2.688641e-01
## Religious_Children 0.107208937 0.36364810 -8.948284e-01
## Organized_Religion 0.242968259 0.26035625 2.040710e-01
## Cultural_Activities 0.033679815 -0.02235911 1.560827e-01
## Political_Party_Affiliation 0.531428093 -0.13757745 3.602027e-02
## Trust_People 0.468302135 -0.80836293 6.578867e-01
## Protect_Environment_vs_Economy 0.175656861 -0.24348567 -1.411751e-01
## Trust_Family 0.092707819 -0.04084287 -7.811029e-01
## Trust_Religion -0.307274994 -1.20041579 -3.458603e-01
## Materialism 1.347876682 -1.10491319 3.547020e-01
## Future_Change_Less_Work 0.545217952 0.27788567 1.439140e-01
## Income_Inequality 0.110332293 -0.04717490 4.802153e-02
## Business_Ownership 0.077852348 -0.10738735 3.040565e-01
## Competition 0.230372736 -0.45506770 -1.412653e-01
## Confidence_Church -0.496736667 0.47606084 -1.868973e-01
## Democracy_Religion 0.448985622 0.75985722 9.594209e-01
## Deonomnation -0.340259200 0.77190764 4.714175e-01
## Religious_Service_Attendance -0.340259200 0.77190764 4.714175e-01
## Pray -0.138148608 0.37869765 -6.713823e-01
## Pray_Outside_Service -0.466009950 0.32966843 4.906652e-01
## Religious -0.544888018 0.62769716 -3.825068e-01
## Belivie_in_God -0.007971578 -0.16143657 1.913157e-01
## Belive_in_Afterlife -0.475084175 -0.02640041 -1.547521e-01
## Belive_in_Hell -0.477843283 -0.01142470 -4.286249e-01
## Believe_in_Heaven -0.443907570 -0.09933958 -2.667951e-01
## Job_Priority_to_Immigrants 0.649629349 -0.19351457 1.194014e+00
## Impact_of_Immigration 0.015651682 -0.85136961 -1.566963e+00
## language_category -0.019221362 -0.97694240 -4.684320e-01
## pts -0.281309970 0.33623449 1.311753e-01
## place -0.221509963 0.18337553 -7.896863e-02
## Comp.25 Comp.26 Comp.27
## year 0.316950292 0.25902686 0.78543981
## Family -0.038558674 -0.02107459 0.02106832
## Friends -0.068768138 0.17348077 0.17459261
## Leisure -0.010121991 -0.19149053 0.03935693
## Politics 0.016729891 0.27815517 0.09702793
## Work 0.089880006 -0.02971067 -0.32148857
## Religion -0.198174901 0.21452270 0.08329326
## Happiness -0.051868834 -0.16073949 -0.35660613
## Satisfaction -0.319190622 -0.25856738 -0.22815443
## Religious_Children 1.342968761 -1.98309390 -0.17088690
## Organized_Religion -0.600747129 0.73903793 0.46787889
## Cultural_Activities 0.088535841 -0.24037805 -0.15524124
## Political_Party_Affiliation -0.050846425 0.08843578 -0.07430415
## Trust_People -1.807200263 -1.34938340 -0.02946637
## Protect_Environment_vs_Economy -0.133019467 0.13935153 -0.16733163
## Trust_Family 0.012718651 0.03227329 0.21504349
## Trust_Religion 1.387600825 1.04655590 -0.82835200
## Materialism 0.485031056 -0.23111863 -0.23518166
## Future_Change_Less_Work 0.152427338 0.48951854 -0.14086963
## Income_Inequality 0.190540278 -0.04571790 -0.06323719
## Business_Ownership 0.234894542 0.04197558 0.03373426
## Competition -0.285090848 0.01498925 0.12812691
## Confidence_Church -0.389717281 -0.15272168 -0.20159509
## Democracy_Religion 0.021819772 0.40501598 -0.24382566
## Deonomnation 0.142661414 -0.05373104 -0.48220577
## Religious_Service_Attendance 0.142661414 -0.05373104 -0.48220577
## Pray -0.533619911 0.14428058 0.28547406
## Pray_Outside_Service 0.265053018 -0.04576509 -0.24356837
## Religious 0.261338861 -0.15119637 0.51847062
## Belivie_in_God -0.505114887 0.21064064 -0.47415805
## Belive_in_Afterlife -0.216385002 0.36096797 0.39459972
## Belive_in_Hell 0.170997046 0.03725063 0.42629481
## Believe_in_Heaven -0.072058582 0.23549285 0.46402689
## Job_Priority_to_Immigrants 0.554425365 -0.22236876 0.21746152
## Impact_of_Immigration -0.859884717 0.33392706 -0.93757494
## language_category 0.409573321 -0.21239891 1.45776559
## pts 0.006760995 -0.18731046 -0.51862105
## place -0.153201012 0.34559889 0.54521894
## Comp.28 Comp.29 Comp.30
## year 0.766024916 1.02193287 0.1818045247
## Family 0.111539274 0.11495394 0.0586428466
## Friends -0.078412426 -0.02070433 0.0162738078
## Leisure 0.195032326 -0.02907846 0.0238970884
## Politics 0.227172821 0.27776317 -0.0905019315
## Work 0.040011548 -0.19926507 -0.0337925886
## Religion 0.049216878 -0.19964833 0.0998094054
## Happiness -0.900835776 0.69900328 -0.6068798585
## Satisfaction -1.000255734 0.64710723 -0.6599282972
## Religious_Children 0.247155645 0.11039550 0.4182817325
## Organized_Religion 0.399956562 -0.21974891 -0.0567769440
## Cultural_Activities 0.047596439 0.04192372 -0.0501290166
## Political_Party_Affiliation 0.024643294 -0.01775389 -0.0009029612
## Trust_People 0.430255376 -0.07151167 -0.1836861091
## Protect_Environment_vs_Economy 0.067253854 -0.06338621 -0.0644841050
## Trust_Family -0.202311175 0.02501834 -0.0685539079
## Trust_Religion 0.058630714 0.02328740 -0.0233636164
## Materialism 0.028302383 0.06220739 -0.1297776737
## Future_Change_Less_Work 0.380587481 0.10850248 -0.0003548520
## Income_Inequality 0.130708831 -0.03735655 -0.0185335961
## Business_Ownership 0.128631733 -0.01075075 0.0274904012
## Competition -0.270244546 0.01822507 -0.0065831641
## Confidence_Church 0.046394354 -0.52893156 0.7732029762
## Democracy_Religion -0.802715311 -0.91669428 -0.0365912507
## Deonomnation 0.435427819 -0.15877464 -0.1374487889
## Religious_Service_Attendance 0.435427819 -0.15877464 -0.1374487889
## Pray -0.147528142 0.18761775 0.1968561951
## Pray_Outside_Service -0.074996574 0.05823524 0.1282105546
## Religious 0.530053284 -0.21351353 -0.4050468472
## Belivie_in_God -0.843414107 0.44347893 0.7576158576
## Belive_in_Afterlife 0.003283352 0.04294207 0.5736007392
## Belive_in_Hell 0.412535572 -0.45390038 -0.5810848243
## Believe_in_Heaven 0.240003629 -0.23873899 -0.3049080048
## Job_Priority_to_Immigrants -0.243924799 -0.06276380 -0.0490172244
## Impact_of_Immigration 0.457564692 -0.08853263 -0.0393463613
## language_category -1.033074726 -0.57104513 0.0647399693
## pts -0.340103680 -0.29925460 0.0804434115
## place 0.044406401 0.67753396 0.2842712024
## Comp.31 Comp.32 Comp.33
## year 0.171943056 0.3037784356 0.053319825
## Family 0.004439022 0.0355232501 -0.022615411
## Friends -0.006359899 -0.0194085902 -0.019545664
## Leisure -0.009825593 0.0201229245 -0.011596788
## Politics -0.080087739 0.0350704146 -0.081404369
## Work -0.003955989 0.0046980868 -0.024078172
## Religion 0.023028507 -0.2674400147 0.764262141
## Happiness -0.105298694 0.3132949083 0.027694374
## Satisfaction -0.108064467 0.3356780970 0.002073140
## Religious_Children -0.011499753 0.0369815437 0.128958772
## Organized_Religion 0.039973082 0.1226735609 0.031973763
## Cultural_Activities -0.005992582 0.0002717217 -0.002260176
## Political_Party_Affiliation -0.037397400 -0.0208849078 0.007991197
## Trust_People -0.131027599 -0.0712979253 0.023776560
## Protect_Environment_vs_Economy -0.040476383 0.0342760451 -0.003005478
## Trust_Family -0.006389729 -0.0007404710 0.017317715
## Trust_Religion -0.072903805 0.0553337974 0.002935538
## Materialism 0.044996678 -0.0087697590 -0.013706748
## Future_Change_Less_Work -0.012199053 0.0356506716 0.021560315
## Income_Inequality -0.022422357 0.0378628890 -0.006864318
## Business_Ownership -0.015384838 0.0256822036 0.003841368
## Competition 0.013415782 0.0483705442 0.016412344
## Confidence_Church 0.397149148 0.9136847427 -0.199740901
## Democracy_Religion -0.115091244 0.1217619091 0.013537388
## Deonomnation -0.110606934 -0.0315774529 0.021904925
## Religious_Service_Attendance -0.110606934 -0.0315774529 0.021904925
## Pray 0.100215578 -0.3358847447 -0.006848585
## Pray_Outside_Service 0.089176359 -0.5072416173 -0.051755727
## Religious -0.304641684 -0.3920170958 -0.511099483
## Belivie_in_God 0.499976167 -0.4276655821 -0.160859643
## Belive_in_Afterlife -1.167042827 0.1269364116 0.025227913
## Belive_in_Hell 0.577951518 0.0541346694 0.046020109
## Believe_in_Heaven 0.382456788 -0.0029955957 0.014574032
## Job_Priority_to_Immigrants 0.030242800 0.0485926088 0.007019340
## Impact_of_Immigration -0.068792654 0.0290055083 0.008026989
## language_category -0.047632790 -0.0276534711 0.010314992
## pts 0.028035189 -0.4066965399 -0.107341216
## place 0.190701275 -0.1875337233 -0.047924989
## Comp.34 Comp.35
## year 0.0281091717 0.025197552
## Family -0.0100992471 0.002241341
## Friends -0.0083532190 -0.010629342
## Leisure -0.0102997345 -0.004324040
## Politics -0.0365663481 -0.018730530
## Work 0.0013371476 0.011279216
## Religion 0.2143601701 0.100320683
## Happiness 0.0147473200 -0.006015231
## Satisfaction 0.0099633022 0.017353030
## Religious_Children 0.0362574216 -0.011310107
## Organized_Religion -0.0050409323 0.039981493
## Cultural_Activities 0.0014700044 -0.002742558
## Political_Party_Affiliation -0.0003920068 -0.010693623
## Trust_People 0.0020134560 -0.002261627
## Protect_Environment_vs_Economy 0.0011114971 0.009152273
## Trust_Family 0.0001354260 0.001234551
## Trust_Religion 0.0179922047 -0.004517645
## Materialism 0.0071993781 0.005806982
## Future_Change_Less_Work 0.0098522275 0.003025221
## Income_Inequality 0.0007658728 0.003499097
## Business_Ownership -0.0055749331 0.003453457
## Competition 0.0047571597 0.006733967
## Confidence_Church -0.0133169757 -0.020629482
## Democracy_Religion 0.0157281698 -0.011473200
## Deonomnation 0.0020058151 -0.012560729
## Religious_Service_Attendance 0.0020058151 -0.012560729
## Pray -0.2344259011 -0.113514809
## Pray_Outside_Service -0.2884388139 -0.138540187
## Religious 0.3163031437 0.151700639
## Belivie_in_God 0.2989034927 0.233020851
## Belive_in_Afterlife -0.0585540891 0.058693207
## Belive_in_Hell -0.2682582653 0.354810161
## Believe_in_Heaven 0.3003262051 -0.454740246
## Job_Priority_to_Immigrants 0.0178507267 0.007594477
## Impact_of_Immigration -0.0084899764 -0.017776085
## language_category -0.0258532323 -0.014153838
## pts -0.1704557464 -0.093638776
## place -0.1590757067 -0.074285414
# Don't want the column to be correlated with itself for obvious reasons
diag(cor_matrix) <- NA
# Flatten the upper triangle of the correlation matrix (excluding diagonal)
upper_triangle <- cor_matrix[upper.tri(cor_matrix, diag = FALSE)]
# Find the indices of the top 10 correlations
top_indices <- order(upper_triangle, decreasing = TRUE)[1:10]
# Extract the top 10 most strongly correlated pairs
top_correlated_pairs <- data.frame(
variable1 = rownames(cor_matrix)[row(cor_matrix)[upper.tri(cor_matrix, diag = FALSE)][top_indices]],
variable2 = colnames(cor_matrix)[col(cor_matrix)[upper.tri(cor_matrix, diag = FALSE)][top_indices]],
correlation = upper_triangle[top_indices]
)
# Denomination & Religious_Service_Attendance have perfect correlation
# Belive_in_Hell & Believe_in_Heaven have the second strongest correlation at 0.7733818
print(top_correlated_pairs)
## variable1 variable2 correlation
## 1 Deonomnation Religious_Service_Attendance 1.0000000
## 2 Belive_in_Hell Believe_in_Heaven 0.7733818
## 3 Belive_in_Afterlife Believe_in_Heaven 0.6324739
## 4 Religion Religious 0.6251510
## 5 Religion Confidence_Church 0.6119951
## 6 Belive_in_Afterlife Belive_in_Hell 0.5509635
## 7 Belivie_in_God Believe_in_Heaven 0.5120930
## 8 Confidence_Church Religious 0.5007874
## 9 year pts 0.4747207
## 10 Religion Pray_Outside_Service 0.4641389
Standardization:
Doing
\[ X_{\text{standardized}} = \frac{X - \mu}{\sigma} \]
#stand_jointdf = joint_data %>%
#mutate(across(-c(year, country), ~ (scale(.))))
stand_jointdf = joint_data_mean %>% scale()
First, we remove the ‘country’ column from the
stand_jointdf dataset to create a modified version
(stand_jointdf_mod).
K-means clustering is performed on stand_jointdf with
two centers (or clusters) specified, and the process is repeated 25
times (nstart = 25) to ensure stable results.
k_means_results is displayed using the str
function, and key information like the centroids of the clusters
(centers) and the size of each cluster (size)
are printed.
Next, we want to figure out the optimal number of clusters (k value)
for the dataset. We made a function, wss, to calculate the
total within-cluster sum of squares (WSS) for different values of k. The
WSS is a measure of the compactness of the clusters and is used to
assess the quality of clustering. Lower WSS values generally indicate
better clustering. The function is applied to a range of k values from 1
to 10, and the resulting WSS values are plotted against the number of
clusters. This plot helps in identifying the “elbow point,” which
indicates the optimal k value. Based on the bend in the graph, 4 might
be a more optimal choice for the number of clusters.
Then, k-means clustering is performed again on the
stand_jointdf dataset, this time with 4 clusters. The final
clustering result (final) is printed and visualized using
fviz_cluster.
#stand_jointdf_mod <- stand_jointdf %>% select(-c(country))
#stand_jointdf_mod <- stand_jointdf
k_means_results <- kmeans(stand_jointdf, centers = 2, nstart = 25)
str(k_means_results)
## List of 9
## $ cluster : Named int [1:12] 1 2 1 1 2 2 1 1 2 1 ...
## ..- attr(*, "names")= chr [1:12] "Albania" "Austria" "Croatia" "Cyprus" ...
## $ centers : num [1:2, 1:37] -0.678 0.678 0.485 -0.485 0.441 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:2] "1" "2"
## .. ..$ : chr [1:37] "Family" "Friends" "Leisure" "Politics" ...
## $ totss : num 407
## $ withinss : num [1:2] 158.6 96.2
## $ tot.withinss: num 255
## $ betweenss : num 152
## $ size : int [1:2] 6 6
## $ iter : int 1
## $ ifault : int 0
## - attr(*, "class")= chr "kmeans"
print(k_means_results$centers) # Cluster centroids
## Family Friends Leisure Politics Work Religion Happiness
## 1 -0.6783419 0.4853335 0.4411384 0.7205388 -0.5517621 -0.8638409 0.7684754
## 2 0.6783419 -0.4853335 -0.4411384 -0.7205388 0.5517621 0.8638409 -0.7684754
## Satisfaction Religious_Children Organized_Religion Cultural_Activities
## 1 -0.6343732 0.7298506 -0.3474591 -0.5009127
## 2 0.6343732 -0.7298506 0.3474591 0.5009127
## Political_Party_Affiliation Trust_People Protect_Environment_vs_Economy
## 1 0.2226048 0.8102748 0.2578536
## 2 -0.2226048 -0.8102748 -0.2578536
## Trust_Family Trust_Religion Materialism Future_Change_Less_Work
## 1 -0.4024436 0.6810859 -0.7633451 0.7965517
## 2 0.4024436 -0.6810859 0.7633451 -0.7965517
## Income_Inequality Business_Ownership Competition Confidence_Church
## 1 -0.2935565 -0.3751764 -0.4524243 -0.785582
## 2 0.2935565 0.3751764 0.4524243 0.785582
## Democracy_Religion Deonomnation Religious_Service_Attendance Pray
## 1 0.4662486 0.5106538 0.5106538 0.2697225
## 2 -0.4662486 -0.5106538 -0.5106538 -0.2697225
## Pray_Outside_Service Religious Belivie_in_God Belive_in_Afterlife
## 1 -0.4990746 -0.8847768 0.8585003 0.3069426
## 2 0.4990746 0.8847768 -0.8585003 -0.3069426
## Belive_in_Hell Believe_in_Heaven Job_Priority_to_Immigrants
## 1 0.7776405 0.6800779 -0.8632198
## 2 -0.7776405 -0.6800779 0.8632198
## Impact_of_Immigration language_category pts place
## 1 -0.226156 0.3137419 0.02569552 -0.1414246
## 2 0.226156 -0.3137419 -0.02569552 0.1414246
print(k_means_results$size) # Size of each cluster
## [1] 6 6
#Now I will use a function to calculate the optimal k value for this analysis
set.seed(123)
wss <- function(k) {
kmeans(stand_jointdf, k, nstart = 25)$tot.withinss
}
k_values <- 1:10
wss_values <- map_dbl(k_values, wss)
plot(k_values, wss_values,
type="b", pch = 19, frame = FALSE,
xlab="Number of clusters K",
ylab="Total within-clusters sum of squares")
## Based on the "bend" in the graph, 4 may be a more optimal choice for our k
final <- kmeans(stand_jointdf, centers = 4, nstart = 25)
print(final)
## K-means clustering with 4 clusters of sizes 1, 6, 2, 3
##
## Cluster means:
## Family Friends Leisure Politics Work Religion Happiness
## 1 -1.4331419 2.2471822 2.4226562 1.9701819 -1.5517114 -0.5330377 1.4641209
## 2 0.6783419 -0.4853335 -0.4411384 -0.7205388 0.5517621 0.8638409 -0.7684754
## 3 -0.1005815 1.0209682 0.5470994 0.3301241 -0.5211973 -0.3301244 0.4463967
## 4 -0.8119154 -0.4590392 -0.2900081 0.5642675 -0.2388223 -1.3299197 0.7513127
## Satisfaction Religious_Children Organized_Religion Cultural_Activities
## 1 -0.4454065 -0.1261935 -1.1847809 -1.3705191
## 2 0.6343732 -0.7298506 0.3474591 0.5009127
## 3 0.1907027 0.3104482 -0.2150797 -0.4888034
## 4 -1.2474126 1.2948002 -0.1566049 -0.2191168
## Political_Party_Affiliation Trust_People Protect_Environment_vs_Economy
## 1 -0.9652418 1.3437286 0.7305360
## 2 -0.2226048 -0.8102748 -0.2578536
## 3 -0.4102023 0.2974441 -0.9122281
## 4 1.0404250 0.9743440 0.8803473
## Trust_Family Trust_Religion Materialism Future_Change_Less_Work
## 1 -1.4251807 0.4573337 -0.87312368 2.1725935
## 2 0.4024436 -0.6810859 0.76334510 -0.7965517
## 3 0.5241569 -0.2478594 0.05991546 0.4108790
## 4 -0.6792650 1.3749668 -1.27559262 0.5949862
## Income_Inequality Business_Ownership Competition Confidence_Church
## 1 1.29076059 -1.3221952 -2.08048262 -0.39677438
## 2 0.29355649 0.3751764 0.45242430 0.78558201
## 3 -0.04521454 0.3578836 -0.24766704 -0.03554635
## 4 -0.98722350 -0.5482101 -0.04624304 -1.41520833
## Democracy_Religion Deonomnation Religious_Service_Attendance Pray
## 1 0.2808224 2.1490414 2.1490414 -0.5416790
## 2 -0.4662486 -0.5106538 -0.5106538 -0.2697225
## 3 0.3546402 -0.7701735 -0.7701735 -0.5416790
## 4 0.6024629 0.8184094 0.8184094 1.0811241
## Pray_Outside_Service Religious Belivie_in_God Belive_in_Afterlife
## 1 -0.1164480 -0.8528834 1.2609504 -1.8470706
## 2 0.4990746 0.8847768 -0.8585003 -0.3069426
## 3 0.2685401 -0.7509570 0.5256994 1.0779286
## 4 -1.1383599 -0.9846212 0.9462175 0.5109563
## Belive_in_Hell Believe_in_Heaven Job_Priority_to_Immigrants
## 1 -0.4241039 -0.8946086 -1.1406432
## 2 -0.7776405 -0.6800779 0.8632198
## 3 0.9798777 1.1191024 -0.5269143
## 4 1.0433972 0.9122904 -0.9949491
## Impact_of_Immigration language_category pts place
## 1 2.1582902 2.7452420 0.34243312 -0.4820080
## 2 0.2261560 -0.3137419 -0.02569552 0.1414246
## 3 -0.5157146 -0.7843548 0.69170053 -0.7739366
## 4 -0.8279324 0.2353065 -0.52388702 0.3937778
##
## Clustering vector:
## Albania Austria Croatia Cyprus France
## 1 2 3 4 2
## Germany Greece Italy Netherlands Serbia
## 2 4 3 2 4
## Spain United Kingdom
## 2 2
##
## Within cluster sum of squares by cluster:
## [1] 0.00000 96.21666 11.99057 40.41851
## (between_SS / total_SS = 63.5 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
fviz_cluster(final, data = stand_jointdf)
With that, we are left with four clusters, wherein cluster 1 contains observations for Albania; cluster 2 contains observations for Croatia and Italy; cluster 3 contains observations for Serbia, Cyprus, and Greece; Cluster 4 contains observations for Austria, Germany, the Netherlands, Spain, France, and the United Kingdom
joint_data_cluster <- joint_data_mean %>% mutate(Cluster = final$cluster) %>% group_by(Cluster) %>% summarize(pts = mean(pts), place = mean(place))
flextable(joint_data_cluster)
Cluster | pts | place |
|---|---|---|
1 | 184.00000 | 11.00000 |
2 | 148.16392 | 15.27113 |
3 | 218.00000 | 9.00000 |
4 | 99.66667 | 17.00000 |
joint_data_cluster %>% ggplot(aes(Cluster, pts)) + geom_bar(stat = "identity", fill = "blue") + ggtitle("Mean Points Earned by Cluster") + ylab("Points")
Based on this descriptive cluster analysis, it would seem that countries in the second cluster (Croatia and Italy) are more likely to score the highest number of points, and thus, place higher
We successfully merged and analyzed European survey data with Eurovision contest outcomes, offering a interesting perspective on how cultural and social values may influence or reflect in a popular international event. The use of k-means clustering provided valuable insights, segmenting countries into distinct groups based on their sociocultural attributes and Eurovision contest performances.
The analysis highlighted intriguing trends, such as the unique position of countries like Croatia and Italy in securing higher points in Eurovision, possibly linked to their distinct cultural values.
This project lays the groundwork for several interesting avenues of research. One potential area is the exploration of longitudinal changes in cultural values and their impact on Eurovision outcomes over a more extended period. Additionally, integrating individual-level survey responses or detailed Eurovision audience voting patterns, could offer deeper insights.
Another interesting direction could be the application of more sophisticated machine learning techniques, like hierarchical clustering or neural networks, to better capture the nuances of cultural influence.
Another thing we could have done was conduct multiple K-means test after performing PCA dimension reduction and comparing the results.
https://uc-r.github.io/kmeans_clustering#:~:text=clustering%20algorithms%20%26%20visualization-,Data%20Preparation,scaled)%20to%20make%20variables%20comparable.