For this project, we want to do a cluster analysis of universities for students who are interested in entering a creative profession. As a result, students will be able to see the sets of typical characteristics that creative universities have in common, prioritize them and thus select a university from one of the clusters.
As for characteristics, there is a great amount of them, but we have to focus on ones which are important for the future artists. In terms of data, we drop variables with unique values and those which very imbalanced categories. Besides, we exclude all variables which are connected with ranks and overall scores because we do not need summary statistics.
The variables of our choice contain information about features important for art unis since they have to be with good teaching, highly evaluated both in industry and foreign countries and have enough staff for paying more attention to students (here details are very important). Moreover, we take into account whether the link is available and whether a share of women is higher than of men.
First, we want to see the location of creative universities.
As we can see from the map, the great amount of art universities can be located in South America and Australia. Although Eurasia does not seem to contain many of them, there is a country with the highest number of art universities. Let us have a closer look.
The graph shows that China has the greatest number of art universities.
Next, we are going to focus on the major features which can characterize art universities. The first one is an industry income.
The graph is skewed to the right and shows that 50% of universities have income which is less than average, so we should be careful while choosing art university. However, let us look at the universities with the highest industry income (in a case students want to choose all the best for them)
## [1] "Tsinghua University Qinghua University Tsing hua University Qing hua University"
## [2] "LMU Munich Ludwig-Maximilians-Universitat Munchen"
## [3] "Zhejiang University ZJU Zhejiang University China research"
## [4] "Korea Advanced Institute of Science and Technology (KAIST)"
## [5] "Shanghai Jiao Tong University SJTU"
## [6] "University of the Witwatersrand Wits"
## [7] "Makerere University"
## [8] "National Cheng Kung University (NCKU)"
## [9] "Asia University, Taiwan"
By now, you can save this list of top universities to consider entering them. Interesting that the TOP-1 university here is located in China.
The second important feature to consider is a teaching score since everyone wants to be sure in the quality of education.
The graph is skewed to the right and shows that the teaching score for 50% of art universities is lower than in average. Again, we want to know the best universities in this field.
## [1] "University of Oxford"
## [2] "Stanford University"
## [3] "Harvard University"
## [4] "Massachusetts Institute of Technology"
## [5] "University of Cambridge"
## [6] "Yale University"
## [7] "Princeton University"
## [8] "The University of Chicago"
## [9] "Tsinghua University Qinghua University Tsing hua University Qing hua University"
## [10] "Peking University"
## [11] "The University of Tokyo"
The first one is Oxford University. As we can see, the main number of universities which are cool at teaching are located in America.
K-means clustering work only with numeric variables, so here our clusters are based on the scores given to a particular characteristics and some statistics. These are: teaching, industry income, international outlook, number of students, student-staff ratio and share of international students.
The elbow plot shows that we should focus on 4 clusters
Silhouette width suggests that the optimal number of clusters is 4
Gap statistics shows that we should get only 1 cluster.
To conclude, the optimal number of clusters is 4. We will start from the maximum number of clusters (= 4), but can cut them if necessary.
1 cluster: Just Bad. Includes unis with bad estimates for all the characteristics. After graduation: sadness and no prospects for life2 cluster: Cool International. Includes unis with high international scores and a great number of international students. After graduation: a degree recognized in other countries + friends from other countries.3 cluster: Student-friendly: includes universities with many students and a high percentage of teacher-student. Here students can find a bunch of new contacts, receive detailed feedback on work. After graduation: know your strengths and weaknesses, a lot of weak ties.4 cluster: High Quality. Includes universities with high teaching and industry ratings. Universities provide knowledge that is really in demand for work and do it well. After graduation: a high level of knowledge and relevance in the industry after graduation.Cool International and High Quality clusters almost coincide. Therefore, if a university is recognized in other countries, then it has an excellent quality of education.In hierarchical clustering, we can include variables of other type than numeric. So, we add availability of the link and whether the female share is greater than one of males.
From the plot, we can see 5 clusters that the observations form.
Let us have a look at dendrogram to finally define the number of clusters
Based on the dendrogram, the best choice is to focus on 5 clusters. The last one (on the right) seems to be underrepresented, so we should remember to check it.
Overall, here we have 4 overlapping clusters and the last one far away from others and containing only one observation.
Next, let find out more about clusters we get.
Cluster 1: teaching + international: contains universities with the highest teaching scores and pretty high international scoresCluster 2: international: contains universities with the higher international scores and the greatest number of foreign studentsCluster 3: industry income: contains universities with the higher scores on industry incomeCluster 4: popular, but bad: contains universities with higher than for others number of students and staff, but lower quality of teaching and reputationCluster 5: extremely student-friendly university: lots of students and staff and the rest indicators are the worst among other clustersFinally, let us vizualize the final cluster solution!
As a result, art universities can be divided into 4 clusters:
The first two clusters are usually plotted inside each other, so the universities inside them are probably the best ones for the future artists.
As for the best clustering method for our case, it appears to be K-means clustering. Although hierarchical clustering benefits from using variables of different types (not numeric only) and provides stable results, clusters here are not so meaningful as in the case with K-means. Besides, it contains a cluster with only 1 observation no matter how many clusters we choose from 3 to 6. Therefore, for cluster analysis for art universities we will prefer using K-means.