In this piece of work I am going to do some web data scrapping. I will be working with the World University Rankings table presented in Wikipedia. The plan is to have a brief look on the changes happened with the positions of world’s top universities in the last decade.

First of all, I am getting an object with the all data presented on the page concerning the World University Ranking (it is the following page:https://en.wikipedia.org/wiki/Times_Higher_Education_World_University_Rankings).

Then, the table objects are extracted

require(dplyr)
library(httr)
library(XML)

urs <- GET("https://en.wikipedia.org/wiki/Times_Higher_Education_World_University_Rankings")

doc <- readHTMLTable(
  doc = content(urs, "text"))
str(doc)
## List of 7
##  $ Times Higher Education World University Rankings                               :'data.frame': 9 obs. of  2 variables:
##   ..$ V1: Factor w/ 9 levels "","Categories",..: 1 4 2 6 8 5 3 7 9
##   ..$ V2: Factor w/ 8 levels "2010","Annual",..: NA 5 4 2 6 1 7 3 8
##  $ NULL                                                                           :'data.frame': 6 obs. of  3 variables:
##   ..$ V1: Factor w/ 6 levels "Citations – research influence",..: 4 2 3 6 5 1
##   ..$ V2: Factor w/ 6 levels "Citation impact (normalised average citation per paper)",..: 2 6 3 5 4 1
##   ..$ V3: Factor w/ 6 levels "15%\n6%\n4.5%\n2.25%\n2.25%",..: 6 3 4 1 2 5
##  $ Times Higher Education World University Rankings—Top 50[Note 1]                :'data.frame': 51 obs. of  12 variables:
##   ..$ V1 : Factor w/ 51 levels "California Institute of Technology",..: 11 43 26 8 1 18 36 33 51 25 ...
##   ..$ V2 : Factor w/ 45 levels "-","1","10","101",..: 15 38 32 2 13 23 38 42 3 36 ...
##   ..$ V3 : Factor w/ 49 levels "-","1","10","11",..: 16 32 14 14 2 44 42 3 4 40 ...
##   ..$ V4 : Factor w/ 50 levels "-","1","10","105",..: 16 14 25 34 2 41 46 50 5 45 ...
##   ..$ V5 : Factor w/ 49 levels "-","1","10","11",..: 15 13 34 13 2 39 45 47 4 42 ...
##   ..$ V6 : Factor w/ 47 levels "-","1","11","12",..: 14 24 31 12 2 41 39 45 46 43 ...
##   ..$ V7 : Factor w/ 51 levels "-","1","10","11",..: 15 13 25 47 2 43 36 6 5 49 ...
##   ..$ V8 : Factor w/ 44 levels "-","1","10","12",..: 14 2 23 41 12 37 30 3 4 42 ...
##   ..$ V9 : Factor w/ 43 levels "1","10","12",..: 13 1 19 39 19 36 11 9 3 40 ...
##   ..$ V10: Factor w/ 48 levels "1","10","11",..: 13 1 23 45 41 33 11 6 47 46 ...
##   ..$ V11: Factor w/ 43 levels "1","10","11",..: 13 1 30 41 11 37 22 5 42 40 ...
##   ..$ V12: Factor w/ 49 levels "1","10","11",..: 14 1 12 23 33 44 46 47 48 49 ...
##  $ Times Higher Education World Reputation Rankings—Top 25[Note 1]                :'data.frame': 26 obs. of  10 variables:
##   ..$ V1 : Factor w/ 26 levels "California Institute of Technology",..: 7 5 10 14 20 23 18 13 26 19 ...
##   ..$ V2 : Factor w/ 26 levels "1","10","11",..: 12 1 11 22 17 23 20 24 26 4 ...
##   ..$ V3 : Factor w/ 25 levels "1","10","11",..: 12 1 11 20 17 22 21 23 2 25 ...
##   ..$ V4 : Factor w/ 24 levels "1","10","11",..: 13 1 11 21 16 18 20 22 2 23 ...
##   ..$ V5 : Factor w/ 26 levels "1","10","11",..: 13 1 11 18 20 22 23 24 25 2 ...
##   ..$ V6 : Factor w/ 26 levels "1","10","11",..: 14 1 21 22 12 19 23 24 25 5 ...
##   ..$ V7 : Factor w/ 26 levels "1","10","11",..: 14 1 12 20 21 22 23 24 25 5 ...
##   ..$ V8 : Factor w/ 25 levels "1","10","11",..: 14 1 12 20 21 21 22 23 24 5 ...
##   ..$ V9 : Factor w/ 23 levels "1","11","12",..: 12 1 10 17 18 19 20 21 22 23 ...
##   ..$ V10: Factor w/ 24 levels "1","10","11",..: 13 1 11 18 19 20 21 22 23 24 ...
##  $ Times Higher Education Asia University Rankings as shown below – Top 20[Note 1]:'data.frame': 21 obs. of  9 variables:
##   ..$ V1: Factor w/ 21 levels "Chinese University of Hong Kong",..: 5 17 12 11 18 4 10 20 1 14 ...
##   ..$ V2: Factor w/ 21 levels "1","10","11",..: 7 18 14 6 12 21 3 1 4 20 ...
##   ..$ V3: Factor w/ 21 levels "1","10","11",..: 6 18 17 5 12 21 3 1 4 14 ...
##   ..$ V4: Factor w/ 20 levels "1","10","11",..: 7 16 14 6 11 18 2 1 4 17 ...
##   ..$ V5: Factor w/ 20 levels "1","10","11",..: 11 16 10 1 15 17 10 18 5 20 ...
##   ..$ V6: Factor w/ 21 levels "1","10","11",..: 13 15 11 1 17 18 16 19 3 21 ...
##   ..$ V7: Factor w/ 20 levels "1","10","11",..: 13 11 15 1 16 17 17 19 18 20 ...
##   ..$ V8: Factor w/ 21 levels "1","10","11",..: 13 1 17 12 16 15 18 20 19 21 ...
##   ..$ V9: Factor w/ 19 levels "=10","=13","1",..: 12 3 10 13 14 15 16 17 18 19 ...
##  $ Times Higher Education BRICS & Emerging Economies Rankings – Top 20[Note 1]    :'data.frame': 21 obs. of  8 variables:
##   ..$ V1: Factor w/ 21 levels "Alfaisal University",..: 6 15 13 21 18 9 14 2 12 11 ...
##   ..$ V2: Factor w/ 16 levels "-","1","10","11",..: 8 7 2 9 14 3 10 16 12 6 ...
##   ..$ V3: Factor w/ 18 levels "-","1","10","11",..: 8 7 2 9 4 14 6 18 15 10 ...
##   ..$ V4: Factor w/ 17 levels "-","1","14","16",..: 7 6 2 15 14 9 14 5 12 3 ...
##   ..$ V5: Factor w/ 20 levels "-","1","10","11",..: 9 8 2 20 16 11 18 17 3 4 ...
##   ..$ V6: Factor w/ 20 levels "-","1","10","11",..: 11 10 2 17 16 12 18 14 3 19 ...
##   ..$ V7: Factor w/ 20 levels "-","1","10","11",..: 11 2 10 14 15 16 19 17 3 18 ...
##   ..$ V8: Factor w/ 21 levels "1","10","11",..: 14 1 12 15 16 17 18 19 20 21 ...
##  $ NULL                                                                           :'data.frame': 6 obs. of  2 variables:
##   ..$ V1: Factor w/ 6 levels "Category","Criticism",..: 6 3 5 4 2 1
##   ..$ V2: Factor w/ 4 levels "Academic Ranking of World Universities\nCWTS Leiden Ranking\nGlobal University Ranking\nMines ParisTech: Profes"| __truncated__,..: NA 1 3 2 4 NA

I do not need all of the table objects, only the one with top-50 universities which have been given a position in the ranking for the last decade.

UNIRANK <- doc[[3]]
names(UNIRANK) <- c('Institution', '2010–11',   '2011–12', '2012–13',   '2013–14', '2014–15',   '2015–16', '2016–17',   '2017-18',  '2018–19',  '2019–20',  '2020–21')
UNIRANK <- UNIRANK[-1,]

rownames(UNIRANK) <- NULL
head(UNIRANK)
##                             Institution 2010–11 2011–12 2012–13 2013–14 2014–15
## 1                  University of Oxford       6       4       2       2       3
## 2                   Stanford University       4       2       3       4       4
## 3                    Harvard University       1       2       4       2       2
## 4    California Institute of Technology       2       1       1       1       1
## 5 Massachusetts Institute of Technology       3       7       5       5       6
## 6               University of Cambridge       6       6       7       7       5
##   2015–16 2016–17 2017-18 2018–19 2019–20 2020–21
## 1       2       1       1       1       1       1
## 2       3       3       3       3       4       2
## 3       6       6       6       6       7       3
## 4       1       2       3       5       2       4
## 5       5       5       5       4       5       5
## 6       4       4       2       2       3       6

So, here is how the table with rankings looks. 50 rows, universities, in total.

I am not going to need the full table, as 50 records would be a bit harder to clearly visualize (in fact, almost impossible). That’s why I will cut the table to only 10 universities sorted by their position in 2020-21 year in ascending order.

UNIRANK_10 <- UNIRANK %>% head(10)

Still, I am going to save the full table in .csv format in case I need it later in my life.

readr::write_csv(UNIRANK, "UNIRANK.csv")

Now, the vizualization I mentioned is going to be a slopegraph, to represent how a position of top-10 universities has changed during the last 10 years.

To create this graph I am going to need some additional data transformations (i.e., turning the wide format of my data to long format).

URS_long <- data.table::melt(data.table::setDT(UNIRANK_10), measure.vars = 2:12, variable.name = "Year")
head(URS_long)
##                              Institution    Year value
## 1:                  University of Oxford 2010–11     6
## 2:                   Stanford University 2010–11     4
## 3:                    Harvard University 2010–11     1
## 4:    California Institute of Technology 2010–11     2
## 5: Massachusetts Institute of Technology 2010–11     3
## 6:               University of Cambridge 2010–11     6

You can see how the table looks now on the above. Now, the plot can be constructed.

require(ggplot2)

long11_21 <- URS_long %>% filter((Year == "2010–11") | (Year == "2020–21"))

long11_21$value <- as.numeric(long11_21$value)

library(viridis)
ggplot(long11_21, aes(x = Year, y = value, group = Institution)) +
  geom_line(aes(color = Institution), size = 1.2) +
  ggtitle("Change in the Rating Position, year 2010-11 to 2020-21") + 
  labs(subtitle = "World University Ranking") +
  ylab("Rank Position") +
  scale_y_reverse( lim=c(15,0)) +
  geom_point(aes(color = Institution), size = 4) +
  scale_color_viridis(discrete = T, option = "C") +
  theme_bw() +
  geom_text(data = long11_21 %>% filter(Year == "2010–11"), 
            aes(label = paste0(Institution, ',', ' ', "rank:", value)) , 
            hjust = 1.35, 
            fontface = "bold", 
            size = 3.5,
position=position_jitter(width=ifelse(long11_21$Institution=='University of Oxford',0.1,0),
  height=ifelse(long11_21$Institution=='University of Oxford',0.6,0))
 ) +
  geom_text(data = long11_21 %>% filter(Year == "2020–21"), 
            aes(label = paste0(Institution, ',', ' ', "rank:", value)) , 
            hjust = -.35, 
            fontface = "bold", 
            size = 3.5) +
  theme(legend.position = "none") +
   theme(axis.title.y     = element_blank()) +
  theme(axis.text.y      = element_blank()) +
  theme(panel.grid.major.y = element_blank()) +
  theme(panel.grid.minor.y = element_blank()) +
  theme(panel.border     = element_blank()) 

A better view:

From the plot we can see that exactly the half of universities experienced a takeoff in rating position. The steepest increase is observed for the Oxford University that takes the first place in the ranking by 2020-21 year.

Four universities out of this top-10 had a decrease in their ranking positions. The sharpest decrease is observed for Princeton University, it went from the 5th position in 2010-11 to the 9th position in 2020-21. At the same time, one of the universities present, the University of Cambridge, remained on the same position through the last decade.

That’s basically all for this little piece. Thanks for bearing with me.