The core purpose of our data visualisation is to create a comparison of the score for importance of factors of the library survey data set obtained from the Singapore Management University’s 2018 Survey Analysis, and how much importance is given by the library users on the different factors considered. For this, we create a heatmap with the proportion of respondents who provide a specific rating as opposed to the ones who do not. This will enable us to understand the areas of the library which require attention based on the higher proportion of respondents.
This section describes the step by step process for creating the data visualizations. We explain about the Intallation of R packages, the Data Preparation followed by Creating Visualizations and incorporating the element of interactivity.
This code chunk installs the’tidyverse’,‘aggregation’,‘plotly’,‘reshape2’,‘dendextend’,‘ggalt’,‘ggridges’ packages on the user machine without having to install each one of them individually. It also loads the packages for us after having installed them into the R studio environment for immediate use.
#Code for checking if the packages are installed or not
packages <-c ('tidyverse','ggridges','aggregation','plotly','heatmaply','reshape2','dendextend','ggalt','knitr')
for (p in packages){
if(!require(p, character.only=T)){
install.packages(p)
}
library(p,character.only = T)
}
library(kableExtra)
The data source for this dataviz is loaded using the read_csv function. The data as mentioned earlier consists of 26 questions whose survey results are documented along with other factors. We are mainly focusing on the survey results aspect.
lib_data<- read_csv("Data/Raw Data LSR.csv")
## Parsed with column specification:
## cols(
## .default = col_double(),
## Comment1 = col_character()
## )
## See spec(...) for full column specifications.
For our analyis, we incorporate the rating of the respondents from both libraries (Li Ka Shing Library as well as Kwa Geok Choo Law Library). However, we do filter out Postgrads, Faculty, Staff & Other respondents and choose only Undergraduate Students for our visualization.
The questions/factors which are being assessed in the survey are labelled from 1 to 26 (I01-I26 for Importance and P01-P26 for Performance), with question P27 being one to assess users’ overall satisfaction with the library. Since, we are focusing on the ‘Importance’ aspect, we choose only factors (I01-I26) and remove/filter out all the other factors from our dataset.
lib_data_stud <- filter(lib_data, Campus == '1' | Campus == '2')
lib_data_stud <- filter(lib_data_stud, StudyArea == '1' | StudyArea == '2'| StudyArea == '3'| StudyArea == '4'| StudyArea == '5'| StudyArea == '6'| StudyArea == '7')
lib_data_stud <- filter(lib_data_stud, Position == '1' | Position == '2'| Position == '3'| Position == '4' | Position == '5')
lib_data_stud$Comment1 <- NULL
lib_data_stud$HowOftenL <- NULL
lib_data_stud$HowOftenC <- NULL
lib_data_stud$HowOftenW <- NULL
lib_data_stud$Campus <- NULL
lib_data_stud$Position <- NULL
lib_data_stud$ID <- NULL
lib_data_stud$NPS1 <- NULL
stu_data_hm<-lib_data_stud[c(1:28)]
stu_data_hm$StudyArea[stu_data_hm$StudyArea== 1] <- "Accountancy"
stu_data_hm$StudyArea[stu_data_hm$StudyArea== 2] <- "Business"
stu_data_hm$StudyArea[stu_data_hm$StudyArea== 3] <- "Economics"
stu_data_hm$StudyArea[stu_data_hm$StudyArea== 4] <- "Information Systems"
stu_data_hm$StudyArea[stu_data_hm$StudyArea== 5] <- "Law"
stu_data_hm$StudyArea[stu_data_hm$StudyArea== 6] <- "Social Sciences"
stu_data_hm$StudyArea[stu_data_hm$StudyArea== 7] <- "Others"
kable(stu_data_hm[1:6,1:10]) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| ResponseID | StudyArea | I01 | I02 | I03 | I04 | I05 | I06 | I07 | I08 |
|---|---|---|---|---|---|---|---|---|---|
| 570 | Business | 6 | 5 | 4 | NA | NA | NA | NA | NA |
| 820 | Business | NA | NA | NA | NA | NA | NA | NA | NA |
| 1232 | Accountancy | NA | NA | NA | NA | NA | NA | NA | NA |
| 2571 | Information Systems | 7 | NA | 6 | NA | 7 | 7 | NA | NA |
| 541 | Accountancy | 6 | 6 | 4 | NA | NA | 7 | NA | NA |
| 1576 | Business | 5 | 6 | 5 | 6 | 5 | 6 | NA | NA |
Since we are only focusing on the respondents who have given a rating of 7 as compared to all the other ratings, we convert the data points having rating of 7 to 1 and every other rating values to 0. This way, we get a dataframe with values of only 0’s and 1’s.
d<-na.omit(stu_data_hm)
d2<-d[c(2:28)]
d3 <- d2
d3[ ,c(2:27)][d3[ ,c(2:27)] != 7] <- 0
d3[ ,c(2:27)][d3[ ,c(2:27)] == 7] <- 1
kable(d3[1:6,1:10]) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| StudyArea | I01 | I02 | I03 | I04 | I05 | I06 | I07 | I08 | I09 |
|---|---|---|---|---|---|---|---|---|---|
| Social Sciences | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Economics | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| Information Systems | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 |
| Social Sciences | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 |
| Business | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| Business | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
Now, we use aggregation by summing the rows of the columns and grouping them into their individual study areas. We store this value into df3.
Similarly, we take the length of the number of units/responses for each question and group them into their individual study areas.
We divde the 2 dataframes and multiply each row value by 100 to obtain the value in percentage.
df3<-aggregate(d3[, 2:27], list(d3$StudyArea), sum)
df4<-aggregate(d2[, 2:27], list(d2$StudyArea), length)
d1<-df3[,c(2:27)]/df4[, c(2:27)] * 100
kable(d1[1:6,1:10]) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| I01 | I02 | I03 | I04 | I05 | I06 | I07 | I08 | I09 | I10 |
|---|---|---|---|---|---|---|---|---|---|
| 19.66292 | 25.84270 | 32.02247 | 14.04494 | 20.22472 | 44.94382 | 24.15730 | 28.65169 | 19.66292 | 23.59551 |
| 15.70796 | 28.98230 | 32.30089 | 17.69911 | 19.24779 | 48.89381 | 21.46018 | 27.65487 | 23.23009 | 28.09735 |
| 19.35484 | 27.95699 | 25.80645 | 17.20430 | 16.12903 | 49.46237 | 26.88172 | 36.55914 | 21.50538 | 31.18280 |
| 19.60784 | 25.49020 | 28.75817 | 13.07190 | 18.30065 | 49.01961 | 21.56863 | 24.18301 | 20.26144 | 24.83660 |
| 26.85185 | 45.37037 | 32.40741 | 21.29630 | 30.55556 | 51.85185 | 44.44444 | 46.29630 | 36.11111 | 42.59259 |
| 14.65517 | 31.03448 | 25.86207 | 12.06897 | 24.13793 | 50.86207 | 26.72414 | 28.44828 | 22.41379 | 32.75862 |
We add the row to our dataframe in order to understand the proportion of ratings split across the various study areas.
d1$StudyArea <- df3$Group.1
row.names(d1) <- d1$StudyArea
kable(d1[1:6,1:10]) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| I01 | I02 | I03 | I04 | I05 | I06 | I07 | I08 | I09 | I10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| Accountancy | 19.66292 | 25.84270 | 32.02247 | 14.04494 | 20.22472 | 44.94382 | 24.15730 | 28.65169 | 19.66292 | 23.59551 |
| Business | 15.70796 | 28.98230 | 32.30089 | 17.69911 | 19.24779 | 48.89381 | 21.46018 | 27.65487 | 23.23009 | 28.09735 |
| Economics | 19.35484 | 27.95699 | 25.80645 | 17.20430 | 16.12903 | 49.46237 | 26.88172 | 36.55914 | 21.50538 | 31.18280 |
| Information Systems | 19.60784 | 25.49020 | 28.75817 | 13.07190 | 18.30065 | 49.01961 | 21.56863 | 24.18301 | 20.26144 | 24.83660 |
| Law | 26.85185 | 45.37037 | 32.40741 | 21.29630 | 30.55556 | 51.85185 | 44.44444 | 46.29630 | 36.11111 | 42.59259 |
| Social Sciences | 14.65517 | 31.03448 | 25.86207 | 12.06897 | 24.13793 | 50.86207 | 26.72414 | 28.44828 | 22.41379 | 32.75862 |
We rename the importance factor code numbers (I01 - I26) with the actual textual information of what they represent, as it is more convenient and clear to the user to understand the question he/she is examining at first glance.
colnames(d1) <- c("Informed about Services",
"Provide Useful Info",
"Library signage is clear",
"Workshops Assist in Learning/Research",
"Anticipates Learning/Research needs",
"Opening hours meets needs",
"Requested Books/Articles delivered",
"Self Service meets needs",
"Online enquiry meets needs",
"F2F enquiry meets needs",
"Library Shelves have items I need",
"Library Staff give accurate answers",
"Library staff are helpful",
"Can find quiet place to study",
"Can find place to work as groups",
"Computer is available",
"Laptop Facilities provided",
"Wireless Access Provided",
"Print/Scan Facilities Provided",
"Info/Resources meet my needs",
"Online Resources are Useful",
"Course specific Resources present",
"Access Library away from Campus",
"Library Search Engine is useful",
"Access to Library made me succesful",
"Mobile devices useful to access content","StudyArea")
kable(d1[1:6,1:10]) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| Informed about Services | Provide Useful Info | Library signage is clear | Workshops Assist in Learning/Research | Anticipates Learning/Research needs | Opening hours meets needs | Requested Books/Articles delivered | Self Service meets needs | Online enquiry meets needs | F2F enquiry meets needs | |
|---|---|---|---|---|---|---|---|---|---|---|
| Accountancy | 19.66292 | 25.84270 | 32.02247 | 14.04494 | 20.22472 | 44.94382 | 24.15730 | 28.65169 | 19.66292 | 23.59551 |
| Business | 15.70796 | 28.98230 | 32.30089 | 17.69911 | 19.24779 | 48.89381 | 21.46018 | 27.65487 | 23.23009 | 28.09735 |
| Economics | 19.35484 | 27.95699 | 25.80645 | 17.20430 | 16.12903 | 49.46237 | 26.88172 | 36.55914 | 21.50538 | 31.18280 |
| Information Systems | 19.60784 | 25.49020 | 28.75817 | 13.07190 | 18.30065 | 49.01961 | 21.56863 | 24.18301 | 20.26144 | 24.83660 |
| Law | 26.85185 | 45.37037 | 32.40741 | 21.29630 | 30.55556 | 51.85185 | 44.44444 | 46.29630 | 36.11111 | 42.59259 |
| Social Sciences | 14.65517 | 31.03448 | 25.86207 | 12.06897 | 24.13793 | 50.86207 | 26.72414 | 28.44828 | 22.41379 | 32.75862 |
Following is the code chunk for plotting our interactive HeatMap. Since we have brought interactivity, we can hover over each of the blocks to indentify the percentage of respondents from each study area who have given a rating of 5 for that particular question. This will help us identify which sections of the library are performing well as opposed to the ones which are not doing very well.
For our case analysis, since we have chosen only Undergraduate students as our target, we can notice that most undergrads from all study areas have given a high rating of 7 for factors such as - having a quite place to study, wireless access, print/scan facilities and place to work as groups when needed. This potentially lets us know that these factors are the most sought after in a library across all study areas.
#Code for ploting the interactive heatmap
d1$StudyArea<- NULL
wh_matrix <- data.matrix((d1))
heatmaply(t(wh_matrix),
Rowv=NA, Colv=NA,
seriate = "none",
colors = Greens,
fontsize_row = 10,
fontsize_col = 10,
grid_color = "white",
grid_lw=0.3,
branches_lwd = 0.6,
label_names = c("Question", "Study Area","Percentage"),
grid_size = 1,
xlab = "Study Area",
ylab = "Questions",
main = "Ratings by Questions and Area of Study: Importance")