1.0 Overview

The core purpose of our data visualisation is to create a comparison of the score for importance of factors of the library survey data set obtained from the Singapore Management University’s 2018 Survey Analysis, and how much importance is given by the library users on the different factors considered. For this, we create a heatmap with the proportion of respondents who provide a specific rating as opposed to the ones who do not. This will enable us to understand the areas of the library which require attention based on the higher proportion of respondents.

2.0 Step-By-Step Creation of Data Visualization

This section describes the step by step process for creating the data visualizations. We explain about the Intallation of R packages, the Data Preparation followed by Creating Visualizations and incorporating the element of interactivity.

2.0.1 Installation of R Packages

This code chunk installs the’tidyverse’,‘aggregation’,‘plotly’,‘reshape2’,‘dendextend’,‘ggalt’,‘ggridges’ packages on the user machine without having to install each one of them individually. It also loads the packages for us after having installed them into the R studio environment for immediate use.

#Code for checking if the packages are installed or not
packages <-c ('tidyverse','ggridges','aggregation','plotly','heatmaply','reshape2','dendextend','ggalt','knitr')
for (p in packages){
  if(!require(p, character.only=T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

library(kableExtra)

2.0.2 Loading of Data Set

The data source for this dataviz is loaded using the read_csv function. The data as mentioned earlier consists of 26 questions whose survey results are documented along with other factors. We are mainly focusing on the survey results aspect.

lib_data<- read_csv("Data/Raw Data LSR.csv")
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   Comment1 = col_character()
## )
## See spec(...) for full column specifications.

2.0.3 Data Preparation

For our analyis, we incorporate the rating of the respondents from both libraries (Li Ka Shing Library as well as Kwa Geok Choo Law Library). However, we do filter out Postgrads, Faculty, Staff & Other respondents and choose only Undergraduate Students for our visualization.

The questions/factors which are being assessed in the survey are labelled from 1 to 26 (I01-I26 for Importance and P01-P26 for Performance), with question P27 being one to assess users’ overall satisfaction with the library. Since, we are focusing on the ‘Importance’ aspect, we choose only factors (I01-I26) and remove/filter out all the other factors from our dataset.

      lib_data_stud <- filter(lib_data, Campus == '1' | Campus == '2')
      
      lib_data_stud <- filter(lib_data_stud, StudyArea == '1' | StudyArea == '2'| StudyArea == '3'| StudyArea == '4'| StudyArea == '5'| StudyArea == '6'| StudyArea == '7')
      
      lib_data_stud <- filter(lib_data_stud, Position == '1' | Position == '2'| Position == '3'| Position == '4' | Position == '5')

    lib_data_stud$Comment1 <- NULL
    lib_data_stud$HowOftenL <- NULL
    lib_data_stud$HowOftenC <- NULL
    lib_data_stud$HowOftenW <- NULL
    lib_data_stud$Campus <- NULL
    lib_data_stud$Position <- NULL
    lib_data_stud$ID <- NULL
    lib_data_stud$NPS1 <- NULL
    stu_data_hm<-lib_data_stud[c(1:28)] 
    stu_data_hm$StudyArea[stu_data_hm$StudyArea== 1] <- "Accountancy"
    stu_data_hm$StudyArea[stu_data_hm$StudyArea== 2] <- "Business"
    stu_data_hm$StudyArea[stu_data_hm$StudyArea== 3] <- "Economics"
    stu_data_hm$StudyArea[stu_data_hm$StudyArea== 4] <- "Information Systems"
    stu_data_hm$StudyArea[stu_data_hm$StudyArea== 5] <- "Law"
    stu_data_hm$StudyArea[stu_data_hm$StudyArea== 6] <- "Social Sciences"
    stu_data_hm$StudyArea[stu_data_hm$StudyArea== 7] <- "Others"
    
    kable(stu_data_hm[1:6,1:10]) %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
ResponseID StudyArea I01 I02 I03 I04 I05 I06 I07 I08
570 Business 6 5 4 NA NA NA NA NA
820 Business NA NA NA NA NA NA NA NA
1232 Accountancy NA NA NA NA NA NA NA NA
2571 Information Systems 7 NA 6 NA 7 7 NA NA
541 Accountancy 6 6 4 NA NA 7 NA NA
1576 Business 5 6 5 6 5 6 NA NA

Since we are only focusing on the respondents who have given a rating of 7 as compared to all the other ratings, we convert the data points having rating of 7 to 1 and every other rating values to 0. This way, we get a dataframe with values of only 0’s and 1’s.

    d<-na.omit(stu_data_hm)
    d2<-d[c(2:28)]
    d3 <- d2
    
    d3[ ,c(2:27)][d3[ ,c(2:27)] != 7] <- 0
    d3[ ,c(2:27)][d3[ ,c(2:27)] == 7] <- 1 
    
    kable(d3[1:6,1:10]) %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
StudyArea I01 I02 I03 I04 I05 I06 I07 I08 I09
Social Sciences 0 0 0 0 0 0 0 0 0
Economics 1 1 0 1 0 0 0 0 0
Information Systems 0 0 0 0 0 1 0 1 1
Social Sciences 1 1 1 0 0 1 1 1 1
Business 1 1 1 0 0 0 0 0 0
Business 0 0 0 0 0 0 0 1 0

Now, we use aggregation by summing the rows of the columns and grouping them into their individual study areas. We store this value into df3.

Similarly, we take the length of the number of units/responses for each question and group them into their individual study areas.

We divde the 2 dataframes and multiply each row value by 100 to obtain the value in percentage.

    df3<-aggregate(d3[, 2:27], list(d3$StudyArea), sum)
    df4<-aggregate(d2[, 2:27], list(d2$StudyArea), length)
    d1<-df3[,c(2:27)]/df4[, c(2:27)] * 100
    kable(d1[1:6,1:10]) %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
I01 I02 I03 I04 I05 I06 I07 I08 I09 I10
19.66292 25.84270 32.02247 14.04494 20.22472 44.94382 24.15730 28.65169 19.66292 23.59551
15.70796 28.98230 32.30089 17.69911 19.24779 48.89381 21.46018 27.65487 23.23009 28.09735
19.35484 27.95699 25.80645 17.20430 16.12903 49.46237 26.88172 36.55914 21.50538 31.18280
19.60784 25.49020 28.75817 13.07190 18.30065 49.01961 21.56863 24.18301 20.26144 24.83660
26.85185 45.37037 32.40741 21.29630 30.55556 51.85185 44.44444 46.29630 36.11111 42.59259
14.65517 31.03448 25.86207 12.06897 24.13793 50.86207 26.72414 28.44828 22.41379 32.75862

We add the row to our dataframe in order to understand the proportion of ratings split across the various study areas.

    d1$StudyArea <- df3$Group.1
    row.names(d1) <- d1$StudyArea
    kable(d1[1:6,1:10]) %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
I01 I02 I03 I04 I05 I06 I07 I08 I09 I10
Accountancy 19.66292 25.84270 32.02247 14.04494 20.22472 44.94382 24.15730 28.65169 19.66292 23.59551
Business 15.70796 28.98230 32.30089 17.69911 19.24779 48.89381 21.46018 27.65487 23.23009 28.09735
Economics 19.35484 27.95699 25.80645 17.20430 16.12903 49.46237 26.88172 36.55914 21.50538 31.18280
Information Systems 19.60784 25.49020 28.75817 13.07190 18.30065 49.01961 21.56863 24.18301 20.26144 24.83660
Law 26.85185 45.37037 32.40741 21.29630 30.55556 51.85185 44.44444 46.29630 36.11111 42.59259
Social Sciences 14.65517 31.03448 25.86207 12.06897 24.13793 50.86207 26.72414 28.44828 22.41379 32.75862

We rename the importance factor code numbers (I01 - I26) with the actual textual information of what they represent, as it is more convenient and clear to the user to understand the question he/she is examining at first glance.

    colnames(d1) <- c("Informed about Services",
                      "Provide Useful Info",
                      "Library signage is clear",
                      "Workshops Assist in Learning/Research",
                      "Anticipates Learning/Research needs",
                      "Opening hours meets needs",
                      "Requested Books/Articles delivered",
                      "Self Service meets needs",
                      "Online enquiry meets needs",
                      "F2F enquiry meets needs",
                      "Library Shelves have items I need",
                      "Library Staff give accurate answers",
                      "Library staff are helpful",
                      "Can find quiet place to study",
                      "Can find place to work as groups",
                      "Computer is available",
                      "Laptop Facilities provided",
                      "Wireless Access Provided",
                      "Print/Scan Facilities Provided",
                      "Info/Resources meet my needs",
                      "Online Resources are Useful",
                      "Course specific Resources present",
                      "Access Library away from Campus",
                      "Library Search Engine is useful",
                      "Access to Library made me succesful",
                      "Mobile devices useful to access content","StudyArea")

    kable(d1[1:6,1:10]) %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
Informed about Services Provide Useful Info Library signage is clear Workshops Assist in Learning/Research Anticipates Learning/Research needs Opening hours meets needs Requested Books/Articles delivered Self Service meets needs Online enquiry meets needs F2F enquiry meets needs
Accountancy 19.66292 25.84270 32.02247 14.04494 20.22472 44.94382 24.15730 28.65169 19.66292 23.59551
Business 15.70796 28.98230 32.30089 17.69911 19.24779 48.89381 21.46018 27.65487 23.23009 28.09735
Economics 19.35484 27.95699 25.80645 17.20430 16.12903 49.46237 26.88172 36.55914 21.50538 31.18280
Information Systems 19.60784 25.49020 28.75817 13.07190 18.30065 49.01961 21.56863 24.18301 20.26144 24.83660
Law 26.85185 45.37037 32.40741 21.29630 30.55556 51.85185 44.44444 46.29630 36.11111 42.59259
Social Sciences 14.65517 31.03448 25.86207 12.06897 24.13793 50.86207 26.72414 28.44828 22.41379 32.75862

2.0.4 Ploting the Interactive HeatMap

Following is the code chunk for plotting our interactive HeatMap. Since we have brought interactivity, we can hover over each of the blocks to indentify the percentage of respondents from each study area who have given a rating of 5 for that particular question. This will help us identify which sections of the library are performing well as opposed to the ones which are not doing very well.

For our case analysis, since we have chosen only Undergraduate students as our target, we can notice that most undergrads from all study areas have given a high rating of 7 for factors such as - having a quite place to study, wireless access, print/scan facilities and place to work as groups when needed. This potentially lets us know that these factors are the most sought after in a library across all study areas.

#Code for ploting the interactive heatmap
    d1$StudyArea<- NULL
    wh_matrix <- data.matrix((d1))
    heatmaply(t(wh_matrix), 
              Rowv=NA, Colv=NA,
              seriate = "none",
              colors = Greens,
              fontsize_row = 10,
              fontsize_col = 10,
              grid_color = "white",
              grid_lw=0.3,
              branches_lwd = 0.6,
              label_names = c("Question", "Study Area","Percentage"),
              grid_size = 1,
              xlab = "Study Area",
              ylab = "Questions", 
              main = "Ratings by Questions and Area of Study: Importance")