Heat Map - Ratings Split on Proportions of Respondents

1.0 Overview

The core purpose of our data visualisation is to create a comparison of the score for importance of factors of the library survey data set obtained from the Singapore Management University’s 2018 Survey Analysis, and how much importance is given by the library users on the different factors considered. For this, we create a heatmap with the proportion of respondents who provide a specific rating as opposed to the ones who do not. This will enable us to understand the areas of the library which require attention based on the higher proportion of respondents.

2.0 Step-By-Step Creation of Data Visualization

This section describes the step by step process for creating the data visualizations. We explain about the Intallation of R packages, the Data Preparation followed by Creating Visualizations and incorporating the element of interactivity.

2.0.1 Installation of R Packages

This code chunk installs the’tidyverse’,‘aggregation’,‘plotly’,‘reshape2’,‘dendextend’,‘ggalt’,‘ggridges’ packages on the user machine without having to install each one of them individually. It also loads the packages for us after having installed them into the R studio environment for immediate use.

#Code for checking if the packages are installed or not
packages <-c ('tidyverse','ggridges','aggregation','plotly','heatmaply','reshape2','dendextend','ggalt','knitr')
for (p in packages){
  if(!require(p, character.only=T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

library(kableExtra)

2.0.2 Loading of Data Set

The data source for this dataviz is loaded using the read_csv function. The data as mentioned earlier consists of 26 questions whose survey results are documented along with other factors. We are mainly focusing on the survey results aspect.

lib_data<- read_csv("Data/Raw Data LSR.csv")

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   Comment1 = col_character()
## )

## See spec(...) for full column specifications.

2.0.3 Data Preparation

For our analyis, we incorporate the rating of the respondents from both libraries (Li Ka Shing Library as well as Kwa Geok Choo Law Library). However, we do filter out Postgrads, Faculty, Staff & Other respondents and choose only Undergraduate Students for our visualization.

The questions/factors which are being assessed in the survey are labelled from 1 to 26 (I01-I26 for Importance and P01-P26 for Performance), with question P27 being one to assess users’ overall satisfaction with the library. Since, we are focusing on the ‘Importance’ aspect, we choose only factors (I01-I26) and remove/filter out all the other factors from our dataset.

      lib_data_stud <- filter(lib_data, Campus == '1' | Campus == '2')
      
      lib_data_stud <- filter(lib_data_stud, StudyArea == '1' | StudyArea == '2'| StudyArea == '3'| StudyArea == '4'| StudyArea == '5'| StudyArea == '6'| StudyArea == '7')
      
      lib_data_stud <- filter(lib_data_stud, Position == '1' | Position == '2'| Position == '3'| Position == '4' | Position == '5')

    lib_data_stud$Comment1 <- NULL
    lib_data_stud$HowOftenL <- NULL
    lib_data_stud$HowOftenC <- NULL
    lib_data_stud$HowOftenW <- NULL
    lib_data_stud$Campus <- NULL
    lib_data_stud$Position <- NULL
    lib_data_stud$ID <- NULL
    lib_data_stud$NPS1 <- NULL
    stu_data_hm<-lib_data_stud[c(1:28)] 
    stu_data_hm$StudyArea[stu_data_hm$StudyArea== 1] <- "Accountancy"
    stu_data_hm$StudyArea[stu_data_hm$StudyArea== 2] <- "Business"
    stu_data_hm$StudyArea[stu_data_hm$StudyArea== 3] <- "Economics"
    stu_data_hm$StudyArea[stu_data_hm$StudyArea== 4] <- "Information Systems"
    stu_data_hm$StudyArea[stu_data_hm$StudyArea== 5] <- "Law"
    stu_data_hm$StudyArea[stu_data_hm$StudyArea== 6] <- "Social Sciences"
    stu_data_hm$StudyArea[stu_data_hm$StudyArea== 7] <- "Others"
    
    kable(stu_data_hm[1:6,1:10]) %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

ResponseID	StudyArea	I01	I02	I03	I04	I05	I06	I07	I08
570	Business	6	5	4	NA	NA	NA	NA	NA
820	Business	NA	NA	NA	NA	NA	NA	NA	NA
1232	Accountancy	NA	NA	NA	NA	NA	NA	NA	NA
2571	Information Systems	7	NA	6	NA	7	7	NA	NA
541	Accountancy	6	6	4	NA	NA	7	NA	NA
1576	Business	5	6	5	6	5	6	NA	NA

Since we are only focusing on the respondents who have given a rating of 7 as compared to all the other ratings, we convert the data points having rating of 7 to 1 and every other rating values to 0. This way, we get a dataframe with values of only 0’s and 1’s.

    d<-na.omit(stu_data_hm)
    d2<-d[c(2:28)]
    d3 <- d2
    
    d3[ ,c(2:27)][d3[ ,c(2:27)] != 7] <- 0
    d3[ ,c(2:27)][d3[ ,c(2:27)] == 7] <- 1 
    
    kable(d3[1:6,1:10]) %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

StudyArea	I01	I02	I03	I04	I06	I07	I08	I09
Social Sciences	0	0	0	0	0	0	0	0
Economics	1	1	0	1	0	0	0	0
Information Systems	0	0	0	0	1	0	1	1
Social Sciences	1	1	1	0	1	1	1	1
Business	1	1	1	0	0	0	0	0
Business	0	0	0	0	0	0	1	0

Now, we use aggregation by summing the rows of the columns and grouping them into their individual study areas. We store this value into df3.

Similarly, we take the length of the number of units/responses for each question and group them into their individual study areas.

We divde the 2 dataframes and multiply each row value by 100 to obtain the value in percentage.

    df3<-aggregate(d3[, 2:27], list(d3$StudyArea), sum)
    df4<-aggregate(d2[, 2:27], list(d2$StudyArea), length)
    d1<-df3[,c(2:27)]/df4[, c(2:27)] * 100
    kable(d1[1:6,1:10]) %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

I01	I02	I03	I04	I05	I06	I07	I08	I09	I10
19.66292	25.84270	32.02247	14.04494	20.22472	44.94382	24.15730	28.65169	19.66292	23.59551
15.70796	28.98230	32.30089	17.69911	19.24779	48.89381	21.46018	27.65487	23.23009	28.09735
19.35484	27.95699	25.80645	17.20430	16.12903	49.46237	26.88172	36.55914	21.50538	31.18280
19.60784	25.49020	28.75817	13.07190	18.30065	49.01961	21.56863	24.18301	20.26144	24.83660
26.85185	45.37037	32.40741	21.29630	30.55556	51.85185	44.44444	46.29630	36.11111	42.59259
14.65517	31.03448	25.86207	12.06897	24.13793	50.86207	26.72414	28.44828	22.41379	32.75862

We add the row to our dataframe in order to understand the proportion of ratings split across the various study areas.

    d1$StudyArea <- df3$Group.1
    row.names(d1) <- d1$StudyArea
    kable(d1[1:6,1:10]) %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

	I01	I02	I03	I04	I05	I06	I07	I08	I09	I10
Accountancy	19.66292	25.84270	32.02247	14.04494	20.22472	44.94382	24.15730	28.65169	19.66292	23.59551
Business	15.70796	28.98230	32.30089	17.69911	19.24779	48.89381	21.46018	27.65487	23.23009	28.09735
Economics	19.35484	27.95699	25.80645	17.20430	16.12903	49.46237	26.88172	36.55914	21.50538	31.18280
Information Systems	19.60784	25.49020	28.75817	13.07190	18.30065	49.01961	21.56863	24.18301	20.26144	24.83660
Law	26.85185	45.37037	32.40741	21.29630	30.55556	51.85185	44.44444	46.29630	36.11111	42.59259
Social Sciences	14.65517	31.03448	25.86207	12.06897	24.13793	50.86207	26.72414	28.44828	22.41379	32.75862

We rename the importance factor code numbers (I01 - I26) with the actual textual information of what they represent, as it is more convenient and clear to the user to understand the question he/she is examining at first glance.

    colnames(d1) <- c("Informed about Services",
                      "Provide Useful Info",
                      "Library signage is clear",
                      "Workshops Assist in Learning/Research",
                      "Anticipates Learning/Research needs",
                      "Opening hours meets needs",
                      "Requested Books/Articles delivered",
                      "Self Service meets needs",
                      "Online enquiry meets needs",
                      "F2F enquiry meets needs",
                      "Library Shelves have items I need",
                      "Library Staff give accurate answers",
                      "Library staff are helpful",
                      "Can find quiet place to study",
                      "Can find place to work as groups",
                      "Computer is available",
                      "Laptop Facilities provided",
                      "Wireless Access Provided",
                      "Print/Scan Facilities Provided",
                      "Info/Resources meet my needs",
                      "Online Resources are Useful",
                      "Course specific Resources present",
                      "Access Library away from Campus",
                      "Library Search Engine is useful",
                      "Access to Library made me succesful",
                      "Mobile devices useful to access content","StudyArea")

    kable(d1[1:6,1:10]) %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

	Informed about Services	Provide Useful Info	Library signage is clear	Workshops Assist in Learning/Research	Anticipates Learning/Research needs	Opening hours meets needs	Requested Books/Articles delivered	Self Service meets needs	Online enquiry meets needs	F2F enquiry meets needs
Accountancy	19.66292	25.84270	32.02247	14.04494	20.22472	44.94382	24.15730	28.65169	19.66292	23.59551
Business	15.70796	28.98230	32.30089	17.69911	19.24779	48.89381	21.46018	27.65487	23.23009	28.09735
Economics	19.35484	27.95699	25.80645	17.20430	16.12903	49.46237	26.88172	36.55914	21.50538	31.18280
Information Systems	19.60784	25.49020	28.75817	13.07190	18.30065	49.01961	21.56863	24.18301	20.26144	24.83660
Law	26.85185	45.37037	32.40741	21.29630	30.55556	51.85185	44.44444	46.29630	36.11111	42.59259
Social Sciences	14.65517	31.03448	25.86207	12.06897	24.13793	50.86207	26.72414	28.44828	22.41379	32.75862

2.0.4 Ploting the Interactive HeatMap

Following is the code chunk for plotting our interactive HeatMap. Since we have brought interactivity, we can hover over each of the blocks to indentify the percentage of respondents from each study area who have given a rating of 5 for that particular question. This will help us identify which sections of the library are performing well as opposed to the ones which are not doing very well.

For our case analysis, since we have chosen only Undergraduate students as our target, we can notice that most undergrads from all study areas have given a high rating of 7 for factors such as - having a quite place to study, wireless access, print/scan facilities and place to work as groups when needed. This potentially lets us know that these factors are the most sought after in a library across all study areas.

#Code for ploting the interactive heatmap
    d1$StudyArea<- NULL
    wh_matrix <- data.matrix((d1))
    heatmaply(t(wh_matrix), 
              Rowv=NA, Colv=NA,
              seriate = "none",
              colors = Greens,
              fontsize_row = 10,
              fontsize_col = 10,
              grid_color = "white",
              grid_lw=0.3,
              branches_lwd = 0.6,
              label_names = c("Question", "Study Area","Percentage"),
              grid_size = 1,
              xlab = "Study Area",
              ylab = "Questions", 
              main = "Ratings by Questions and Area of Study: Importance")