1.0 Core Purpose and Components

Libraries are touted to be the keys to the past and gateway to our future. The data from Singapore Management University’s 2018 Library Survey will be our keys to unlocking potential insights so that we can pave the way for a better library that meets everyone’s needs. The core purpose of our data visualisation is to create a comparison of the score for importance of factors of the library survey data set obtained from the Singapore Management University’s 2018 Survey Analysis, and how much importance is given by the library users on the different factors considered.

1.0.1 Proposed Design Sketches

For our analysis, have the survey data results for both ‘Performance’ as well as ‘Importance’. But since we are objectively focusing on only one aspect for our data viz, we choose to work with ‘Importance’ as the factor. With this being the case, I believe that survey level data can be best represented by the use of a likert scale. And for my likert scale, I am going to go ahead with a diverging stacked bar chart. Importance has been captured for each question/factor in Likert scales, from 1 (lowest) to 7 (highest). This allows for each individual question to be plotted as a diverging bar graph since the scales used on both sides are the same. This lets us clearly see which are the more important factors to be considered in the library as opposed to the others.

I also choose to make use of a interactive donut chart, to help users understand the exact diversity (Faculty, Staff, Students etc,.) in the profile of the people who have taken the library survey.

There were other options to consider for creation of this visualisation, namely the bar-in-bar chart, heatmap (to show which factor are more important based on area of study) and the scatter plot with line. But my other project team members are focussing on these and hence I have decided to go ahead with this form of visualization.

2.0 Ways to Incorporate Interactivity

Since the dataset is a survey dataset, visualizing the likert scale data in a stacked bar chart allows us to make use of ggplotly tooltips for interactivity. We make use of ggplotly as we are expected to produce our charts using ggplot2.

The interactivity helps as the likert scale bar chart can be filtered by double clicking the legend (filtered based on rating) and the user can visualize any one particular observation as per their interest.

Also, we can hover over each diverging bar, which in turn would let us know the question for which the rating was provided, categorical value of the rating and also the percentage of people having given that specific rating.

3.0 Step-By-Step Creation of Data Visualization

This section describes the step by step process for creating the data visualizations. We explain about the Intallation of R packages, the Data Preparation followed by Creating Visualizations and incorporating the element of interactivity.

3.0.1 Installation of R Packages

This code chunk installs the’tidyverse’,‘aggregation’,‘plotly’,‘plyr’,‘reshape2’,‘HH’,‘dplyr’,‘cowplot’,‘gridExtra’,‘ggthemes’,‘stringr’,‘ggplot2’,‘RColorBrewer’ packages on the user machine without having to install each one of them individually. It also loads the packages for us after having installed them into the R studio environment for immediate use.

my_locale <- Sys.getlocale("LC_ALL")
Sys.setlocale("LC_ALL", my_locale)
## [1] ""
#Code for checking if the packages are installed or not
packages <-c ('tidyverse','aggregation','plotly','plyr','reshape2','HH','dplyr','cowplot','gridExtra','ggthemes','stringr','ggplot2','RColorBrewer')
for (p in packages){
  if(!require(p, character.only=T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

3.1 Data Preparation

The data source for this dataviz is loaded using the read_csv function. The data as mentioned earlier consists of 26 questions whose survey results are documented along with other factors. We are mainly focusing on the survey results aspect.

lib_data<- read_csv("data/Raw Data LSR.csv")
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   Comment1 = col_character()
## )
## See spec(...) for full column specifications.

3.1.1 Data Preparation for Donut Chart

First, we focus on making the donut chart to show the split of the profile of people who have taken the SMU library survey earlier. For this, we split our data based on the position of the person. We select these columns alone and obtain the total sum of people under each position.

lib_data_student_under <- filter(lib_data, Position == '1' | Position == '2'| Position == '3' | Position == '4' | Position == '5')

stud_under <- nrow(lib_data_student_under)
stud_under
## [1] 2124
lib_data_student_mast <- filter(lib_data, Position == '6'| Position == '7')

stud_mast <- nrow(lib_data_student_mast)
stud_mast
## [1] 279
lib_data_student_fact <- filter(lib_data, Position == '8'| Position == '9'| Position == '10'| Position == '11')

stud_fact <- nrow(lib_data_student_fact)
stud_fact
## [1] 65
lib_data_student_staff <- filter(lib_data, Position == '12'| Position == '13'| Position == '14')

stud_staff <- nrow(lib_data_student_staff)
stud_staff
## [1] 170

Once we obtain our count for each position, we make the dataframe with ‘Category’ as the Position and the ‘Count’ for each position.

# Create test data.
data <- data.frame(
  category=c("Undergrads", "Faculty", "Postgrads","Staff & Others"),
  count=c(stud_under, stud_fact, stud_mast, stud_staff)
)
 
# Compute percentages
data$fraction <- data$count / sum(data$count)

# Compute the cumulative percentages (top of each rectangle)
data$ymax <- cumsum(data$fraction)

# Compute the bottom of each rectangle
data$ymin <- c(0, head(data$ymax, n=-1))

# Compute label position
data$labelPosition <- (data$ymax + data$ymin) / 2

# Compute a good label
data$label <- paste0(data$category, "\n Count: ", data$count)


data
##         category count   fraction      ymax      ymin labelPosition
## 1     Undergrads  2124 0.80515542 0.8051554 0.0000000     0.4025777
## 2        Faculty    65 0.02463988 0.8297953 0.8051554     0.8174754
## 3      Postgrads   279 0.10576194 0.9355572 0.8297953     0.8826763
## 4 Staff & Others   170 0.06444276 1.0000000 0.9355572     0.9677786
##                         label
## 1    Undergrads\n Count: 2124
## 2         Faculty\n Count: 65
## 3      Postgrads\n Count: 279
## 4 Staff & Others\n Count: 170

Now that we have our dataframe, we make the static donut chart as shown below.

# Make Static Donut Chart
gg_donut_static <- ggplot(data, aes(ymax=ymax, ymin=ymin, xmax=4, xmin=3, fill=category)) +
  geom_rect() + ggtitle("Library Survey Data Distribution Based on Position") + 
  geom_label( x=4, aes(y=labelPosition, label=label), size=4) +
  scale_fill_brewer(palette=4) +
  coord_polar(theta="y") +
  xlim(c(2, 4)) +
  theme_void() +
  theme(legend.position = "none")

gg_donut_static

In order to incorporate interactivity to our donut chart, we make use of the plot_ly function as shown below. If you hover over each section, you can see the position, the count of people based on position as well as the percentage covered.

# Interactive Donut Chart
gg_donut_Int <- data %>% plot_ly(labels = ~category, values = ~count)
gg_donut_Int <- gg_donut_Int %>% add_pie(hole = 0.6)
gg_donut_Int <- gg_donut_Int %>% layout(title = "Library Survey Data Distribution Based on Position", showlegend = T,
                      xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
                      yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))

gg_donut_Int

3.1.2 Data Preparation for Diverging Stacked Bar Chart

For our analyis, we incorporate the rating of the respondents from both libraries (Li Ka Shing as well as Kwa Geok Choo Law). However, we do filter out Faculty, Staff & Other respondents and choose only Students for visualization.

The questions/factors which are being assessed in the survey are labelled from 1 to 26 (I01-I26 and P01-P26), with question P27 being one to assess users’ overall satisfaction with the library. Since, we are focusing on the ‘Importance’ aspect, we choose only these factors and remove/filter out all the other factors from our dataset.

lib_data_student <- filter(lib_data, Position == '1' | Position == '2'| Position == '3'| Position == '4'| Position == '5'| Position == '6'| Position == '7')

lib_data_student$Comment1 <- NULL
lib_data_student$HowOftenL <- NULL
lib_data_student$HowOftenC <- NULL
lib_data_student$HowOftenW <- NULL
lib_data_student$Campus <- NULL
lib_data_student$Position <- NULL
lib_data_student$StudyArea <- NULL
lib_data_student$ID <- NULL
lib_data_student$NPS1 <- NULL

lib_data_student
## # A tibble: 2,403 x 80
##    ResponseID   I01   I02   I03   I04   I05   I06   I07   I08   I09   I10
##         <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1        599    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
##  2        570     6     5     4    NA    NA    NA    NA    NA    NA    NA
##  3        264    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
##  4        686    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
##  5        820    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
##  6       1232    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
##  7       1514    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
##  8       3179    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
##  9       2571     7    NA     6    NA     7     7    NA    NA    NA    NA
## 10        541     6     6     4    NA    NA     7    NA    NA    NA    NA
## # ... with 2,393 more rows, and 69 more variables: I11 <dbl>, I12 <dbl>,
## #   I13 <dbl>, I14 <dbl>, I15 <dbl>, I16 <dbl>, I17 <dbl>, I18 <dbl>,
## #   I19 <dbl>, I20 <dbl>, I21 <dbl>, I22 <dbl>, I23 <dbl>, I24 <dbl>,
## #   I25 <dbl>, I26 <dbl>, P01 <dbl>, P02 <dbl>, P03 <dbl>, P04 <dbl>,
## #   P05 <dbl>, P06 <dbl>, P07 <dbl>, P08 <dbl>, P09 <dbl>, P10 <dbl>,
## #   P11 <dbl>, P12 <dbl>, P13 <dbl>, P14 <dbl>, P15 <dbl>, P16 <dbl>,
## #   P17 <dbl>, P18 <dbl>, P19 <dbl>, P20 <dbl>, P21 <dbl>, P22 <dbl>,
## #   P23 <dbl>, P24 <dbl>, P25 <dbl>, P26 <dbl>, P27 <dbl>, NA1 <dbl>,
## #   NA2 <dbl>, NA3 <dbl>, NA4 <dbl>, NA5 <dbl>, NA6 <dbl>, NA7 <dbl>,
## #   NA8 <dbl>, NA9 <dbl>, NA10 <dbl>, NA11 <dbl>, NA12 <dbl>, NA13 <dbl>,
## #   NA14 <dbl>, NA15 <dbl>, NA16 <dbl>, NA17 <dbl>, NA18 <dbl>,
## #   NA19 <dbl>, NA20 <dbl>, NA21 <dbl>, NA22 <dbl>, NA23 <dbl>,
## #   NA24 <dbl>, NA25 <dbl>, NA26 <dbl>
lib_data_student1 <- dplyr::select(lib_data_student,-starts_with("NA"))
lib_data_student2 <- dplyr::select(lib_data_student1,-starts_with("P"))

Since we are having all the survey responses for each factor in separate columns, it makes our analysis and visualization difficult. We need to ensure we have as few columns as possible and as such we make use of the pivot_longer function to record the response from each ‘ResponseID’ for all 26 questions under one single column.

survey <- lib_data_student2 %>%
pivot_longer(-ResponseID, names_to = "measure", values_to = "response")
survey
## # A tibble: 62,478 x 3
##    ResponseID measure response
##         <dbl> <chr>      <dbl>
##  1        599 I01           NA
##  2        599 I02           NA
##  3        599 I03           NA
##  4        599 I04           NA
##  5        599 I05           NA
##  6        599 I06           NA
##  7        599 I07           NA
##  8        599 I08           NA
##  9        599 I09           NA
## 10        599 I10           NA
## # ... with 62,468 more rows

We convert the Measure and Response columns as Factors as we do not require any other values in between during visualization.

survey$measure <- as.factor(survey$measure)
survey$response <- as.factor(survey$response)

The data is tabulated into a contingency table by measure and response (satisfaction rating)

survey_df <- table(survey$measure,survey$response) %>% as.data.frame.matrix()
survey_df <- tibble::rownames_to_column(survey_df, var="Measure")
survey_df
##    Measure   1   2   3   4   5   6    7
## 1      I01  26  58 149 344 748 591  450
## 2      I02  11  30  72 223 483 763  764
## 3      I03  20  39  91 251 494 742  731
## 4      I04  43  91 177 373 526 528  368
## 5      I05  23  40 115 325 559 686  496
## 6      I06  10  21  39 125 265 616 1299
## 7      I07  11  36  64 212 346 456  440
## 8      I08  11  21  57 183 379 638  698
## 9      I09  17  40  81 218 398 551  456
## 10     I10   6  34  54 212 405 619  689
## 11     I11   3  29  55 171 360 631  821
## 12     I12   2  21  37 150 350 739  819
## 13     I13   1  12  31 147 360 713  954
## 14     I14   7  14  24  58 168 422 1618
## 15     I15  19  34  46 135 292 514 1257
## 16     I16 111 104 128 267 314 314  488
## 17     I17  16  15  32  91 243 510 1320
## 18     I18   4   4  12  47 136 366 1737
## 19     I19   8   4  24  67 179 489 1495
## 20     I20  20  31  65 191 414 560  583
## 21     I21   5   8  27 127 313 604 1133
## 22     I22   6  12  34 149 355 676  839
## 23     I23   1  10  35 136 315 623 1057
## 24     I24   2  10  36 123 298 698 1052
## 25     I25   8  13  47 199 390 649  853
## 26     I26  14  27  69 202 404 603  718

Now that our data is prepared, we make use of the likert function to prepare our basic diverging stacked bar chart to help us give a quick view on how we want to proceed.

likert(Measure ~ ., data=survey_df, ylab=NULL,
       ReferenceZero=4, as.percent=TRUE,
       positive.order=TRUE, 
       main = list("Student Survey Result - LSR (Improvement)",x=unit(.55, "npc")), 
       sub= list("Satisfaction Rating",x=unit(.57, "npc")), 
       xlim=c(-100,-80,-60,-40,-20,0,20,40,60,80,100),
       strip=FALSE, 
       par.strip.text=list(cex=.7))

We understand that from our intital data viz prepared, we need to note that we need to convert the factors as actual questions for the user to understand instead of choosing the factor number. Also, we need to incorporate interactivity as the static likert plot is not very intuitive to derive too much information.

3.1.3 Bar Plot to Visualize Factors with NA Values

We make another visualization to represent the total count of ratings which were left blank/provided an NA rating for all the factors involved in the survey.

lib_data_student_NA <- dplyr::select(lib_data_student,-starts_with("I"))
lib_data_student_NA1 <- dplyr::select(lib_data_student_NA,-starts_with("P"))


survey_NA <- lib_data_student_NA1 %>%
pivot_longer(-ResponseID, names_to = "measure", values_to = "response")
survey_NA
## # A tibble: 62,478 x 3
##    ResponseID measure response
##         <dbl> <chr>      <dbl>
##  1        599 NA1           NA
##  2        599 NA2           NA
##  3        599 NA3           NA
##  4        599 NA4           NA
##  5        599 NA5           NA
##  6        599 NA6           NA
##  7        599 NA7           NA
##  8        599 NA8           NA
##  9        599 NA9           NA
## 10        599 NA10          NA
## # ... with 62,468 more rows
survey_NA$measure <- as.factor(survey_NA$measure)
survey_NA$response <- as.factor(survey_NA$response)
survey_NA_df <- table(survey_NA$measure,survey_NA$response) %>% as.data.frame.matrix()

We reorder our factors and based on the ascending order of the ‘Count’ of the respondents who did not give a rating for certain questions.

survey_NA_df <- tibble::rownames_to_column(survey_NA_df, var="Measure")
colnames(survey_NA_df)[2] <- "Count"
survey_NA_df[order(survey_NA_df$Count),]
##    Measure Count
## 6     NA14    17
## 23     NA6    21
## 10    NA18    22
## 20     NA3    29
## 1      NA1    31
## 7     NA15    31
## 12     NA2    51
## 11    NA19    62
## 9     NA17   101
## 17    NA24   109
## 14    NA21   111
## 16    NA23   151
## 22     NA5   152
## 18    NA25   169
## 5     NA13   178
## 15    NA22   257
## 4     NA12   278
## 21     NA4   290
## 19    NA26   291
## 3     NA11   326
## 2     NA10   377
## 25     NA8   409
## 13    NA20   464
## 8     NA16   602
## 26     NA9   635
## 24     NA7   831

Our bar plot visualization helps us to understand which questions were most neglected as were left without a response. We notice that factors ‘NA7’,‘NA9’ and ‘NA16’ were the ones with the most ‘NA’ values.

NA7 -> Books and articles I have requested from other Libraries are delivered promptly

NA9 -> Online enquiry services (e.g. Email, Library Chat) meet my needs

NA16 -> A computer is available when I need one

This could potentially mean that these are the facilties/services that students rarely make use of so they find it hard to comment on/provide a rating.

plot_Null <- ggplot(survey_NA_df, aes(x = reorder(Measure, Count), y = Count)) + geom_bar(stat = "identity") + ggtitle("Number of Respondents who did not Answer certain Questions") + coord_flip()

ggplotly(plot_Null)

3.1.4 Alternate Take on the Diverging Stacked Bar Chart

Now that we know how to proceed with our data viz, we think of an alternate way where we incorporate interactivity, we convert factors to actual questions and we give a text value instead of number for satisfaction rating.

3.1.4.1 Data Preparation

Since we make use of longer axis labels for our factors, we make sure to download the ‘stringr’ package to incorporate this. We define the title for our visual and the we convert the ratings from 1 to 7 into textual values for better understanding purposes.

tab <- survey_df
mytitle<-"Library Survey Data - Student Survey Result (Improvement)\n"
mylevels<-c("Strongly Disagree","Disagree","Somewhat Disagree","Not Sure","Somewhat Agree","Agree","Strongly Agree")
tab <- cbind(tab[1], prop.table(as.matrix(tab[-1]), margin = 1))
tab[,-1] <-round(tab[,-1],2) #the "-1" excludes column 1

We have a center category of ‘Not Sure’ and for this, we divide this estimate by two, and then include it twice. That way, we can plot half of this category below the center line, and other half over the center line.

numlevels<-length(tab[1,])-1
numcenter<-ceiling(numlevels/2)+1
tab$midvalues<-tab[,numcenter]/2
tab2<-cbind(tab[,1],tab[,2:ceiling(numlevels/2)],
  tab$midvalues,tab$midvalues,tab[,numcenter:numlevels+1])
colnames(tab2)<-c("Factor",mylevels[1:floor(numlevels/2)],"notsure-",
  "notsure+",mylevels[numcenter:numlevels])

We provide the amount of blank space between both the minimum and maximum stacked bar on both sides for better visual representation.

Now that we have converted our data in a dataframe, we multiply these values by 100 to get the required percentage estimate on both sides of the axis.

numlevels<-length(mylevels)+1
point1<-2
point2<-((numlevels)/2)+1
point3<-point2+1
point4<-numlevels+1
mymin<-(ceiling(max(rowSums(tab2[,point1:point2]))*4)/4)*-100
mymax<-(ceiling(max(rowSums(tab2[,point3:point4]))*4)/4)*100
tab2 <- tab2[,c(1,2,3,4,5,9,8,7,6)]
numlevels<-length(tab[1,])-1
numlevels
## [1] 8
temp.rows<-length(tab2[,1])
temp.rows
## [1] 26

Next, we split our table to 2 separate dataframes called ‘highs’ and ‘lows’ in order to obtain the positive values (notsure, somewhat agree, agree and strongly agree) under the dataframe ‘highs’ and all the negative ratings under the dataframe ‘lows’ as shown.

tab3<-melt(tab2,id="Factor")
#tab3$col<-rep(pal,each=temp.rows)
tab3$value<-tab3$value*100
tab3$Factor<-str_wrap(tab3$Factor, width = 40)
tab3$Factor<-factor(tab3$Factor, levels = tab2$Factor[order(-(tab2[,5]+tab2[,6]+tab2[,7]))])
highs<-na.omit(tab3[(length(tab3[,1])/2)+1:length(tab3[,1]),])
lows<-na.omit(tab3[1:(length(tab3[,1])/2),])
lows <- lows[rev(rownames(lows)),]
#lows$col <- factor(lows$col, levels = c("#B2182B","#EF8A62","#FDDBC7", "#DFDFDF"))

lows<-lows[dim(lows)[1]:1,]

We need to make sure to give the factor a meaningful textual value for the user to better understanding the question for which the rating was provided upon first glance rather than having to refer to the codebook and decipher its meaning. At the same time, we have to ensure that we do not make the translation too lengthy that it becomes tedious to read and makes our visualization look messy.

For this, we make use of the transform function and create an additional column with the translated textual data for each factor.

highs <- transform(highs, Question= ifelse(Factor == "I01","C) Informed about Services",ifelse(Factor == "I02","M) Provide Useful Info"
,ifelse(Factor == "I03","H) Library signage is clear"
,ifelse(Factor == "I04","B) Workshops Assist in Learning/Research"
,ifelse(Factor == "I05","D) Anticipates Learning/Research needs"
,ifelse(Factor == "I06","S) Opening hours meets needs"
,ifelse(Factor == "I07","F) Requested Books/Articles delivered"
,ifelse(Factor == "I08","K) Self Service meets needs"
,ifelse(Factor == "I09","E) Online enquiry meets needs"
,ifelse(Factor == "I10","J) F2F enquiry meets needs"
,ifelse(Factor == "I11","L) Library Shelves have items I need"
,ifelse(Factor == "I12","R) Library Staff give accurate answers"
,ifelse(Factor == "I13","T) Library staff are helpful"
,ifelse(Factor == "I14","X) Can find quiet place to study"
,ifelse(Factor == "I15","O) Can find place to work as groups"
,ifelse(Factor == "I16","A) Computer is available"
,ifelse(Factor == "I17","V) Laptop Facilities provided"
,ifelse(Factor == "I18","Z) Wireless Access Provided"
,ifelse(Factor == "I19","Y) Print/Scan Facilities Provided"
,ifelse(Factor == "I20","G) Info/Resources meet my needs"
,ifelse(Factor == "I21","W) Online Resources are Useful"
,ifelse(Factor == "I22","Q) Course specific Resources present"
,ifelse(Factor == "I23","U) Access Library away from Campus"
,ifelse(Factor == "I24","P) Library Search Engine is useful"
,ifelse(Factor == "I25","N) Access to Library made me succesful"
,"I) Mobile devices useful to access content"))))))))))))))))))))))))))
as_tibble(highs)
## # A tibble: 104 x 4
##    Factor variable       value Question                                
##    <fct>  <fct>          <dbl> <fct>                                   
##  1 I01    Strongly Agree   19  C) Informed about Services              
##  2 I02    Strongly Agree   33  M) Provide Useful Info                  
##  3 I03    Strongly Agree   31  H) Library signage is clear             
##  4 I04    Strongly Agree   17  B) Workshops Assist in Learning/Research
##  5 I05    Strongly Agree   22  D) Anticipates Learning/Research needs  
##  6 I06    Strongly Agree   55. S) Opening hours meets needs            
##  7 I07    Strongly Agree   28. F) Requested Books/Articles delivered   
##  8 I08    Strongly Agree   35  K) Self Service meets needs             
##  9 I09    Strongly Agree   26  E) Online enquiry meets needs           
## 10 I10    Strongly Agree   34  J) F2F enquiry meets needs              
## # ... with 94 more rows
lows <- transform(lows, Question= ifelse(Factor == "I01","C) Informed about Services",ifelse(Factor == "I02","M) Provide Useful Info"
,ifelse(Factor == "I03","H) Library signage is clear"
,ifelse(Factor == "I04","B) Workshops Assist in Learning/Research"
,ifelse(Factor == "I05","D) Anticipates Learning/Research needs"
,ifelse(Factor == "I06","S) Opening hours meets needs"
,ifelse(Factor == "I07","F) Requested Books/Articles delivered"
,ifelse(Factor == "I08","K) Self Service meets needs"
,ifelse(Factor == "I09","E) Online enquiry meets needs"
,ifelse(Factor == "I10","J) F2F enquiry meets needs"
,ifelse(Factor == "I11","L) Library Shelves have items I need"
,ifelse(Factor == "I12","R) Library Staff give accurate answers"
,ifelse(Factor == "I13","T) Library staff are helpful"
,ifelse(Factor == "I14","X) Can find quiet place to study"
,ifelse(Factor == "I15","O) Can find place to work as groups"
,ifelse(Factor == "I16","A) Computer is available"
,ifelse(Factor == "I17","V) Laptop Facilities provided"
,ifelse(Factor == "I18","Z) Wireless Access Provided"
,ifelse(Factor == "I19","Y) Print/Scan Facilities Provided"
,ifelse(Factor == "I20","G) Info/Resources meet my needs"
,ifelse(Factor == "I21","W) Online Resources are Useful"
,ifelse(Factor == "I22","Q) Course specific Resources present"
,ifelse(Factor == "I23","U) Access Library away from Campus"
,ifelse(Factor == "I24","P) Library Search Engine is useful"
,ifelse(Factor == "I25","N) Access to Library made me succesful"
,"I) Mobile devices useful to access content"))))))))))))))))))))))))))

We call ggplot() to make our static diverging stacked bar chart. First, we plot the highs values to obtain the positive stacked bar in the positive x-axis. Next, we plot the lows values, but in the aes() mapping, we need to specify -value so that these are plotted on the negative axis.

A line is drawn at the midpoint on the scale. This is the midpoint on the Likert-type scale, not necessarily the midpoint on any distribution, but just to show the split between the positive and negative ratings. In order to obtain the color split, we create a color palette range from starting from shades of blue (representing positive values) to shades of red (representing negative values). We make use of the scale_fill_manual funtion in order to color our chart according Response.

We flip the axes so that we have a horizontal bar chart. We also add the title and labels, adjust some font sizes, move the legend to the right so that it fits in one size and the static graph does not cut off, and add grid lines at 25% intervals.

colnames(highs)[3] <- "Percentage"
colnames(lows)[3] <- "Percentage"

colnames(highs)[2] <- "Response"
colnames(lows)[2] <- "Response"

mycolors <- c("#67A9CF","#EF8A62","#DFDFDF","#DFDFDF","#D1E5F0","#FDDBC7","#2166AC","#B2182B")
#("#67A9CF","#FDDBC7","#DFDFDF","#DFDFDF","#EF8A62","#D1E5F0","#2166AC","#B2182B")
LSR_Static <- ggplot() + geom_bar(data=highs, aes(x = Question, y=Percentage, fill=Response), position="stack", stat="identity") +
  geom_bar(data=lows, aes(x = Question, y=-Percentage, fill=Response), position="stack", stat="identity") +
  geom_hline(yintercept = 0, color =c("white")) +
scale_fill_manual(values = mycolors) +
  coord_flip() +
  labs(title=mytitle, caption = "Data Source: © Singapore Management University - Library Survey Data 2018", y="Percentage of Respondents", x="Factors") +
  theme(plot.title = element_text(size=14, hjust=0.5)) +
  theme(axis.text.y = element_text(hjust=1)) +
  theme(legend.position = "right") +
  scale_y_continuous(breaks=seq(mymin,mymax,25), limits=c(mymin,mymax)) + theme(
  plot.background = element_rect(
    fill = "white",
    colour = "black",
    size = 1
  )
)

  LSR_Static

Now, in order to add interactivity to our static chart, we make use of the ggplotly() function as shown below.

Notice that upon hovering our mouse pointer over each rating, we are now able to see the question, its rating/response as well as the percentage of the students who have responded to that question to have given this rating/response which in turn makes it easier for the user to quickly understand from the summarization provided.

ggplotly(LSR_Static, tooltip = c("x","y","Response"))
LSR_Static_border <- LSR_Static + theme(
  plot.background = element_rect(
    fill = "grey90",
    colour = "black",
    size = 1
  )
)

4.0 Final Data Visualization and Insights

LSR_Static

Our final visualization is a diverging stacked bar chart with the bars representing the ratings ranging from ‘Strongly Agree’ to ‘Strongly Disagree’ in different colors. The Y-axis represents the various factors based on which the students had given ratings and the x-axis represents the percentage of the students who have given each rating/response.

4.0.1 Insights/Data Reveal from Divering Stacked Bar Chart

ggplotly(LSR_Static_border, tooltip = c("x","y","Response"))
  1. Most of the students feel that having Wireless access in the library is very crucial for researching and browsing information content as and when needed through their devices as more than 75% of the students strongly agree to the factor - ‘I can get wireless access in the Library when I need to’.

  2. Having a quiet place to study in the library and having print/scan facilties easily accessible are far more valued by students as 70% and 66% of the students Strongly Agree to these factors respectively.

  3. On the other hand, most students feel that conducting workshops/classes/tutorials in the library are not essentially very important and assists them in their learning as well as research needs. Only 17% of the students strongly agree with the factor - ‘Library workshops, classes and tutorials help me with my learning and research needs’.

  4. Students also feel that being informed about the library services is not really essential as most students are able to easily figure these facilties on their own without much help as only 19% strongly agree with it.

4.0.2 Insights/Data Reveal from Barplot representing ‘NA’ Values

From our bar plot, we can note that the top 3 factors which were left without a response by most students are-

NA7 -> Books and articles I have requested from other Libraries are delivered promptly

NA9 -> Online enquiry services (e.g. Email, Library Chat) meet my needs

NA16 -> A computer is available when I need one

This could potentially mean that these are the facilties/services that students rarely make use of so they find it hard to comment on/provide a rating.

5.0 Major Advantages of Incorporating Interactivity in our Visual

  1. Filter Data/Simplicity - Users are provided with the option to choose which part of the visual they would want to focus on as they can hover over the legend and double tap each aspect to add/remove that entire portion from the graph, making it much more simpler to only focus on the filtered data. This could potentially enable the user to find specific patterns or insights related only to that selection which they previously could not see because all the other factors combined together hides certain key points.

  2. Quicker Understanding - Interactive data visualizations help with understanding the charts much faster and thus enables the user to make quicker and informative decisions. The tooltip hover feature makes this possible as any user can quickly hover the visual and immediately get a summary of the what a particular portion of the visual is trying to portray.

  3. Re-arranging Data - Rearranging and re-mapping the layout of displayed information to bring different insights which otherwise might not have been possible through the use of only static plots as the user might have to manipulate the data from the data frame and refer the updated static plot each time which is quite tedious. Interactivity features allow users to further inspect their data by including summary statistics.