Core purpose of data visualisation is to create a comparison of the score for importance of items of library survey, and how library users perceived the library has performed based on the different items.
The items which are assessed are labelled from 1 to 26 (P1-P26 and I1-I26), with question P27 being one to assess users’ overall satisfaction with the library.
Both importance and performance have been captured for each item in Likert scales, from 1 (lowest) to 7 (highest). This allows for each individual question to be plotted on a scatterplot.
The premise is that with limited resources, the library survey should guide direction as to where to put future resources on, on where to hold back on where it is already performing well. Therefore, doing too well on items which are not important isn’t resource saving strategy.
There were other options to consider for creation of this visualisation, namely the bar-in-bar chart, and the dumbell plot.
I will try 2 different methods: scatter plot & the dumbell plot after considering the following:
Our Likert Scale chart created by fellow team member already will separate ites each displayed by a diverging stacked bar chart. Both bar-in-bar as well as dot plot with line will follow the same display format of data, the scatter plot could break the monotomy of the full project.
I feel the scatterplot also able to visualise the big picture better see the overall rating of the importance of items by the users, comparing that with the overall performance of the library in a scatter plot. Depending on which quadrant which the majority of the data belongs to, we can infer meaning to what users of the library feel is the performance of the library, and also see whether the library needs to change the focus of the survey or service improvement (e.g. if users feel all items brought up are of low importance, probably need to change the vision/strategy of the library in delivering on other indicators.)
The dumbell plot may be a neater way of presentation and allow for the library management going through the results to see the gap between the performance and improvement score for each individual item.
For the project, we have to do the following:
A constraint which I face in this particular data set is the phrase descriptions for items are quite lengthy. To reduce clutter in the data visualisation, I would use interactivity by tooltip display to contain the item explanation in more detail: e.g. Item 04 states: “Library workshops, classes and tutorials help me with my learning and research needs.”
Since there are only 1 years’ data, and the 27 items are separate from each other, I think this will not be not necessary for any animation to show any time series change, neither is there any benefit for use of it for transformation of the data. Therefore I will pursue interactivity for my assignment.
There is direct and indirect manipulation via highlighting which is present with use of plotly and crosstalk package which can allow to get information easier. Various types of highlighting, box selection, lasso selection, can be used.
There must be enough and broad functionality to invite the user to want to interact with the visualisation using these above methods. This is for human computing interaction and data analysis to give the analyst more insights. There is zoom function and ability of adjustment of axes the analyst/user can consider to use as well.
Using Taxonomy of Interactive Dynamics for Visual Analysis (the core reading article from lesson 4):
For data and view specification: Filter out data by survey respondent type, and which library location survey is done for Sort of items to expose patterns in visualisation - to be shown section 3.12 below
View manipulation: Selection is allowed, with co-ordination of views, and data views is organised side by side. More discussed later on in the later section on interactivity, and adding interactivity.
Tidyverse, Plotly and Crosstalk packages were used.
packages <- c('tidyverse', 'plotly', 'crosstalk')
for (p in packages)
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
Original data was used for the file, with read_csv function to retain the labels of variables and get data as tibble data frame.
lib<- read_csv("data/Raw data 2018-03-07 SMU LCS data file - KLG.csv")
Position variable of 1-7 (denoting students) was chosen as subset for this DataViz assignment - similar to Karthik’s selection of student groups after our initial discussion. Eventually we can code this into the R Shiny application to make selections of type of user (student/faculties/others) with a dropdown box.
Selection of Improvement Questions (I01. I02…) and Performance Questions (P01,P02) which are matching from Q1 to Q26, using selection criteria with exclusion of P27 because there is no equivalent in the ‘Importance’ section, and other variables such as ‘ID’ and ‘Position’ were removed based on the selection criteria which I had written in R code chunk below.
lib_data<- filter(lib, Position == '1' | Position == '2'| Position == '3'| Position == '4'| Position == '5'| Position == '6'| Position == '7')
lib_data
## # A tibble: 2,403 x 89
## ResponseID Campus Position StudyArea ID I01 I02 I03 I04 I05
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 599 1 6 3 2 NA NA NA NA NA
## 2 570 1 1 2 2 6 5 4 NA NA
## 3 264 1 6 4 2 NA NA NA NA NA
## 4 686 1 6 2 2 NA NA NA NA NA
## 5 820 2 5 2 2 NA NA NA NA NA
## 6 1232 1 4 1 2 NA NA NA NA NA
## 7 1514 1 6 2 1 NA NA NA NA NA
## 8 3179 1 6 2 1 NA NA NA NA NA
## 9 2571 1 3 4 1 7 NA 6 NA 7
## 10 541 1 1 1 2 6 6 4 NA NA
## # ... with 2,393 more rows, and 79 more variables: I06 <dbl>, I07 <dbl>,
## # I08 <dbl>, I09 <dbl>, I10 <dbl>, I11 <dbl>, I12 <dbl>, I13 <dbl>,
## # I14 <dbl>, I15 <dbl>, I16 <dbl>, I17 <dbl>, I18 <dbl>, I19 <dbl>,
## # I20 <dbl>, I21 <dbl>, I22 <dbl>, I23 <dbl>, I24 <dbl>, I25 <dbl>,
## # I26 <dbl>, P01 <dbl>, P02 <dbl>, P03 <dbl>, P04 <dbl>, P05 <dbl>,
## # P06 <dbl>, P07 <dbl>, P08 <dbl>, P09 <dbl>, P10 <dbl>, P11 <dbl>,
## # P12 <dbl>, P13 <dbl>, P14 <dbl>, P15 <dbl>, P16 <dbl>, P17 <dbl>,
## # P18 <dbl>, P19 <dbl>, P20 <dbl>, P21 <dbl>, P22 <dbl>, P23 <dbl>,
## # P24 <dbl>, P25 <dbl>, P26 <dbl>, P27 <dbl>, Comment1 <chr>,
## # HowOftenL <dbl>, HowOftenC <dbl>, HowOftenW <dbl>, NA1 <dbl>, NA2 <dbl>,
## # NA3 <dbl>, NA4 <dbl>, NA5 <dbl>, NA6 <dbl>, NA7 <dbl>, NA8 <dbl>,
## # NA9 <dbl>, NA10 <dbl>, NA11 <dbl>, NA12 <dbl>, NA13 <dbl>, NA14 <dbl>,
## # NA15 <dbl>, NA16 <dbl>, NA17 <dbl>, NA18 <dbl>, NA19 <dbl>, NA20 <dbl>,
## # NA21 <dbl>, NA22 <dbl>, NA23 <dbl>, NA24 <dbl>, NA25 <dbl>, NA26 <dbl>,
## # NPS1 <dbl>
lib_data_IMP <- dplyr::select(lib_data,starts_with("I"),-matches("ID"))
lib_data_PERF <- dplyr::select(lib_data,starts_with("P"), -matches("Position"), -ends_with("27"))
Use fo colMeans to calculate the mean performance and improvement scores were done, with names of the scores given appropriately.
Score_IMP <- data.frame(colMeans(lib_data_IMP, na.rm=TRUE))
Score_PERF <- data.frame(colMeans(lib_data_PERF, na.rm=TRUE))
names(Score_IMP) <- c("Mean Importance Score")
names(Score_PERF) <- c("Mean Performance Score")
Score_PERF
## Mean Performance Score
## P01 5.248314
## P02 5.514456
## P03 5.579612
## P04 5.311701
## P05 5.356158
## P06 5.721429
## P07 5.524204
## P08 5.806727
## P09 5.584938
## P10 5.827075
## P11 5.530602
## P12 5.877532
## P13 5.960864
## P14 4.961489
## P15 4.788420
## P16 5.276362
## P17 5.845532
## P18 6.366002
## P19 5.639011
## P20 5.670601
## P21 5.862878
## P22 5.703525
## P23 5.837391
## P24 5.769265
## P25 5.679481
## P26 5.128130
Score_IMP
## Mean Importance Score
## I01 5.241336
## I02 5.763001
## I03 5.664696
## I04 5.043685
## I05 5.405971
## I06 6.224421
## I07 5.539297
## I08 5.820332
## I09 5.508234
## I10 5.768202
## I11 5.914493
## I12 5.983003
## I13 6.069432
## I14 6.506707
## I15 6.141924
## I16 5.006373
## I17 6.295914
## I18 6.622290
## I19 6.465137
## I20 5.660944
## I21 6.193054
## I22 6.002897
## I23 6.146991
## I24 6.157729
## I25 5.922186
## I26 5.766814
It was noted that the original survey results report by the vendor administering and analyzing 2018 survey results, it contained 4 categories classifying the 26 questions, but this was missing from the original survey data.
The 4 categories are: * Communication Service delivery Facilities and equipment * Information resources
CSV file was loaded with question number, item description, and the service category obtained. Item description being the full text of the questions being used.
Screenshot of created item categories data
ItemCategories <- read_csv("data/ItemCategories.csv")
bind_cols was used to combine the data created above to create the mean scores for importance and performance for each item, and later then we created new variable Ratio score, the ratio of improvement (numerator) over performance (denominator).
Another variable ‘Classify’ was formed, with ratio more than 1 denoting that the importance score is higher, higher and if ratio less than 1 that the performance score is higher.
This was done with ifelse function in R.
I then created the Item variable as categorical variable using as.factor, otherwise plotly would interpret it as a quantititive continuous variable and the visualisation will be different.
Results<- bind_cols(ItemCategories,Score_IMP,Score_PERF)
Results$RatioScore <- Results$`Mean Performance Score`/Results$`Mean Importance Score`
Results$Classify <- ifelse(Results$RatioScore >=1, "Higher Performance", "Lower Performance")
Results$Item <- as.factor(Results$Item)
Results$Gap <- Results$`Mean Performance Score` - Results$`Mean Importance Score`
This has been a fun exercise where there was research on the axes labels, title, hovertext, hoverinfo function etc.
For first iteration, the aesthetics which I have chosen are for scatterplot are that of mean performance score against mean importance score, with both colour and symbol (pointing upwards or downwards) to denote whether the performance score for that item was higher or lower than the importance score. This is because I think that the performance score outperforming importance is a good thing, whereas if the performance is perceived as not as high as it is important, the library should improved in its service.I added the item description as text which appears over the hovertext, which overwrites the standard hovertext. This is the result.
fig <- plot_ly(share_results, y=~Results$'Mean Importance Score', x=~Results$'Mean Performance Score', type="scatter", symbol=~Results$Classify, symbols=c('triangle-up','triangle-down'), color=~Results$Classify, colors= setNames(c("blue","red"), c("Higher Performance","Lower Performance")), text=~paste(Results$Description,"<br>Item Number:", Results$Item, "<br>", Results$`Service Category`), size=I(30), hovertemplate = ~paste(
"<b>%{text}</b><br><br>",
"%{yaxis.title.text}: %{y}<br>",
"%{xaxis.title.text}: %{x}<br>",
"<extra></extra>"))
fig <- fig %>% layout(yaxis = list(title="Mean Importance Score", titlefont=list(family = "Arial, monospace",
size = 16)), xaxis = list(title="Mean Performance Score", titlefont=list(family = "Arial, monospace",
size = 16)))
fig
However I realised that there is redundancy using both colour and symbol aesthetic to denote the same meaning. I have decided to therefore put the colour aesthetic onto service category for ease of the end user (SMU Library Management) to see how they performed, but to me it looks messy. I also tweaked the hover information slightly compared to the previous figure, and added a trace
fig1 <- plot_ly(share_results, y=~Results$'Mean Importance Score', x=~Results$'Mean Performance Score', type="scatter", symbol=~Results$Classify, symbols=c('triangle-up','triangle-down'), color=~Results$`Service Category`, text=~paste(Results$Description,"<br>Item Number:", Results$Item, "<br>", Results$`Service Category`), size=I(30), hovertemplate = ~paste(
"<b>%{text}</b><br><br>",
"%{yaxis.title.text}: %{y}<br>",
"%{xaxis.title.text}: %{x}<br>",
"<extra></extra>"))
fig1 <- fig1 %>% layout(yaxis = list(title="Mean Importance Score", titlefont=list(family = "Arial, monospace",
size = 16)), xaxis = list(title="Mean Performance Score", titlefont=list(family = "Arial, monospace",
size = 16)))
fig1
Now on to do dumbell plots. For creation of this visualisation:
Results$Item <- factor(Results$Item, levels = Results$Item[order(Results$`Mean Importance Score`)])
fig2 <-share_results %>%
plot_ly() %>%
add_segments(
x = ~Results$`Mean Importance Score`, y = ~Item,
xend = ~Results$`Mean Performance Score`, yend = ~Item,
color = I("gray"), showlegend = FALSE, size=1, text=~Results$Description, hovertemplate= paste(
"<b>%{text}</b><br><br>",
"%{yaxis.title.text}: %{y}<br>",
"%{xaxis.title.text}: %{x}<br>",
"<extra></extra>")
) %>%
add_markers(
x = ~Results$`Mean Importance Score`, y = ~Item,
color = I("blue"),
name = "Importance", size = 3, text=~Results$Description, hovertemplate= paste(
"<b>%{text}</b><br><br>",
"%{yaxis.title.text}: %{y}<br>",
"Mean Importance Score: %{x}<br>",
"<extra></extra>")
) %>%
add_markers(
x = ~Results$`Mean Performance Score`, y = ~Item,
color = I("red"),
name = "Performance", size = 3, text=~Results$Description, hovertemplate= paste(
"<b>%{text}</b><br><br>",
"%{yaxis.title.text}: %{y}<br>",
"Mean Performance Score: %{x}<br>",
"<extra></extra>")
) %>%
layout(title = list( titlefont=list(family="Arial, monospace"), text="<b>Comparison of Importance vs Performance across Items in Library Survey 2018</b>"), legend=list(orientation="h",title=list(text="<b> Scoring Type </b>")), yaxis = list(title="Item Number", titlefont=list(family = "Arial, monospace",
size = 16)), xaxis = list(title="Mean Scores", titlefont=list(family = "Arial, monospace",
size = 16)))
Results$Item <- factor(Results$Item, levels = Results$Item[order(Results$`Mean Importance Score`)])
share_results %>%
plot_ly() %>%
add_segments(
x = ~Results$`Mean Importance Score`, y = ~Item,
xend = ~Results$`Mean Performance Score`, yend = ~Item,
color = I("gray"), showlegend = FALSE, text=~Results$Description, hovertemplate= paste(
"<b>%{text}</b><br><br>",
"%{yaxis.title.text}: %{y}<br>",
"%{xaxis.title.text}: %{x}<br>",
"Performance-Importance Gap: <br>",
"<extra></extra>")
) %>%
add_markers(
x = ~Results$`Mean Importance Score`, y = ~Item, size= 3, color=I("blue"), marker = list(symbol = 'diamond', alpha=0.7),
name = "Importance", text=~Results$Description, hovertemplate= paste(
"<b>%{text}</b><br><br>",
"%{yaxis.title.text}: %{y}<br>",
"Mean Importance Score: %{x}<br>",
"Performance-Importance Gap: <br>",
"<extra></extra>")
) %>%
add_markers(
x = ~Results$`Mean Performance Score`, y = ~Item, size=3, color=I("red") , marker = list(symbol='circle', alpha=0.7),
name = "Performance", text=~Results$Description, hovertemplate= paste(
"<b>%{text}</b><br><br>",
"%{yaxis.title.text}: %{y}<br>",
"Mean Performance Score: %{x}<br>",
"Performance-Importance Gap: <br>",
"<extra></extra>")
) %>%
layout(title = list( titlefont=list(family="Arial, monospace"), text="<b>Comparison of Importance vs Performance across Items in Library Survey 2018</b>"), legend=list(orientation="h",title=list(text="<b> Scoring Type </b>")), yaxis = list(title="Item Number", titlefont=list(family = "Arial, monospace",
size = 16)), xaxis = list(title="Mean Scores", titlefont=list(family = "Arial, monospace",
size = 16)))
The following amendments were made for this following plot:
Results$Item <- factor(Results$Item, levels = Results$Item[order(Results$`Mean Importance Score`)])
Results %>%
plot_ly() %>%
add_segments(
x = ~Results$`Mean Importance Score`, y = ~Item,
xend = ~Results$`Mean Performance Score`, yend = ~Item,
color = I("gray"), showlegend = FALSE, text=~Results$Description, hovertemplate= paste(
"<b>%{text}</b><br><br>",
"%{yaxis.title.text}: %{y}<br>",
"%{xaxis.title.text}: %{x}<br>",
"Performance-Importance Gap:", round(Results$Gap,2),
"<extra></extra>")
) %>%
add_markers(
x = ~Results$`Mean Importance Score`, y = ~Item, size=3, color=I("blue"), marker = list(symbol = 'diamond', color='blue', alpha=0.7),
name = "Importance", text=~Results$Description, hovertemplate= paste(
"<b>%{text}</b><br><br>",
"%{yaxis.title.text}: %{y}<br>",
"Mean Importance Score: %{x}<br>",
"Performance-Importance Gap:", round(Results$Gap,2),
"<extra></extra>")) %>%
add_markers(
x = ~Results$`Mean Performance Score`, y = ~Item, size=3, symbol=~Results$Classify, text=~Results$Description, hovertemplate= ~paste(
"<b>%{text}</b><br><br>",
"%{yaxis.title.text}: %{y}<br>",
"Mean Performance Score: %{x}<br>",
"Performance-Importance Gap:", round(Results$Gap,2),
"<extra></extra>")
) %>%
layout(title = list( titlefont=list(family="Arial, monospace"), text="<b>Comparison of Importance vs Performance across Items in Library Survey 2018</b>"), legend=list(orientation="h",title=list(text="<b> Scoring Type </b>")), yaxis = list(title="Item Number", titlefont=list(family = "Arial, monospace",
size = 16)), xaxis = list(title="Mean Scores", titlefont=list(family = "Arial, monospace",
size = 16)))
For the following visualisation, made the following changes:
Results$Item <- factor(Results$Item, levels = Results$Item[order(Results$`Mean Importance Score`)])
fig3 <- share_results %>%
plot_ly() %>%
add_segments( x = ~Results$`Mean Importance Score`, y = ~Item,
xend = ~Results$`Mean Performance Score`, yend = ~Item,
color = I("gray"), showlegend = FALSE, hoverinfo = "none") %>%
add_markers(
x = ~Results$`Mean Importance Score`, y = ~Item, size=3, color=I("blue4"), marker = list(symbol = 'diamond', alpha=0.7), text=~paste(Results$Description,"<br>", Results$`Service Category`,"<br>Mean Performance Score:", round(Results$`Mean Performance Score`,2), "<br>Mean Importance Score:", round(Results$`Mean Importance Score`,2), "<br>Performance - Improvement Gap:", round(Results$Gap,2)), hoverinfo="text",
name = "Importance") %>%
add_markers(
x = ~Results$`Mean Performance Score`, y = ~Item, size=3, symbol=~Results$Classify, text=~paste(Results$Description, "<br>", Results$`Service Category`, "<br>Mean Performance Score:", round(Results$`Mean Performance Score`,2), "<br>Mean Importance Score:", round(Results$`Mean Importance Score`,2), "<br>Performance - Improvement Gap:", round(Results$Gap,2)), hoverinfo="text") %>%
layout(title = list( titlefont=list(family="Arial, monospace"), text="<b>Comparison of Importance vs Performance across Items in Library Survey 2018</b>"), legend=list(orientation="h",title=list(text="<b> Scoring Type </b>")), yaxis = list(title="Item Number", titlefont=list(family = "Arial, monospace",
size = 16)), xaxis = list(title="Mean Scores", titlefont=list(family = "Arial, monospace",
size = 16)))
fig3
Use of crosstalk package by adding it as a package to be used - see above R code, where I added the following:
qq<- subplot(fig,fig3, titleX=TRUE,titleY=TRUE, margin=0.1) %>%
hide_legend() %>%
highlight(on="plotly_selected", off="plotly_deselect")
## No scatter mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
qq
***
I had to do severe and long period of troubleshooting where there was lag for the above visualisation, and I realised on multiple refreshes of the code that the ‘&’ symbol was not displayed properly on the html output, and also that when I clicked on the service catgory ‘Facilities & Equipment’ there was problems getting it to flash out the display on the visualisation. I realised that the problem was likely with the inability of the system to read ‘symbols’ due to ASCII character formats properly and after I changed the ‘Facilities & Equipment’ to ‘Facilities and Equipment’, the problem was solved and the code below was able to work seamlessly. Finally the final product is done!
For the below plot, I used the plot above in order to get and fit in one column by putting the qq plot and the filter checkbox as a list together, so they stack above one another. This is to achieve stacked criteria for easier viewing of the visualisation. This uses crosstalk package.
bscols(list(qq, filter_checkbox(id='Service Category',label="Category", sharedData=share_results, group=~Results$`Service Category`)))
Using the visualisations above, we can see the following:
The parts which the library does well is in terms of service delivery - workshops, research courses which people are happy with, face to face and online enquiry answering are also well performing.
Filtering used to find lowest mean importance score - are those related to the availability of computer, library workshops/classes, and being informed about library services. Understandable on computer availability because mostly every student would have their own IT resources, and there is no need for improvement of the Bloomberg computer. Borrowing stations are also not blocked, because of availability of resources for reading/borrowing through online database as well.
Similarly, filtering for those of highest impportance, are those of availability of quiet place, of place for group work, and availability of WiFi.
As an insight, I personally think this presents issues for strategic direction for the library, because on one hand, students seem to have a lack of group work physical locations, but at the same time, students are requesting for more use of online resources. There seems to need to have work on both the physical location and the online services. The library is already doing well on service delivery and therefore can continue with the current levels of servicing of enquires, and workshops and lessons.
The benefits of interactivity visualisation are:
User experience has been great, it allows me the user to discover information for myself and see association instead of if all the information were provided to me. User interest is heightened. Ability to highlight and select using both the lasso and box selection functions which is provided by plotly and also the ability for indirect manipulation using the filter checkbox which allows the user to select a single category or both categories keeps user engaged. Double-clicking on anywhere within the visualisation allows for the data to be reset and for all data points to be selected. Plotly visualisation contains the user interactivity which allows for resetting of the axis, for zooming into the see more granularity, and I like that it is neatly arranged and packaged.
The other benefit is in multiple co-ordinated views across different data visualisation panels. This allows for human computing interaction and navigation as the modification done by the user allows the user to think different and see things differently and inspire the user to see new ideas (think-see-modify triangle). This is not possible with static visualisation at all, because you either see granular data or aggregate data, and not make smaller comparison by your own interraction, unless you have more short term memory for processing minute calculations in your mind from a static visualisation, which is not that of the average human being.As plotly also allow for ‘Toggle spike lines’ to compare data with other data points on axes, and also to ‘Compare data on hover’, which are just tools used with a mouse to get more information.
Lastly, interactivity through tooltips helps us to contain more information within the same visual space, disappearing when it is not necessary to be seen. The item descriptions are long, but they are hidden and only seen when the person scrolls nearby it or over it. With the filter function, if the user want to see only for specific service category items, like facilities and equipment, they do not need to see the rest which are not necessary.