In this week’s DataViz Makeover, we are required to demonstrate interactivity in our data visualisation design.Therefore for this assignment we have chosen a library survey dataset.As persons in academia or students currently being educated - we are so intimately connected to the library - as it is the place we consume knowledge and share ideas with each other. Therefore, wouldn’t we hope the library be as good as it can be - giving us creature comfortable while having information accessible at our fingertips? Libraries are quipped to be the keys to the past and gateway to our future, but in a twist, data from Singapore Management University’s 2018 Library Survey will be our keys to unlock insights so that we can pave the way for a better library for us to enjoy in the future.
The core pupose of the visualisation is to do a prelimnary analysis of the survey results and highlight the areas of improvement and concerns. SMU has two library for which the survey has been conducted. Through this data viz assignment we would like to explore how the users of the library feel about the services offerred by the library. There are 26 Questions and the users have to rate the services with respect to the importance and performance criteria.
As the aim of the visualisation is to levarage interactivity features and highlight the findings of the survey, we have deployed the following packages to accomplish the task-
Dumbell Chart- Dumbbell plots are a nice way to visualize relative positions between two points and compare distance between two categories. To get the correct ordering of the dumbbells, the Y variable should be a factor.We have tried to comapre the mean ratings of the Performance and Improvement factors for the 26 questions and understand the areas of improvements.
Heat Map- Heat Map provides an efficient way to quickly identify high points and low points across the organization or various study groups. Heatmaps helps to visualise data through variations in colouring.Heatmaps are good for showing variance across multiple variables, revealing any patterns, displaying whether any variables are similar to each other, and for detecting if any correlations exist in-between them.They are ideal for visualizing comparisons, showing how the many levels of a field compare on items across the organization. It shows how each department scored on the survey questions, compared to others.
The following are the rough sketch of the proposed visuals-
Draft Sketch of the Dumbell Chart-
Sketch of Dumbbel Plot
Draft Sketch of the Heatmap-
Sketch of Heatmap
Finally to show the data using appropriate visuals, we make use of the “GGPLOT” package and inorder to make it interactive we have used “plotly”. The dumbel plot is made more intutive by usingb the annnotate and tool tip functions of the plotly. Similarly the “heatmaply” package builds an interactive heatmap which can be used by the user to uderstand the different ratings given by each study group and get an overall idea of their satisfaction.
In this section, we shall describe the steps performed inorder to generate the proposed visuals. We are required to start a new R project, and to create a new R Markdown document.
This code chunk installs the necessary R packages and loads them into R Studio Enviornment without having to explicitly load them every time.
Importing the Data:The SMU Library Survey 2018 data was obtained courtesy of the SMU library management and staff. Survey responses were obtained from a total of 2639 participants, including staff, faculty and students, capturing their demographics, as well as ratings of the importance of pre-defined factors and indicators, and their assessment of the performance library on them. Free text information in the form of comments were also collected to supplement their assessment of the library through pre-determined matrices and suggest recommendations for improvement of the library.
In the code chunk below, read_csv() of readr is used to import the CSV file into R and parsed it into tibble R data frame format.
#Code for importing the survey data
lib_surv_data<- read_csv("Data/Raw data 2018-03-07 SMU LCS data file - KLG.csv")Analyse the data: Firstly we would like to see the basic survey response distribution for respondents who took part in the survey. Therefore we analyse the dataset to do a prelimnary EDA.
Preparing the data: Now inorder to see the distribution, we are identifying and tagging the respondents into 4 categories namely- “Undergrads”, “Faculty”, “Postgrads”, “Staff & Others”.
#Code for categorising the respondents of the survey
lib_data<-lib_surv_data
lib_data_student_under <- filter(lib_data, Position == '1' | Position == '2'| Position == '3' | Position == '4' | Position == '5')
stud_under <- nrow(lib_data_student_under)
lib_data_student_mast <- filter(lib_data, Position == '6'| Position == '7')
stud_mast <- nrow(lib_data_student_mast)
lib_data_student_fact <- filter(lib_data, Position == '8'| Position == '9'| Position == '10'| Position == '11')
stud_fact <- nrow(lib_data_student_fact)
lib_data_student_staff <- filter(lib_data, Position == '12'| Position == '13'| Position == '14')
stud_staff <- nrow(lib_data_student_staff)
data <- data.frame(
category=c("Undergrads", "Faculty", "Postgrads","Staff & Others"),
count=c(stud_under, stud_fact, stud_mast, stud_staff)
)
dataPlotting the Barplots for the respondents: With the data prepared we now plot the distribution of repondents of the survey using gg plot.
#Code for constructing bar plots
p1<-ggplot(data=data, aes(x=reorder(category, -count), y=count)) +
geom_bar(stat="identity",fill="steelblue",width=0.6,alpha=0.6,colour="skyblue",size=0.3)+ theme_minimal()+ labs(caption = "Data Source:Singapore Management University - Library Survey Data 2018",hjust=0,color = "green", face = "italic")
p1+
xlab("Category of Respondents in the Survey") + ylab("No. of Resposes in the survey") + # Set axis labels
ggtitle("SMU Library Survey 2018: Distribution of Survey Participants") + # Set title
theme(plot.background = element_rect(fill = "grey100",colour = "black",size = 1))Adding Interactivity: We are using ggplotly package to add interactivity for the above visual to make it more intutive.
#Code for interactivity
p2<-ggplotly(p1)
p2<-p2%>%layout(
title = list(text="SMU Library Survey 2018:Distribution of Survey Participants",y = .98),
xaxis = list(title = "Category of Respondents in the Survey"),
yaxis = list(title = "No. of Resposes in the survey"),
margin = list(l = 90)+geom_text(color="blue", size=2, vjust=-1.5)
)
p2From the above analysis it is evident that maximum participants are from the student fraternity, hence for building this data viz we shall consider the students.
Data Wrangling for building the Dumbell Chart:In order to compare the responses of the students for the question around the “Performance” and “Improvement” factors of the library we construct a dumbel chart. To do so we start preparing the data accordingly.
First Step is to filter the data for students only
#Filtering the students from the survey dataset
stu_data<- filter(lib_surv_data, Position == '1' | Position == '2'| Position == '3'| Position == '4'| Position == '5'| Position == '6'| Position == '7')Now we filter the data with Improvement responses only
#Preparing the dataframe with improvement responses
stu_data_I<- stu_data%>%
select(ResponseID,starts_with("I"),(-ID))Now we filter the data with Performances responses only
#Preparing the dataframe with Performances responses
stu_data_P<- stu_data%>%
select(ResponseID,starts_with("P"),(-Position))Using the below chunk of codes we are calculating the Mean responses for Performance and Improvement related Questions.
#Calculating the mean responses for Performance Related Question
survey_p<- stu_data_P %>%
pivot_longer(-ResponseID, names_to = "measure", values_to = "response")
survey_pn<-na.omit(survey_p)
dfp<- aggregate(survey_pn$response, by=list(survey_pn$measure), FUN=mean)
names(dfp)<-c("Ques","Mean_P")
c<-format(round(dfp$Mean_P, 2), nsmall = 2)
dfp$ID <- seq.int(nrow(dfp))
dfp$Mean_P<- as.numeric(c)
dfp#Calculating the mean responses for Improvement Related Question
survey_i<- stu_data_I %>%
pivot_longer(-ResponseID, names_to = "measure", values_to = "response")
survey_pn<-na.omit(survey_i)
dfi<- aggregate(survey_pn$response, by=list(survey_pn$measure), FUN=mean)
names(dfi)<-c("Ques","Mean_I")
d<-format(round(dfi$Mean_I, 2), nsmall = 2)
dfi$ID <- seq.int(nrow(dfi))
dfi$Mean_I<-as.numeric(d)
dfi#Code for merging the Performance and Improvement mean ratings
total <- merge(dfi,dfp,by="ID")
stu_data1<- total%>%
select(ID,Mean_I,Mean_P)
stu_data1<-mutate(stu_data1, diff = format(round((Mean_I-Mean_P), 2), nsmall = 2))
stu_data1$diff<-as.numeric(stu_data1$diff)#Ploting the Dumbbell Plot
g1<-ggplot(stu_data1) +
aes(x=Mean_P, xend=Mean_I, y=reorder(ID, diff),
group=ID) +
geom_dumbbell(color="grey72", size = 0.9,
size_x=3.5,
size_xend = 3.5,
colour_x = "Violet",
colour_xend = "blue")+ theme_minimal()
gg11<-g1+
geom_text(color="blue", size=2, hjust=-1.5,
aes(x=Mean_I, label=Mean_I))+
geom_text(aes(x=Mean_P, label=Mean_P),
color="Violet", size=2, hjust=1.5)+labs(caption = "Data Source:Singapore Management University - Library Survey Data 2018",hjust=0,color = "green", face = "italic")
gg2<-gg11 +
# Add white rectangle to set the area where the values of the differences will
# be
geom_rect(
mapping = aes(xmin = 7, xmax = 7.13 , ymin = -Inf, ymax = Inf),
fill = "white",
color = "white"
) +
# Add rectangle with correct banground color for the differences
geom_rect(
mapping = aes(xmin = 7, xmax = 7.13 , ymin = -Inf, ymax = Inf),
fill = "#eef0e2",
color = "#eef0e2"
) +
geom_text(aes(y = ID, label = diff),
x = 7.125, hjust = 1) +
annotate(x = 7.125, y = "14", label = "",
geom = "text", vjust = -2,color="blue",
fontface = "bold",
hjust = 1) +
geom_text(
# Bold face
fontface = "bold",
# Font size
size = 3,
# Colour
colour = "Blue",
# Set text a little above the dots
nudge_y = 0.6,
# Position
mapping =
aes(
x = 7.09,
y = "14",
label = "",
)
)
p<- gg2 +
# Plot Title and Axis Labels
labs(
title = "Library Survey Data",
subtitle = paste0(
"Comparision of Mean Performance vs Mean Improvement Score \n",
"(Students)"
),
x = "Mean Rating",
y = "Questions"
) +
# Change background, General font size, and other things
theme(
# Change font color and text for all text outside geom_text
text = element_text(color = "#4e4d47", size = 9),
# Country names in bold face
axis.text.y = element_text(face = "bold"),
# Add space between x axis text and plot
axis.text.x = element_text(vjust = -0.9),
# Do not show tick marks
axis.ticks = element_blank(),
# Delete original legend (keep only the one we created)
#legend.position = "none",
# White background
panel.background = element_blank(),
# Country (y Axis) Lines
panel.grid.major.y = element_line(colour = "grey96", size = 0.6),
# Change Title Font
plot.title = element_text(face = "bold", size = 16),
# Change Subtitle Font and add some margin
plot.subtitle = element_text(face = "italic", size = 12,
margin = margin(b = 0.5, unit = "cm"))
)
dumbell<-p+theme(plot.background = element_rect(fill = "grey100",colour = "black",size = 1))
dumbellLegends for above plot: Please note that there are 26 questions in the survey which has been coded to 1-26 in the above graph. The following is the legend for the questions. The scale for the ratings is 1-7.
Legend for the Questions
Adding Interactivity: We are using plotly package to add interactivity for the above visual to make it more intutive.
#Code for making the dumbbel chart interactive
stu_data1$ID <- factor(stu_data1$ID, levels = stu_data1$ID[order(stu_data1$diff)])
fig <- plot_ly(stu_data1, color = I("grey72"))
fig <- fig %>% add_segments(x = ~Mean_P, xend = ~Mean_I, y = ~ID, yend = ~ID,showlegend = FALSE)
fig <- fig %>% add_markers(x = ~Mean_P, y = ~ID, name = "Mean Performance", color = I("orchid2"))
fig <- fig %>% add_markers(x = ~Mean_I, y = ~ID, name = "Mean Improvement", color = I("blue"))
fig <- fig %>% layout(
title = "Library Survey \n (Performance vs Improvement)",
xaxis = list(title = "Ratings"),
yaxis = list(title = "Questions"),
margin = list(l = 60)
)
figAs described earlier the Heat Map provides an efficient way to quickly identify high points and low points across the organization for various study groups. In this survey we have the 7 Study Areas which are listed as Accountancy, Business, Economics, Information Studies,Law,Social Sciences and others. With help of heatmap we are trying to visualise the mean respones from each of the study areas for the 26 questions.
Data Wrangling for building the Heatmap:
#Filtering the Student responses for building the heatmap
stu_data_h<- filter(lib_surv_data, Position == '1' | Position == '2'| Position == '3'| Position == '4'| Position == '5'| Position == '6'| Position == '7')
stu_data_hm<- stu_data_h%>%
select(StudyArea,starts_with("I"),(-ID))
stu_data_hm#Code for ploting the interactive heatmap
d<-na.omit(stu_data_hm)
df3<-aggregate(d[, 2:27], list(d$StudyArea), mean)
d1<-df3
df4<-select(d1,-d1$Group.1)
wh_matrix <- data.matrix((df4))
hm<-heatmaply(t(wh_matrix),
Rowv=NA, Colv=NA,
grid_color = "black",
grid_lw=0.3,
branches_lwd = 0.6,
grid_size = 1,
xlab = "Student Study Area",
ylab = "Questions",
main = "Library Survey Data:Improvement Ratings by Student")
hm## Warning: 'heatmap' objects don't have these attributes: 'showlegend'
## Valid attributes include:
## 'type', 'visible', 'opacity', 'name', 'uid', 'ids', 'customdata', 'meta', 'hoverinfo', 'hoverlabel', 'stream', 'transforms', 'uirevision', 'z', 'x', 'x0', 'dx', 'y', 'y0', 'dy', 'text', 'hovertext', 'transpose', 'xtype', 'ytype', 'zsmooth', 'connectgaps', 'xgap', 'ygap', 'zhoverformat', 'hovertemplate', 'zauto', 'zmin', 'zmax', 'zmid', 'colorscale', 'autocolorscale', 'reversescale', 'showscale', 'colorbar', 'coloraxis', 'xcalendar', 'ycalendar', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'metasrc', 'hoverinfosrc', 'zsrc', 'xsrc', 'ysrc', 'textsrc', 'hovertextsrc', 'hovertemplatesrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
Legend of the Survey: Please note that there are 7 Study Area considered in the survey. The legend is as follows-
Legend for the Study Area
The following information can be derived from the visuals which are made-
Inferences from the Dumbbell Plot
From the above dumbbell plot we can see that the Questions 14,15 and 19 have maximum difference in the mean ratings of the performance and improvement factors. This is helps to higlighting that these questions needs more focus inorder to increase the satisfaction of the users.
Another inference which can be derived from the above plot is that the questions 16 and 4 are rated comparitevely low in both Performance and Improvement criteria and hence needs special focus.
Interacive Dumbbell PLot
The Interactivity feature of the dumbbel plots helps to analyse quickly as we hover over the data points and makes the visual very neat and tidy.Inferences from the Heatmap
## Warning: 'heatmap' objects don't have these attributes: 'showlegend'
## Valid attributes include:
## 'type', 'visible', 'opacity', 'name', 'uid', 'ids', 'customdata', 'meta', 'hoverinfo', 'hoverlabel', 'stream', 'transforms', 'uirevision', 'z', 'x', 'x0', 'dx', 'y', 'y0', 'dy', 'text', 'hovertext', 'transpose', 'xtype', 'ytype', 'zsmooth', 'connectgaps', 'xgap', 'ygap', 'zhoverformat', 'hovertemplate', 'zauto', 'zmin', 'zmax', 'zmid', 'colorscale', 'autocolorscale', 'reversescale', 'showscale', 'colorbar', 'coloraxis', 'xcalendar', 'ycalendar', 'xaxis', 'yaxis', 'idssrc', 'customdatasrc', 'metasrc', 'hoverinfosrc', 'zsrc', 'xsrc', 'ysrc', 'textsrc', 'hovertextsrc', 'hovertemplatesrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
From the above heat map we can see that Study Area Group 4 is fairly dissatisfied compared to other groups as the mean ratings are lower than the other study group members.
The other inference we can see that Group 7 has given better ratings for most of the questions and is fairly more satisfied.
For the group 5 and 6 , we can see that the ratings are significantly less for Question 16 and hence can be considered as a focus area of Improvement.
Flexibility to User: Interactivity provides felexibility to user and make the visuals more dynamic and easy to understand. With options such as pan and zoom, users can get a clear picture of the visual and can decipher some interesting findings. For example in the heatmap, we can easily hover and see the values of each cells and thus provides a clear perception of the visual.
Makes visual more intuitive: With the help of interactivity and animations, we can make visuals more intuitive and tell effective stories. The Visuals becomes easy to understand and helps to provide a bigger picture may not be revealed through static visuals.
Focus on details with ease:Interactivity allows users to zoom into a visualization – physically selecting an area of interest and blowing up that area of the chart. Hover information gives information as we scroll over the visuals which makes it very easy for the users to understand. Animations specially in timeseries helps to visualise changes very easily and provides clear information. Thus interactivity and animations play a very pivotal role in analysing the visuals.