Original graph: https://eduhseru.sharepoint.com/sites/AdvancedDataAnalysis/Shared%20Documents/General/Viz%20Quiz%203/viz_2waves.png
The first thing, that is not nice about this two graphs (actually, it combines two points in one) - if it was supposed to be two graphs for comparison - it is nearly to impossible compare it correctly. First - different scales. Not really different, x-axis on both graphs show relations autonomy index. However, it is highly difficult to compare two scales when one shows the section between -20 and 5 and another shows the section between -20 and 60.
So the first thing I would do is placing two graphs at the same section of the scale. And better to put 0 in the center and then leave similar tails from both positive and negative sides from the ziro. For example from -20 to 20.
Then it is worth asking the question - difference in what do you want to show with these graphs? Answer to this question will give us a clue how to place the graphs: one under another (like we have) to show how relations autonomy index changes; one next to another to show how share of students with the same index increases? Or we can place both graphs on one plot, change the filling colors to two different ones and make it more transparent. That would show not only results on the same scales and it would be easier to see horizontal and vertical diferences, but it also shows well the dynamics of changes - how the whole picture has changed in the second wave.
So, in general, I would agree with the type of graph - histogram, but I would change the way it is presented to show the comparison in a more clear way.
library(dplyr)
wave <- c("first wave","first wave","first wave","first wave","first wave","first wave","first wave","first wave","first wave","first wave","first wave","first wave","first wave","first wave","first wave","second wave","second wave","second wave","second wave","second wave","second wave","second wave","second wave","second wave","second wave","second wave") %>% as.factor
stud_share <- c("0", "1", "2", "2", "4", "7", "10", "12", "16", "18", "14", "7", "4", "2", "1", "1", "0.5", "0.5", "1", "5", "30", "4", "0.5", "0.25", "0.25", "0.25") %>% as.numeric()
index <- c("-6.8", "-5.95", "-5.1", "-4.25","-3.4", "-2.55","-1.7", "-0.85", "0.85", "1.7", "2.55", "3.4", "4.25", "5.1", "5.95", "-13", "-10", "-7", "-5", "-3", "0", "3", "5", "7", "10", "17") %>% as.numeric()
data <- data.frame(wave, stud_share, index)
library(knitr)
#kable(data)
str(data)
## 'data.frame': 26 obs. of 3 variables:
## $ wave : Factor w/ 2 levels "first wave","second wave": 1 1 1 1 1 1 1 1 1 1 ...
## $ stud_share: num 0 1 2 2 4 7 10 12 16 18 ...
## $ index : num -6.8 -5.95 -5.1 -4.25 -3.4 -2.55 -1.7 -0.85 0.85 1.7 ...
summary(data)
## wave stud_share index
## first wave :15 Min. : 0.000 Min. :-13.0000
## second wave:11 1st Qu.: 0.625 1st Qu.: -4.8125
## Median : 2.000 Median : -0.4250
## Mean : 5.510 Mean : -0.1077
## 3rd Qu.: 7.000 3rd Qu.: 4.0375
## Max. :30.000 Max. : 17.0000
Well, while exploring various ways of representing this data I’ve came with the following design. It is not a histogram and maybe the whole picture of change does not seen as clear as it could be, however, we can look at the boxplots at the borders of the plot - they show differences in distribution of our results. Looking at this graph we can say that at the second wave of the research students’ autonomy index has increased and has become wider distributed (boxplots at the top), while at the first wave of the research it was more concentrated and centered and students’ shares with the same indices were bigger (boxplots at the right).
library(ggpubr)
ggscatterhist(
data, x = "index", y = "stud_share",
color = "wave", size = 3, alpha = 0.6,
palette = c("#00AFBB", "#E7B800"),
margin.plot = "boxplot",
ggtheme = theme_bw(),
title = "Distribution of the Index of Academic Motivation\nreflecting the Degree of Relative Automony in\ntwo rounds of the longitudinal study",
xlab = "Relations autonomy index",
ylab = "Share of students",
legend.title = "Rounds:"
)