In this document, I’m going to introduce our team, list the features we would love to contribute to JASP, and also give updates on how these elements are coming. I think we all share similar visions: we want to increase the quality of data interpretation among researchers. Specifically, my team hopes to bring a “statistical cognition” element to JASP. Based on the research we’re conducting (and the research we hope to conduct), we want to develop scientifically-based best-practices for the presentation of statistical results.
I received my PhD in Quantitative Psychology from the University of Oklahoma in 2013. I then worked for three years as a biostatistician (where I saw my fair share of bad statistical practices). Historically, my research expertise was in missing data. I have recently added a branch to my area of expertise in “statistical cognition” as a potential remedy for the replication crisis. Specifically, I want to know how presenting results to people in different ways improves their interpretation of the data (and mitigates their overestimates of replicability). I have been an R programmer for over a decade and have written five different R packages. In addition to overseeing my end of the project, I will be primarily responsible for developing the R code for any new features and maintaining communication with the JASP team.
Polly is a human factors expert who has worked for various institutes as a human factors scientist. Her work has included over two decades leading industry and Department of Defense-sponsored human-machine interaction research projects. She has degrees in both engineering and cognitive psychology, and over 25 years of applied human factors research experience. Her role is to perform use cases for the various ways we anticipate presenting results to people. With her assistance, we hope to develop best-practices for the display of statistical results.
Bo is a computer scientist with expertise in immersive visualization, including serious gaming and virtual reality. She is primarily responsible for coordinating efforts among her programming students, as well as developing the AI component of the project (see below).
As I’ve mentioned previously, I am a strong proponent of graphical data analysis. Humans have an innate pattern-recognition system that quickly encodes large amounts of data in the visual cortex. Naturally, visual-based statistics should improve encoding of data and make problematic data features more apparent. Because of this, I want to propose adding a strong visual component to JASP.
One task is to integrate my R function “flexplot” into JASP (as well as its sidekick function called visualize). Essentially, flexplot frees the user of the burden of deciding the type of graphic to use. If first decides if it will do a histogram, bar chart, median dot plot (see below for an example related to a “t-test”-like visualization), scatterplot, or coplot. Then, it will decide how variables are represented graphically using some basic rules. Below are some examples of flexplot’s functionality:
require(fifer)
require(ggplot2)
data(exercise_data)
### histogram of weight loss
flexplot(weight.loss~1, data=exercise_data)
### barplot of gender
flexplot(gender~1, data=exercise_data)
## show "t-test" like data
flexplot(weight.loss~gender, data=exercise_data)
## show "regression" like data
flexplot(weight.loss~motivation, data=exercise_data)
## show "multiple regression" data
flexplot(weight.loss~motivation + gender | income, data=exercise_data)
## same plot, but with regression lines and no standard errors
flexplot(weight.loss~motivation + gender | income, data=exercise_data, se=F, method="lm")
## now some more sophisticated plotting: plot a gamma GLIM, bin the empathy variable, label the bins,
## and add a "Ghost line," which repeats a regression line across panels to make it easier to compare across panels.
## Also, since it's a ggplot object, we can save it and add ggplot features to it
data(criminal.data)
a = flexplot(aggression~ses | empathy, data=criminal_data, method="Gamma", se=F,
labels = list(ses=c("low", "mid-low", "mid-high", "high")),
ghost.line="gray",
ghost.reference=list(empathy=0))
a + labs(x="Socioeconomic Status", y="Aggression Score")
I have integrated flexplot’s capabilities within my R package (called fifer). One companion function is visualize. What visualize will do is take a fitted object (e.g., an lm object) and automatically create visuals for the analysis. Some examples:
## visualize a t-test, but just the model
ttest.model = lm(weight.loss~gender, data=exercise_data)
visualize(ttest.model, plot="bivariate")
### visualize a regression AND the residuals
reg.model = lm(weight.loss~motivation, data=exercise_data)
visualize(reg.model)
## visualize a multiple regression
mult.reg.mod = lm(weight.loss~motivation + gender + income, data=exercise_data)
visualize(mult.reg.mod)
These function are able to take an R-based formula (e.g., y~x1 + x2 + x3) and convert it into an intuitive graphic. Sometimes that will mean a median dot plot. Other times that will mean a scatterplot or a logistic curve. In the background, it will determine which variables are numeric and which are categorical, then it will plot things accordingly.
Currently flexplot (and visualize) work with only certain models (lm models that meet certain conditions). Eventually I will make it work with lme4 models, zeroinf models, rlm models, etc.
I designed a mockup of how it might look in Jasp:
Proposed visual for a JASP analysis.
A few things to note about the above graphic:
And here’s the results after the user clicks “Shift Variable View”:
Proposed visual for a JASP analysis.
We spoke on the phone about how you envision having editable graphics. I spoke with Don over Skype about this and he says it’s a difficult problem to resolve and anticipates it will be a very long term project. We decided that my team would:
In either case, we are waiting on the most recent version of Jasp before getting started.
I mentioned briefly the idea of “Skins.” The idea originated from our email conversation where I said I would prefer software to not make the t-test/anova/regression distinction (since they’re all the general linear model). Your point (which I agree with) is that you want to make it easy for users to migrate from SPSS to jasp without being confused about where to find the analysis. I proposed instead my team can develop “skins,” or different views of the Jasp interface.
To do so, upon first opening jasp, the user is greeted with the following:
Opening dialogue box of Jasp, inviting the user to choose a skin.
Notice the “i” icons, which allow the user to find more information about the skins available. This is a provisional list of skin ideas (and their purpose):
I like these “skins” because it makes more explicit the distinction between EDA, CDA, and study planning and hopefully reinforces in people’s minds that critical distinction (so they don’t blend the two and p-hack).
Here’s an example view of the “wizard” skin (forgive the cheesy thematic dialogue): The choices (clustering variables, finding something interesting, etc.) are, of course, provisional. I welcome discussion on the various purposes for which researchers might use Jasp.
Here’s an example of the exploration mode:
Example wireframe of the exploration skin mode.
A couple of things to note:
Here’s one example of the confirmation mode:
Example wireframe of the confirmation skin mode.
A couple of things to note:
Example wireframe of the confirmation skin mode, with an illustration of the drag and drop feature.
Like the exploration tab, this doesn’t show the analysis console, but that would, of course, be an option. The analysis console was shown in the first graphic, but I’ve repeated it below:
Proposed visual for a JASP analysis, including the existing analysis console under the “classical” skin.
One of the problems with any commercial statistical software package is that as new advances in data analysis emerge, software developers must have enough time and interest to update their analyses. As a result, commercial software lags behind advances in statistics. R overcomes this problem because it places software development in the hands of the user. If statisticians want people to use their methods, the onus of developing the software falls on them. Many (if not most) of them use R for developing packages. Unfortunately, R has a steep learning curve. I suspect this is a major impetus for the development of Jasp: it is easy for users to take advantage of R’s functionality. But that comes at a cost, because now statisticians must develop packages in R and Jasp.
What we would like to do is to develop a feature in Jasp that automatically imports existing R packages and automatically creates clickable menus based on the R package’s functions. In other words, a user might, for example, want to use the randomForest package, but a module for randomForest hasn’t yet been developed. We envision the user being able to specify the name of the package (randomForest in this case) and Jasp would automatically read the R code, create the QML, and design a module, all without any human having to understand code.
We have spent some time thinking this over. Our impression, at this point, is that this may not be possible without human supervision. Instead, we anticipate that the user who is importing the package will have to give some inputs to Jasp. Fortunately, once it’s done, it becomes part of the Jasp library, so nobody else has to import it.
As I mentioned, aside from the cool new features I envision (skins, graphics, AI), we are also working on the “human factors” side of things (aka “statistical cognition”). We have two projects in the works: (1) a paper about flexplot, where we are testing 8 heuristics we developed for plotting. Specifically, we are showing participants images that violate the heuristic and images that follow the heuristic. At the end of this project, we hope to have solid evidence that certain displays minimize bias in interpretation. (2) a paper that seeks to understand if the mode of presentation (bayesian, NHST, estimation, and graphical) affects people’s confidence, judgments about replicability, and accuracy of interpretation. I think it would be very powerful to be able to advertise that, not only is Jasp supported by a team of skilled statisticians, but the way in which the results are presented is also scientifically validated.
One potential avenue for some human factors input is how to elicit priors from people. Suppose, for example, you have a model with multiple predictors. Should Jasp elicit priors for every single parameter? That’s a lot of priors. Instead, maybe Jasp elicits priors for a model comparison (e.g., a full and reduced model). From what I see, that’s what Jasp is currently doing. But now you have another problem: which models do you compare? I remember when I first used Jasp, I struggled to understand exactly what the Bayes Factor was comparing. (I eventually figured it out). Perhaps that’s where human factors can come in. Maybe we can do use cases and identify the best strategy for eliciting priors.
I have two final comments (after an overly verbose summary). First, both you and Bruno mentioned a large update is up and coming. Do you mind if I get a status on that? We’re excited to start working on the QML portion.
Second, I have weekly meetings with my team here. In our meetings, their input is constantly shifting my vision of how things will work. Because of that, it would be very easy for me to diverge from a shared gameplan I develop with you if I don’t communicate with you for several weeks. Granted, I could fork off of your main branch, develop what I want and how I want and hope for the best, but as I mentioned, I’d rather work in tandem with Jasp. I’d hate to spent a year working on our projects and not have it match your vision (and thus not have it integrated with the core of Jasp). With that being said, I’m considering either having regular skype calls with you, or sending weekly reports (like this, but probably shorter since only a week will have passed). Do you have a preference? It doesn’t matter to me either way.