Features

In this document, I’m going to introduce our team, list the features we would love to contribute to JASP, and also give updates on how these elements are coming. I think we all share similar visions: we want to increase the quality of data interpretation among researchers. Specifically, my team hopes to bring a “statistical cognition” element to JASP. Based on the research we’re conducting (and the research we hope to conduct), we want to develop scientifically-based best-practices for the presentation of statistical results.

Team

Dustin Fife.

I received my PhD in Quantitative Psychology from the University of Oklahoma in 2013. I then worked for three years as a biostatistician (where I saw my fair share of bad statistical practices). Historically, my research expertise was in missing data. I have recently added a branch to my area of expertise in “statistical cognition” as a potential remedy for the replication crisis. Specifically, I want to know how presenting results to people in different ways improves their interpretation of the data (and mitigates their overestimates of replicability). I have been an R programmer for over a decade and have written five different R packages. In addition to overseeing my end of the project, I will be primarily responsible for developing the R code for any new features and maintaining communication with the JASP team.

Polly Tremoulet.

Polly is a human factors expert who has worked for various institutes as a human factors scientist. Her work has included over two decades leading industry and Department of Defense-sponsored human-machine interaction research projects. She has degrees in both engineering and cognitive psychology, and over 25 years of applied human factors research experience. Her role is to perform use cases for the various ways we anticipate presenting results to people. With her assistance, we hope to develop best-practices for the display of statistical results.

Bo Sun.

Bo is a computer scientist with expertise in immersive visualization, including serious gaming and virtual reality. She is primarily responsible for coordinating efforts among her programming students, as well as developing the AI component of the project (see below).

Graphics

As I’ve mentioned previously, I am a strong proponent of graphical data analysis. Humans have an innate pattern-recognition system that quickly encodes large amounts of data in the visual cortex. Naturally, visual-based statistics should improve encoding of data and make problematic data features more apparent. Because of this, I want to propose adding a strong visual component to JASP.

Flexplot Module

One task is to integrate my R function “flexplot” into JASP (as well as its sidekick function called visualize). Essentially, flexplot frees the user of the burden of deciding the type of graphic to use. If first decides if it will do a histogram, bar chart, median dot plot (see below for an example related to a “t-test”-like visualization), scatterplot, or coplot. Then, it will decide how variables are represented graphically using some basic rules. Below are some examples of flexplot’s functionality:

require(fifer)
require(ggplot2)
data(exercise_data)

### histogram of weight loss
flexplot(weight.loss~1, data=exercise_data)

### barplot of gender 
flexplot(gender~1, data=exercise_data)

## show "t-test" like data
flexplot(weight.loss~gender, data=exercise_data)

## show "regression" like data
flexplot(weight.loss~motivation, data=exercise_data)

## show "multiple regression" data
flexplot(weight.loss~motivation + gender | income, data=exercise_data)

## same plot, but with regression lines and no standard errors
flexplot(weight.loss~motivation + gender | income, data=exercise_data, se=F, method="lm")

## now some more sophisticated plotting: plot a gamma GLIM, bin the empathy variable, label the bins, 
## and add a "Ghost line," which repeats a regression line across panels to make it easier to compare across panels.
## Also, since it's a ggplot object, we can save it and add ggplot features to it
data(criminal.data)
a = flexplot(aggression~ses | empathy, data=criminal_data, method="Gamma", se=F, 
        labels = list(ses=c("low", "mid-low", "mid-high", "high")), 
        ghost.line="gray",
        ghost.reference=list(empathy=0))
a + labs(x="Socioeconomic Status", y="Aggression Score")        

Flexplot Graphics in Core Jasp Functionality

I have integrated flexplot’s capabilities within my R package (called fifer). One companion function is visualize. What visualize will do is take a fitted object (e.g., an lm object) and automatically create visuals for the analysis. Some examples:

## visualize a t-test, but just the model
ttest.model = lm(weight.loss~gender, data=exercise_data)
visualize(ttest.model, plot="bivariate")

### visualize a regression AND the residuals
reg.model = lm(weight.loss~motivation, data=exercise_data)
visualize(reg.model)

## visualize a multiple regression
mult.reg.mod = lm(weight.loss~motivation + gender + income, data=exercise_data)
visualize(mult.reg.mod)

These function are able to take an R-based formula (e.g., y~x1 + x2 + x3) and convert it into an intuitive graphic. Sometimes that will mean a median dot plot. Other times that will mean a scatterplot or a logistic curve. In the background, it will determine which variables are numeric and which are categorical, then it will plot things accordingly.

Currently flexplot (and visualize) work with only certain models (lm models that meet certain conditions). Eventually I will make it work with lme4 models, zeroinf models, rlm models, etc.

I designed a mockup of how it might look in Jasp:

Proposed visual for a JASP analysis.

Proposed visual for a JASP analysis.

A few things to note about the above graphic:

  • The “Edit Plot” button will allow them to modify each element of the graphic, as you envisioned.
  • The “Shift Variable View” button will modify the graphic such that, for example, income (which was in the panels) now shows up on the x axis (see below graphic). One of the papers I’m working on argues that if there’s more than one way to look at a graphic, one should because each view may reveal new insights.
  • Notice the new icons for the modules, which will be explained in the next section. I’ll save comment for then.
  • I have displayed the graphics before the model comparison estimates. That’s my personal preference (because I think people need to make sure the model is appropriate visually before they evaluate statistical estimates)

And here’s the results after the user clicks “Shift Variable View”:

Proposed visual for a JASP analysis.

Proposed visual for a JASP analysis.

Editable Graphics

We spoke on the phone about how you envision having editable graphics. I spoke with Don over Skype about this and he says it’s a difficult problem to resolve and anticipates it will be a very long term project. We decided that my team would:

  1. Create a flexplot module, then
  2. Modify core jasp functionality to use visualize (if you like that feature)
  3. Make the graphics editable

In either case, we are waiting on the most recent version of Jasp before getting started.

Skins

I mentioned briefly the idea of “Skins.” The idea originated from our email conversation where I said I would prefer software to not make the t-test/anova/regression distinction (since they’re all the general linear model). Your point (which I agree with) is that you want to make it easy for users to migrate from SPSS to jasp without being confused about where to find the analysis. I proposed instead my team can develop “skins,” or different views of the Jasp interface.

To do so, upon first opening jasp, the user is greeted with the following:

Opening dialogue box of Jasp, inviting the user to choose a skin.

Opening dialogue box of Jasp, inviting the user to choose a skin.

Notice the “i” icons, which allow the user to find more information about the skins available. This is a provisional list of skin ideas (and their purpose):

  • wizard mode guides the user through the analysis by asking them a list of questions. This would be for users not too familiar with stats.
  • classical mode is the current view (with t-tests, anova, regression, factor analysis, etc.)
  • confirmation view is for those who have (essentially) preregistered their studies and is designed for confirmatory tests
  • exploration view is for those who don’t have a precise hypothesis but are open to seeing what’s in their data
  • study-planning view would have modules for sample size planning, pre-registration, quantitative literature reviews (a paper I’m working on at the moment discusses these)

I like these “skins” because it makes more explicit the distinction between EDA, CDA, and study planning and hopefully reinforces in people’s minds that critical distinction (so they don’t blend the two and p-hack).

Here’s an example view of the “wizard” skin (forgive the cheesy thematic dialogue): Example wireframe of the wizard skin mode. The choices (clustering variables, finding something interesting, etc.) are, of course, provisional. I welcome discussion on the various purposes for which researchers might use Jasp.

Here’s an example of the exploration mode:

Example wireframe of the exploration skin mode.

Example wireframe of the exploration skin mode.

A couple of things to note:

  • I have created buttons I associate with exploration: visualization (which I envision as the module for flexplot), data mining (which might include multiple regression with stepwise analysis, random forests, support vectors, etc.), clustering (e.g., k-means, MDS, t-SNE), and exploratory factor analysis
  • I envision having multiple consoles. Here, I have shown a data console, an R console (where the user could either program in R or have the results from the point and click interface report the R code), and a results console. I do not show the analysis console here for no meaningful reason other than I forgot until now. In the confirmation skin console (shown shortly), I will also show a console for crafting papers.
  • Note that the right results tab is just a placeholder (they wouldn’t for example, be doing a bayesian linear regression in exploration mode, I assume).

Here’s one example of the confirmation mode:

Example wireframe of the confirmation skin mode.

Example wireframe of the confirmation skin mode.

A couple of things to note:

  • The tabs I envision are confirmation centered: general linear model (both bayesian and classical), generalized linear models, mixed models, and SEM/Confirmatory Factor Analysis. Within each of these, the visualize function is doing the legwork of producing graphics that represent the analysis (though sometimes that is more challenging than others, such as with SEM).
  • This shows a writing console, where users can write their paper as they’re doing analysis. Better yet, much like latex, I envision the two being highly integrated, such that users can drag and drop elements from the analysis directly into the text, rather than copying and pasting or retyping (e.g., graphics, tables, raw statistics). The next figure shows an illustration of the drag and drop functionality of the graphic:
Example wireframe of the confirmation skin mode, with an illustration of the drag and drop feature.

Example wireframe of the confirmation skin mode, with an illustration of the drag and drop feature.

Like the exploration tab, this doesn’t show the analysis console, but that would, of course, be an option. The analysis console was shown in the first graphic, but I’ve repeated it below:

Proposed visual for a JASP analysis, including the existing analysis console under the classical skin.

Proposed visual for a JASP analysis, including the existing analysis console under the “classical” skin.

AI Component

One of the problems with any commercial statistical software package is that as new advances in data analysis emerge, software developers must have enough time and interest to update their analyses. As a result, commercial software lags behind advances in statistics. R overcomes this problem because it places software development in the hands of the user. If statisticians want people to use their methods, the onus of developing the software falls on them. Many (if not most) of them use R for developing packages. Unfortunately, R has a steep learning curve. I suspect this is a major impetus for the development of Jasp: it is easy for users to take advantage of R’s functionality. But that comes at a cost, because now statisticians must develop packages in R and Jasp.

What we would like to do is to develop a feature in Jasp that automatically imports existing R packages and automatically creates clickable menus based on the R package’s functions. In other words, a user might, for example, want to use the randomForest package, but a module for randomForest hasn’t yet been developed. We envision the user being able to specify the name of the package (randomForest in this case) and Jasp would automatically read the R code, create the QML, and design a module, all without any human having to understand code.

We have spent some time thinking this over. Our impression, at this point, is that this may not be possible without human supervision. Instead, we anticipate that the user who is importing the package will have to give some inputs to Jasp. Fortunately, once it’s done, it becomes part of the Jasp library, so nobody else has to import it.

Human Factors component

As I mentioned, aside from the cool new features I envision (skins, graphics, AI), we are also working on the “human factors” side of things (aka “statistical cognition”). We have two projects in the works: (1) a paper about flexplot, where we are testing 8 heuristics we developed for plotting. Specifically, we are showing participants images that violate the heuristic and images that follow the heuristic. At the end of this project, we hope to have solid evidence that certain displays minimize bias in interpretation. (2) a paper that seeks to understand if the mode of presentation (bayesian, NHST, estimation, and graphical) affects people’s confidence, judgments about replicability, and accuracy of interpretation. I think it would be very powerful to be able to advertise that, not only is Jasp supported by a team of skilled statisticians, but the way in which the results are presented is also scientifically validated.

One potential avenue for some human factors input is how to elicit priors from people. Suppose, for example, you have a model with multiple predictors. Should Jasp elicit priors for every single parameter? That’s a lot of priors. Instead, maybe Jasp elicits priors for a model comparison (e.g., a full and reduced model). From what I see, that’s what Jasp is currently doing. But now you have another problem: which models do you compare? I remember when I first used Jasp, I struggled to understand exactly what the Bayes Factor was comparing. (I eventually figured it out). Perhaps that’s where human factors can come in. Maybe we can do use cases and identify the best strategy for eliciting priors.

Misc.

I have two final comments (after an overly verbose summary). First, both you and Bruno mentioned a large update is up and coming. Do you mind if I get a status on that? We’re excited to start working on the QML portion.

Second, I have weekly meetings with my team here. In our meetings, their input is constantly shifting my vision of how things will work. Because of that, it would be very easy for me to diverge from a shared gameplan I develop with you if I don’t communicate with you for several weeks. Granted, I could fork off of your main branch, develop what I want and how I want and hope for the best, but as I mentioned, I’d rather work in tandem with Jasp. I’d hate to spent a year working on our projects and not have it match your vision (and thus not have it integrated with the core of Jasp). With that being said, I’m considering either having regular skype calls with you, or sending weekly reports (like this, but probably shorter since only a week will have passed). Do you have a preference? It doesn’t matter to me either way.