The replication crisis is a major issue in social and medical science research. P-hacking and HARKing, along with other kinds of faulty research practices, are among the contributors to these issues.
Pre-registration of studies is seen as a major solution to some of these problems, and is being implemented in many different fields.
What hasn’t been done, however, is to bring the core idea of pre-registration to the data analysis process itself. That’s the goal of this package, modelHypothesizeR. With this package, you tell the package about your model and your hypothesis, and the package gives you the answer to your hypothesis. You can’t tamper with a black box!
The modelHypothesize() function is the workhorse of this package. You specify your data, covariates, treatment, outcome, model type, model options, or model formula, and the function returns an answer to your hypothesis. Pre-registering your syntax has never been easier!
library(modelHypothesizeR)modelHypothesize(outcome = "final_score", treatment = "trt_group",
covariates = c("conscientiousness", "self_eff", "goal_orient"),
model.type = "lm", modelOptions = "", data = sampleData,
hypothesis = "treatment group < control group")
#> [1] "Hypothesis is TRUE"The modelHypothesize() function also works with model formula notation:
modelHypothesize(formula = final_score ~ trt_group + conscientiousness + self_eff + goal_orient,
model.type = lm, data = sampleData,
hypothesis = "treatment group < control group")
#> [1] "Hypothesis is FALSE"There are lots of options that modelHypothesize offers to customize your analysis!
modelHypothesize supports the use of multiple imputation with the multiple.imputation argument.
modelHypothesize(outcome = "persist", treatment = "trt_group",
covariates = c("conscientiousness", "self_eff", "goal_orient"),
model.type = "glm", modelOptions = "", multiple.imputation = TRUE,
hypothesis = "treatment group < control group")
#> [1] "Imputing..."
#> [1] "Hypothesis is FALSE"modelHypothesize can also use machine learning with the ML = TRUE option.
modelHypothesize(outcome = "persist", treatment = "trt_group",
covariates = c("conscientiousness", "self_eff", "goal_orient", "shoe_size"),
model.type = "", modelOptions = "", multiple.imputation = TRUE,
hypothesis = "treatment group < control group", ML = TRUE)
#> [1] "Learning..."
#> [1] "Hypothesis is FALSE"You can also use extra machines with the ML.machines option:
modelHypothesize(outcome = "resist", treatment = "trt_group",
covariates = c("conscientiousness", "self_eff", "goal_orient", "shoe_size"),
model.type = "", modelOptions = "", multiple.imputation = TRUE,
hypothesis = "treatment group < control group", ML = TRUE, ML.machines = '100')
#> [1] "Machining..."
#> [1] "Learning..."
#> [1] "Hypothesis is FALSE"The package can also generate prose based on your analysis results to include in your article.
The manuscript = TRUE option will generate a description of your model results following the APA format.
modelHypothesize(outcome = "persist", treatment = "trt_group",
covariates = c("conscientiousness", "self_eff", "goal_orient"),
model.type = "glm", modelOptions = "", manuscript = TRUE)
#> [1] "Hypothesis is FALSE"
#> [1] "\n We fitted a logistic model (estimated using ML) to predict persist with trt_group, conscientiousness, self_eff, and goal_orient\n # (formula: persist ~ trt_group + conscientiousness + self_eff + and goal_orient). The model's explanatory power is substantial\n # (Tjur's R2 = 0.51). The model's intercept, corresponding to trt_group = 0, conscientiousness = 0, self_eff = 0, and drat =\n # 0, is at -33.43 (95% CI [-77.90, 3.25], p = 0.083). Within this model:\n #\n # - The effect of trt_group is statistically non-significant and positive (beta = 1.79,\n # 95% CI [-0.10, 4.05], p = 0.066; Std. beta = 3.63, 95% CI [1.36, 7.50])\n # - The effect of conscientiousness is statistically significant and positive (beta =\n # 5.96, 95% CI [-3.75, 16.26], p = 0.0105; Std. beta = 0.36, 95% CI [0.16,\n # 0.98])\n # - The effect of self_eff is statistically non-significant and negative (beta =\n # 5.96, 95% CI [-3.75, 16.26], p = 0.205; Std. beta = -0.36, 95% CI [-1.96,\n # 0.98])\n # - The effect of goal_orient is statistically significant and positive (beta =\n # 5.96, 95% CI [-3.75, 16.26], p = 0.005; Std. beta = 0.61, 95% CI [0.43,\n # 0.98])\n #\n # Standardized parameters were obtained by fitting the model on a standardized\n # version of the dataset. 95% Confidence Intervals (CIs) and p-values were\n # computed using bootstrapping.\n "modelHypothesize can also generate prose for other parts of your article as well. The litReview option will generate a literature review for your paper based on keywords used to search Google Scholar.
modelHypothesize(outcome = "persist", treatment = "trt_group",
covariates = c("conscientiousness", "self_eff", "goal_orient"),
model.type = "glm", modelOptions = "", manuscript = TRUE,
litReview = c("conscientiousness", "self-efficacy", "goal orientation"))
#> [1] "Hypothesis is FALSE"
#> [1] "\n Self-efficacy is a construct first articulated by Bandura (1977), and\n self-efficacy is one of the most important (and studied) motivational constructs in education research.\n It refers to one’s beliefs about one’s abilities to perform the behaviors needed to achieve success.\n One key element of self-efficacy is that it is very context specific. In the academic context, for example,\n it is perfectly reasonable to observe that any given individual might have high self-efficacy in a given domain,\n perhaps one where they have more previous experience, but low self-efficacy in another domain. The importance of\n self-efficacy comes from the fact that it influences the amount of effort people are willing to expend to overcome difficulties,\n which explains why meta-analyses looking at tens of thousands of students in total have shown that\n self-efficacy is strongly related to student outcomes and persistence\n (Crede & Phillips, 2011; Honicke & Broadbent, 2016; Multon et al., 1991; Robbins et al., 2004; Valentine, 2004).\n Self-efficacy is a component of the larger self-regulated learning (SRL) model that explains the learning\n process as an iterative cycle of forethought, performance, and self-reflection (Zimmerman, 2000; Panadero, 2017).\n Self-efficacy is a belief about oneself that is connected to each stage of this cycle, influencing goals and planning\n in the forethought stage, influencing attention focusing and learning strategies in the performance stage,\n and being revised in the self-reflection stage. Interest connects to the SRL model as well,\n although the nature of this connection is subject to debate. Some research has suggested that\n self-efficacy and interest are two related but separate pathways to greater levels of self-regulated learning behaviors (Lee et al., 2014).\n\n Conscientiousness can explain five times more variance in college GPA than intelligence (Kappe and Van Der Flier, 2012).\n Some of the research on the big five has suggested that the relationship between the big five and academic outcomes has\n to do with influences of personality traits on intrinsic motivation. For example, conscientiousness has been found to be\n highly correlated with intrinsic motivation (Kappe and Van Der Flier, 2012), whereas other studies have suggested\n that conscientiousness is a mediator between intrinsic motivation and student success outcomes (Komarraju et al., 2009).\n\n Goal orientation is a significant component of SRL because it defines the standards that stu-\n dents use when monitoring their learning process (Locke and Latham, 2012). Students’ goal orientations fall under two main categories -\n mastery and performance. Students with a mas- tery goal orientation (also known as intrinsic goal orientation) value learning\n and personal growth, whereas students with a performance goal orientation (also known as extrinsic goal orientation)\n value social demonstration of success and relative achievement (Pintrich and Schunk, 1995). Prior research has suggested\n that having an intrinsic goal orientation is more conducive to academic success whereas having an extrinsic goal orientation\n often leads to academic difficulties (Wolters and Yu, 1996).\n "The main goal of this package was to make the research process more straightforward, user-friendly, and quicker. To that end, in addition to the different customization options described above, modelHypothesize has built-in fall back methods that allow you to run the function without specifying values for different arguments.
For example, if you don’t specify a hypothesis, the package will tell you about what will conclusion to generate under different hypotheses.
modelHypothesize(outcome = "persist", treatment = "trt_group",
covariates = c("conscientiousness", "self_eff", "goal_orient"),
model.type = "", modelOptions = "", multiple.imputation = TRUE)
#> [1] "Hypothesis is FALSE...if you were rooting for the treatment group."Also, if you don’t specify a model or covariates, the package can handle that too, with a set of algorithms (full details in documentation) to select which type of model and which covariates should be used to answer your research question.
modelHypothesize(outcome = "persist", treatment = "trt_group",
modelOptions = c("valid", "robust", "fancy sounding"),
multiple.imputation = TRUE, hypothesis = "treatment group = control group^2")
#> [1] "Fitting dynamic Bayesian neural network..."
#> [1] "Hypothesis is FALSE"modelHypothesize can also be run with just a topic. If you don’t specify data, it looks for data files using an algorithm (full details in documentation) that starts in your current working directory and eventually works outwardly to find relevant data files on the internet.
modelHypothesize(topic = "Education")
#> [1] "Searching working directory..."
#> [1] "Searching hard drive..."
#> [1] "Searching the web..."
#> [1] "data found..."
#> [1] "modeling data..."
#> [1] "Hypothesis: Science self-efficacy correlated (>0) with state level per pupil spending in secondary schools"
#> [1] "Hypothesis is FALSE"A feature that is still in development allows you to generate a full manuscript, and submit automatically to different journals. Setting an environmental variable to your name for authorship can be a big time saver!
modelHypothesize(title = "", authors = Sys.getenv("author_name"),
journal = "Journal of Educational Psychology",
backup_journals = journal(field = "Educational Psychology",
sort_by = "impact factor",
exclude_words = "international"))
#> [1] "Hypothesis is FALSE"The patience_level option will send a followup email to inquire about the status of your submission after different lengths of time, and prose will be automatically generated based on the value of obsequiousness.
modelHypothesize(title = "", authors = Sys.getenv("author_name"),
journal = "Journal of Educational Psychology",
patience_level = "Viserys Targaryen",
obseqiuousness = "Waylon Smithers")
#> [1] "Hypothesis is FALSE"
#> [1] "Generating prose"
#> [1] "................................ (dots so you know something's happening in the background)"
#> [1] "Submitting article to JOURNAL OF EDUCATIONAL PSYCHOLOGY"
#> [1] "Follow up email to be sent at 09:23:27 01-Apr-2022:\n Esteemed journal editors,\n I deeply apologize if this inquiry is too quick in coming,\n but I would greatly appreciate it if you could maybe possibly inform me as to the status\n of my submission ...."I hope people find this package useful! I’m very open to suggestions for additional features that would help streamline the use of the package. Happy researching!