Introduction

Corpus Building

WWC Corpus

  • We extracted information on both math studies and reading studies from What Works Clearinghouse at this link. The filters that we used to download the studies for both of our WWC corpuses are: all ratings, all topics, all protocols, all interventions, all ESSA ratings, all standard versions, and all outcome domains. Once we had downloaded all the information for papers that met those requirements, we filtered out any papers that did not meet WWC standards without reservations, as our WWC corpuses consisted of papers that did meet WWC standards without reservation. We then found all papers about math studies by selecting all rows where the the binary column Topic_Mathematics was equal to 1, and we found all papers about reading studies by selecting all rows where the binary column Topic_Literacy was equal to 1. We then filtered the dataset to all papers that included an ERIC ID, and we used ERIC's API to collect the title, abstract, and other necessary information on these papers in both corpuses.

Non-WWC Corpus

  • We then discovered the distribution of "intervention names" in our WWC corpuses, and we manually replicated this distribution in the non-WWC corpuses. To replicate this distribution for math papers, we used the search tag: 'title:"intervention_name" AND subject:"mathematics" AND ieswwcreviewed:null', which means that the intervention name must be in the title, the subject of the paper must be mathematics, and the paper must not have been reviewed by WWC. To replicate this distribution for math papers, we used the search tag: 'title:"intervention_name" AND subject:"reading" AND ieswwcreviewed:null'. If multiple results were returned, we chose the ones with the most recent publication dates. To obtain the same number of papers in both corpuses, we collected information on the remaining number of math papers with the search tag: 'description:"mathematics intervention" AND ieswwcreviewed:null'. Similarly, we collected information on the remaining number of reading papers with the search tag: 'description:"reading intervention" AND ieswwcreviewed:null'. We then filtered unique papers that were published the most recently, removed any papers that were already selected in the process of replicating distributions, and collected the titles, the abstracts, and other information on these papers make up our final corpuses.

Breakdown of Papers in Both Corpuses


Model Building

Latent Dirichlet Allocation

  • Latent Dirichlet Allocation (LDA) is a type of model that takes in a collection of papers, and helps us understand what underlying topics exist within that group of papers.
  • In other words, we are trying to create fuzzy clusters of papers based on the words that appear in those papers
  • The “topics” are sets of words and the corresponding likelihoods that each word appears within a particular topic
  • It is up to us to look at the list of words LDA uses to describe a topic with, and decide the meaning of topics from there.
  • It’s also important to point out that we are creating “fuzzy clusters” because specific words don’t have to be in one cluster or another - the likelihood of the words showing up in a cluster that defines each cluster
  • Ex: Later in our discussion the word “school” appears in both topics - but its probability of appearing in topic one is different than its probability of appearing in topic two.
  • We are trying to find out if there are latent, or hidden, structural differences between the papers in ERIC and the papers approved by the WWC - if this is true, we would expect that papers from each group naturally fall into two different topics

Structural Changes and Data Cleaning

  • Tokenizing: breaking the sentences in the titles and abstracts down into individual words
  • Removing stopwords: taking out frequently used English words like “the” or “and”
  • Getting frequency counts during exploratory data analysis
  • Lemmatization: Eliminating duplicate words based on meaning - for example “ran” and “run” would be condensed down to just “run”

  • The three boxes below show the results of data cleaning. The first box, the title, and the second box, the abstract, are the original structure of the data for a paper within each corpus. After lemmatizing, combining, and removing punctuation and quotation marks, we are left with the third box, which is what we eventually feed into our model. By doing this, words like “randomize” and “randomized” are combined into just "randomize", so that words that are essentially the same are correctly treated as the same.

## [1] "The Effects of Math Video Games on Learning: A Randomized Evaluation Study with Innovative Impact Estimation Techniques. CRESST Report 841"
## [1] "A large-scale randomized controlled trial tested the effects of researcher-developed learning games on a transfer measure of fractions knowledge. The measure contained items similar to standardized assessments. Thirty treatment and 29 control classrooms (~1500 students, 9 districts, 26 schools) participated in the study. Students in treatment classrooms played fractions games and students in the control classrooms played solving equations games. Multilevel multidimensional item response theory modeling of the outcome measure produced scaled scores that were more sensitive to the instructional treatment than standard measurement approaches. Hierarchical linear modeling of the scaled scores showed that the treatment condition performed significantly higher on the outcome measure than the control condition. The effect (d = 0.58) was medium to large (Cohen, 1992). Two appendices are included: (1) Descriptive Statistics of Pretest and Posttest Scores by Schools and Conditions; and (2) Summary of Efficacy Trial Procedures."
## [1] "The effect of Math Video game on learn A randomize Evaluation Study with Innovative Impact Estimation technique CRESST Report 841 A large scale randomize control trial test the effect of researcher develop learn game on a transfer measure of fraction knowledge The measure contain item similar to standardize assessment Thirty treatment and 29 control classroom ~1500 student 9 district 26 school participate in the study student in treatment classroom play fraction game and student in the control classroom play solve equation game Multilevel multidimensional item response theory model of the outcome measure produce scale score that be much sensitive to the instructional treatment than standard measurement approach Hierarchical linear model of the scale score show that the treatment condition perform significantly high on the outcome measure than the control condition The effect have = 0 58 be medium to large Cohen 1992 Two appendix be include 1 Descriptive statistic of Pretest and Posttest score by school and condition and 2 Summary of Efficacy Trial procedure ED555700"

Model with Math Papers

Finding 1

  • Table 1 shows probabilities for a given word being in topic one or topic two. A word’s probability does not need to add up to one between topics, but the higher "beta" value signifies that the word appears more often in that topic. For example, "intervention" will show up more in topic two than topic one, but only slightly.

Table 1

Finding 2

  • We now view the top 15 words for each topic based on their beta values, with topic one in red and topic two in blue (table 3).

  • The beta value represents the “probability that a word is described by a topic”. As a result you can kind of think of it as a measure of importance for each word.

  • As you can see, many of these words align; student, mathematics, and intervention are all commonly found in both topics.

  • On the other hand, topic one includes unique words like teacher, program, and school. Unique important words to topic two include treatment, control, and effect.

Table 2

Table 3

Finding 3

  • One additional way we can try to understand these papers is by looking at distinctive words within a topic. Unlike the last example, which allowed the same word to show up in both topics, this example conveys highly impact words that are correspond to one topic or the other.

  • We calculate these values using something called the log-ratio score. The log-ratio score is the probability of a word being in topic two over the probability of the word being in topic one. We then take the logarithm (represented as log(x)) of the ratio so the graphical depiction (table 5) is centered at 0. In this way, topic one (permission, revision, citation) is put on the left hand side and topic two (cohort, recovery, outperform) is put on the right hand side of the graph.

  • As shown in tables 4 and 5, an example of unique words include parent, instructor, and mentor in topic one and schema, numerical and conceptual in topic two. Because these values are distinctive to each topic, none of the words will appear more than once.

Table 4

Table 5

Table 6

Finding 4

  • Table 7 shows that for papers in our non-WWC corpus, the mean proportion of allocation to topic one is 0.493, while the mean proportion of allocation to topic two is 0.507. Table 7 also shows that for papers in our WWC corpus, the mean proportion of allocation to topic one is 0.269, and the mean proportion of allocation to topic two is 0.731. This means that the breakdown of papers in our non-WWC corpus has approximately a 50% probability of being allocated to topic one and a 50% probability of being allocated to topic two. This also shows that the breakdown of papers in our WWC corpus has approximately a 25% probability of being allocated to topic one and a 75% probability of being allocated to topic two. With this model, we discover a relationship between topic and corpus, which leads us to conclude there may be structural differences between the papers in our WWC corpus and the papers in our non-WWC corpus.

Table 7

Table 8

Finding 5

  • Histogram 1 shows the distribution of proportions of allocation to topic one (the red bars) and to topic two (the blue bars) for the papers in our WWC corpus. The upper end of the x-axis demonstrates that there are a lot of high probabilities of topic two allocation and not a lot of high probabilities of topic one allocation. The lower end of the x-axis demonstrates that there are a lot of low probabilities of topic one allocation and not a lot of low probabilities of topic two allocation. Since topic one probabilities and topic two probabilities do not have approximately the same count for the bins at the upper and lower ends, this shows that the breakdown of papers in our WWC corpus has a higher probability of topic two allocation and a lower probability of topic one allocation.

  • Histogram 2 shows the distribution of proportions of allocation to topic one (the red bars) and to topic two (the blue bars) for the papers in our non-WWC corpus. The upper end of the x-axis demonstrates that there are about as many high probabilities of topic one allocation as there are high probabilities of topic two allocation. The lower end of the x-axis demonstrates that there are about as many low probabilities of topic two allocation as there are low probabilities of topic one allocation. Since topic one probabilities and topic two probabilities have approximately the same count for all bins, this shows that the breakdown of papers in our non-WWC corpus has the same probability of topic one allocation as the probability of topic two allocation.

  • These histograms lead to the conclusion that there seems to be structural differences between the papers in our WWC corpus and the papers in our non-WWC corpus. This distribution of probabilities makes sense because the non-WWC papers were not rejected by WWC, they just were not reviewed by WWC. This means that there is the possibility that some of these non-WWC papers have some structural similarities with our WWC papers that were reviewed and approved by WWC. And this subset of non-WWC papers would be the ones that are allocated to topic two with most of the WWC papers. Next, we would like to see if this relationship between corpus and topic is also present in our two corpuses of reading papers.

Histogram 1

Histogram 2

Model with Reading Papers

Finding 1

  • The beta value for a word (below) corresponds to the connection of that word to either topic one or topic two. In this case, the word “child” corresponds to both topics, with a slight favor to topic two. It is important to note that this value is not just a frequency estimate; the words are also weighted based on their perceived importance within each topic as well.

Table 1

Finding 2

  • Now we will view the 15 most important words within topic one and topic two for the reading corpus.

  • As we can see here, in topic one "important" words (based on the beta probabilities) include read, student, intervention, program, strategy, write and learn. In topic two, we again see read, student, and intervention, but then words such as treatment, effect, and study also pop up that differ.

  • This is similar to the math corpus, which also had words such as treatment, effect, and study in the top 15. WWC more strongly aligned with topic two in both the math and reading corpuses.

Table 2

Table 3

Finding 3

  • We again view the most distinctive words within each topic (tables 4 and 5) for the reading corpus.

  • As a reminder, topic one (permission, revision, citation) is on the left hand side and topic two (cohort, recovery, outperform) is on the right hand side of the graph.

  • It can be a little hard to tease out an explanation for why these words are specifically showing up; this is where qualitative expertise is important tease out a deeper understanding as to why these words appear over others to within each topic.

Table 4

Table 5

Table 6

Finding 4

  • Table 7 shows that for papers in our non-WWC corpus, the mean proportion of allocation to topic one is 0.555, while the mean proportion of allocation to topic two is 0.445. Table 7 also shows that for papers in our WWC corpus, the mean proportion of allocation to topic one is 0.416, and the mean proportion of allocation to topic two is 0.584. This means that the breakdown of papers in our non-WWC corpus has approximately a 55% probability of being allocated to topic one and a 45% probability of being allocated to topic two. This shows that the breakdown of papers in our WWC corpus has approximately a 40% probability of being allocated to topic one and a 60% probability of being allocated to topic two. There seems to be a weak relationship between corpus and topic because non-WWC papers have a small difference in mean probabilities and tend to be allocated to topic one somewhat more often; meanwhile WWC papers have a slightly greater difference in mean probabilities and tend to be allocated to topic two somewhat more often.

Table 7

Table 8

Finding 5

  • Histogram 1 shows the distribution of proportions of allocation to topic one (the red bars) and to topic two (the blue bars) for the papers in our WWC corpus. The upper end of the x-axis demonstrates that there are somewhat more high probabilities of topic two allocation than the high probabilities of topic one allocation. The lower end of the x-axis demonstrates that there are somewhat more low probabilities of topic one allocation than the low probabilities of topic two allocation. The difference between the counts of reading papers with high probabilities of topic one allocation and those with high probabilities of topic two allocation is not that great in comparision to the same difference between the counts of math papers.

  • Histogram 2 shows the distribution of proportions of allocation to topic one (the red bars) and to topic two (the blue bars) for the papers in our non-WWC corpus. The upper end of the x-axis demonstrates that there are somewhat more high probabilities of topic one allocation than the high probabilities of topic two allocation. The lower end of the x-axis demonstrates that there are somewhat more low probabilities of topic two allocation than the low probabilities of topic once allocation. The difference between the counts of reading papers with high probabilities of topic one allocation and those with high probabilities of topic two allocation is about the same in comparision to the same difference between the counts of math papers.

  • These histograms lead to the conclusion that there do not seem to be great structural differences between reading papers in our WWC corpus and reading papers in our non-WWC corpus, though there are some small differences. We think that math papers could be intrinsically different from reading papers in the sense that math papers that are approved by WWC are structurally different from math papers that are not reviewed by WWC, as we did not see these strong structural differences in reading papers. We do not know what could be the cause of these differing relationships in math papers and reading papers, but we hope to start a conversation in the working group about the possible causes and effects of these relationships.

Histogram 1

Histogram 2