Analysis

This analysis will be conducted on trimester reports for XIS’s Middle Years Program (MYP) students. The MYP program runs from grade six through grade ten and each student takes eight classes.

Set Up and EDA

First I’ll start by loading all the extra packages I used for my analysis.

#load required libraries, data, and created functions
library(tm)
library(dplyr)
library(tidyr)
library(RWeka)
library(DT)
library(knitr)


#source custom-built functions
source("Functions.R")

Now I’ll load the csv file of term grades I exported from ManageBac.

 t1.report <- GetReportsDFfromMBcsv("data/t1 comments.csv")

#find dimensions (rows & columns) of the table
dim(t1.report)

## [1] 1066   14

#get column names
colnames(t1.report)

##  [1] "Student.ID"      "Last.Name"       "First.Name"     
##  [4] "Class.ID"        "Grade.Level"     "Subject"        
##  [7] "Teacher"         "Cri.A"           "Cri.B"          
## [10] "Cri.C"           "Cri.D"           "Sum"            
## [13] "CriMean"         "Student.Comment"

So this table has 14 columns 1066 rows, where each row is one student’s report for one class. Now to show you more what the data looks like, I’ve randomly selected 100 class reports with indentifying information removed (i.e. Name, Student ID, and comment).

#get an example table
t1.report.ex <- t1.report %>%
        #salt student IDs, round CriMean, 
        mutate(Student.ID = "1000****", CriMean = round(CriMean, digits = 3)) %>%
        #select only what I want
        select(Student.ID, Grade.Level:Cri.D, CriMean, -First.Name, -Last.Name) %>%
        #take 100 rows at random
        sample_n(100)

At XIS it isn’t required that all four criteria be assessed each trimester so mean criteria score was calculated based on what was available.

Measuring Improvement

Before we get to measuring how much a student improves from trimester one to two, lets start with an definition then an example.

Definition

Improvement – the class-centered increase of mean criteria levels from trimester one to trimester two.

“centered” in this case signifies that the mean improvement for for each class has been subtracted from the improvement each student received in said class.

Example Methodology

Let’s use an example school with two classes of four students each taught by Ms. Blue and Ms. Green.

Generally, Improvement in this instance is calculated as: \[ {\text{Improvement}_{\text{T1-T2}}} = {\text{CriMean}}_{\text{T2}}-{\text{CriMean}}_{\text{T1}} \]

We assume that if a student’s levels went up from trimester one to trimester two then they improved in that class. However in the above example, who improved more: Denise or Emily? Due to large variation in criteria levels between teachers and classes, this is something that needs addressing¹.

To account for this difference I decided to centralize improvement to control for the effects of the teacher² . The improvement metric is modified by subtracting the mean improvement of each class. That is to say, the mean for each individual class is computed then each individuals improvement is scaled according to the average improvement of the class. Accordingly, the meanof Ms. Green and Ms. Blue’s class is:

by_teacher <- norm.ex %>%
        group_by(Teacher) %>%
        summarize(Improve.m = mean(Improvement))

Teacher	Improve.m
Blue	1.625
Green	-0.625

The formula for our improvement adjustment. \[ {CenteredImprovement}_{student} = {Improvement}_{student} - {Improvement}_{classmean} \] So using the above formula we get the following column displaying our new improvement metric. Its worth noting that a negative centered improvement score does not necessarily mean that the student’s performance decreased, but that they increased less than the average of the class.

norm.ex.by_teacher <- left_join(norm.ex, by_teacher, by = "Teacher") %>%
        #normalize the t12.growth by mean & sd of teacher t12.growth
        mutate(Improve.centered = (Improvement - Improve.m)) %>%
        mutate_each(funs(round(.,3)), -Student, -Teacher) %>%
        select(Student, Teacher, Improvement:Improve.centered)

Improvement at XIS

So now to the XIS data, below is a table showing the average criteria levels for each subject in the MYP for both trimester one and two as well as the difference³ between them.

#combine all three trimesters of data together
year.report <- GetYearReport()

#wrappers for mean and sd with na.rm = TRUE
av <- function(x) {mean(x, na.rm = TRUE)}
s <- function(x) {sd(x, na.rm = TRUE)}

by_subject <- year.report %>%
        group_by(Subject) %>%
        summarize(CriMean.t1 = av(CriMean.t1), CriMean.t2 = av(CriMean.t2),
                  Imp.m = av(t12.growth), Imp.sd = s(t12.growth)) %>%
        mutate(Improvement.centered = Imp.m - mean(Imp.m)) %>%
        mutate_each(funs(round(.,2)), -Subject) %>%
        ungroup()

datatable(by_subject, caption = "T1-T2 Average Criteria Levels by Subject*",
          class = 'compact', options = list(pageLength = 12, dom = 't'),
          rownames = FALSE)

We can see from the table that the average criteria improvement varies among the subjects. This holds true for grade levels, teachers and individual classes.

I think the be

#combine all three trimesters of data together
year.report <- GetYearReport() %>%
        select(Student.ID, Class.ID:Teacher, CriMean.t1, CriMean.t2, t12.growth) %>%
        group_by(Class.ID) %>%
        #add centered mean metric
        mutate(t12.growth.center = round(t12.growth - av(t12.growth), 22)) %>%
        ungroup()

#anonymize and randomly samplethe data for display
year.report.anon <- year.report %>%
        select( -Class.ID) %>%
        sample_n(100) %>%
        mutate(Student.ID = "1000****")

The Comments

The general format for XIS trimester comments for students is two paragraphs:

A paragraph about what happened in class that trimester generally
A paragraph of three sentences each of which performs the following function of saying something the student:
- has done well,
- the student struggles with, and
- can do to improve that with which they struggle.

A typical comment reads like this:

The MYP sixth grade science program at XIS is an intellectually challenging program that results in creative, critical, reflective thinkers. It is designed to help students make connections between science and the real world. Students are developing approaches to learning skills for thinking before writing responses and communicating using tables and graphs. The first trimester focused on cells and disease. The key concept was form. Three criteria A-C were covered with the following summative assessments: Criterion A- Unit test, Criterion B & C- design lab on yeast and Criterion C ’ vertical leap investigation.

****⁴ is willing to work in class. He has shown an improvement in submission and achievement in assessment tasks over the first trimester. Further attention to the detail required for different assessment criteria should allow **** continued improvement. He is encouraged to write independently first so that advice can be provided on written work rather than risk forgetting how the verbal advice affects his achievement level.

As teachers, when we write these comments, we hope that the student and his/her parents reads the comment, assimilate the feedback and improve. We have no way to know for sure that this happens. But I set out to learn more…

A breakdown of the comments from trimester one. In total for trimester one there were: * Students: 133 * Reports: 1066 * Words: 72,921

Let’s start by getting the top 10 words that were used in trimester one comments.

t1.corpus <- GetReportsDFfromMBcsv("data/t1 comments.csv") %>%
        AnonymizeReport() %>%
        GetCorpusFromReportDF()

t1.top10 <- t1.corpus %>%
        DocumentTermMatrix() %>%
        CollapseAndSortDTM() %>%
        head(10) %>%
        select(Words, freq) %>%
        mutate( per.1000 = round(1000 * freq / 72921,2))

T1 Comments: Top 10 Most Word Used Words
Words	Frequency	(per 1000 words)
and	3646	50.00
the	3358	46.05
xxx	2076	28.47
his	1554	21.31
her	1502	20.60
she	1317	18.06
has	911	12.49
this	890	12.20
work	859	11.78
for	822	11.27

Wow, how insightful! (not). Let’s look instead at n-grams (i.e. phrases).

#Set tokenizer funciton to phrases 4- to 8-words in length
DersTokenizer <- function(x) {NGramTokenizer(x, Weka_control(min = 4, max = 8))}
options(mc.cores=1) #strange RJava workaround


t1.top1000 <- t1.corpus %>% 
        DocumentTermMatrix(control=list(tokenize = DersTokenizer)) %>%
        CollapseAndSortDTM() %>%
        mutate(length = CountWords(Words)) %>%
        mutate(LenNorm = length * freq) %>%
        arrange(desc(LenNorm)) %>%
        head(1000)

t1.pruned <- GetPrunedList(t1.top1000, 100)

T1 Comments: Top 10 Most Word Used Phrases – weighted by length
Phrases	Frequency X Length
encouraged that parents review task specific comments and	616
needs to be able to	315
and understanding to solve problems set in familiar	304
to improve xxx can work on	276
i would like to see	260
member of the group who	260
proficiency with the mathematical concepts covered in this	248
information to make scientifically supported judgments	246
to improve in this	236
good understanding of the	232
xxx is able to	228
a specific problem or issue	225

Now we are getting somewhere! These are the phrases that were most used to describe students in the first trimester.

What I want to do now is look at compare two groups and the language used to describe each. Student who were in the: * Top 25% in terms of improvement, and * Bottom 25% in terms of improvement.

To do this I will use the term-frequency/inverse-document frequency metric (tf-idf) which I discovered from an article published on Nate Silver’s FiveThirtyEight blog titled, These Are The Phrases Each GOP Candidate Repeats Most by (Milo Beckman 2016). In it, Beckman analyzes 2016 GOP debate transcripts to find unique phrases for each candidate.

In this analysis, I am employ tf-idf to find phrases that are more likely to have been used to describe students that improved than those that didn’t.

Putting it All Together

year.report <- year.report %>%
    #take only needed columns
    select(Student.ID, Class.ID:Teacher,
               CriMean.t1, CriMean.t2, t12.growth, t12.growth.center) %>%
    #add quartile column based on centered growth
    within(t12.growth.center.quartile <- as.integer(cut(t12.growth.center,
                                      quantile(t12.growth.center, probs=0:4/4,
                                      na.rm = TRUE), include.lowest=TRUE))) %>%
    #add index to crossref w/ corpus
    mutate(ID.SUB = paste(Student.ID, Subject))

#get 4 quartiles of ID.SUB's 
quarts <- c(1,2,3,4)
quartiles <- lapply(quarts, function(x) {
        year.report %>% filter(t12.growth.center.quartile == x) %>%
                .$ID.SUB})

#paste each quartile's comments into one comment
quartile.comments <- lapply(quartiles, function(x) {
        idx <- t1.corpus %>% meta(tag = "ID.SUB") %in% x
        do.call(paste,content(t1.corpus[idx])) })

#take only Q1 and Q4
topbot.comments <- quartile.comments[-c(2,3)]

#make corpus (1 quartile = 1 document)
topbot.corpus <- VectorSource(topbot.comments) %>% Corpus

#make dtc from corpus with phrases 2- to 6-words long
all.tfidf <- GetAllTfIdfMatricesFromCorpus(topbot.corpus, 2,4, norm = TRUE)

#remove repetitive words
all.pruned <- lapply(all.tfidf, GetPrunedList, prune_thru = 300)

top <- all.pruned[[1]] %>%
  transmute(ngrams, Score = tfidfXlength * 100000) %>%
  filter(Score >= 20)
 
bottom <- all.pruned[[2]]  %>%
  transmute(ngrams, Score = tfidfXlength * 100000) %>%
  filter(Score >= 20)

Top 25% Improved: Most Common Phrases from T1
ngrams	Score
class discussions as	30.84788
a focus for the	29.37893
he needs to improve	29.37893
xxx should try to	29.37893
at the start	26.44104
a friendly student who	23.50314
a very polite cooperative	23.50314
achieve to an even	23.50314
an excellent understanding of	23.50314
and should be written	23.50314
be measured and evaluated	23.50314
became somewhat noticeable while	23.50314
cooperative and hardworking student	23.50314
goals should be based	23.50314
had some difficulty when	23.50314
has shown that he	23.50314
her contributions to the	23.50314
her strong performance xxx	23.50314
invest more time at	23.50314
is to more clearly	23.50314
next unit for xxx	23.50314
on display when she	23.50314
positive influence on the	23.50314
tasks on time and	23.50314
the key concepts i	23.50314
the technical skills needed	23.50314
the year well she	23.50314
will find success by	23.50314
group work xxx	22.03420
in the classroom	22.03420
of work	20.56525

Bottom 25% Improved: Most Common Phrases from T1
ngrams	Score
and more comfortable sharing	33.85527
as the year has	33.85527
at first but as	33.85527
for next	28.21272
and i look forward	28.21272
it is important that	28.21272
he did not	25.39145
her knowledge and	25.39145
achievement in all tasks	22.57018
an area of focus	22.57018
an investigation but performed	22.57018
difficult for him to	22.57018
evident while assessing criterion	22.57018
forward to seeing this	22.57018
from the text to	22.57018
homework on time and	22.57018
lines of reasoning xxx	22.57018
plan and create videos	22.57018
still needs to improve	22.57018
the requirements of designing	22.57018
to ensure that she	22.57018
to make sure he	22.57018
understanding of the concept	22.57018
well in the swimming	22.57018
xxx is an energetic	22.57018
xxx started to develop	22.57018
it has been	21.15954
receive valuable feedback	21.15954
result on the	21.15954

Findings

My first conlcusion is that there is a lot of similarities between the most unique phrases describing the top 25% and bottom 25% of improvers. There are no real standout phrases that would key anyone into the idea that this student might improve in the future. That makes sense, there are certainly a lot of other factors that go into whether a student improves or not.

If you were to read 100 comments, you would start to see a lot of similarity in the comments. The similarity however is primarily semantic (ideas) and not syntactic (grammar and word choice). The methodology I employed is not capable of finding similarities in this way only literal similarity. Thus, it is more likely that we see phrases repeated by one teacher about different students. This is the case with “more and more comfortable sharing”, a phrase one teacher used sixteen times. The same goes for “as the year has progressed” (17 times). However, “and I look forward to” used by three different teachers in nine different reports.

More than anything, this table doesn’t conclusively answer my research question at all. Instead I have more questions than when I started.

One phrase worthy of note from the bottom 25% is “achievement in all tasks”. A search of the comments reveals more context, “xxx needs more consistent achievement in all tasks”. This suggests that inconsistent performance might not often lead to impovement in the coming grading period.

That being said, a phrases stood out to me as teacher euphemisms. For example I get the feeling that “more and more comfortable sharing” really means “your son/daughter is shy but working on it.”

In the top 25% table “a focus for the next” falls under the category of something just one teacher said frequently. However, the context for “class discussions as” comes from two teachers who say that despite the students polite nature and high achievement that “I would like to see her partcipate more in class discussions as she has a lot to offer”. The high frequency of the use of this phrase to describe high achieving students almost suggests an inverse proportional relationship between achievement and class particiaption.

References & Footnotes

Milo Beckman. 2016. “These Are The Phrases Each GOP Candidate Repeats Most | FiveThirtyEight.” https://fivethirtyeight.com/features/these-are-the-phrases-each-gop-candidate-repeats-most/.

Assuming that criteria level distribution should be level in the first place.↩
The average growth should be the difference between the Criteria Means of trimester one and two but the average growth excludes mid-year students who: 1) typically don’t do well in their 1st trimester of MYP and 2) are not represented in the T1-T2 growth statistic.↩
A lot of assumptions going on here….↩
Name removed for privacy↩

Comment Mining Write Up - II

Anders Swanson