Analysis



This analysis will be conducted on trimester reports for XIS’s Middle Years Program (MYP) students. The MYP program runs from grade six through grade ten and each student takes eight classes.

Set Up and EDA



First I’ll start by loading all the extra packages I used for my analysis.

#load required libraries, data, and created functions
library(tm)
library(dplyr)
library(tidyr)
library(RWeka)
library(DT)
library(knitr)


#source custom-built functions
source("Functions.R")

Now I’ll load the csv file of term grades I exported from ManageBac.

 t1.report <- GetReportsDFfromMBcsv("data/t1 comments.csv")

#find dimensions (rows & columns) of the table
dim(t1.report)
## [1] 1066   14
#get column names
colnames(t1.report)
##  [1] "Student.ID"      "Last.Name"       "First.Name"     
##  [4] "Class.ID"        "Grade.Level"     "Subject"        
##  [7] "Teacher"         "Cri.A"           "Cri.B"          
## [10] "Cri.C"           "Cri.D"           "Sum"            
## [13] "CriMean"         "Student.Comment"

So this table has 14 columns 1066 rows, where each row is one student’s report for one class. Now to show you more what the data looks like, I’ve randomly selected 100 class reports with indentifying information removed (i.e. Name, Student ID, and comment).

#get an example table
t1.report.ex <- t1.report %>%
        #salt student IDs, round CriMean, 
        mutate(Student.ID = "1000****", CriMean = round(CriMean, digits = 3)) %>%
        #select only what I want
        select(Student.ID, Grade.Level:Cri.D, CriMean, -First.Name, -Last.Name) %>%
        #take 100 rows at random
        sample_n(100)



At XIS it isn’t required that all four criteria be assessed each trimester so mean criteria score was calculated based on what was available.


Measuring Improvement



Before we get to measuring how much a student improves from trimester one to two, lets start with an definition then an example.

Definition

Improvement – the class-centered increase of mean criteria levels from trimester one to trimester two.

“centered” in this case signifies that the mean improvement for for each class has been subtracted from the improvement each student received in said class.

Example Methodology

Let’s use an example school with two classes of four students each taught by Ms. Blue and Ms. Green.

Generally, Improvement in this instance is calculated as: \[ {\text{Improvement}_{\text{T1-T2}}} = {\text{CriMean}}_{\text{T2}}-{\text{CriMean}}_{\text{T1}} \]



We assume that if a student’s levels went up from trimester one to trimester two then they improved in that class. However in the above example, who improved more: Denise or Emily? Due to large variation in criteria levels between teachers and classes, this is something that needs addressing1.

To account for this difference I decided to centralize improvement to control for the effects of the teacher2 . The improvement metric is modified by subtracting the mean improvement of each class. That is to say, the mean for each individual class is computed then each individuals improvement is scaled according to the average improvement of the class. Accordingly, the meanof Ms. Green and Ms. Blue’s class is:

by_teacher <- norm.ex %>%
        group_by(Teacher) %>%
        summarize(Improve.m = mean(Improvement))
Teacher Improve.m
Blue 1.625
Green -0.625



The formula for our improvement adjustment. \[ {CenteredImprovement}_{student} = {Improvement}_{student} - {Improvement}_{classmean} \] So using the above formula we get the following column displaying our new improvement metric. Its worth noting that a negative centered improvement score does not necessarily mean that the student’s performance decreased, but that they increased less than the average of the class.

norm.ex.by_teacher <- left_join(norm.ex, by_teacher, by = "Teacher") %>%
        #normalize the t12.growth by mean & sd of teacher t12.growth
        mutate(Improve.centered = (Improvement - Improve.m)) %>%
        mutate_each(funs(round(.,3)), -Student, -Teacher) %>%
        select(Student, Teacher, Improvement:Improve.centered)



Improvement at XIS

So now to the XIS data, below is a table showing the average criteria levels for each subject in the MYP for both trimester one and two as well as the difference3 between them.

#combine all three trimesters of data together
year.report <- GetYearReport()

#wrappers for mean and sd with na.rm = TRUE
av <- function(x) {mean(x, na.rm = TRUE)}
s <- function(x) {sd(x, na.rm = TRUE)}

by_subject <- year.report %>%
        group_by(Subject) %>%
        summarize(CriMean.t1 = av(CriMean.t1), CriMean.t2 = av(CriMean.t2),
                  Imp.m = av(t12.growth), Imp.sd = s(t12.growth)) %>%
        mutate(Improvement.centered = Imp.m - mean(Imp.m)) %>%
        mutate_each(funs(round(.,2)), -Subject) %>%
        ungroup()

datatable(by_subject, caption = "T1-T2 Average Criteria Levels by Subject*",
          class = 'compact', options = list(pageLength = 12, dom = 't'),
          rownames = FALSE)



We can see from the table that the average criteria improvement varies among the subjects. This holds true for grade levels, teachers and individual classes.

I think the be

#combine all three trimesters of data together
year.report <- GetYearReport() %>%
        select(Student.ID, Class.ID:Teacher, CriMean.t1, CriMean.t2, t12.growth) %>%
        group_by(Class.ID) %>%
        #add centered mean metric
        mutate(t12.growth.center = round(t12.growth - av(t12.growth), 22)) %>%
        ungroup()

#anonymize and randomly samplethe data for display
year.report.anon <- year.report %>%
        select( -Class.ID) %>%
        sample_n(100) %>%
        mutate(Student.ID = "1000****")

The Comments



The general format for XIS trimester comments for students is two paragraphs:

  • A paragraph about what happened in class that trimester generally
  • A paragraph of three sentences each of which performs the following function of saying something the student:

    • has done well,
    • the student struggles with, and
    • can do to improve that with which they struggle.

A typical comment reads like this:

The MYP sixth grade science program at XIS is an intellectually challenging program that results in creative, critical, reflective thinkers. It is designed to help students make connections between science and the real world. Students are developing approaches to learning skills for thinking before writing responses and communicating using tables and graphs. The first trimester focused on cells and disease. The key concept was form. Three criteria A-C were covered with the following summative assessments: Criterion A- Unit test, Criterion B & C- design lab on yeast and Criterion C ’ vertical leap investigation.

****4 is willing to work in class. He has shown an improvement in submission and achievement in assessment tasks over the first trimester. Further attention to the detail required for different assessment criteria should allow **** continued improvement. He is encouraged to write independently first so that advice can be provided on written work rather than risk forgetting how the verbal advice affects his achievement level.



As teachers, when we write these comments, we hope that the student and his/her parents reads the comment, assimilate the feedback and improve. We have no way to know for sure that this happens. But I set out to learn more…

A breakdown of the comments from trimester one. In total for trimester one there were: * Students: 133 * Reports: 1066 * Words: 72,921

Let’s start by getting the top 10 words that were used in trimester one comments.

t1.corpus <- GetReportsDFfromMBcsv("data/t1 comments.csv") %>%
        AnonymizeReport() %>%
        GetCorpusFromReportDF()

t1.top10 <- t1.corpus %>%
        DocumentTermMatrix() %>%
        CollapseAndSortDTM() %>%
        head(10) %>%
        select(Words, freq) %>%
        mutate( per.1000 = round(1000 * freq / 72921,2))
T1 Comments: Top 10 Most Word Used Words
Words Frequency (per 1000 words)
and 3646 50.00
the 3358 46.05
xxx 2076 28.47
his 1554 21.31
her 1502 20.60
she 1317 18.06
has 911 12.49
this 890 12.20
work 859 11.78
for 822 11.27



Wow, how insightful! (not). Let’s look instead at n-grams (i.e. phrases).

#Set tokenizer funciton to phrases 4- to 8-words in length
DersTokenizer <- function(x) {NGramTokenizer(x, Weka_control(min = 4, max = 8))}
options(mc.cores=1) #strange RJava workaround


t1.top1000 <- t1.corpus %>% 
        DocumentTermMatrix(control=list(tokenize = DersTokenizer)) %>%
        CollapseAndSortDTM() %>%
        mutate(length = CountWords(Words)) %>%
        mutate(LenNorm = length * freq) %>%
        arrange(desc(LenNorm)) %>%
        head(1000)

t1.pruned <- GetPrunedList(t1.top1000, 100)
T1 Comments: Top 10 Most Word Used Phrases – weighted by length
Phrases Frequency X Length
encouraged that parents review task specific comments and 616
needs to be able to 315
and understanding to solve problems set in familiar 304
to improve xxx can work on 276
i would like to see 260
member of the group who 260
proficiency with the mathematical concepts covered in this 248
information to make scientifically supported judgments 246
to improve in this 236
good understanding of the 232
xxx is able to 228
a specific problem or issue 225



Now we are getting somewhere! These are the phrases that were most used to describe students in the first trimester.

What I want to do now is look at compare two groups and the language used to describe each. Student who were in the: * Top 25% in terms of improvement, and * Bottom 25% in terms of improvement.

To do this I will use the term-frequency/inverse-document frequency metric (tf-idf) which I discovered from an article published on Nate Silver’s FiveThirtyEight blog titled, These Are The Phrases Each GOP Candidate Repeats Most by (Milo Beckman 2016). In it, Beckman analyzes 2016 GOP debate transcripts to find unique phrases for each candidate.

In this analysis, I am employ tf-idf to find phrases that are more likely to have been used to describe students that improved than those that didn’t.

Putting it All Together

year.report <- year.report %>%
    #take only needed columns
    select(Student.ID, Class.ID:Teacher,
               CriMean.t1, CriMean.t2, t12.growth, t12.growth.center) %>%
    #add quartile column based on centered growth
    within(t12.growth.center.quartile <- as.integer(cut(t12.growth.center,
                                      quantile(t12.growth.center, probs=0:4/4,
                                      na.rm = TRUE), include.lowest=TRUE))) %>%
    #add index to crossref w/ corpus
    mutate(ID.SUB = paste(Student.ID, Subject))

#get 4 quartiles of ID.SUB's 
quarts <- c(1,2,3,4)
quartiles <- lapply(quarts, function(x) {
        year.report %>% filter(t12.growth.center.quartile == x) %>%
                .$ID.SUB})

#paste each quartile's comments into one comment
quartile.comments <- lapply(quartiles, function(x) {
        idx <- t1.corpus %>% meta(tag = "ID.SUB") %in% x
        do.call(paste,content(t1.corpus[idx])) })

#take only Q1 and Q4
topbot.comments <- quartile.comments[-c(2,3)]

#make corpus (1 quartile = 1 document)
topbot.corpus <- VectorSource(topbot.comments) %>% Corpus
#make dtc from corpus with phrases 2- to 6-words long
all.tfidf <- GetAllTfIdfMatricesFromCorpus(topbot.corpus, 2,4, norm = TRUE)

#remove repetitive words
all.pruned <- lapply(all.tfidf, GetPrunedList, prune_thru = 300)

top <- all.pruned[[1]] %>%
  transmute(ngrams, Score = tfidfXlength * 100000) %>%
  filter(Score >= 20)
 
bottom <- all.pruned[[2]]  %>%
  transmute(ngrams, Score = tfidfXlength * 100000) %>%
  filter(Score >= 20)
Top 25% Improved: Most Common Phrases from T1
ngrams Score
class discussions as 30.84788
a focus for the 29.37893
he needs to improve 29.37893
xxx should try to 29.37893
at the start 26.44104
a friendly student who 23.50314
a very polite cooperative 23.50314
achieve to an even 23.50314
an excellent understanding of 23.50314
and should be written 23.50314
be measured and evaluated 23.50314
became somewhat noticeable while 23.50314
cooperative and hardworking student 23.50314
goals should be based 23.50314
had some difficulty when 23.50314
has shown that he 23.50314
her contributions to the 23.50314
her strong performance xxx 23.50314
invest more time at 23.50314
is to more clearly 23.50314
next unit for xxx 23.50314
on display when she 23.50314
positive influence on the 23.50314
tasks on time and 23.50314
the key concepts i 23.50314
the technical skills needed 23.50314
the year well she 23.50314
will find success by 23.50314
group work xxx 22.03420
in the classroom 22.03420
of work 20.56525
Bottom 25% Improved: Most Common Phrases from T1
ngrams Score
and more comfortable sharing 33.85527
as the year has 33.85527
at first but as 33.85527
for next 28.21272
and i look forward 28.21272
it is important that 28.21272
he did not 25.39145
her knowledge and 25.39145
achievement in all tasks 22.57018
an area of focus 22.57018
an investigation but performed 22.57018
difficult for him to 22.57018
evident while assessing criterion 22.57018
forward to seeing this 22.57018
from the text to 22.57018
homework on time and 22.57018
lines of reasoning xxx 22.57018
plan and create videos 22.57018
still needs to improve 22.57018
the requirements of designing 22.57018
to ensure that she 22.57018
to make sure he 22.57018
understanding of the concept 22.57018
well in the swimming 22.57018
xxx is an energetic 22.57018
xxx started to develop 22.57018
it has been 21.15954
receive valuable feedback 21.15954
result on the 21.15954

Findings



My first conlcusion is that there is a lot of similarities between the most unique phrases describing the top 25% and bottom 25% of improvers. There are no real standout phrases that would key anyone into the idea that this student might improve in the future. That makes sense, there are certainly a lot of other factors that go into whether a student improves or not.

If you were to read 100 comments, you would start to see a lot of similarity in the comments. The similarity however is primarily semantic (ideas) and not syntactic (grammar and word choice). The methodology I employed is not capable of finding similarities in this way only literal similarity. Thus, it is more likely that we see phrases repeated by one teacher about different students. This is the case with “more and more comfortable sharing”, a phrase one teacher used sixteen times. The same goes for “as the year has progressed” (17 times). However, “and I look forward to” used by three different teachers in nine different reports.

More than anything, this table doesn’t conclusively answer my research question at all. Instead I have more questions than when I started.

One phrase worthy of note from the bottom 25% is “achievement in all tasks”. A search of the comments reveals more context, “xxx needs more consistent achievement in all tasks”. This suggests that inconsistent performance might not often lead to impovement in the coming grading period.

That being said, a phrases stood out to me as teacher euphemisms. For example I get the feeling that “more and more comfortable sharing” really means “your son/daughter is shy but working on it.”

In the top 25% table “a focus for the next” falls under the category of something just one teacher said frequently. However, the context for “class discussions as” comes from two teachers who say that despite the students polite nature and high achievement that “I would like to see her partcipate more in class discussions as she has a lot to offer”. The high frequency of the use of this phrase to describe high achieving students almost suggests an inverse proportional relationship between achievement and class particiaption.


References & Footnotes

Milo Beckman. 2016. “These Are The Phrases Each GOP Candidate Repeats Most | FiveThirtyEight.” https://fivethirtyeight.com/features/these-are-the-phrases-each-gop-candidate-repeats-most/.


  1. Assuming that criteria level distribution should be level in the first place.

  2. The average growth should be the difference between the Criteria Means of trimester one and two but the average growth excludes mid-year students who: 1) typically don’t do well in their 1st trimester of MYP and 2) are not represented in the T1-T2 growth statistic.

  3. A lot of assumptions going on here….

  4. Name removed for privacy