This analysis will be conducted on trimester reports for XIS’s Middle Years Program (MYP) students. The MYP program runs from grade six through grade ten and each student takes eight classes.
First I’ll start by loading all the extra packages I used for my analysis.
#load required libraries, data, and created functions
library(tm)
library(dplyr)
library(tidyr)
library(RWeka)
library(DT)
library(knitr)
#source custom-built functions
source("Functions.R")
Now I’ll load the csv file of term grades I exported from ManageBac.
t1.report <- GetReportsDFfromMBcsv("data/t1 comments.csv")
#find dimensions (rows & columns) of the table
dim(t1.report)
## [1] 1066 14
#get column names
colnames(t1.report)
## [1] "Student.ID" "Last.Name" "First.Name"
## [4] "Class.ID" "Grade.Level" "Subject"
## [7] "Teacher" "Cri.A" "Cri.B"
## [10] "Cri.C" "Cri.D" "Sum"
## [13] "CriMean" "Student.Comment"
So this table has 14 columns 1066 rows, where each row is one student’s report for one class. Now to show you more what the data looks like, I’ve randomly selected 100 class reports with indentifying information removed (i.e. Name, Student ID, and comment).
#get an example table
t1.report.ex <- t1.report %>%
#salt student IDs, round CriMean,
mutate(Student.ID = "1000****", CriMean = round(CriMean, digits = 3)) %>%
#select only what I want
select(Student.ID, Grade.Level:Cri.D, CriMean, -First.Name, -Last.Name) %>%
#take 100 rows at random
sample_n(100)
At XIS it isn’t required that all four criteria be assessed each trimester so mean criteria score was calculated based on what was available.
Before we get to measuring how much a student improves from trimester one to two, lets start with an definition then an example.
Improvement – the class-centered increase of mean criteria levels from trimester one to trimester two.
“centered” in this case signifies that the mean improvement for for each class has been subtracted from the improvement each student received in said class.
Let’s use an example school with two classes of four students each taught by Ms. Blue and Ms. Green.
Generally, Improvement in this instance is calculated as: \[ {\text{Improvement}_{\text{T1-T2}}} = {\text{CriMean}}_{\text{T2}}-{\text{CriMean}}_{\text{T1}} \]
We assume that if a student’s levels went up from trimester one to trimester two then they improved in that class. However in the above example, who improved more: Denise or Emily? Due to large variation in criteria levels between teachers and classes, this is something that needs addressing1.
To account for this difference I decided to centralize improvement to control for the effects of the teacher2 . The improvement metric is modified by subtracting the mean improvement of each class. That is to say, the mean for each individual class is computed then each individuals improvement is scaled according to the average improvement of the class. Accordingly, the meanof Ms. Green and Ms. Blue’s class is:
by_teacher <- norm.ex %>%
group_by(Teacher) %>%
summarize(Improve.m = mean(Improvement))
| Teacher | Improve.m |
|---|---|
| Blue | 1.625 |
| Green | -0.625 |
The formula for our improvement adjustment. \[ {CenteredImprovement}_{student} = {Improvement}_{student} - {Improvement}_{classmean} \] So using the above formula we get the following column displaying our new improvement metric. Its worth noting that a negative centered improvement score does not necessarily mean that the student’s performance decreased, but that they increased less than the average of the class.
norm.ex.by_teacher <- left_join(norm.ex, by_teacher, by = "Teacher") %>%
#normalize the t12.growth by mean & sd of teacher t12.growth
mutate(Improve.centered = (Improvement - Improve.m)) %>%
mutate_each(funs(round(.,3)), -Student, -Teacher) %>%
select(Student, Teacher, Improvement:Improve.centered)
So now to the XIS data, below is a table showing the average criteria levels for each subject in the MYP for both trimester one and two as well as the difference3 between them.
#combine all three trimesters of data together
year.report <- GetYearReport()
#wrappers for mean and sd with na.rm = TRUE
av <- function(x) {mean(x, na.rm = TRUE)}
s <- function(x) {sd(x, na.rm = TRUE)}
by_subject <- year.report %>%
group_by(Subject) %>%
summarize(CriMean.t1 = av(CriMean.t1), CriMean.t2 = av(CriMean.t2),
Imp.m = av(t12.growth), Imp.sd = s(t12.growth)) %>%
mutate(Improvement.centered = Imp.m - mean(Imp.m)) %>%
mutate_each(funs(round(.,2)), -Subject) %>%
ungroup()
datatable(by_subject, caption = "T1-T2 Average Criteria Levels by Subject*",
class = 'compact', options = list(pageLength = 12, dom = 't'),
rownames = FALSE)
We can see from the table that the average criteria improvement varies among the subjects. This holds true for grade levels, teachers and individual classes.
I think the be
#combine all three trimesters of data together
year.report <- GetYearReport() %>%
select(Student.ID, Class.ID:Teacher, CriMean.t1, CriMean.t2, t12.growth) %>%
group_by(Class.ID) %>%
#add centered mean metric
mutate(t12.growth.center = round(t12.growth - av(t12.growth), 22)) %>%
ungroup()
#anonymize and randomly samplethe data for display
year.report.anon <- year.report %>%
select( -Class.ID) %>%
sample_n(100) %>%
mutate(Student.ID = "1000****")
The general format for XIS trimester comments for students is two paragraphs:
A paragraph of three sentences each of which performs the following function of saying something the student:
A typical comment reads like this:
The MYP sixth grade science program at XIS is an intellectually challenging program that results in creative, critical, reflective thinkers. It is designed to help students make connections between science and the real world. Students are developing approaches to learning skills for thinking before writing responses and communicating using tables and graphs. The first trimester focused on cells and disease. The key concept was form. Three criteria A-C were covered with the following summative assessments: Criterion A- Unit test, Criterion B & C- design lab on yeast and Criterion C ’ vertical leap investigation.
****4 is willing to work in class. He has shown an improvement in submission and achievement in assessment tasks over the first trimester. Further attention to the detail required for different assessment criteria should allow **** continued improvement. He is encouraged to write independently first so that advice can be provided on written work rather than risk forgetting how the verbal advice affects his achievement level.
As teachers, when we write these comments, we hope that the student and his/her parents reads the comment, assimilate the feedback and improve. We have no way to know for sure that this happens. But I set out to learn more…
A breakdown of the comments from trimester one. In total for trimester one there were: * Students: 133 * Reports: 1066 * Words: 72,921
Let’s start by getting the top 10 words that were used in trimester one comments.
t1.corpus <- GetReportsDFfromMBcsv("data/t1 comments.csv") %>%
AnonymizeReport() %>%
GetCorpusFromReportDF()
t1.top10 <- t1.corpus %>%
DocumentTermMatrix() %>%
CollapseAndSortDTM() %>%
head(10) %>%
select(Words, freq) %>%
mutate( per.1000 = round(1000 * freq / 72921,2))
| Words | Frequency | (per 1000 words) |
|---|---|---|
| and | 3646 | 50.00 |
| the | 3358 | 46.05 |
| xxx | 2076 | 28.47 |
| his | 1554 | 21.31 |
| her | 1502 | 20.60 |
| she | 1317 | 18.06 |
| has | 911 | 12.49 |
| this | 890 | 12.20 |
| work | 859 | 11.78 |
| for | 822 | 11.27 |
Wow, how insightful! (not). Let’s look instead at n-grams (i.e. phrases).
#Set tokenizer funciton to phrases 4- to 8-words in length
DersTokenizer <- function(x) {NGramTokenizer(x, Weka_control(min = 4, max = 8))}
options(mc.cores=1) #strange RJava workaround
t1.top1000 <- t1.corpus %>%
DocumentTermMatrix(control=list(tokenize = DersTokenizer)) %>%
CollapseAndSortDTM() %>%
mutate(length = CountWords(Words)) %>%
mutate(LenNorm = length * freq) %>%
arrange(desc(LenNorm)) %>%
head(1000)
t1.pruned <- GetPrunedList(t1.top1000, 100)
| Phrases | Frequency X Length |
|---|---|
| encouraged that parents review task specific comments and | 616 |
| needs to be able to | 315 |
| and understanding to solve problems set in familiar | 304 |
| to improve xxx can work on | 276 |
| i would like to see | 260 |
| member of the group who | 260 |
| proficiency with the mathematical concepts covered in this | 248 |
| information to make scientifically supported judgments | 246 |
| to improve in this | 236 |
| good understanding of the | 232 |
| xxx is able to | 228 |
| a specific problem or issue | 225 |
Now we are getting somewhere! These are the phrases that were most used to describe students in the first trimester.
What I want to do now is look at compare two groups and the language used to describe each. Student who were in the: * Top 25% in terms of improvement, and * Bottom 25% in terms of improvement.
To do this I will use the term-frequency/inverse-document frequency metric (tf-idf) which I discovered from an article published on Nate Silver’s FiveThirtyEight blog titled, These Are The Phrases Each GOP Candidate Repeats Most by (Milo Beckman 2016). In it, Beckman analyzes 2016 GOP debate transcripts to find unique phrases for each candidate.
In this analysis, I am employ tf-idf to find phrases that are more likely to have been used to describe students that improved than those that didn’t.
year.report <- year.report %>%
#take only needed columns
select(Student.ID, Class.ID:Teacher,
CriMean.t1, CriMean.t2, t12.growth, t12.growth.center) %>%
#add quartile column based on centered growth
within(t12.growth.center.quartile <- as.integer(cut(t12.growth.center,
quantile(t12.growth.center, probs=0:4/4,
na.rm = TRUE), include.lowest=TRUE))) %>%
#add index to crossref w/ corpus
mutate(ID.SUB = paste(Student.ID, Subject))
#get 4 quartiles of ID.SUB's
quarts <- c(1,2,3,4)
quartiles <- lapply(quarts, function(x) {
year.report %>% filter(t12.growth.center.quartile == x) %>%
.$ID.SUB})
#paste each quartile's comments into one comment
quartile.comments <- lapply(quartiles, function(x) {
idx <- t1.corpus %>% meta(tag = "ID.SUB") %in% x
do.call(paste,content(t1.corpus[idx])) })
#take only Q1 and Q4
topbot.comments <- quartile.comments[-c(2,3)]
#make corpus (1 quartile = 1 document)
topbot.corpus <- VectorSource(topbot.comments) %>% Corpus
#make dtc from corpus with phrases 2- to 6-words long
all.tfidf <- GetAllTfIdfMatricesFromCorpus(topbot.corpus, 2,4, norm = TRUE)
#remove repetitive words
all.pruned <- lapply(all.tfidf, GetPrunedList, prune_thru = 300)
top <- all.pruned[[1]] %>%
transmute(ngrams, Score = tfidfXlength * 100000) %>%
filter(Score >= 20)
bottom <- all.pruned[[2]] %>%
transmute(ngrams, Score = tfidfXlength * 100000) %>%
filter(Score >= 20)
| ngrams | Score |
|---|---|
| class discussions as | 30.84788 |
| a focus for the | 29.37893 |
| he needs to improve | 29.37893 |
| xxx should try to | 29.37893 |
| at the start | 26.44104 |
| a friendly student who | 23.50314 |
| a very polite cooperative | 23.50314 |
| achieve to an even | 23.50314 |
| an excellent understanding of | 23.50314 |
| and should be written | 23.50314 |
| be measured and evaluated | 23.50314 |
| became somewhat noticeable while | 23.50314 |
| cooperative and hardworking student | 23.50314 |
| goals should be based | 23.50314 |
| had some difficulty when | 23.50314 |
| has shown that he | 23.50314 |
| her contributions to the | 23.50314 |
| her strong performance xxx | 23.50314 |
| invest more time at | 23.50314 |
| is to more clearly | 23.50314 |
| next unit for xxx | 23.50314 |
| on display when she | 23.50314 |
| positive influence on the | 23.50314 |
| tasks on time and | 23.50314 |
| the key concepts i | 23.50314 |
| the technical skills needed | 23.50314 |
| the year well she | 23.50314 |
| will find success by | 23.50314 |
| group work xxx | 22.03420 |
| in the classroom | 22.03420 |
| of work | 20.56525 |
| ngrams | Score |
|---|---|
| and more comfortable sharing | 33.85527 |
| as the year has | 33.85527 |
| at first but as | 33.85527 |
| for next | 28.21272 |
| and i look forward | 28.21272 |
| it is important that | 28.21272 |
| he did not | 25.39145 |
| her knowledge and | 25.39145 |
| achievement in all tasks | 22.57018 |
| an area of focus | 22.57018 |
| an investigation but performed | 22.57018 |
| difficult for him to | 22.57018 |
| evident while assessing criterion | 22.57018 |
| forward to seeing this | 22.57018 |
| from the text to | 22.57018 |
| homework on time and | 22.57018 |
| lines of reasoning xxx | 22.57018 |
| plan and create videos | 22.57018 |
| still needs to improve | 22.57018 |
| the requirements of designing | 22.57018 |
| to ensure that she | 22.57018 |
| to make sure he | 22.57018 |
| understanding of the concept | 22.57018 |
| well in the swimming | 22.57018 |
| xxx is an energetic | 22.57018 |
| xxx started to develop | 22.57018 |
| it has been | 21.15954 |
| receive valuable feedback | 21.15954 |
| result on the | 21.15954 |
My first conlcusion is that there is a lot of similarities between the most unique phrases describing the top 25% and bottom 25% of improvers. There are no real standout phrases that would key anyone into the idea that this student might improve in the future. That makes sense, there are certainly a lot of other factors that go into whether a student improves or not.
If you were to read 100 comments, you would start to see a lot of similarity in the comments. The similarity however is primarily semantic (ideas) and not syntactic (grammar and word choice). The methodology I employed is not capable of finding similarities in this way only literal similarity. Thus, it is more likely that we see phrases repeated by one teacher about different students. This is the case with “more and more comfortable sharing”, a phrase one teacher used sixteen times. The same goes for “as the year has progressed” (17 times). However, “and I look forward to” used by three different teachers in nine different reports.
More than anything, this table doesn’t conclusively answer my research question at all. Instead I have more questions than when I started.
One phrase worthy of note from the bottom 25% is “achievement in all tasks”. A search of the comments reveals more context, “xxx needs more consistent achievement in all tasks”. This suggests that inconsistent performance might not often lead to impovement in the coming grading period.
That being said, a phrases stood out to me as teacher euphemisms. For example I get the feeling that “more and more comfortable sharing” really means “your son/daughter is shy but working on it.”
In the top 25% table “a focus for the next” falls under the category of something just one teacher said frequently. However, the context for “class discussions as” comes from two teachers who say that despite the students polite nature and high achievement that “I would like to see her partcipate more in class discussions as she has a lot to offer”. The high frequency of the use of this phrase to describe high achieving students almost suggests an inverse proportional relationship between achievement and class particiaption.
Milo Beckman. 2016. “These Are The Phrases Each GOP Candidate Repeats Most | FiveThirtyEight.” https://fivethirtyeight.com/features/these-are-the-phrases-each-gop-candidate-repeats-most/.
Assuming that criteria level distribution should be level in the first place.↩
The average growth should be the difference between the Criteria Means of trimester one and two but the average growth excludes mid-year students who: 1) typically don’t do well in their 1st trimester of MYP and 2) are not represented in the T1-T2 growth statistic.↩
A lot of assumptions going on here….↩
Name removed for privacy↩