Supporting Multilingual Learners: Conversational Sequence Types

Author

Katie Jiang

Introduction

One of the most frequent concerns teachers, instructional assistants, and tutors bring to Multilingual Learner teachers about helping their Multilingual Learners is if they are doing it “right”. When asked to specify what they mean, the vast majority of the time the root of their concern is the types of interactions they have with their students. In this paper I will use the Teacher-Student Chatroom Corpus to examine they types of interactions experienced English Learner teachers have with their students, to provide information for other teachers on which to center their self-reflections about interacting with English Learners.

The Teacher-Student Chatroom Corpus (TSCC) contains 260 chatroom dialogues of online English language lesson tutoring between 2 teachers and 13 English learners. There are 42.4K conversational turns and 363K word tokens. Students’ responses during the lessons ranged at CEFR proficiency levels from B1 to C2 (intermediate to mastery). The conversations between teachers and students were analyzed for conversational sequence types, pedagogical focus, and correction of grammatical errors. “Conversational sequence types” refers to sections of conversations that the teachers in the study started with a specific purpose (Caines et. al., 34). The authors of the Chatroom Corpus gathered the dataset to study one-to-one interaction and language teaching at different levels of English proficiency, among other aims related to machine learning.

Research Focus

In this study, I will be focusing on the conversational sequence types that classroom teachers, instructional assistants, and tutors commonly use with students in one-on-one and small group lessons that were studied in the TSCC. These conversational sequence types are intentional “teacher moves”, strategic actions educators use to facilitate student learning. These sequence types also equate with the interactions that teachers are most concerned about. The main sequence types in the TSCC that will be examined in this paper are:

Presentation: presenting or explaining a linguistic skill or knowledge component
Eliciting: continuing to seek out a particular response or realization by the student
Repair: correction of a previous linguistic sequence
Scaffolding: giving helpful support to the student
- lexical resource: appropriate and varied use of vocabulary
- grammatical resource: appropriate use of grammar

In this paper scaffolding is measured in 2 foci, lexical resource and grammatical resource. Analysis of the data reflected that this would be beneficial for better clarity of what experienced teachers do with their MLs. The scaffolding sequence type is explored further in the Analysis section of this paper.

Addressing specific concerns of teachers, the following are real questions, ML teachers have been asked recently with the specific conversational sequence type that would be related:

Am I correcting them too much? (repair, scaffolding: grammatical resource)
I want them to feel comfortable talking, but I don’t want to put pressure on them. Am I asking them too many questions? Am I not giving them enough chances to speak? (eliciting, scaffolding: lexical resource)
Am I explaining too much for my MLs? I feel like I’m just talking at them. (presentation, scaffolding: lexical resource)

As reflected in these questions, many teachers want to know “the right amount”. While the goal of this analysis is not to provide specific numbers, I aim to look for patterns or trends.

I will look at 2 focal research questions:

Which conversational sequence types do English Learner teachers use most often?
Does the frequency of conversation sequence types used by English Learner teachers vary between student proficiency levels?

In analyzing the data from the TSCC, I hope teachers can see patterns that can begin a framework for discussion about the best methods for their Multilingual Learners.

Wrangle

Multiple steps were taken to prepare data for analysis.

The tidyverse and associated libraries were added.
Several files from the TSCC were found to have a timestamp with a character value, while others had a timestamp with a UTC value in the month/day/year format. This prevented the function bind_rows. The incompatible files had to be found. Some files multiple other inconsistences, and were left out of this analysis; some were easier to modify and were included in the merge after being updated to a UTC value in the timestamp column. Due to the number of chats (260) and the number of incompatible files found only a portion of the TSCC data files were used.
A chat.num column was added to each conversation so sequence types would be labeled with the chat number. The chat files were then joined with the metadata file which contained the proficiency level determined for each chat. Note: proficiency levels were assigned by conversation, not by student, meaning there were 260 proficiency assignations, one for each chat, rather than 13, one for each student.

library(tidyverse)
library(tidyr)
library(dplyr)
library(readxl)
library(readr)
library(purrr)

files <- list.files(path = "C:/Users/katie/OneDrive/Documents/TSCCData/", pattern = "\\.xlsx$", full.names = TRUE)

TSCC_data <- files |> 
  map_dfr(read_excel)

TSCC_metadata <- read_csv("C:/Users/katie/OneDrive/Documents/TSCCData/teacherstudentchatroommetadata.csv")

Rows: 260 Columns: 16
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): filename, start.time, teacher, student, student.cefr.level, student...
dbl (8): chat.num, n.turns, n.teacher.turns, n.student.turns, n.words, n.tea...
lgl (1): is_sett_annotated

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

joined_TSCC_data <- left_join(TSCC_data, TSCC_metadata, by = "chat.num")

Data was filtered for the chat number, the sequence type, focus, and proficiency type. The timestamp and edited columns were also maintained in case further troubleshooting or analysis was needed. Rows with NA (entered as a character value) for sequence type and student proficiency level were removed.

All_seq_types <- joined_TSCC_data |>
  select(chat.num, timestamp, edited, seq.type, focus, student.cefr.level)|>
  filter(seq.type != "NA")|>
  filter(student.cefr.level != "NA")

Research Question: Which conversational sequence types do English Learner teachers use most often?

Analysis

To answer the first research question I ran counts of the sequence types. I first ran a count of all sequence types. I found that other terms came higher above presentation, which may be valuable to analyze at a later date. For the purposes of this study, I chose to remain with these terms as the sequence types that were higher than presentation are all sequence types that can also be initiated by students, if some to a minor degree.

#count values in all sequence types
All_seq_types |>
  count(seq.type) |>
  arrange(desc(n))

# A tibble: 64 × 2
   seq.type                  n
   <chr>                 <int>
 1 scaffolding             904
 2 eliciting               277
 3 enquiry                 242
 4 repair                  232
 5 topic development       165
 6 exercise                138
 7 topic opening           130
 8 presentation             69
 9 scaffolding,eliciting    65
10 opening                  62
# ℹ 54 more rows

I then narrowed the count to the four primary conversation sequence types. Scaffolding was by far the most used strategies by teachers, used more than triple other strategies.

#Filter for major sequence types
Major_seq_types_counts <- All_seq_types |>
  filter(seq.type %in% c("eliciting","repair", "presentation", "scaffolding")) |>
  count(seq.type) |>
   arrange(desc(n))
print(Major_seq_types_counts)

# A tibble: 4 × 2
  seq.type         n
  <chr>        <int>
1 scaffolding    904
2 eliciting      277
3 repair         232
4 presentation    69

ggplot(Major_seq_types_counts, aes(x = reorder(seq.type, -n), y = n, fill = seq.type)) +   geom_col(width = 0.9) +   
  coord_flip()  +
labs(title = "Supporting Multilingual Learners", x = "Teacher Move (sequence type)", y = "Counts", fill = "Sequence Type") +
  theme(plot.title = element_text(hjust = 0.5), axis.text.y = element_text(size = 6), axis.title.y = element_text(size = 8), axis.text.x = element_text(size = 6), axis.title.x = element_text(size = 8), legend.text = element_text(size = 6), legend.title = element_text(size = 7))

However, the term “scaffolding” was initially simply defined as “giving helpful support to a student.” What are teachers actually doing when they are scaffolding for students?

Types of Scaffolding

The authors of the study annotated some sequence types with the specific instructional focus the teacher had during that conversational turn of the lesson. To provide more useful information from our data, I broke down the scaffolding category into these foci to find the primary areas in which teachers supported students.

Major_seq_types_scaf_breakdown_counts <- All_seq_types |> 
  filter(seq.type %in% c("presentation", "eliciting", "repair", "scaffolding")) |>
  mutate(display_group = if_else(seq.type == "scaffolding", paste(seq.type, focus, sep = ": "), seq.type)) |> 
  count(display_group, sort = TRUE)
print(Major_seq_types_scaf_breakdown_counts)

# A tibble: 52 × 2
   display_group                                          n
   <chr>                                              <int>
 1 scaffolding: lexical resource                        308
 2 eliciting                                            277
 3 repair                                               232
 4 scaffolding: grammatical resource                    173
 5 scaffolding: lexical resource,meaning                115
 6 presentation                                          69
 7 scaffolding: task achievement                         29
 8 scaffolding: lexical resource,grammatical resource    26
 9 scaffolding: world knowledge                          23
10 scaffolding: NA                                       20
# ℹ 42 more rows

After examining the data, it appears that teachers had two primary areas of scaffolding: lexical resource, and grammatical resource. The authors defined these as follows:

Scaffolding lexical resource: helping the student with appropriate and varied use of vocabulary
Scaffolding grammatical resource: helping the student with appropriate use of grammar

After initially planning on only looking at conversational sequence types without instructional focus, I included these types of scaffolding foci in the analysis to provide more clarity on actions teachers took in this study to support multilingual learners.

Continuing the Analysis with Revised Sequence Types

After doing a word count and visualization of the revised sequence types to account for the major forms of scaffolding, there is a clearer picture of the types of interactions teachers are having with their students. Word usage (lexical resource scaffolding) and seeking out responses (eliciting) seem to be the main focus with a gradual decline. Presentation of material seems to be the least used strategy during these one-on-one lessons.

Top5_seq_types_scaf_breakdown_counts <- All_seq_types |> 
  filter(seq.type %in% c( "presentation", "eliciting", "repair")|
  (seq.type == "scaffolding" & focus %in% c("lexical resource", "grammatical resource")))|>
 mutate( seq.type = if_else(seq.type == "scaffolding", paste(seq.type, focus, sep = ": "), seq.type)) |> 
  count(seq.type, sort = TRUE)
print(Top5_seq_types_scaf_breakdown_counts)

# A tibble: 5 × 2
  seq.type                              n
  <chr>                             <int>
1 scaffolding: lexical resource       308
2 eliciting                           277
3 repair                              232
4 scaffolding: grammatical resource   173
5 presentation                         69

ggplot(Top5_seq_types_scaf_breakdown_counts, aes(x = reorder(seq.type, -n), y = n, fill = seq.type)) +   geom_col(width = 0.9)  +
  coord_flip()  +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 12)) +
  scale_fill_discrete(labels = function(x) str_wrap(x, 12)) +
labs(title = "Supporting Multilingual Learners", x = "Teacher Move (sequence type)", y = "Counts", fill = "Sequence Type") +
  theme(plot.title = element_text(hjust = 0.5), axis.text.y = element_text(size = 6), axis.title.y = element_text(size = 8), axis.text.x = element_text(size = 6), axis.title.x = element_text(size = 8), legend.text = element_text(size = 6), legend.title = element_text(size = 7))

Does the frequency of conversation sequence types used by English Learner teachers vary between student proficiency levels?

Counting Sequence Types by Proficiency

Students’ performance during each lesson was coded using CEFR levels. In this study, students ranged from B1 to C2:

B1: Intermediate
B2: Upper Intermediate
C1: Advanced
C2: Proficient

To answer my next question, I found the count of each major sequence type within each proficiency level and then displayed the data on a bar graph. This immediately showed a distinct difference in the usage of strategies between proficiency levels, particularly with the strategies of eliciting and scaffolding lexical resource.

#counts by proficiency and seq.type
proficiency_seq_type_counts <- Top5_seq_types_scaf_breakdown_counts <- All_seq_types |> 
  filter(seq.type %in% c( "presentation", "eliciting", "repair")|
  (seq.type == "scaffolding" & focus %in% c("lexical resource", "grammatical resource")))|>
 mutate(seq.type = if_else(seq.type == "scaffolding", paste(seq.type, focus, sep = ": "), seq.type ) ) |> 
  count(student.cefr.level, seq.type) %>%
  group_by(student.cefr.level) %>%
  arrange(desc(n), .by_group = TRUE)
  
print(proficiency_seq_type_counts)

# A tibble: 20 × 3
# Groups:   student.cefr.level [4]
   student.cefr.level seq.type                              n
   <chr>              <chr>                             <int>
 1 B1                 eliciting                           112
 2 B1                 repair                               41
 3 B1                 scaffolding: grammatical resource    37
 4 B1                 scaffolding: lexical resource        35
 5 B1                 presentation                         14
 6 B2                 eliciting                           111
 7 B2                 repair                               79
 8 B2                 scaffolding: lexical resource        73
 9 B2                 presentation                         32
10 B2                 scaffolding: grammatical resource    25
11 C1                 scaffolding: lexical resource       120
12 C1                 repair                               61
13 C1                 scaffolding: grammatical resource    44
14 C1                 eliciting                            22
15 C1                 presentation                         14
16 C2                 scaffolding: lexical resource        80
17 C2                 scaffolding: grammatical resource    67
18 C2                 repair                               51
19 C2                 eliciting                            32
20 C2                 presentation                          9

ggplot(proficiency_seq_type_counts, aes(x = reorder(seq.type, -n), y = n, fill = seq.type)) +   geom_col(width = 0.9) +   
  facet_wrap(~student.cefr.level, ncol = 2) +
  coord_flip()  +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 12)) +
  scale_fill_discrete(labels = function(x) str_wrap(x, 12)) +
labs(title = "Supporting Multilingual Learners", x = "Teacher Move (sequence type)", y = "Counts", fill = "Sequence Type") +
  theme(plot.title = element_text(hjust = 0.5), axis.text.y = element_text(size = 6), axis.title.y = element_text(size = 8), axis.text.x = element_text(size = 6), axis.title.x = element_text(size = 8), legend.text = element_text(size = 6), legend.title = element_text(size = 7))

Key Findings

Does the frequency of conversation sequence types used by English Learner teachers vary between student proficiency levels?

Yes, teachers seem to vary strategies significantly by proficiency levels:
- Eliciting is used more than twice as much as other strategies for intermediate students.
- Teachers use more of a balance of strategies with upper intermediate students, but the focus is still on eliciting responses.
- In advanced levels the focus shifts to a significant emphasis on word choice.
- In proficient levels the primary focus is appropriate word and grammar usage, with very minimal presentation of material.

Which conversational sequence types do English Learner teachers use most often?

Scaffolding is the highest sequence type by far, but can be broken into many smaller foci.
Scaffolding lexical resources, eliciting, and repair are the top 3 conversational sequence types.
Presentation is the least common sequence type.

Limitations & Action

This study was of only two teachers, in real time and online. Not all students have the same learning styles even at the same proficiency levels, and teachers vary greatly in teaching styles. While these teachers were selected for this study because of their experience as English Learner teachers, other experienced teachers may use a different frequency of strategies. However, as the TSCC is a wide dataset done with analytical research, this data can be used as a basis for reflection and discourse. This information can be a tool for staff when reflecting on interactions with MLs. With guidance from coaches and ML teachers other staff can consider the teachers’ use of strategies in this study and reflect on their own strategies. For example, it can be used in coaching sessions or professional development. ML teachers can either pre-label or guide teachers in identifying which students are in each proficiency label. Then, coaches, ML teachers, and administrators can use a series of self-reflection questions to guide teachers such as: Think of MLs you know in these proficiency levels. Why do you think these teachers used these strategies more with this proficiency level? Does it match the needs of your students in this level? Why or why not? Are there other strategies you would use more? What strategies do you think you might try to use more than you do now? What strategies do you feel you need more practice or training with?

Rather than providing specific proportions of strategies to use, I hope to provide a tool for teachers to reflect on which teacher moves English Learner teachers use the most, and why, so they can begin transforming their own practice.

References

Caines, Andrew, et al. “The teacher-student chatroom corpus version 2: more lessons, new annotation, automatic detection of sequence shifts.” Proceedings of the 11th workshop on nlp for computer assisted language learning. 2022.