Exploring Pre-service Teachers’ Understandings and Attitudes on Computational Thinking

1. Introduction
2. Wrangle
3. Explore
4. Sentiment Analysis
5. Model
6. Conclusion, Limitations and Next Steps
7. References

1. Introduction

In 2006, Wing proposed that Computational Thinking (“CT”) is a fundamental analytical skill for 21st-century citizens which led to pervading CT into K-12 education (Angeli et al., 2006; Wing, 2006). In 2010, the National Council for Research (NRC) echoed Wings’ advocate on infusing CT to all by defining CT as “a cognitive skill that an average person is expected to possess” (NRC, 2010; Yadav et al., 2016). In addition, the Computer Science Teachers Associate (CSTA) highlighted the importance of CT in K-12 classrooms as “a problem-solving methodology that can be automated and transferred and applied across subjects” (as cited in Barr & Stephenson, 2011; Yadav et al., 2014). In fact, the current International Society for Technology in Education (ISTE) Standards for Students emphasized the needs for our younger generations to develop CT skills in order to navigate in a digital world (ISTE, 2016). One way to successfully implement CT into classrooms relies on teachers. Therefore, we must equip our teachers with the knowledge and skillsets to integrate CT (Yadav et al., 2014). Accordingly, I conducted this case study to explore 1) how pre-service teachers understand CT and 2) what their attitudes and preliminary ideas are on how to integrate CT into classrooms.

What is CT? The CSTA and ISTE provides a operational definition of CT

The dataset that I used for this case study is from a larger research project on promoting pre-service teachers’ CT development. I have been actively involved in this research project. For the context of this case study, I looked at a sample of pre-service teachers who participated in a college level course which introduces computational thinking and provides learning activities for them to experience computational tools. The participants were asked to complete a course survey at the beginning and at the end of the course.

For this case study, I focused on analyzing the survey data and selected to examine the following variables which are highly relevant to my study purposes. First, I focused on two constructs which are “CT definition” and “Knowledge and Belief”. In addition, I looked at two open-ended questions. The first open-ended questions asked pre-service teachers to explain CT and the second one asked how the pre-service teachers would integrate CT into classrooms.

2. Wrangle

I personally think that the data wrangling phase is very important but also challenging. For this project, I had been using different ways to clean and format this sample dataset. However, I realized that there were some steps that I missed during the wrangling phase while I was in the exploring and the analysis phases. Therefore, I had to go back to the wrangling phase and to see if there was anything that I could have done differently. In the steps below, I presented the finalized process of wrangling the dataset. I first introduced needed packages and subsequently describe each step with specific R code.

2.1 Packages

I used the following R packages to wrangle and analyze data.

devtools::install_github("gaospecial/wordcloud2")

## Error in get(genname, envir = envir) : object 'testthat_print' not found

## WARNING: Rtools is required to build R packages, but no version of Rtools compatible with R 4.0.2 was found. (Only the following incompatible version(s) of Rtools were found:3.5)
## 
## Please download and install Rtools custom from https://cran.r-project.org/bin/windows/Rtools/.

## Skipping install of 'wordcloud2' from a github remote, the SHA1 (cd95cdb5) has not changed since last install.
##   Use `force = TRUE` to force installation

library(tidyverse)

## -- Attaching packages -------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.1     v dplyr   1.0.0
## v tidyr   1.1.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0

## -- Conflicts ----------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(readxl)
library(skimr)

## Warning: package 'skimr' was built under R version 4.0.5

library(tidytext)

## Warning: package 'tidytext' was built under R version 4.0.5

library(vader)

## Warning: package 'vader' was built under R version 4.0.5

library(wordcloud2)
library(textdata)

## Warning: package 'textdata' was built under R version 4.0.5

2.2 Dataset

Prior to the wrangling process, I de-identified the dataset. The raw dataset contains 1065 rows and 56 columns.

#load original dataset
dat <- read.csv("Data/all4LA.csv")

2.3 Select variables

As mentioned in the introduction section, I selected to examine two constructs and two open-ended questions from the dataset. The constructs are 1) “CT Definition” which contains 4 survey items and 2) “Knowledge and Belief” which contains 5 survey items. In addition to select the needed items from the dataset, I also selected other variables which are the “StartDate”, “Progress” and “ID”, these variables are very important for me to clean and structure the data for later analysis.

dat_select <- dat %>% select(
  StartDate,
  Progress,
  id=D1,
  ct_meaning=Q2,
  ct_int_text=Q32,
  def_1=Q5_1,
  def_2=Q5_2,
  def_3=Q5_3,
  def_4=Q5_4,
  kb_1=Q5_22,
  kb_2=Q5_23,
  kb_3=Q5_24,
  kb_4=Q5_25,
  kb_5=Q5_26
)
dim(dat_select)

## [1] 1065   14

2.4 Drop incomplete data entries

I kept the “Progress” variable because it helped me to identify completed data entries and to remove any incomplete data entries. After running the code chuck below, the dataset has 967 completed data entries.

dat_clean <- dat_select %>%  filter(Progress == 100)
dim(dat_clean)

## [1] 967  14

2.5 Likert-scale questions

The survey items are likert-scale questions with four choices from “Strongly Agree” to “Strong Disagree”. I first converted the text of all the likert-scale questions to numeric values, then reversed scales. Finally, I combined survey items to generate their respective constructs.

#convert text likert scales to numeric values
dat_likert <- dat_clean %>% mutate_at(c(6:14),funs(recode(.,"Strongly Agree" =4,
                                                           "Agree"=3,
                                                           "Disagree"=2,
                                                           "Strongly Disagree"=1)))

## Warning: `funs()` is deprecated as of dplyr 0.8.0.
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

#reverse scales
reverse_scale <- function(question){
  x <- case_when(
    question == 1 ~ 4,
    question == 2 ~ 3,
    question == 3 ~ 2,
    question == 4 ~ 1,
    TRUE ~ NA_real_
  )
  x
}
dat_likert_reverse <- dat_likert %>% 
  mutate(def_1 = reverse_scale(def_1),
         def_3 = reverse_scale(def_3))

#combine construct
dat_likert_combine <- dat_likert_reverse %>% rowwise() %>% 
  mutate(
    def = mean(c_across(def_1:def_4)),
    kb = mean(c_across(kb_1:kb_5))
  )

2.6 Define the pre and post conditions

As I mentioned earlier, the dataset contained the pre and the post responses. In order to specify the time/order/condition for each data entry, I kept the “StartDate” variable to help me define the time for each data entry. For doing this, I first converted the “StartDate” to extract the month, I then assigned the pre or the post condition using the month for each data entry.

#convert the date to extract the month
dat_likert_combine$StartDate <- as.Date(dat_likert_combine$StartDate,format = "%m/%d/%Y")
dat_likert_combine$mon <- format(dat_likert_combine$StartDate,"%m") 

#define pre and post condition by month
dat_final <- dat_likert_combine %>% mutate(
  con1 = case_when(
    mon %in% c ("02","08","09") ~ "pre",
    mon %in% c("05","12") ~ "post"
  )
)
dim(dat_final)

## [1] 967  18

2.7 Create new datasets

With all the steps that I have done above, I used the below R code to create new datasets which are important for later analysis. There are three new datasets which were created based on the raw dataset: 1) the dataset which contains all the completed data entries for the pre survey (505 rows and 18 cols); 2) the dataset which contains all the completed data entries for the post survey (342 rows and 18 cols) and 3) the final dataset which include the pre and post datasets (847 rows and 6 cols).

#create the pre dataset without duplicate cases
pre <- dat_final %>% filter(con1 == "pre") 
pre <- pre[!duplicated(pre$id),]
dim(pre) #505 rows and 18 cols

## [1] 505  18

#create the post dataset without duplicate cases
post <- dat_final %>% filter(con1 == "post")
post <- post[!duplicated(post$id),]
dim(post) #342 rows and 18 cols

## [1] 342  18

#the final dataset for analysis
dat_analysis <- rbind(pre,post) %>% select(c(3:5,15,16,18))
dim(dat_analysis) #847 rows and 6 cols

## [1] 847   6

3. Explore

For the two constructs, I am interested in seeing their distributions in the pre and the post conditions. For this purpose, I first converted the data to the pivot longer format then used the ggplots for data visualizations.

#data visualization for CT definition, and knowledge and belief
dat_analysis_visual <- dat_analysis %>% pivot_longer(cols = 4:5,
                                                     names_to = "construct",
                                                     values_to = "response")

dat_analysis_visual$con1 <- as.factor(dat_analysis_visual$con1)

p <- ggplot(dat_analysis_visual, aes(x=con1, y= response, fill = con1)) +
  geom_boxplot(alpha=0.7) +
  facet_wrap(~construct,labeller = label_both) +
  scale_x_discrete(limits = c("pre", "post")) +
  stat_summary(fun.y=mean, geom="point", shape=20, size=7, color="yellow", fill="yellow") +
  theme(legend.position="none") +
  scale_fill_brewer(palette="Set1")

## Warning: `fun.y` is deprecated. Use `fun` instead.

According to the boxplots shown above, we can see that the post survey responses have higher means and mediums on the “CT Definition” and the “Knowledge and Belief” constructs compared to the pre survey responses. This means that pre-service teachers’ CT understandings, on average, increased after they participated in the course. I am interested in knowing whether the open-ended responses aligned with this observation.

4. Sentiment Analysis

I first explored the open-ended responses for the question asking pre-service teachers to define CT. As the first step of doing the sentiment analysis, I extracted two datasets: the text responses for the pre survey and the responses for the post survey.

pre_def <- pre %>% select(c(con1,ct_meaning))
post_def <- post %>% select(c(con1,ct_meaning))

Based on these two datasets, I looked into a small sample of the text responses.

slice_sample(pre_def, n = 10)

## # A tibble: 505 x 2
## # Rowwise: 
##    con1  ct_meaning                                                             
##    <chr> <chr>                                                                  
##  1 pre   "Testing."                                                             
##  2 pre   "...."                                                                 
##  3 pre   "Using computers and or technology to answer the unknown. "            
##  4 pre   "I think it means that thinking about something in a formulaic way. "  
##  5 pre   "I think computational thinking would be something related to computer~
##  6 pre   "Computational thinking means thinking in terms of using a computer. " 
##  7 pre   "I think it means being able to integrate your thinking with technolog~
##  8 pre   "Thinking of concepts in terms of technology and completing things wit~
##  9 pre   "Having students use technology in the classroom?"                     
## 10 pre   "Ways to solve problems"                                               
## # ... with 495 more rows

slice_sample(post_def, n = 10)

## # A tibble: 342 x 2
## # Rowwise: 
##    con1  ct_meaning                                                             
##    <chr> <chr>                                                                  
##  1 post  "Computational thinking means thinking outside of the box using relati~
##  2 post  "Thinking like a computer scientist."                                  
##  3 post  "Computational thinking involves problem solving and thinking about di~
##  4 post  "Thinking critically and logically to solve a problem "                
##  5 post  "A way to think about how computer scientists think.  It involves prob~
##  6 post  "Computational Thinking means thinking in a way that promotes problem ~
##  7 post  "Processes of thinking done by a computer scientist, such an analyzing~
##  8 post  "Computational thinking involves kids learning how to do activities wi~
##  9 post  "Computational thinking involves problem solving and the organization ~
## 10 post  "Computational thinking requires you to break down a task in a way tha~
## # ... with 332 more rows

Next, I combined the pre and the post datasets.

ct_def <- union(pre_def, post_def)

Next, I used the below R code chunks to tokenize the text and to remove stops words such as “to”. I also removed words like “computaitonal” and “thinking”.

#tokenize text
ct_text <- ct_def %>% 
  unnest_tokens(output = word,
                input = ct_meaning)
ct_text %>% count(word, sort = TRUE)

## # A tibble: 1,067 x 2
## # Rowwise: 
##    word              n
##    <chr>         <int>
##  1 thinking        788
##  2 to              721
##  3 a               637
##  4 and             469
##  5 computational   387
##  6 the             380
##  7 in              341
##  8 of              333
##  9 i               306
## 10 think           299
## # ... with 1,057 more rows

#remove stop words
ct_text_2 <- anti_join(ct_text,
                       stop_words,
                       by = "word")
head(ct_text_2)

## # A tibble: 6 x 2
## # Rowwise: 
##   con1  word      
##   <chr> <chr>     
## 1 pre   testing   
## 2 pre   computers 
## 3 pre   technology
## 4 pre   answer    
## 5 pre   unknown   
## 6 pre   means

ct_text_2 %>% count(word, sort = TRUE)

## # A tibble: 775 x 2
## # Rowwise: 
##    word              n
##    <chr>         <int>
##  1 thinking        788
##  2 computational   387
##  3 technology      273
##  4 means           233
##  5 solving         186
##  6 computer        181
##  7 solve           158
##  8 process         105
##  9 computers        91
## 10 term             75
## # ... with 765 more rows

my_stopwords_1 <- c("computational","thinking","=","+")
ct_text_3 <- ct_text_2 %>% 
  filter(!word %in% my_stopwords_1)
ct_text_3 %>% count(word, sort = TRUE)

## # A tibble: 773 x 2
## # Rowwise: 
##    word           n
##    <chr>      <int>
##  1 technology   273
##  2 means        233
##  3 solving      186
##  4 computer     181
##  5 solve        158
##  6 process      105
##  7 computers     91
##  8 term          75
##  9 skills        69
## 10 students      67
## # ... with 763 more rows

Finally, I created two word clouds to show the top 20 words for the pre and the post open-ended question responses.

top_pre_1 <- ct_text_3 %>% 
  filter(con1 == "pre") %>% 
  count(word, sort = TRUE) %>% 
  top_n(20)

## Selecting by n

top_pre_1

## # A tibble: 544 x 2
## # Rowwise: 
##    word            n
##    <chr>       <int>
##  1 technology    206
##  2 means         175
##  3 computer       92
##  4 computers      64
##  5 term           64
##  6 solve          60
##  7 solving        56
##  8 process        48
##  9 answer         38
## 10 information    33
## # ... with 534 more rows

p1 <- wordcloud2(top_pre_1)
p1

top_post_1 <- ct_text_3 %>% 
  filter(con1 == "post") %>% 
  count(word, sort = TRUE) %>% 
  top_n(20)

## Selecting by n

top_post_1

## # A tibble: 464 x 2
## # Rowwise: 
##    word           n
##    <chr>      <int>
##  1 solving      130
##  2 solve         98
##  3 computer      89
##  4 technology    67
##  5 means         58
##  6 process       57
##  7 skills        54
##  8 students      42
##  9 involves      33
## 10 steps         31
## # ... with 454 more rows

p2 <- wordcloud2(top_post_1)
p2

As shown in the two word clouds above, I was amazed by the differences between the CT definitions explained by the pre-service teachers in the pre and post surveys. As shown in the pre responses, the words are more about random guesses such as associating CT with “computer(s)” or “information”. For the post responses, I could see the participants used “solving”,“step” and other descriptions that are more aligned with the definitions of CT.

5. Model

After seeing a positive trend for both the constructs and for the CT definition open-ended question, I turned to look at the last open-ended question which is about how pre-service teachers thought about the possibility of integrating CT into classrooms and the ideas that they had. For this purpose, I followed the first several steps descried above as I conducted the sentiment analysis, I then looked at how positive,neutral or negative the responses are.

#text analysis for CT integration
pre_int <- pre %>% select(c(con1,ct_int_text))
post_int <- post %>% select(c(con1,ct_int_text))

#model
vader_pre <- vader_df(pre_int$ct_int_text)
mean(vader_pre$compound)

## [1] 0.242703

vader_pre_summary <- vader_pre %>% 
  mutate(sentiment = ifelse(compound >= 0.05,"positive",
                            ifelse(compound <= 0.05, "negative", "neutral"))) %>% 
  count(sentiment,sort = TRUE) %>% 
  spread(sentiment,n) %>% 
  relocate(positive) %>% 
  mutate(ratio = positive/negative)
vader_pre_summary

##   positive negative    ratio
## 1      285      220 1.295455

vader_post <- vader_df(post_int$ct_int_text)
mean(vader_post$compound)

## [1] 0.1777778

vader_post_summary <- vader_post %>% 
  mutate(sentiment = ifelse(compound >= 0.05,"positive",
                            ifelse(compound <= 0.05, "negative", "neutral"))) %>% 
  count(sentiment,sort = TRUE) %>% 
  spread(sentiment,n) %>% 
  relocate(positive)%>% 
  mutate(ratio = positive/negative)

vader_post_summary

##   positive negative     ratio
## 1      166      176 0.9431818

As shown in the R consoles, the responses are all positive (0.24 > 0.05 for pre, and 0.18 > 0.05 for post). Although the compound scores for both survey responses are beyond 0.05, I see that the compound scores for the post survey is lower than the score of the pre survey (0.18 verse 0.24). Moreover, the ratio of the positive sentiment and the negative sentiment is 1.30 for the pre survey responses while the ration is 0.94 for the post survey responses. This means that, on average, although pre-service teachers held positive attitudes on CT integration in the classrooms, their post responses were not as positive as their pre responses to the same question. This is a very interesting preliminary finding as I saw the responses distribution for the “knowledge and belief” construct were more towards the higher score ranges in the post responses than in the pre responses.

6. Conclusion, Limitations and Next Steps

In this case study, I showcased the process of analyzing a raw dataset throughout wrangle, explore, analysis and model phases, to help myself collect evidence on answering my research questions on exploring 1) how pre-service teachers understand CT and 2) what their attitudes and preliminary ideas are on how to integrate CT into classrooms. The construct analysis provided a clear visualization on the data distributions for the “CT Definition” and the “Knowledge and Belief” while the sentiment analyses on the two open-ended questions provided more details on the differences of pre-service teachers’ responses in the pre and the post surveys. Such preliminary findings showed that the participating in the course could help the pre-service teachers develop an understanding on CT.

However, as the sentiment analysis showed, there are less positive sentiments in the post survey responses on the second open-ended question, although the box plots showed that participants tended to have higher mean/medium on the knowledge and belief construct in the post survey. Therefore, more investigations are needed to draw a holistic picture on how pre-service teachers develop CT. As one of the next steps, I plan to look at matched cases among participants and conducted more analyses on examining pre-service teachers’ CT development.

7. References

Angeli, C., Voogt, J., Fluck, A., Webb, M., Cox, M., Malyn-Smith, J., & Zagami, J. (2016). A K-6 Computational Thinking Curriculum Framework: Implications for Teacher Knowledge. Educational Technology & Society, 19 (3), 47–57.

Barr, V., and Stephenson, C. (2011). Bringing computational thinking to K-12: What is involved and what is the role of the computer science education community? ACM Inroads, 2(1), 48-54.

Falloon, G. (2016). An analysis of young students’ thinking when completing basic coding tasks using Scratch Jnr. On the iPad. Journal of Computer Assisted Learning, 32(6), 576-593.

ISTE (2016). National Educational Technology Standards for Students. Retrieved from http://www.iste.org Jaipal-Jamani, K., & Angeli, C. (2017). Effect of Robotics on Elementary Preservice Teachers’ Self-Efficacy, Science Learning, and Computational Thinking. Journal of Science Education and Technology, 26(2), 175-192.

National Research Council. (2010). Committee for the Workshops on Computational Thinking: Report of a workshop on the scope and nature of computational thinking. Washington, DC: National Academies Press. http://www.nap.edu/catalog.php?record_id=12840.

Wing, J.M. (2006). Computational thinking. Communications of the ACM.

Yadav, A., Hong, H., & Stephenson, C. (2016). Computational thinking for all: Pedagogical approaches to embedding a 21st century problem solving in K-12 classrooms. TechTrends. DOI: 10.1007/s11528-016-0087-7.

Yadav, A.,Mayfield, C., Zhou, N., Hambrusch, S., and Korb, J. T. (2014). Computational thinking in elementary and secondary teacher education. ACM Trans. Comput. Educ. 14, 1, Article 5 (March 2014), 16 pages. DOI:http://dx.doi.org/10.1145/2576872