In 2006, Wing proposed that Computational Thinking (“CT”) is a fundamental analytical skill for 21st-century citizens which led to pervading CT into K-12 education (Angeli et al., 2006; Wing, 2006). In 2010, the National Council for Research (NRC) echoed Wings’ advocate on infusing CT to all by defining CT as “a cognitive skill that an average person is expected to possess” (NRC, 2010; Yadav et al., 2016). In addition, the Computer Science Teachers Associate (CSTA) highlighted the importance of CT in K-12 classrooms as “a problem-solving methodology that can be automated and transferred and applied across subjects” (as cited in Barr & Stephenson, 2011; Yadav et al., 2014). In fact, the current International Society for Technology in Education (ISTE) Standards for Students emphasized the needs for our younger generations to develop CT skills in order to navigate in a digital world (ISTE, 2016). One way to successfully implement CT into classrooms relies on teachers. Therefore, we must equip our teachers with the knowledge and skillsets to integrate CT (Yadav et al., 2014). Accordingly, I conducted this case study to explore 1) how pre-service teachers understand CT and 2) what their attitudes and preliminary ideas are on how to integrate CT into classrooms.
What is CT? The CSTA and ISTE provides a operational definition of CT
The dataset that I used for this case study is from a larger research project on promoting pre-service teachers’ CT development. I have been actively involved in this research project. For the context of this case study, I looked at a sample of pre-service teachers who participated in a college level course which introduces computational thinking and provides learning activities for them to experience computational tools. The participants were asked to complete a course survey at the beginning and at the end of the course.
For this case study, I focused on analyzing the survey data and selected to examine the following variables which are highly relevant to my study purposes. First, I focused on two constructs which are “CT definition” and “Knowledge and Belief”. In addition, I looked at two open-ended questions. The first open-ended questions asked pre-service teachers to explain CT and the second one asked how the pre-service teachers would integrate CT into classrooms.
I personally think that the data wrangling phase is very important but also challenging. For this project, I had been using different ways to clean and format this sample dataset. However, I realized that there were some steps that I missed during the wrangling phase while I was in the exploring and the analysis phases. Therefore, I had to go back to the wrangling phase and to see if there was anything that I could have done differently. In the steps below, I presented the finalized process of wrangling the dataset. I first introduced needed packages and subsequently describe each step with specific R code.
I used the following R packages to wrangle and analyze data.
devtools::install_github("gaospecial/wordcloud2")
## Error in get(genname, envir = envir) : object 'testthat_print' not found
## WARNING: Rtools is required to build R packages, but no version of Rtools compatible with R 4.0.2 was found. (Only the following incompatible version(s) of Rtools were found:3.5)
##
## Please download and install Rtools custom from https://cran.r-project.org/bin/windows/Rtools/.
## Skipping install of 'wordcloud2' from a github remote, the SHA1 (cd95cdb5) has not changed since last install.
## Use `force = TRUE` to force installation
library(tidyverse)
## -- Attaching packages -------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.1 v dplyr 1.0.0
## v tidyr 1.1.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts ----------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(readxl)
library(skimr)
## Warning: package 'skimr' was built under R version 4.0.5
library(tidytext)
## Warning: package 'tidytext' was built under R version 4.0.5
library(vader)
## Warning: package 'vader' was built under R version 4.0.5
library(wordcloud2)
library(textdata)
## Warning: package 'textdata' was built under R version 4.0.5
Prior to the wrangling process, I de-identified the dataset. The raw dataset contains 1065 rows and 56 columns.
#load original dataset
dat <- read.csv("Data/all4LA.csv")
As mentioned in the introduction section, I selected to examine two constructs and two open-ended questions from the dataset. The constructs are 1) “CT Definition” which contains 4 survey items and 2) “Knowledge and Belief” which contains 5 survey items. In addition to select the needed items from the dataset, I also selected other variables which are the “StartDate”, “Progress” and “ID”, these variables are very important for me to clean and structure the data for later analysis.
dat_select <- dat %>% select(
StartDate,
Progress,
id=D1,
ct_meaning=Q2,
ct_int_text=Q32,
def_1=Q5_1,
def_2=Q5_2,
def_3=Q5_3,
def_4=Q5_4,
kb_1=Q5_22,
kb_2=Q5_23,
kb_3=Q5_24,
kb_4=Q5_25,
kb_5=Q5_26
)
dim(dat_select)
## [1] 1065 14
I kept the “Progress” variable because it helped me to identify completed data entries and to remove any incomplete data entries. After running the code chuck below, the dataset has 967 completed data entries.
dat_clean <- dat_select %>% filter(Progress == 100)
dim(dat_clean)
## [1] 967 14
The survey items are likert-scale questions with four choices from “Strongly Agree” to “Strong Disagree”. I first converted the text of all the likert-scale questions to numeric values, then reversed scales. Finally, I combined survey items to generate their respective constructs.
#convert text likert scales to numeric values
dat_likert <- dat_clean %>% mutate_at(c(6:14),funs(recode(.,"Strongly Agree" =4,
"Agree"=3,
"Disagree"=2,
"Strongly Disagree"=1)))
## Warning: `funs()` is deprecated as of dplyr 0.8.0.
## Please use a list of either functions or lambdas:
##
## # Simple named list:
## list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`:
## tibble::lst(mean, median)
##
## # Using lambdas
## list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
#reverse scales
reverse_scale <- function(question){
x <- case_when(
question == 1 ~ 4,
question == 2 ~ 3,
question == 3 ~ 2,
question == 4 ~ 1,
TRUE ~ NA_real_
)
x
}
dat_likert_reverse <- dat_likert %>%
mutate(def_1 = reverse_scale(def_1),
def_3 = reverse_scale(def_3))
#combine construct
dat_likert_combine <- dat_likert_reverse %>% rowwise() %>%
mutate(
def = mean(c_across(def_1:def_4)),
kb = mean(c_across(kb_1:kb_5))
)
As I mentioned earlier, the dataset contained the pre and the post responses. In order to specify the time/order/condition for each data entry, I kept the “StartDate” variable to help me define the time for each data entry. For doing this, I first converted the “StartDate” to extract the month, I then assigned the pre or the post condition using the month for each data entry.
#convert the date to extract the month
dat_likert_combine$StartDate <- as.Date(dat_likert_combine$StartDate,format = "%m/%d/%Y")
dat_likert_combine$mon <- format(dat_likert_combine$StartDate,"%m")
#define pre and post condition by month
dat_final <- dat_likert_combine %>% mutate(
con1 = case_when(
mon %in% c ("02","08","09") ~ "pre",
mon %in% c("05","12") ~ "post"
)
)
dim(dat_final)
## [1] 967 18
With all the steps that I have done above, I used the below R code to create new datasets which are important for later analysis. There are three new datasets which were created based on the raw dataset: 1) the dataset which contains all the completed data entries for the pre survey (505 rows and 18 cols); 2) the dataset which contains all the completed data entries for the post survey (342 rows and 18 cols) and 3) the final dataset which include the pre and post datasets (847 rows and 6 cols).
#create the pre dataset without duplicate cases
pre <- dat_final %>% filter(con1 == "pre")
pre <- pre[!duplicated(pre$id),]
dim(pre) #505 rows and 18 cols
## [1] 505 18
#create the post dataset without duplicate cases
post <- dat_final %>% filter(con1 == "post")
post <- post[!duplicated(post$id),]
dim(post) #342 rows and 18 cols
## [1] 342 18
#the final dataset for analysis
dat_analysis <- rbind(pre,post) %>% select(c(3:5,15,16,18))
dim(dat_analysis) #847 rows and 6 cols
## [1] 847 6
For the two constructs, I am interested in seeing their distributions in the pre and the post conditions. For this purpose, I first converted the data to the pivot longer format then used the ggplots for data visualizations.
#data visualization for CT definition, and knowledge and belief
dat_analysis_visual <- dat_analysis %>% pivot_longer(cols = 4:5,
names_to = "construct",
values_to = "response")
dat_analysis_visual$con1 <- as.factor(dat_analysis_visual$con1)
p <- ggplot(dat_analysis_visual, aes(x=con1, y= response, fill = con1)) +
geom_boxplot(alpha=0.7) +
facet_wrap(~construct,labeller = label_both) +
scale_x_discrete(limits = c("pre", "post")) +
stat_summary(fun.y=mean, geom="point", shape=20, size=7, color="yellow", fill="yellow") +
theme(legend.position="none") +
scale_fill_brewer(palette="Set1")
## Warning: `fun.y` is deprecated. Use `fun` instead.
p
According to the boxplots shown above, we can see that the post survey responses have higher means and mediums on the “CT Definition” and the “Knowledge and Belief” constructs compared to the pre survey responses. This means that pre-service teachers’ CT understandings, on average, increased after they participated in the course. I am interested in knowing whether the open-ended responses aligned with this observation.
I first explored the open-ended responses for the question asking pre-service teachers to define CT. As the first step of doing the sentiment analysis, I extracted two datasets: the text responses for the pre survey and the responses for the post survey.
pre_def <- pre %>% select(c(con1,ct_meaning))
post_def <- post %>% select(c(con1,ct_meaning))
Based on these two datasets, I looked into a small sample of the text responses.
slice_sample(pre_def, n = 10)
## # A tibble: 505 x 2
## # Rowwise:
## con1 ct_meaning
## <chr> <chr>
## 1 pre "Testing."
## 2 pre "...."
## 3 pre "Using computers and or technology to answer the unknown. "
## 4 pre "I think it means that thinking about something in a formulaic way. "
## 5 pre "I think computational thinking would be something related to computer~
## 6 pre "Computational thinking means thinking in terms of using a computer. "
## 7 pre "I think it means being able to integrate your thinking with technolog~
## 8 pre "Thinking of concepts in terms of technology and completing things wit~
## 9 pre "Having students use technology in the classroom?"
## 10 pre "Ways to solve problems"
## # ... with 495 more rows
slice_sample(post_def, n = 10)
## # A tibble: 342 x 2
## # Rowwise:
## con1 ct_meaning
## <chr> <chr>
## 1 post "Computational thinking means thinking outside of the box using relati~
## 2 post "Thinking like a computer scientist."
## 3 post "Computational thinking involves problem solving and thinking about di~
## 4 post "Thinking critically and logically to solve a problem "
## 5 post "A way to think about how computer scientists think. It involves prob~
## 6 post "Computational Thinking means thinking in a way that promotes problem ~
## 7 post "Processes of thinking done by a computer scientist, such an analyzing~
## 8 post "Computational thinking involves kids learning how to do activities wi~
## 9 post "Computational thinking involves problem solving and the organization ~
## 10 post "Computational thinking requires you to break down a task in a way tha~
## # ... with 332 more rows
Next, I combined the pre and the post datasets.
ct_def <- union(pre_def, post_def)
Next, I used the below R code chunks to tokenize the text and to remove stops words such as “to”. I also removed words like “computaitonal” and “thinking”.
#tokenize text
ct_text <- ct_def %>%
unnest_tokens(output = word,
input = ct_meaning)
ct_text %>% count(word, sort = TRUE)
## # A tibble: 1,067 x 2
## # Rowwise:
## word n
## <chr> <int>
## 1 thinking 788
## 2 to 721
## 3 a 637
## 4 and 469
## 5 computational 387
## 6 the 380
## 7 in 341
## 8 of 333
## 9 i 306
## 10 think 299
## # ... with 1,057 more rows
#remove stop words
ct_text_2 <- anti_join(ct_text,
stop_words,
by = "word")
head(ct_text_2)
## # A tibble: 6 x 2
## # Rowwise:
## con1 word
## <chr> <chr>
## 1 pre testing
## 2 pre computers
## 3 pre technology
## 4 pre answer
## 5 pre unknown
## 6 pre means
ct_text_2 %>% count(word, sort = TRUE)
## # A tibble: 775 x 2
## # Rowwise:
## word n
## <chr> <int>
## 1 thinking 788
## 2 computational 387
## 3 technology 273
## 4 means 233
## 5 solving 186
## 6 computer 181
## 7 solve 158
## 8 process 105
## 9 computers 91
## 10 term 75
## # ... with 765 more rows
my_stopwords_1 <- c("computational","thinking","=","+")
ct_text_3 <- ct_text_2 %>%
filter(!word %in% my_stopwords_1)
ct_text_3 %>% count(word, sort = TRUE)
## # A tibble: 773 x 2
## # Rowwise:
## word n
## <chr> <int>
## 1 technology 273
## 2 means 233
## 3 solving 186
## 4 computer 181
## 5 solve 158
## 6 process 105
## 7 computers 91
## 8 term 75
## 9 skills 69
## 10 students 67
## # ... with 763 more rows
Finally, I created two word clouds to show the top 20 words for the pre and the post open-ended question responses.
top_pre_1 <- ct_text_3 %>%
filter(con1 == "pre") %>%
count(word, sort = TRUE) %>%
top_n(20)
## Selecting by n
top_pre_1
## # A tibble: 544 x 2
## # Rowwise:
## word n
## <chr> <int>
## 1 technology 206
## 2 means 175
## 3 computer 92
## 4 computers 64
## 5 term 64
## 6 solve 60
## 7 solving 56
## 8 process 48
## 9 answer 38
## 10 information 33
## # ... with 534 more rows
p1 <- wordcloud2(top_pre_1)
p1
top_post_1 <- ct_text_3 %>%
filter(con1 == "post") %>%
count(word, sort = TRUE) %>%
top_n(20)
## Selecting by n
top_post_1
## # A tibble: 464 x 2
## # Rowwise:
## word n
## <chr> <int>
## 1 solving 130
## 2 solve 98
## 3 computer 89
## 4 technology 67
## 5 means 58
## 6 process 57
## 7 skills 54
## 8 students 42
## 9 involves 33
## 10 steps 31
## # ... with 454 more rows
p2 <- wordcloud2(top_post_1)
p2
As shown in the two word clouds above, I was amazed by the differences between the CT definitions explained by the pre-service teachers in the pre and post surveys. As shown in the pre responses, the words are more about random guesses such as associating CT with “computer(s)” or “information”. For the post responses, I could see the participants used “solving”,“step” and other descriptions that are more aligned with the definitions of CT.
After seeing a positive trend for both the constructs and for the CT definition open-ended question, I turned to look at the last open-ended question which is about how pre-service teachers thought about the possibility of integrating CT into classrooms and the ideas that they had. For this purpose, I followed the first several steps descried above as I conducted the sentiment analysis, I then looked at how positive,neutral or negative the responses are.
#text analysis for CT integration
pre_int <- pre %>% select(c(con1,ct_int_text))
post_int <- post %>% select(c(con1,ct_int_text))
#model
vader_pre <- vader_df(pre_int$ct_int_text)
mean(vader_pre$compound)
## [1] 0.242703
vader_pre_summary <- vader_pre %>%
mutate(sentiment = ifelse(compound >= 0.05,"positive",
ifelse(compound <= 0.05, "negative", "neutral"))) %>%
count(sentiment,sort = TRUE) %>%
spread(sentiment,n) %>%
relocate(positive) %>%
mutate(ratio = positive/negative)
vader_pre_summary
## positive negative ratio
## 1 285 220 1.295455
vader_post <- vader_df(post_int$ct_int_text)
mean(vader_post$compound)
## [1] 0.1777778
vader_post_summary <- vader_post %>%
mutate(sentiment = ifelse(compound >= 0.05,"positive",
ifelse(compound <= 0.05, "negative", "neutral"))) %>%
count(sentiment,sort = TRUE) %>%
spread(sentiment,n) %>%
relocate(positive)%>%
mutate(ratio = positive/negative)
vader_post_summary
## positive negative ratio
## 1 166 176 0.9431818
As shown in the R consoles, the responses are all positive (0.24 > 0.05 for pre, and 0.18 > 0.05 for post). Although the compound scores for both survey responses are beyond 0.05, I see that the compound scores for the post survey is lower than the score of the pre survey (0.18 verse 0.24). Moreover, the ratio of the positive sentiment and the negative sentiment is 1.30 for the pre survey responses while the ration is 0.94 for the post survey responses. This means that, on average, although pre-service teachers held positive attitudes on CT integration in the classrooms, their post responses were not as positive as their pre responses to the same question. This is a very interesting preliminary finding as I saw the responses distribution for the “knowledge and belief” construct were more towards the higher score ranges in the post responses than in the pre responses.
In this case study, I showcased the process of analyzing a raw dataset throughout wrangle, explore, analysis and model phases, to help myself collect evidence on answering my research questions on exploring 1) how pre-service teachers understand CT and 2) what their attitudes and preliminary ideas are on how to integrate CT into classrooms. The construct analysis provided a clear visualization on the data distributions for the “CT Definition” and the “Knowledge and Belief” while the sentiment analyses on the two open-ended questions provided more details on the differences of pre-service teachers’ responses in the pre and the post surveys. Such preliminary findings showed that the participating in the course could help the pre-service teachers develop an understanding on CT.
However, as the sentiment analysis showed, there are less positive sentiments in the post survey responses on the second open-ended question, although the box plots showed that participants tended to have higher mean/medium on the knowledge and belief construct in the post survey. Therefore, more investigations are needed to draw a holistic picture on how pre-service teachers develop CT. As one of the next steps, I plan to look at matched cases among participants and conducted more analyses on examining pre-service teachers’ CT development.
Angeli, C., Voogt, J., Fluck, A., Webb, M., Cox, M., Malyn-Smith, J., & Zagami, J. (2016). A K-6 Computational Thinking Curriculum Framework: Implications for Teacher Knowledge. Educational Technology & Society, 19 (3), 47–57.
Barr, V., and Stephenson, C. (2011). Bringing computational thinking to K-12: What is involved and what is the role of the computer science education community? ACM Inroads, 2(1), 48-54.
Falloon, G. (2016). An analysis of young students’ thinking when completing basic coding tasks using Scratch Jnr. On the iPad. Journal of Computer Assisted Learning, 32(6), 576-593.
ISTE (2016). National Educational Technology Standards for Students. Retrieved from http://www.iste.org Jaipal-Jamani, K., & Angeli, C. (2017). Effect of Robotics on Elementary Preservice Teachers’ Self-Efficacy, Science Learning, and Computational Thinking. Journal of Science Education and Technology, 26(2), 175-192.
National Research Council. (2010). Committee for the Workshops on Computational Thinking: Report of a workshop on the scope and nature of computational thinking. Washington, DC: National Academies Press. http://www.nap.edu/catalog.php?record_id=12840.
Wing, J.M. (2006). Computational thinking. Communications of the ACM.
Yadav, A., Hong, H., & Stephenson, C. (2016). Computational thinking for all: Pedagogical approaches to embedding a 21st century problem solving in K-12 classrooms. TechTrends. DOI: 10.1007/s11528-016-0087-7.
Yadav, A.,Mayfield, C., Zhou, N., Hambrusch, S., and Korb, J. T. (2014). Computational thinking in elementary and secondary teacher education. ACM Trans. Comput. Educ. 14, 1, Article 5 (March 2014), 16 pages. DOI:http://dx.doi.org/10.1145/2576872