It goes without saying that programming has been an inevitable component of science in the last few decades. Given its significance, a plethora of methods and approaches have been proposed to make the programming learning process easier and more effective. An established popular way to do so is using block-based programming languages such as Snap! and Scratch, which offer graphical programming rather than conventional text-based programming [1]. The growth in the use of block-based programming environments in the K-12 curriculum and introductory programming courses, on the one side, and the increasing attention of researchers on the other side, made me so curious to investigate the papers in the field and find out the most popular topics studied.
To start this study, I had mainly two questions in mind:
What topics are mainly discussed in BBP language papers?
What are the challenges currently under consideration for BBP?
I was also very interested in comparing the research conducted on Snap! and Scratch as top block-based languages, but I realized that getting insights into more general papers would be more helpful as the first step.
I collected 30 papers related to block-based from different sources, including Google Scholar and IEEE Xplore. Scratch was the first block-based language launched in 2003 [2], and since then, different topics have been investigated in the area. Many of the studied ideas or issues in the past years have already been addressed; thus, focusing on more recent papers would uncover more new issues that are still open or at least have room for further study. Accordingly, I narrowed down the publication date to the last four years, i.e. 2018 to 2022.
Similar analyzes usually limit their data to the title and abstract of papers, but I defined a different feature for the Discussion and Conclusions sections. My reason was that many ideas and results are covered in these two sections rather than in the introduction, especially when researchers have suggestions to make or directions to mention for future research.
The features of my dataset are:
Date (year)
Title
Abstract
Discussion
Conclusion
The primary target of this analysis is researchers and developers who are interested in exploring and improving block-based programming languages and environments. In a more general view, programming instructors and anyone who is enthusiastic to know more trends in programming may find this analysis engaging.
I will start the process with importing necessary libraries for this whole analysis.
Then, I will import the dataset I created.
## Rows: 30 Columns: 5
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (4): title, abstract, conclusion, discussion
## dbl (1): date
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
To be able to conduct analysis on the all main variables of data, i.e. Title, Abstract, Discussion and Conclusion, I created a new feature which contains all of them together. Also, looking at the data I realized that some papers do not have either of conclusion or discussion and the narration in these two sections are usually in line. So, for the simplicity of analysis, I merged them both into conclusion variable.
After tokenizng data, a list of words with high count has been specified and added to this step to be removed from the beginning of analysis.
extra_words <- c("based","block","blocks", "computer","science","environments","tools","programming","students","language","students", "mit", "study", "student", "results","al", "na", "future", "app","data")
bbp_tokens_all <- bbp_papers %>%
unnest_tokens(output = word, input = combined) %>%
anti_join(stop_words, by = "word") %>%
select(date, title, word) %>%
filter(!word %in% extra_words)
bbp_tokens_conclusion <- bbp_papers %>%
unnest_tokens(output = word, input = conclusion) %>%
anti_join(stop_words, by = "word") %>%
select(date, title, word) %>%
filter(!word %in% extra_words)
bbp_tokens_abstract <- bbp_papers %>%
unnest_tokens(output = word, input = abstract) %>%
anti_join(stop_words, by = "word") %>%
select(date, title, word) %>%
filter(!word %in% extra_words)
Although I added some features such as tf-idf and did some transformations like stemming throughout the process, which belongs to the Wrangle section, I prefer to mention them in Analyze to keep the analysis integrated.
I generated the following three plots to get an insight into the top words of abstract, discussion-conclusion, and the whole dataset.
bbp_tokens_abstract %>%
count(word, sort = TRUE) %>%
top_n(30) %>%
arrange(desc(n)) %>%
mutate(word = reorder(word, n)) %>%
ungroup %>%
ggplot(aes(word, n, fill = word)) +
geom_col(show.legend = FALSE) +
coord_flip() +
scale_x_reordered() +
scale_y_continuous(expand = c(0,0))
## Selecting by n
bbp_tokens_conclusion %>%
count(word, sort = TRUE) %>%
top_n(30) %>%
arrange(desc(n)) %>%
mutate(word = reorder(word, n)) %>%
ungroup %>%
ggplot(aes(word, n, fill = word)) +
geom_col(show.legend = FALSE) +
coord_flip() +
scale_x_reordered() +
scale_y_continuous(expand = c(0,0))
## Selecting by n
bbp_tokens_all %>%
count(word, sort = TRUE) %>%
top_n(30) %>%
arrange(desc(n)) %>%
mutate(word = reorder(word, n)) %>%
ungroup %>%
ggplot(aes(word, n, fill = word)) +
geom_col(show.legend = FALSE) +
coord_flip() +
scale_x_reordered() +
scale_y_continuous(expand = c(0,0))
## Selecting by n
Looking at the clouds, I do not see a meaningful difference, but since the variable “combined” contains all features of our dataset, I will use that for the majority of analysis.
Before moving on, I created a word cloud of highly-repeated words to see them from another perspective.
cloud_tokens <- bbp_tokens_all %>%
count(word, sort = TRUE)%>%
top_n(50)
## Selecting by n
wordcloud2(cloud_tokens,size = 0.3,color = "random-light", shape = 'star')
Dividing the data into multiple parts based on the year of publication will help finding top keywords used in the papers of each year. Prior to that, let’s find out how many papers belongs to each year.
ggplot(bbp_papers, aes(x = date, fill = "")) +
geom_bar(width = .8, show.legend = FALSE) +
xlab(label = "Year") +
ylab(label = "Number of Papers") +
scale_y_continuous(breaks = seq(0, 10, by = 1)) +
# theme_bw() +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
As can be seen, the number of papers related to years 2018 and 2022 is trivial. This can make the analysis biased towards the papers of these years in the dataset. So we will remove papers of these two years from the analyses based on the year.
As mentioned in [3], a good metric for evaluating the importance of a word to a document in a collection of documents is TF-IDF. So, to capture these significant words, I applied TF-IDF on the data. Also, stemming is performed to improve the quality of TF-IDF results.
tf_idf_stemmed <- bbp_tokens_all[!(bbp_tokens_all$date=="2018" | bbp_tokens_all$date=="2022"),]
tf_idf_stemmed <- tf_idf_stemmed %>%
mutate(word = wordStem(word))
words_by_year <- tf_idf_stemmed %>%
count(date, word, sort = TRUE) %>%
ungroup()
tf_idf_year <- words_by_year %>%
bind_tf_idf(word, date, n) %>%
arrange(desc(tf_idf))
tf_idf_year %>%
group_by(date) %>%
slice_max(tf_idf, n = 10) %>%
ungroup() %>%
mutate(word = reorder(word, tf_idf)) %>%
ggplot(aes(tf_idf, word, fill = date)) +
geom_col(show.legend = FALSE) +
facet_wrap(~ date, scales = "free") +
labs(x = "tf-idf", y = NULL)
TF-IDF obtained significant words out of the papers of each year. These are some important terms, which I will elaborate in the next section.
2019: Parson, Tinker, Exam, Rubric, AP, CSP
2020: Service, Mathematics, Feedback, CT
2021: Monitor, Abstract, Feedback, GradeSnap, COVID
In this section, I will tokenize the data into bigrams and trigrams to find out whether any interesting insight will be gained.
bbp_bigrams <- bbp_papers %>%
unnest_tokens(bigram, combined, token = "ngrams", n = 2) %>%
select(date, title, bigram)
bbp_bigrams %>%
count(bigram, sort = TRUE)
## # A tibble: 20,974 x 2
## bigram n
## <chr> <int>
## 1 block based 266
## 2 of the 217
## 3 in the 179
## 4 based programming 153
## 5 to the 97
## 6 on the 82
## 7 text based 82
## 8 this study 80
## 9 that the 75
## 10 et al 69
## # ... with 20,964 more rows
As can be seen, stop words and unnecessary words should be removed to see better results.
bigrams_separated <- bbp_bigrams %>%
separate(bigram, c("word1", "word2"), sep = " ")
bigrams_filtered <- bigrams_separated %>%
filter(!grepl("[^A-Za-z]", word1),!word1 %in% stop_words$word) %>%
filter(!grepl("[^A-Za-z]", word2),!word2 %in% stop_words$word) %>%
filter(!word1 %in% extra_words) %>%
filter(!word2 %in% extra_words)
# mutate(word1 = wordStem(word1),word2 = wordStem(word2))
bigram_counts <- bigrams_filtered %>%
count(word1, word2, sort = TRUE)
bigram_counts
## # A tibble: 3,528 x 3
## word1 word2 n
## <chr> <chr> <int>
## 1 computational thinking 46
## 2 solving skills 27
## 3 pre service 24
## 4 service teachers 24
## 5 middle school 20
## 6 abstraction skill 19
## 7 abstraction skills 17
## 8 academic achievement 17
## 9 struggling moments 16
## 10 random effects 15
## # ... with 3,518 more rows
After cleaning, more meaningful bigrams can be seen. I will create a word network to see the relationship between words.
bigram_graph <- bigram_counts %>%
filter(n > 6) %>%
graph_from_data_frame()
set.seed(1234)
a <- grid::arrow(type = "open", length = unit(.1, "inches"))
ggraph(bigram_graph, layout = "fr") +
geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
arrow = a, end_cap = circle(.07, 'inches')) +
geom_node_point(color = "red", size = 3) +
geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
theme_void()
Word combinations like “computational thinking”, “solving skills”, “pre service teachers”, “middle school”, “abstraction skills”, “academic achievement”, “struggling moments”, “intervention reasons”, “cognitive load”, “visual impairment”, ““online learning”, and “tinkering behaviors” give a good idea of keywords in the papers of recent years.
Using the following histogram, these keywords are shown with their counts.
bigrams_unite <- bigrams_filtered %>%
unite(bigram, word1, word2, sep = " ")
bigrams_unite %>%
count(bigram, sort = TRUE) %>%
top_n(30) %>%
arrange(desc(n)) %>%
mutate(bigram = reorder(bigram, n)) %>%
ungroup %>%
ggplot(aes(bigram, n, fill = bigram)) +
geom_col(show.legend = FALSE) +
coord_flip() +
scale_x_reordered() +
scale_y_continuous(expand = c(0,0))
## Selecting by n
A word cloud of bigrams can also be helpful to show popular bigrams.
top_bigrams<- bigrams_unite %>%
count(bigram, sort = TRUE)%>%
top_n(50)
## Selecting by n
wordcloud2(top_bigrams,size = 0.3, shape = 'star')
Given the good insights from the bigram analysis, I would like to try trigrams as well.
bbp_trigrams <- bbp_papers %>%
unnest_tokens(trigram, combined, token = "ngrams", n = 3) %>%
select(date, title, trigram)
bbp_trigrams %>%
count(trigram, sort = TRUE)
## # A tibble: 31,882 x 2
## trigram n
## <chr> <int>
## 1 block based programming 109
## 2 based visual programming 44
## 3 block based visual 44
## 4 text based programming 43
## 5 of block based 41
## 6 in block based 33
## 7 problem solving skills 27
## 8 the block based 26
## 9 the use of 26
## 10 a block based 24
## # ... with 31,872 more rows
trigrams_separated <- bbp_trigrams %>%
separate(trigram, c("word1", "word2", "word3"), sep = " ")
# extra_words <- c("")
trigrams_filtered <- trigrams_separated %>%
filter(!grepl("[^A-Za-z]", word1),!word1 %in% stop_words$word) %>%
filter(!grepl("[^A-Za-z]", word2),!word2 %in% stop_words$word) %>%
filter(!grepl("[^A-Za-z]", word3),!word3 %in% stop_words$word) %>%
filter(!word1 %in% extra_words, !word2 %in% extra_words, !word3 %in% extra_words)
trigram_counts <- trigrams_filtered %>%
count(word1, word2, word3, sort = TRUE)
trigram_counts
## # A tibble: 1,243 x 4
## word1 word2 word3 n
## <chr> <chr> <chr> <int>
## 1 pre service teachers 23
## 2 childhood preservice teachers 6
## 3 computational thinking skills 6
## 4 level random effects 6
## 5 conceptual mathematics teaching 5
## 6 missing key components 5
## 7 address research question 4
## 8 ap csp pseudocode 4
## 9 educational computing research 4
## 10 online learning activities 4
## # ... with 1,233 more rows
trigrams_unite <- trigrams_filtered %>%
unite(trigram, word1, word2, word3, sep = " ")
trigrams_unite %>%
count(trigram, sort = TRUE) %>%
top_n(20) %>%
arrange(desc(n)) %>%
mutate(trigram = reorder(trigram, n)) %>%
ungroup %>%
ggplot(aes(trigram, n, fill = trigram)) +
geom_col(show.legend = FALSE) +
coord_flip() +
scale_x_reordered() +
scale_y_continuous(expand = c(0,0))
## Selecting by n
Trigrams do not add anything interesting to the findings of bigrams, so we will not analyze it further.
For using LDA, the number of topics should be specified and one way to do that is using FindTopicsNumber function. I tried all metrics to test and see which will derive more meaningful topics.
bbp_dtm_all <- bbp_tokens_all %>%
count(title, word) %>%
cast_dtm(title, word, n)
k_metrics <- FindTopicsNumber(
bbp_dtm_all,
topics = seq(1, 20, by = 1),
metrics = c("Griffiths2004", "CaoJuan2009", "Arun2010", "Deveaud2014"),
method = "Gibbs",
control = list(),
mc.cores = NA,
return_models = FALSE,
verbose = FALSE,
libpath = NULL
)
FindTopicsNumber_plot(k_metrics)
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
Considering the results of all metrics together, the optimal number of K falls between 6 to 10.
words_per_topic <- 10
number_of_topics <- 10
bbp_lda_all <- LDA(bbp_dtm_all,
k = number_of_topics,
control = list(seed = 588))
terms(bbp_lda_all, words_per_topic)
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## [1,] "tinkering" "feedback" "teachers" "visual" "intervention"
## [2,] "models" "progress" "parsons" "solving" "struggling"
## [3,] "random" "modality" "grading" "scratch" "solution"
## [4,] "behavior" "interface" "school" "skills" "feedback"
## [5,] "level" "learning" "experience" "effect" "expert"
## [6,] "activities" "practices" "rubric" "learning" "correct"
## [7,] "model" "design" "time" "school" "interventions"
## [8,] "effects" "text" "learning" "achievement" "system"
## [9,] "predictive" "monitoring" "solving" "academic" "code"
## [10,] "behaviors" "environment" "teacher" "experimental" "reasons"
## Topic 6 Topic 7 Topic 8 Topic 9 Topic 10
## [1,] "abstraction" "text" "abstraction" "mathematics" "debugging"
## [2,] "bbpes" "learners" "skills" "teachers" "programs"
## [3,] "skill" "introductory" "ct" "thinking" "teachers"
## [4,] "teacher" "learning" "learning" "teaching" "struggling"
## [5,] "features" "modality" "blocks4ds" "computational" "code"
## [6,] "ct" "transition" "activities" "ct" "time"
## [7,] "skills" "potential" "programs" "pre" "activities"
## [8,] "teachers" "environment" "abstractions" "connections" "errors"
## [9,] "instruction" "research" "environment" "service" "identify"
## [10,] "planit" "findings" "tasks" "concepts" "progress"
After testing them all, I figured out 10 is the most suitable number of topics.
tidy_lda <- tidy(bbp_lda_all)
top_terms <- tidy_lda %>%
group_by(topic) %>%
slice_max(beta, n = words_per_topic, with_ties = FALSE) %>%
ungroup() %>%
arrange(topic, -beta)
top_terms %>%
mutate(term = reorder_within(term, beta, topic)) %>%
group_by(topic, term) %>%
arrange(desc(beta)) %>%
ungroup() %>%
ggplot(aes(beta, term, fill = as.factor(topic))) +
geom_col(show.legend = FALSE) +
scale_y_reordered() +
labs(title = "Top terms in each LDA topic",
x = expression(beta), y = NULL) +
facet_wrap(~ topic, ncol = 2, scales = "free")
We can combine our beta and gamma values to sort topics based on their prevalence in our dataset.
td_beta <- tidy(bbp_lda_all)
td_gamma <- tidy(bbp_lda_all, matrix = "gamma")
top_terms <- td_beta %>%
arrange(beta) %>%
group_by(topic) %>%
top_n(words_per_topic, beta) %>%
arrange(-beta) %>%
select(topic, term) %>%
summarise(terms = list(term)) %>%
mutate(terms = map(terms, paste, collapse = ", ")) %>%
unnest()
## Warning: `cols` is now required when using unnest().
## Please use `cols = c(terms)`
gamma_terms <- td_gamma %>%
group_by(topic) %>%
summarise(gamma = mean(gamma)) %>%
arrange(desc(gamma)) %>%
left_join(top_terms, by = "topic") %>%
mutate(topic = paste0("Topic ", topic),
topic = reorder(topic, gamma))
gamma_terms %>%
select(topic, gamma, terms) %>%
kable(digits = 3,
col.names = c("Topic", "Expected topic proportion", "Top 10 terms"))
| Topic | Expected topic proportion | Top 10 terms |
|---|---|---|
| Topic 3 | 0.167 | teachers, parsons, grading, school, experience, rubric, time, learning, solving, teacher, gradesnap |
| Topic 8 | 0.159 | abstraction, skills, ct, learning, blocks4ds, activities, programs, abstractions, environment, tasks |
| Topic 2 | 0.102 | feedback, progress, modality, interface, learning, practices, design, text, monitoring, environment |
| Topic 7 | 0.100 | text, learners, introductory, learning, modality, transition, potential, environment, research, findings, languages |
| Topic 6 | 0.098 | abstraction, bbpes, skill, teacher, features, ct, skills, teachers, instruction, planit |
| Topic 10 | 0.097 | debugging, programs, teachers, struggling, code, time, activities, errors, identify, progress |
| Topic 9 | 0.075 | mathematics, teachers, thinking, teaching, computational, ct, pre, connections, service, concepts, b2c3math |
| Topic 5 | 0.070 | intervention, struggling, solution, feedback, expert, correct, interventions, system, code, reasons |
| Topic 4 | 0.067 | visual, solving, scratch, skills, effect, learning, school, achievement, academic, experimental |
| Topic 1 | 0.067 | tinkering, models, random, behavior, level, activities, model, effects, predictive, behaviors |
Finally, I will try STM to process the dataset.
temp <- textProcessor(bbp_papers$title,
metadata = bbp_papers,
lowercase=TRUE,
removestopwords=TRUE,
removenumbers=TRUE,
removepunctuation=TRUE,
wordLengths=c(3,Inf),
stem=TRUE,
onlycharacter= FALSE,
striphtml=TRUE,
customstopwords=extra_words)
## Building corpus...
## Converting to Lower Case...
## Removing punctuation...
## Removing stopwords...
## Remove Custom Stopwords...
## Removing numbers...
## Stemming...
## Creating Output...
Similar to LDA. we will use a function searchK() to estimate topics lying in our data.
meta <- temp$meta
vocab <- temp$vocab
docs <- temp$documents
findingk <- searchK(docs,
vocab,
K = c(2:20),
data = meta,
verbose=FALSE)
## Warning in stm(documents = heldout$documents, vocab = heldout$vocab, K = k, :
## K=2 is equivalent to a unidimensional scaling model which you may prefer.
plot(findingk)
Based on the results of Held-Out Likelihood and Semantic Coherence, 8 is a reasonable value.
bbp_stm_all <- stm(documents=docs,
data=meta,
vocab=vocab,
K=8,
max.em.its=words_per_topic,
verbose = FALSE)
plot.STM(bbp_stm_all, n = words_per_topic)
After getting the results of three models, I looked at the word clusters of each model, and realized that LDA obtained the most coherent group of words for each topic, and STM the least.
Based on the order of analysis, I will start the discussion from tf-idf. In fact, the tf-idf results were so useful to get an idea of hot keywords in the field.
2019
Parson’s problems: they are fragments of code for solving a problem, and students should figure out how to put them next to each other to make the code work. Similar to block-based languages, students do not need to learn the coding syntax to work with them[5]. This made Parsons Problems an exciting topic to study and compare with block-based programming and see which one is more efficient for novice programmers.
Tinkering: it is an approach that students adopt to “test, explore, and struggle with their code” with some levels of uncertainty [6]. Studying this behavior in students learning block-based programming was a topic of interest.
Exam: An interesting topic for researchers is to measure the effectiveness of block-based programming in preparing students to learn text-based programming [7]. They do that by assessing students’ exams, and this is one of the reasons this term has been considered important.
Rubric: How to define a rubric for a block-based open-ended project? This is a question that is worth exploring in the field since, usually, the final project of novice programmers is open-ended [8].
AP CSP: Advanced Placement Computer Science Principles (AP CSP) is an important exam for those interested in computer science.Determining the suitable form of programming questions (text-based vs. block-based) in a written exam of AP CSP can be interesting [9].
2020
Pre-Service Teachers: pre-service teachers can use block-based programming and computational thinking to teach mathematics more conceptually to elementary students [10]. The best approach to do so can be a study topic for researchers.
Mathematics: as mentioned in the previous bullet point, block-based can facilitate teaching mathematics concepts, and this is an area that can be studied.
Feedback: generating relevant automatic positive feedback on block-based programming environments can be challenging. The approach and the effectiveness of such feedback systems can be good topics to study [11].
CT: Computational thinking is a concept that is highly present in many block-based programming papers. One line of research is assessing ways to improve the CT skills of students using block-based programming [12].
2021
Monitor: a goal of researchers working with novice programmers is to cultivate and promote a sense of progress monitoring. Block-based programming, as the first step for many new programmers, can play a significant role in their approach to coding, and this made progress monitoring interesting for researchers[13].
Abstract: Improving students’ abstraction skills with block-based programming languages is another topic that has been studied [14].
Feedback: the use of feedback in block-based programming is explained above, but it is interesting to see this keyword in two consecutive years.
GradeSnap: as mentioned above, rubrics and grading are exciting topics in this area. So a subject of study is to assess the grading processes of teachers while grading block-based programming submissions in GradeSnap[15].
COVID: there is no surprise to see this term in a publication of any area in 2021! So as a topic of research, some studied the effect of using block-based programming in the COVID era on the improvement of CT skills [16]. There should be more room for studies related to block-based programming during the pandemic.
The magic of tf-idf for this dataset was that without any complicated algorithm, it has been able to uncover many important topics in this area.
Following tf-idf, we can see that in bigram analysis and word network, bigrams like “intervention reasons”, “cognitive load”, “visual impairment” and “online learning” have been obtained which suggest significant concepts in CS education. For instance, finding the best time, reason, and approach for quality interventions can be challenging. Giving the best programming hint at an inappropriate time can have a negative effect on students. The use of block-based languages to teach programming to people with “visual impairment” is another interesting area that has been researched.
Finally, I will categorize the most important topics found through topic modeling.
Teachers’ tools and experience improvement: Based on LDA, a large proportion of data are about tools for helping teachers and improving their work quality. For instance, topic 3 and 9 of LDA and topic 7 of CTM fall into this category. Creating systems for grading more efficiently, improving rubrics, and training pre-service teachers are all a part of this category.
Students’ Skills and Behaviors: Topic 2 and 1 of LDA, Topic 4 of CTM and Topic 6 of STM are in this category. Papers of this theme will mainly discuss the abstraction skills, computational thinking and learning aspects of students.
Feedback, Intervention and Progress Monitoring: Topic 3 and 5 of LDA and Topic 5 of CTM are instances of this theme. The idea of this theme is to improve students’ experience of programming by providing them positive feedback, quality interventions and teaching them to monitor their progress.
Finding more categories may be possible but the main themes are what already mentioned. As can be seen, all topics are somehow related to computer education which shows the nature of block-based programming languages.
I started my study with two research questions. Regarding the first one, I found many topics that have been discussed in block-based programming papers. As examples, I see topics like computational thinking, grading, and providing interventions have been widely discussed. Also, for the second question, the general theme of challenges under consideration has been clear. For instance, providing feedback for students has been present in the papers of the last two years. Also, assessing the effectiveness of block-based languages in the improvement of CT skills has been under study in the last years.
There were some limitations in conducting this study such as the size of the dataset and time. For my future research, I will increase the number of papers and expand the analyses. But in general, I believe I have been able to gain sound insights into the papers of the area which will be definitely helpful for me and all researchers who are interested in this area.
[1] Lin, Y. and Weintrop, D., 2021. The landscape of Block-based programming: Characteristics of block-based environments and how they support the transition to text-based programming. Journal of Computer Languages, 67, p.101075.
[2] Dodge, D., 2022. Block Coding | Everything You Need to Know - CodaKid. [online] CodaKid. Available at: https://codakid.com/block-coding/ [Accessed 3 May 2022].
[3] Robinson, J., 2022. 3 Analyzing word and document frequency: tf-idf | Text Mining with R. [online] Tidytextmining.com. Available at: https://www.tidytextmining.com/tfidf.html [Accessed 3 May 2022].
[4] Blei, D. and Lafferty, J., 2006. Correlated topic models. Advances in neural information processing systems, 18, p.147.
[5] Zhi, R., Chi, M., Barnes, T. and Price, T., 2019. Evaluating the Effectiveness of Parsons Problems for Block-based Programming. Proceedings of the 2019 ACM Conference on International Computing Education Research.
[6] Dong, Y., Marwan, S., Catete, V., Price, T. and Barnes, T., 2019. Defining Tinkering Behavior in Open-ended Block-based Programming Assignments. Proceedings of the 50th ACM Technical Symposium on Computer Science Education.
[7] Gomez, M., Moresi, M. and Benotti, L., 2019. Text-based Programming in Elementary School. Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education.
[8 ]Basu, S., 2019. Using Rubrics Integrating Design and Coding to Assess Middle School Students’ Open-ended Block-based Programming Projects. Proceedings of the 50th ACM Technical Symposium on Computer Science Education.
[9] Weintrop, D., Killen, H., Munzar, T. and Franke, B., 2019. Block-based Comprehension. Proceedings of the 50th ACM Technical Symposium on Computer Science Education.
[10] Gleasman, C. and Kim, C., 2020. Pre-Service Teacher’s Use of Block-Based Programming and Computational Thinking to Teach Elementary Mathematics. Digital Experiences in Mathematics Education, 6(1), pp.52-90.
[11] Shabrina, P., Marwan, S., Chi, M., Price, T.W. and Barnes, T., 2020. The Impact of Data-driven Positive Programming Feedback: When it Helps, What Happens when it Goes Wrong, and How Students Respond in CSEDM@ EDM.
[12] Pérez-Marín, D., Hijón-Neira, R., Bacelo, A. and Pizarro, C., 2020. Can computational thinking be improved by using a methodology based on metaphors and scratch to teach computer programming to children?. Computers in Human Behavior, 105, p.105849.
[13] Marwan, S., Shabrina, P., Milliken, A., Menezes, I., Catete, V., Price, T. and Barnes, T., 2021. Promoting Students’ Progress-Monitoring Behavior during Block-Based Programming. 21st Koli Calling International Conference on Computing Education Research.
[14] Çakıroğlu, Ü., Çevik, İ., Köşeli, E. and Aydın, M., 2021. Understanding students’ abstractions in block-based programming environments: A performance based evaluation. Thinking Skills and Creativity, 41, p.100888.
[15] Milliken, A., Cateté, V., Limke, A., Gransbury, I., Chipman, H., Dong, Y. and Barnes, T., 2021. Exploring and Influencing Teacher Grading for Block-based Programs through Rubrics and the GradeSnap Tool. Proceedings of the 17th ACM Conference on International Computing Education Research.
[16] Amnouychokanant, V., Boonlue, S., Chuathong, S. and Thamwipat, K., 2021. Online Learning Using Block-based Programming to Foster Computational Thinking Abilities during the COVID-19 Pandemic. International Journal of Emerging Technologies in Learning (iJET), 16(13), p.227.