In the Unit 4 walkthrough, we will replicate a simpler version of the following paper: Not the same: a text network analysis on computational thinking definitions to study its relationship with computer programming.
This paper reviews definitions for computational thinking in the literature. You might have noticed that in our field, the same terminology could mean different things, especially for emerging terminologies such as computational thinking. This paper contributes to understandings how different research teams conceptualize computational thinking.
Abstract
Even though countries from all over the world are modifying their national educational curriculum in order to include computational thinking skills, there is not an agreement in the definition of this ability. This is partly caused by the myriad of definitions that has been proposed by the scholar community. In fact, there are multiple examples in educational scenarios in which coding and even robotics are considered as synonymous of computational thinking. This paper presents a text network analysis of the main definitions of this skill that have been found in the literature, aiming to offer insights on the common characteristics they share and on their relationship with computer programming. As a result, a new definition of computational thinking is proposed, which emerge from the analysed data.
Data Source
Based on your reading, what’s the data source in this paper? The data source is a literature review published in a dissertation by one of the authors of the article (Moreno-León, 2018). He included publications that featured definitions of computational thinking–starting with that of Seymour Papert in 1980. Sources included published papers, magazine articles, book chapters, conference proceedings, etc.
What’s the research question in this paper? I would say that the overarching research question–though it is not explicitly stated as such–is, What is the generally accepted definition of computational thinking. Subquestions posed by the authors include: - What is the relationship between programming and CT, based on the definitions of the latter? - Does programming arise as a fundamental core of CT? - What about the relationship between CT and robotics? - How different are the definitions of CT proposed during the last years? - Do they share some common characteristics? - Or are they focused on distinct dimensions of this competence?
Why is text network analysis an appropriate methodology to address the research question?
According to the authors, InfraNodus, the package used to perform text network analysis, was used: "to make sense of pieces of disjointed textual data (Paranyushkin, 2019). The solution automates the visualization of a text as a network; shows the most relevant topics, their relations, and the structural gaps between them; and enables the analysis of the discourse structure and the assessment of its diversity based on the community structure of the graph (Paranyushkin, 2019).
Text network analysis has the advantage of not only showing frequently used terms in a corpus but also when pairs or triplets of words together. For this study’s aim–illuminating a commonly accepted definition–the text network technique can show commonly used definitions through identifying words commonly or most frequently used together. An absence of connections between words or pairs of words can also show when alternate definitions are used, for example in different fields. In the mini-analysis that follows, for example, computer science stands separate from the “solving/process/methodology” unit. This suggests that computational science is equated with computer science in some works (for example from the field of computer science) but that the term may be used more generally in other fields (design, science, math, etc.).
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.5
library(tidytext)
## Warning: package 'tidytext' was built under R version 4.0.5
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.0.5
## Warning: package 'ggplot2' was built under R version 4.0.5
## Warning: package 'tibble' was built under R version 4.0.5
## Warning: package 'tidyr' was built under R version 4.0.5
## Warning: package 'readr' was built under R version 4.0.5
## Warning: package 'purrr' was built under R version 4.0.5
## Warning: package 'stringr' was built under R version 4.0.5
## Warning: package 'forcats' was built under R version 4.0.5
library(tidyr)
library(ggplot2)
library(igraph)
## Warning: package 'igraph' was built under R version 4.0.5
library(ggraph)
## Warning: package 'ggraph' was built under R version 4.0.5
Now let’s read our data into our Environment using the read_csv() function and assign it to a variable name so we can work with it like any other object in R.
CTdefinition <- read_csv("data/definition.csv")
## Rows: 8 Columns: 1
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): definition
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
We can see that we have 8 different definitions for CT (computational thinking). The definitions were taken from the paper. The paper includes more than 8 definitions. Here, we use 8 definitions as an exemplar dataset.
Next, we will use some familiar tidytext functions for tokenizing text. But this time, we will tokenize bigram, instead of unigrams (or single words) in the sentiment analysis unit. We can also tokenize trigrams etc.
Let’s tokenize our definition text and by using the familiar unnest_tokens(), by specifying n=2 n = 2 to get bigrams:
ct_bigrams <- CTdefinition %>%
unnest_tokens(bigram, definition, token = "ngrams", n = 2)
Now let’s do a quick count to see common bigrams:
ct_bigrams %>%
count(bigram, sort = TRUE)
## # A tibble: 452 x 2
## bigram n
## <chr> <int>
## 1 computer science 5
## 2 ct is 5
## 3 is a 5
## 4 a computer 4
## 5 can be 4
## 6 problem solving 4
## 7 that can 4
## 8 a problem 3
## 9 in a 3
## 10 of computer 3
## # ... with 442 more rows
The most frequent bigram is “computer science”, with a frequency of 5. The next few bigrams (“ct is”, “is a”) contain stop words. The next bigram that seems to make sense is “problem solving,” with a frequency of 4.
The frequencies are low because we are performing the walkthrough with a small sample of 8 definitions. Bigger datasets should yield more ngrams and are better suited to text mining analysis.
Next, I will remove stop words. Different from unigrams (or single words), we cannot remove stop words directly. We should first separate the diagrams into two columns and then remove stop words in both columns using separate and filter:
bigrams_separated <- ct_bigrams %>%
separate(bigram, c("word1", "word2"), sep = " ")
bigrams_filtered <- bigrams_separated %>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word)
bigram_counts <- bigrams_filtered %>%
count(word1, word2, sort = TRUE)
bigram_counts
## # A tibble: 86 x 3
## word1 word2 n
## <chr> <chr> <int>
## 1 computer science 5
## 2 solving methodology 2
## 3 solving process 2
## 4 abstraction automation 1
## 5 abstraction decomposition 1
## 6 abstraction recursion 1
## 7 algorithmic notions 1
## 8 algorithmic thinking 1
## 9 analysis ct 1
## 10 analyze data 1
## # ... with 76 more rows
Now, in bigram_counts, all stop words are removed.
We can also combine the separated words with unite:
bigrams_united <- bigrams_filtered %>%
unite(bigram, word1, word2, sep = " ")
# why are there no quotes around word1 and word2, as in separate() above
# also what does sep=" " signify?
bigrams_united
## # A tibble: 92 x 1
## bigram
## <chr>
## 1 people engaged
## 2 computer hobbyist
## 3 hobbyist clubs
## 4 running computer
## 5 computer drop
## 6 computers simply
## 7 shareable kinds
## 8 integrate computational
## 9 computational thinking
## 10 everyday life
## # ... with 82 more rows
This is a word network from the paper:
All nodes are connected with lines, but the lines have no direction. But we can turn lines to edges. When we have edges, we need three variables to visualize a word net:
We need to transform our dataset (bigram_counts) into these variables in the following way: from is the “word1”, to is the “word2”, and weight is “n”.
Let’s use graph_from_data_frame to make the transformation:
bigram_graph <- bigram_counts %>%
graph_from_data_frame()
bigram_graph
## IGRAPH 77338a7 DN-- 117 86 --
## + attr: name (v/c), n (e/n)
## + edges from 77338a7 (vertex names):
## [1] computer ->science solving ->methodology
## [3] solving ->process abstraction ->automation
## [5] abstraction ->decomposition abstraction ->recursion
## [7] algorithmic ->notions algorithmic ->thinking
## [9] analysis ->ct analyze ->data
## [11] analyzing ->data applicable ->attitude
## [13] artifacts ->ct automating ->solutions
## [15] broad ->story broader ->discipline
## + ... omitted several edges
Since lots of bigrams only appear one time, let’s only keep those appearing more than one time
bigram_graph_filtered <- bigram_counts %>%
filter(n > 1) %>%
graph_from_data_frame()
bigram_graph_filtered
## IGRAPH 776ccf5 DN-- 5 3 --
## + attr: name (v/c), n (e/n)
## + edges from 776ccf5 (vertex names):
## [1] computer->science solving ->methodology solving ->process
Now, we have only three bigrams left. Next, let’s go ahead and visualize the word net before filtering:
set.seed(100)
a <- grid::arrow(type = "open", length = unit(.2, "inches"))
ggraph(bigram_graph, layout = "fr") +
geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
arrow = a, end_cap = circle(.07, 'inches')) +
geom_node_point(color = "red", size = 3) +
geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
theme_void()
Revise the code in the previous code chunk so that there are no arrows. The graph shows lines instead of arrows.
#write your code here
set.seed(100)
a <- grid::arrow(type = "open", length = unit(.2, "inches"))
ggraph(bigram_graph, layout = "fr") +
geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
#arrow = a,
end_cap = circle(.07, 'inches')) +
geom_node_point(color = "red", size = 3) +
geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
theme_void()
Revise the code in the previous code chunk so that there are no arrows and the width of the lines represents the frequency of bigrams. This is a documentation that you can use to find the answer.
There are many other encoding options you can play with, such as changing the color of specific nodes. Remember you encoding should serve the purpose of drawing useful insights from the word net.
In comparison, this is the net after filtering:
set.seed(100)
a <- grid::arrow(type = "closed", length = unit(.2, "inches"))
ggraph(bigram_graph_filtered, layout = "fr") +
geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
arrow = a, end_cap = circle(.07, 'inches')) +
geom_node_point(color = "red", size = 3) +
geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
theme_void()
We have only three connections left. We can see that computer science and problem solving are important concepts in CT, which is the core idea from Jeannette M Wing. No matter what changes over time or how we conceptualize CT differently, these two are the core concepts. Is that true, though? This is an open question for the field to explore.