0. INTRODUCTION

In the Unit 4 walkthrough, we will replicate a simpler version of the following paper: Not the same: a text network analysis on computational thinking definitions to study its relationship with computer programming. This paper reviews definitions for computational thinking in the literature. You might have noticed that in our field, the same terminology could mean different things, especially for emerging terminologies such as computational thinking. This paper contributes to understandings how different research teams conceptualize computational thinking.

Walkthrough Focus

Similar to previous walkthroughs, we should first understand the context and dataset by reading the paper (prepare), and then wrangle the data (wrangle), and then explore and visualize our data (explore & communicate).


1. PREPARE

To help us better understand the context, questions, and data sources, this section will focus on the following topics:

  1. Context. This paper examines definitions of computational thinking from the literature.
  2. Questions. [Question for you]
  3. Project Setup. This should be very familiar by now, but we’ll set up a new R project and install and load the required packages for the walkthrough.

1a. Context

Not the same: a text network analysis on computational thinking definitions to study its relationship with computer programming

Abstract

Even though countries from all over the world are modifying their national educational curriculum in order to include computational thinking skills, there is not an agreement in the definition of this ability. This is partly caused by the myriad of definitions that has been proposed by the scholar community. In fact, there are multiple examples in educational scenarios in which coding and even robotics are considered as synonymous of computational thinking. This paper presents a text network analysis of the main definitions of this skill that have been found in the literature, aiming to offer insights on the common characteristics they share and on their relationship with computer programming. As a result, a new definition of computational thinking is proposed, which emerge from the analysed data.

Data Source

✅ Comprehension Check

Based on your reading, what’s the data source in this paper?

1b. Guiding Questions

✅ Comprehension Check

What’s the research question in this paper? Why is text network analysis an appropriate methodology to address the research question?

1c. Set Up

As highlighted in Chapter 6 of Data Science in Education Using R (DSIEUR), one of the first steps of every workflow should be to set up a “Project” within RStudio. This will be your “home” for any files and code used or created in previous units.

You are welcome to continue using the same project created for previous units, or create an entirely new project for Unit 4. However, after you’ve created your project open up a new R script, and load the following packages that we’ll be needing for this walkthrough:

library(dplyr)
library(tidytext)
library(tidyverse)
library(tidyr)
library(ggplot2)
library(igraph)
library(ggraph)

At the end of this week, I encourage you share with me your R script as evidence that you have complete the walkthrough. Although I highly recommend that that you manually type the code shared throughout this walkthrough, for large blocks of text it may be easier to copy and paste.


2. WRANGLE

2a. Import Data

Now let’s read our data into our Environment using the read_csv() function and assign it to a variable name so we can work with it like any other object in R.

CTdefinition <- read_csv("data/definition.csv")
## Rows: 8 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): definition
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

We can see that we have 8 different definitions for CT (computational thinking). The definitions were taken from the paper. The paper includes more than 8 definitions. Here, we use 8 definitions as an exemplar dataset.

2b. Tokenizing text into bigrams

Next, we will use some familiar tidytext functions for tokenizing text. But this time, we will tokenize bigram, instead of unigrams (or single words) in the sentiment analysis unit. We can also tokenize trigrams etc.

Let’s tokenize our definition text and by using the familiar unnest_tokens(), by specifying n=2 n = 2 to get bigrams:

ct_bigrams <- CTdefinition %>%
  unnest_tokens(bigram, definition, token = "ngrams", n = 2)

Now let’s do a quick count to see common bigrams:

ct_bigrams %>%
  count(bigram, sort = TRUE)
## # A tibble: 452 × 2
##    bigram               n
##    <chr>            <int>
##  1 computer science     5
##  2 ct is                5
##  3 is a                 5
##  4 a computer           4
##  5 can be               4
##  6 problem solving      4
##  7 that can             4
##  8 a problem            3
##  9 in a                 3
## 10 of computer          3
## # … with 442 more rows

You might notice that the most frequent bigram is “computer science”. This makes sense as the researcher (i.e., Jeannette M Wing) who created the concept was from the field of computer science. You might noticed that the largest frequency is 5. It makes sense as we only have 8 definitions. For your independent analysis, you should use a big dataset. Here, it’s only for walkthrough demo.

2c. Remove stop words

In this context, stop words might be not helpful for us to explore the definition. Next, we will remove stop words. Different from unigrams (or single words), we cannot remove stop words directly. We should first seperate the diagrams into two columns and then remove stop words in both columns using separate and filter:

bigrams_separated <- ct_bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ")

bigrams_filtered <- bigrams_separated %>%
  filter(!word1 %in% stop_words$word) %>%
  filter(!word2 %in% stop_words$word)

bigram_counts <- bigrams_filtered %>% 
  count(word1, word2, sort = TRUE)

bigram_counts
## # A tibble: 86 × 3
##    word1       word2             n
##    <chr>       <chr>         <int>
##  1 computer    science           5
##  2 solving     methodology       2
##  3 solving     process           2
##  4 abstraction automation        1
##  5 abstraction decomposition     1
##  6 abstraction recursion         1
##  7 algorithmic notions           1
##  8 algorithmic thinking          1
##  9 analysis    ct                1
## 10 analyze     data              1
## # … with 76 more rows

Now, in bigram_counts, all stop words are removed.

We can also combine the separated words with unite:

bigrams_united <- bigrams_filtered %>%
  unite(bigram, word1, word2, sep = " ")

bigrams_united
## # A tibble: 92 × 1
##    bigram                 
##    <chr>                  
##  1 people engaged         
##  2 computer hobbyist      
##  3 hobbyist clubs         
##  4 running computer       
##  5 computer drop          
##  6 computers simply       
##  7 shareable kinds        
##  8 integrate computational
##  9 computational thinking 
## 10 everyday life          
## # … with 82 more rows

3. VISUALIZE WORD NETWORK

This is a word network from the paper:

You can see that all nodes are connected with lines. In this visualization, the lines have no direction. But we can turn lines to edges. When we have edges, we need three variables to visualize a word net:

  1. from: the node an edge is coming from
  2. to: the node an edge is going towards
  3. weight: A numeric value associated with each edge

We need to transform our dataset (bigram_counts) into these variables in the following way: from is the “word1”, to is the “word2”, and weight is “n”.

Let’s use graph_from_data_frame to make the transformation:

bigram_graph <- bigram_counts %>%
  graph_from_data_frame()

bigram_graph
## IGRAPH 489f7e9 DN-- 117 86 -- 
## + attr: name (v/c), n (e/n)
## + edges from 489f7e9 (vertex names):
##  [1] computer       ->science       solving        ->methodology  
##  [3] solving        ->process       abstraction    ->automation   
##  [5] abstraction    ->decomposition abstraction    ->recursion    
##  [7] algorithmic    ->notions       algorithmic    ->thinking     
##  [9] analysis       ->ct            analyze        ->data         
## [11] analyzing      ->data          applicable     ->attitude     
## [13] artifacts      ->ct            automating     ->solutions    
## [15] broad          ->story         broader        ->discipline   
## + ... omitted several edges

Since lots of bigrams only appear one time, let’s only keep those appearing more than one time

bigram_graph_filtered <- bigram_counts %>%
  filter(n > 1) %>%
  graph_from_data_frame()

bigram_graph_filtered
## IGRAPH 5d322a8 DN-- 5 3 -- 
## + attr: name (v/c), n (e/n)
## + edges from 5d322a8 (vertex names):
## [1] computer->science     solving ->methodology solving ->process

Now, we have only three bigrams left. Please use a larger dataset so that you will not end up with visualizing only three bigrams.

Next, let’s go ahead and visualize the word net before filtering:

set.seed(100)

a <- grid::arrow(type = "open", length = unit(.2, "inches"))

ggraph(bigram_graph, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
                 arrow = a, end_cap = circle(.07, 'inches')) +
  geom_node_point(color = "red", size = 3) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
  theme_void()

✅ Comprehension Check

Revise the code in the previous code chunk so that there are no arrows. The graph shows lines instead of arrows.

set.seed(100)
bigram_graph2 <- bigram_counts %>%
  filter(n > 1) %>%
  graph_from_data_frame()

ggraph(bigram_graph2, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE)+
  geom_node_point(color = "red", size = 3) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
  theme_void()

Revise the code in the previous code chunk so that there are no arrows and the width of the lines represents the frequency of bigrams. This is a documentation that you can use to find the answer.

There are many other encoding options you can play with, such as changing the color of specific nodes. Remember you encoding should serve the purpose of drawing useful insights from the word net.

In comparison, this is the net after filtering:

set.seed(100)

a <- grid::arrow(type = "closed", length = unit(.2, "inches"))

ggraph(bigram_graph_filtered, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
                 arrow = a, end_cap = circle(.07, 'inches')) +
  geom_node_point(color = "red", size = 3) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
  theme_void()

We have only three connections left. We can see that computer science and problem solving are important concepts in CT, which is the core idea from Jeannette M Wing. No matter what changes over time or how we conceptualize CT differently, these two are the core concepts. Is that true, though? This is an open question for the field to explore.

Once again, please use a larger dataset for your independent analysis.

Unit Takeaway

One main lesson I’m hoping you take away from this walkthrough is that we should find an appropriate scenario using word net. It’s easy to visualize word net. But how could we make meanings out of it? We should first closely read the research question and then apply an appropriate technique to answer the research question.

