unit4_walkthrough

0. INTRODUCTION

In the Unit 4 walkthrough, we will replicate a simpler version of the following paper: Not the same: a text network analysis on computational thinking definitions to study its relationship with computer programming.

This paper reviews definitions for computational thinking in the literature. You might have noticed that in our field, the same terminology could mean different things, especially for emerging terminologies such as computational thinking. This paper contributes to understandings how different research teams conceptualize computational thinking.

1. PREPARE

1a. Context

Not the same: a text network analysis on computational thinking definitions to study its relationship with computer programming

Abstract

Even though countries from all over the world are modifying their national educational curriculum in order to include computational thinking skills, there is not an agreement in the definition of this ability. This is partly caused by the myriad of definitions that has been proposed by the scholar community. In fact, there are multiple examples in educational scenarios in which coding and even robotics are considered as synonymous of computational thinking. This paper presents a text network analysis of the main definitions of this skill that have been found in the literature, aiming to offer insights on the common characteristics they share and on their relationship with computer programming. As a result, a new definition of computational thinking is proposed, which emerge from the analysed data.

Data Source

â Comprehension Check

Based on your reading, what’s the data source in this paper? The data source is a literature review published in a dissertation by one of the authors of the article (Moreno-León, 2018). He included publications that featured definitions of computational thinking–starting with that of Seymour Papert in 1980. Sources included published papers, magazine articles, book chapters, conference proceedings, etc.

1b. Guiding Questions

â Comprehension Check

What’s the research question in this paper? I would say that the overarching research question–though it is not explicitly stated as such–is, What is the generally accepted definition of computational thinking. Subquestions posed by the authors include: - What is the relationship between programming and CT, based on the definitions of the latter? - Does programming arise as a fundamental core of CT? - What about the relationship between CT and robotics? - How different are the definitions of CT proposed during the last years? - Do they share some common characteristics? - Or are they focused on distinct dimensions of this competence?

Why is text network analysis an appropriate methodology to address the research question?

According to the authors, InfraNodus, the package used to perform text network analysis, was used: "to make sense of pieces of disjointed textual data (Paranyushkin, 2019). The solution automates the visualization of a text as a network; shows the most relevant topics, their relations, and the structural gaps between them; and enables the analysis of the discourse structure and the assessment of its diversity based on the community structure of the graph (Paranyushkin, 2019).

Text network analysis has the advantage of not only showing frequently used terms in a corpus but also when pairs or triplets of words together. For this study’s aim–illuminating a commonly accepted definition–the text network technique can show commonly used definitions through identifying words commonly or most frequently used together. An absence of connections between words or pairs of words can also show when alternate definitions are used, for example in different fields. In the mini-analysis that follows, for example, computer science stands separate from the “solving/process/methodology” unit. This suggests that computational science is equated with computer science in some works (for example from the field of computer science) but that the term may be used more generally in other fields (design, science, math, etc.).

1c. Set Up

Set up New Project in Rstudio
Load necessary packages into library

library(dplyr)

## Warning: package 'dplyr' was built under R version 4.0.5

library(tidytext)

## Warning: package 'tidytext' was built under R version 4.0.5

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.0.5

## Warning: package 'ggplot2' was built under R version 4.0.5

## Warning: package 'tibble' was built under R version 4.0.5

## Warning: package 'tidyr' was built under R version 4.0.5

## Warning: package 'readr' was built under R version 4.0.5

## Warning: package 'purrr' was built under R version 4.0.5

## Warning: package 'stringr' was built under R version 4.0.5

## Warning: package 'forcats' was built under R version 4.0.5

library(tidyr)
library(ggplot2)
library(igraph)

## Warning: package 'igraph' was built under R version 4.0.5

library(ggraph)

## Warning: package 'ggraph' was built under R version 4.0.5

2. WRANGLE

2a. Import Data

Now let’s read our data into our Environment using the read_csv() function and assign it to a variable name so we can work with it like any other object in R.

CTdefinition <- read_csv("data/definition.csv")

## Rows: 8 Columns: 1
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): definition
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

We can see that we have 8 different definitions for CT (computational thinking). The definitions were taken from the paper. The paper includes more than 8 definitions. Here, we use 8 definitions as an exemplar dataset.

2b. Tokenizing text into bigrams

Next, we will use some familiar tidytext functions for tokenizing text. But this time, we will tokenize bigram, instead of unigrams (or single words) in the sentiment analysis unit. We can also tokenize trigrams etc.

Let’s tokenize our definition text and by using the familiar unnest_tokens(), by specifying n=2 n = 2 to get bigrams:

ct_bigrams <- CTdefinition %>%
  unnest_tokens(bigram, definition, token = "ngrams", n = 2)

Now let’s do a quick count to see common bigrams:

ct_bigrams %>%
  count(bigram, sort = TRUE)

## # A tibble: 452 x 2
##    bigram               n
##    <chr>            <int>
##  1 computer science     5
##  2 ct is                5
##  3 is a                 5
##  4 a computer           4
##  5 can be               4
##  6 problem solving      4
##  7 that can             4
##  8 a problem            3
##  9 in a                 3
## 10 of computer          3
## # ... with 442 more rows

The most frequent bigram is “computer science”, with a frequency of 5. The next few bigrams (“ct is”, “is a”) contain stop words. The next bigram that seems to make sense is “problem solving,” with a frequency of 4.

The frequencies are low because we are performing the walkthrough with a small sample of 8 definitions. Bigger datasets should yield more ngrams and are better suited to text mining analysis.

2c. Remove stop words

Next, I will remove stop words. Different from unigrams (or single words), we cannot remove stop words directly. We should first separate the diagrams into two columns and then remove stop words in both columns using separate and filter:

bigrams_separated <- ct_bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ")

bigrams_filtered <- bigrams_separated %>%
  filter(!word1 %in% stop_words$word) %>%
  filter(!word2 %in% stop_words$word)

bigram_counts <- bigrams_filtered %>% 
  count(word1, word2, sort = TRUE)

bigram_counts

## # A tibble: 86 x 3
##    word1       word2             n
##    <chr>       <chr>         <int>
##  1 computer    science           5
##  2 solving     methodology       2
##  3 solving     process           2
##  4 abstraction automation        1
##  5 abstraction decomposition     1
##  6 abstraction recursion         1
##  7 algorithmic notions           1
##  8 algorithmic thinking          1
##  9 analysis    ct                1
## 10 analyze     data              1
## # ... with 76 more rows

Now, in bigram_counts, all stop words are removed.

We can also combine the separated words with unite:

bigrams_united <- bigrams_filtered %>%
  unite(bigram, word1, word2, sep = " ")

# why are there no quotes around word1 and word2, as in separate() above
# also what does sep=" " signify?

bigrams_united

## # A tibble: 92 x 1
##    bigram                 
##    <chr>                  
##  1 people engaged         
##  2 computer hobbyist      
##  3 hobbyist clubs         
##  4 running computer       
##  5 computer drop          
##  6 computers simply       
##  7 shareable kinds        
##  8 integrate computational
##  9 computational thinking 
## 10 everyday life          
## # ... with 82 more rows

3. VISUALIZE WORD NETWORK

This is a word network from the paper:

All nodes are connected with lines, but the lines have no direction. But we can turn lines to edges. When we have edges, we need three variables to visualize a word net:

from: the node an edge is coming from
to: the node an edge is going towards
weight: A numeric value associated with each edge

We need to transform our dataset (bigram_counts) into these variables in the following way: from is the “word1”, to is the “word2”, and weight is “n”.

Let’s use graph_from_data_frame to make the transformation:

bigram_graph <- bigram_counts %>%
  graph_from_data_frame()

bigram_graph

## IGRAPH 77338a7 DN-- 117 86 -- 
## + attr: name (v/c), n (e/n)
## + edges from 77338a7 (vertex names):
##  [1] computer       ->science       solving        ->methodology  
##  [3] solving        ->process       abstraction    ->automation   
##  [5] abstraction    ->decomposition abstraction    ->recursion    
##  [7] algorithmic    ->notions       algorithmic    ->thinking     
##  [9] analysis       ->ct            analyze        ->data         
## [11] analyzing      ->data          applicable     ->attitude     
## [13] artifacts      ->ct            automating     ->solutions    
## [15] broad          ->story         broader        ->discipline   
## + ... omitted several edges

Since lots of bigrams only appear one time, let’s only keep those appearing more than one time

bigram_graph_filtered <- bigram_counts %>%
  filter(n > 1) %>%
  graph_from_data_frame()

bigram_graph_filtered

## IGRAPH 776ccf5 DN-- 5 3 -- 
## + attr: name (v/c), n (e/n)
## + edges from 776ccf5 (vertex names):
## [1] computer->science     solving ->methodology solving ->process

Now, we have only three bigrams left. Next, let’s go ahead and visualize the word net before filtering:

set.seed(100)

a <- grid::arrow(type = "open", length = unit(.2, "inches"))

ggraph(bigram_graph, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
                 arrow = a, end_cap = circle(.07, 'inches')) +
  geom_node_point(color = "red", size = 3) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
  theme_void()

â Comprehension Check

Revise the code in the previous code chunk so that there are no arrows. The graph shows lines instead of arrows.

#write your code here
set.seed(100)

a <- grid::arrow(type = "open", length = unit(.2, "inches"))

ggraph(bigram_graph, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
                 #arrow = a, 
                 end_cap = circle(.07, 'inches')) +
  geom_node_point(color = "red", size = 3) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
  theme_void()

Revise the code in the previous code chunk so that there are no arrows and the width of the lines represents the frequency of bigrams. This is a documentation that you can use to find the answer.

There are many other encoding options you can play with, such as changing the color of specific nodes. Remember you encoding should serve the purpose of drawing useful insights from the word net.

In comparison, this is the net after filtering:

set.seed(100)

a <- grid::arrow(type = "closed", length = unit(.2, "inches"))

ggraph(bigram_graph_filtered, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
                 arrow = a, end_cap = circle(.07, 'inches')) +
  geom_node_point(color = "red", size = 3) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
  theme_void()

We have only three connections left. We can see that computer science and problem solving are important concepts in CT, which is the core idea from Jeannette M Wing. No matter what changes over time or how we conceptualize CT differently, these two are the core concepts. Is that true, though? This is an open question for the field to explore.

unit4_walkthrough_noonan

Catherine Noonan

3/10/2022

0. INTRODUCTION

1. PREPARE

1a. Context

Not the same: a text network analysis on computational thinking definitions to study its relationship with computer programming

â Comprehension Check

1b. Guiding Questions

â Comprehension Check

1c. Set Up

2. WRANGLE

2a. Import Data

2b. Tokenizing text into bigrams

2c. Remove stop words

3. VISUALIZE WORD NETWORK

â Comprehension Check

unit4_walkthrough_noonan

Catherine Noonan

3/10/2022

0. INTRODUCTION

1. PREPARE

1a. Context

Not the same: a text network analysis on computational thinking definitions to study its relationship with computer programming

â Comprehension Check

1b. Guiding Questions

â Comprehension Check

1c. Set Up

2. WRANGLE

2a. Import Data

2b. Tokenizing text into bigrams

2c. Remove stop words

3. VISUALIZE WORD NETWORK

â Comprehension Check

â Comprehension Check

â Comprehension Check

â Comprehension Check