Read data into R.

x <- read.csv("cluster_reading_anon.csv", stringsAsFactors = F)

Subset dataframe for one specific class and include student nickname, book title, author, publisher, genre, and stars the student awarded for the book.

y <- x %>% filter(Campus == "A") %>% 
  select(2:6,9)

Convert character variables to factor variables.

y[,1:5] <- lapply(y[,1:5], as.factor)

Check the structure of the new dataframe.

str(y)
## 'data.frame':    108 obs. of  6 variables:
##  $ Nickname  : Factor w/ 20 levels "atsuhito","fumiya",..: 12 12 8 8 8 8 8 8 8 20 ...
##  $ Book_title: Factor w/ 87 levels "a chrismas carol",..: 57 40 2 4 3 79 43 86 84 28 ...
##  $ Author    : Factor w/ 69 levels "alex raynham",..: 37 63 13 19 57 8 58 36 36 28 ...
##  $ Publisher : Factor w/ 6 levels "Cambridge","Macmillan",..: 6 5 5 4 1 5 5 4 4 3 ...
##  $ Genre     : Factor w/ 13 levels "action adventure",..: 3 3 5 9 7 1 5 11 9 1 ...
##  $ Stars     : int  1 1 2 2 1 2 3 3 3 4 ...

Create a document feature matrix for book titles read by students.

Title_dfm <-table(y$Nickname, y$Book_title)

Likewise, create tables for author, publisher, genre, and stars.

Author_dfm <- table(y$Nickname, y$Book_title)
Publisher_dfm <- table(y$Nickname, y$Publisher)
Genre_dfm <- table(y$Nickname, y$Genre)
Stars_dfm <- table(y$Nickname, y$Stars)

Bind the five dfm to create a new dataframe.

Cluster_df <- cbind(Author_dfm, Genre_dfm, Publisher_dfm, Title_dfm, Stars_dfm)

Plot a dendrogram based on students’ book reading interests.

Cluster_df %>% dist %>% hclust %>% plot

Why is Kakuto different from the others?
x %>% filter(Nickname == "kakuto") %>% select(Book_title, Genre, Stars)
##          Book_title              Genre Stars
## 1      big hair day            fantasy     1
## 2 a death in oxford            mystery     1
## 3       let me out!            fantasy     1
## 4 next door to love            romance     2
## 5          book boy          biography     1
## 6              why? historical fiction     1
## 7             help!            fantasy     1
Comment: Kakuto didn’t enjoy reading his selections.
How about Shimpei and Shintaro?
x %>% filter(Nickname == "shimpei" | Nickname == "shintaro") %>%
  select(Nickname, Book_title, Genre, Stars) %>%
  arrange(Book_title, Genre, Stars)
##    Nickname               Book_title              Genre Stars
## 1  shintaro     a tale of two cities historical fiction     2
## 2   shimpei     anna and the fighter              other     2
## 3   shimpei        dangerous journey   action adventure     2
## 4  shintaro        dangerous journey   action adventure     4
## 5   shimpei                 l.a.raid            mystery     2
## 6  shintaro                    marco        non-fiction     3
## 7   shimpei                    marco        young adult     2
## 8   shimpei           picture puzzle            mystery     2
## 9   shimpei            project omega            mystery     2
## 10  shimpei    the house on the hill            romance     2
## 11 shintaro    the house on the hill            romance     2
## 12 shintaro the man in the iron mask   action adventure     2
## 13 shintaro     the three masketeers   action adventure     2
Comment: Both students read “Dangerous Journey”, “Marco”, and “The House on the Hill”, and like similar genres and award similar stars.
Jun and Hide?
x %>% filter(Nickname == "jun" | Nickname == "hide") %>%
  select(Nickname, Book_title, Genre, Stars) %>%
  arrange(Book_title, Genre, Stars)
##    Nickname                    Book_title                 Genre Stars
## 1       jun            ali and his camera           young adult     1
## 2      hide           alice in wonderland               fantasy     2
## 3      hide                 american life                 other     5
## 4      hide                extreme sports                 sport     2
## 5       jun                jennifer lopez           non-fiction     3
## 6      hide                  jojo's story           non-fiction     4
## 7      hide                michael jordan             biography     5
## 8       jun                michael jordan                 sport     1
## 9       jun                      new york                 other     3
## 10     hide                      new york                 other     4
## 11      jun sadie's big day at the office                 other     1
## 12      jun                   the fireboy children's literature     1
## 13     hide             the mummy returns               fantasy     2
## 14     hide     the swiss family robinson      action adventure     3
Comment: Both Jun and Hide read “Michael Jordan” and “New York”, and like to read non-fiction including sport and biography.
Ryo and Tomo.
x %>% filter(Nickname == "ryo" | Nickname == "tomo") %>%
  select(Nickname, Book_title, Genre, Stars) %>%
  arrange(Book_title, Genre, Stars)
##   Nickname                         Book_title                 Genre Stars
## 1     tomo                      american life                 other     5
## 2      ryo marcel and the shakespeare letters children's literature     1
## 3      ryo                       six sketches children's literature     1
Comment: Ryo and Tomo didn’t read much.
One last pair, Mai and Rio.
x %>% filter(Nickname == "mai" | Nickname == "rio") %>%
  select(Nickname, Book_title, Genre, Stars) %>%
  arrange(Book_title, Genre, Stars)
##   Nickname                 Book_title                Genre Stars
## 1      rio           a chrismas carol              fantasy     3
## 2      rio  a midsummer night's dream              fantasy     3
## 3      mai a midsummmer night’s dream              fantasy     3
## 4      mai                     hamlet classical literature     4
## 5      mai            strong medicine              mystery     3
## 6      rio            strong medicine              mystery     3
Comment: Mai and Rio both read “A Midsummer Night’s Dream” and “Strong Medicine”, and like fantasy and mystery.
Conclusion: When students discuss their graded readers in class, a dendrogram based on reading interests may be used to create groups. Students may have more in common within these clusters than otherwise, so this novel arrangement may be an occasional alternative to self-selection or randomization.