Thorndike Pages 23 - 28

Steve Hoffman

Quarto

From the documentation about this method of producing documents and slide decks using R and RStudio: This is a Quarto presentation (the replacement/enhancement of RMarkdown).

Quarto enables you to weave together content and executable code into a finished presentation. To learn more about Quarto presentations see https://quarto.org/docs/presentations/.

Bullets

When you click the Render button a document will be generated that includes:

  • Content authored with markdown
  • Output from executable code

Code

When you click the Render button a presentation will be generated that includes both content and the output of embedded code. You can embed code like this:

1 + 1
[1] 2

Prerequsites

Since we will be following the instructions in R for Data Science (RDS) we will install package called tidyverse. To access this package for the first time on your computer, paste or type into in your console

install.packages(“tidyverse”)

Inputing, Tidying and Wrangling Thorndike pp. 23 - 28

CLEAR WORKSPACE

rm(list=ls())

LOAD PACKAGES

library(tidyverse)

READ IN DATA

Table.2.1 <- read_csv(file = "Table.2.1.csv")

CLEAN DATA

# Create a tibble (coerce the data into tidyverse data frame)

as_tibble(Table.2.1)
# A tibble: 52 × 7
   First    Last     Gender Class Reading Spelling  Math
   <chr>    <chr>     <dbl> <dbl>   <dbl>    <dbl> <dbl>
 1 Aaron    Andrews       1     1      32       64    43
 2 Byron    Biggs         1     1      40       64    37
 3 Charles  Cowen         1     1      36       40    38
 4 Donna    Davis         2     1      41       74    40
 5 Erin     Edwards       2     1      36       69    28
 6 Fernando Franco        1     1      41       67    42
 7 Gail     Galaraga      2     1      40       71    37
 8 Harpo    Henry         1     1      30       51    34
 9 Irrida   Ignacio       2     1      37       68    35
10 Jack     Johanson      1     1      26       56    26
# … with 42 more rows

Make column “Gender” into factors and label it

Table.2.1 = Table.2.1 %>% 
  mutate(Gender = factor(Gender, levels=c("1", "2"),
                         labels=c("male", "female")))

Nominal Scale

From page 27 of Thorndike: “One way to define measurement is the assignment of numbers to objects according to a set of rules. The set of rules is called a scale. Knowing the scale that has been used to assigned the numbers is critical to proper interpretation of the measurement. For example, Ms. Johnson and Mr. Cordero assigned the number 1 to their male students and the number 2 to their female students.”

“When numbers are used in this way, the scale is called a nominal scale.

Make column “Class” into factors and label

In the same way, which classroom students are in is also going to be labeled with a nominal scale that we will change into a factor.

Table.2.1 = Table.2.1 %>% 
  mutate(Class = factor(Class, levels=c("1", "2"),
          labels=c("Johnson", "Cordero")))

Coerce scores to be integers

This is optional, but I wanted to make this explicit. Sometimes cutting and pasting Excel numbers can cause the data to get funky.

Table.2.1 <- Table.2.1 %>% 
  mutate(Reading = as.integer(Reading), 
         Spelling = as.integer(Spelling),
         Math = as.integer(Math))

Look at the structure of the data

str(Table.2.1)
tibble [52 × 7] (S3: tbl_df/tbl/data.frame)
 $ First   : chr [1:52] "Aaron" "Byron" "Charles" "Donna" ...
 $ Last    : chr [1:52] "Andrews" "Biggs" "Cowen" "Davis" ...
 $ Gender  : Factor w/ 2 levels "male","female": 1 1 1 2 2 1 2 1 2 1 ...
 $ Class   : Factor w/ 2 levels "Johnson","Cordero": 1 1 1 1 1 1 1 1 1 1 ...
 $ Reading : int [1:52] 32 40 36 41 36 41 40 30 37 26 ...
 $ Spelling: int [1:52] 64 64 40 74 69 67 71 51 68 56 ...
 $ Math    : int [1:52] 43 37 38 40 28 42 37 34 35 26 ...

display summary on your console

summary(Table.2.1)
    First               Last              Gender       Class       Reading     
 Length:52          Length:52          male  :26   Johnson:26   Min.   :21.00  
 Class :character   Class :character   female:26   Cordero:26   1st Qu.:30.75  
 Mode  :character   Mode  :character                            Median :35.00  
                                                                Mean   :34.44  
                                                                3rd Qu.:39.00  
                                                                Max.   :44.00  
    Spelling          Math      
 Min.   :38.00   Min.   :19.00  
 1st Qu.:51.00   1st Qu.:33.00  
 Median :57.00   Median :38.00  
 Mean   :57.15   Mean   :38.17  
 3rd Qu.:64.00   3rd Qu.:44.00  
 Max.   :76.00   Max.   :60.00  

Write csv file of cleaned up Thorndike table

write_csv(Table.2.1, "Table.2.1_clean.csv")