Learning how to create a data codebook

Load Packages

Import and export SPSS, STATA and SAS files.

if(!require(haven)){
  install.packages("haven", dependencies = TRUE)
  library(haven) 
}
Loading required package: haven

A collection of packages that makes it easier to tidy, clean and work with data

if(!require(tidyverse)){
  install.packages("tidyverse", dependencies = TRUE)
  library(tidyverse) 
}
Loading required package: tidyverse
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Tools to quickly and neatly summarize data

if(!require(summarytools)){
  install.packages("summarytools", dependencies = TRUE)
  library(summarytools) 
}
Loading required package: summarytools
Warning in fun(libname, pkgname): couldn't connect to display ":0"
system might not have X11 capabilities; in case of errors when using dfSummary(), set st_options(use.x11 = FALSE)

Attaching package: 'summarytools'
The following object is masked from 'package:tibble':

    view

Import Data

dataset <- read_sav ("Harry Potter Data.sav")

Selecting key variables

(dataset %>%
  select(CoinFlip, FFM_5, Potter3) -> dataset.threevariables)
# A tibble: 122 × 3
   CoinFlip  FFM_5                          Potter3  
   <dbl+lbl> <dbl+lbl>                      <dbl+lbl>
 1 1 [Heads] 5 [Agree Strongly]             1 [Moon] 
 2 2 [Tails] 3 [Neither agree nor diasgree] 2 [Stars]
 3 1 [Heads] 4 [Agree a little]             1 [Moon] 
 4 2 [Tails] 5 [Agree Strongly]             1 [Moon] 
 5 1 [Heads] 5 [Agree Strongly]             1 [Moon] 
 6 2 [Tails] 3 [Neither agree nor diasgree] 2 [Stars]
 7 2 [Tails] 4 [Agree a little]             2 [Stars]
 8 1 [Heads] 3 [Neither agree nor diasgree] 2 [Stars]
 9 1 [Heads] 3 [Neither agree nor diasgree] 1 [Moon] 
10 1 [Heads] 3 [Neither agree nor diasgree] 2 [Stars]
# ℹ 112 more rows

Create Codebook

print(dfSummary(dataset.threevariables, graph.magnif = .75), method = 'render')
Warning in png(png_loc <- tempfile(fileext = ".png"), width = 150 *
graph.magnif, : unable to open connection to X11 display ''
Warning in png(png_loc <- tempfile(fileext = ".png"), width = 150 *
graph.magnif, : unable to open connection to X11 display ''
Warning in png(png_loc <- tempfile(fileext = ".png"), width = 150 *
graph.magnif, : unable to open connection to X11 display ''

Data Frame Summary

dataset.threevariables

Dimensions: 122 x 3
Duplicates: 99
No Variable Label Stats / Values Freqs (% of Valid) Graph Valid Missing
1 CoinFlip [haven_labelled, vctrs_vctr, double] Flip a coin. Is it heads or tails?
Min : 1
Mean : 1.3
Max : 2
1 : 83 ( 70.3% )
2 : 35 ( 29.7% )
118 (96.7%) 4 (3.3%)
2 FFM_5 [haven_labelled, vctrs_vctr, double] I see Myself as Someone Who..... - Is original, comes up with new ideas
Mean (sd) : 3.8 (0.8)
min ≤ med ≤ max:
2 ≤ 4 ≤ 5
IQR (CV) : 1 (0.2)
2 : 6 ( 6.2% )
3 : 26 ( 26.8% )
4 : 50 ( 51.5% )
5 : 15 ( 15.5% )
97 (79.5%) 25 (20.5%)
3 Potter3 [haven_labelled, vctrs_vctr, double] Moon or Stars?
Min : 1
Mean : 1.6
Max : 2
1 : 39 ( 38.6% )
2 : 62 ( 61.4% )
101 (82.8%) 21 (17.2%)

Generated by summarytools 1.0.1 (R version 4.4.1)
2024-07-05

Codebook

(dataset %>%
  select(CoinFlip, FFM_3, Potter5) -> dataset.newVariable)
# A tibble: 122 × 3
   CoinFlip  FFM_3                          Potter5  
   <dbl+lbl> <dbl+lbl>                      <dbl+lbl>
 1 1 [Heads] 3 [Neither agree nor diasgree] 1 [Heads]
 2 2 [Tails] 4 [Agree a little]             2 [Tails]
 3 1 [Heads] 3 [Neither agree nor diasgree] 1 [Heads]
 4 2 [Tails] 5 [Agree Strongly]             2 [Tails]
 5 1 [Heads] 5 [Agree Strongly]             2 [Tails]
 6 2 [Tails] 4 [Agree a little]             1 [Heads]
 7 2 [Tails] 4 [Agree a little]             2 [Tails]
 8 1 [Heads] 3 [Neither agree nor diasgree] 1 [Heads]
 9 1 [Heads] 4 [Agree a little]             2 [Tails]
10 1 [Heads] 5 [Agree Strongly]             2 [Tails]
# ℹ 112 more rows
print(dfSummary(dataset.newVariable, graph.magnif = .75), method = 'render')
Warning in png(png_loc <- tempfile(fileext = ".png"), width = 150 *
graph.magnif, : unable to open connection to X11 display ''
Warning in png(png_loc <- tempfile(fileext = ".png"), width = 150 *
graph.magnif, : unable to open connection to X11 display ''
Warning in png(png_loc <- tempfile(fileext = ".png"), width = 150 *
graph.magnif, : unable to open connection to X11 display ''

Data Frame Summary

dataset.newVariable

Dimensions: 122 x 3
Duplicates: 102
No Variable Label Stats / Values Freqs (% of Valid) Graph Valid Missing
1 CoinFlip [haven_labelled, vctrs_vctr, double] Flip a coin. Is it heads or tails?
Min : 1
Mean : 1.3
Max : 2
1 : 83 ( 70.3% )
2 : 35 ( 29.7% )
118 (96.7%) 4 (3.3%)
2 FFM_3 [haven_labelled, vctrs_vctr, double] I see Myself as Someone Who..... - Does a thorough job
Mean (sd) : 4.3 (0.8)
min ≤ med ≤ max:
2 ≤ 4 ≤ 5
IQR (CV) : 1 (0.2)
2 : 4 ( 4.1% )
3 : 10 ( 10.2% )
4 : 40 ( 40.8% )
5 : 44 ( 44.9% )
98 (80.3%) 24 (19.7%)
3 Potter5 [haven_labelled, vctrs_vctr, double] Heads or Tails?
Min : 1
Mean : 1.4
Max : 2
1 : 63 ( 62.4% )
2 : 38 ( 37.6% )
101 (82.8%) 21 (17.2%)

Generated by summarytools 1.0.1 (R version 4.4.1)
2024-07-05