Tutorial 4

Codebook

Hello, my name is Kevin.

I am going to demonstrate on some of the basic features about codebook.

First, use data () to load and check which dataset available. Second, load the packages.

library(memisc)

## Loading required package: lattice

## Loading required package: MASS

## 
## Attaching package: 'memisc'

## The following objects are masked from 'package:stats':
## 
##     contr.sum, contr.treatment, contrasts

## The following object is masked from 'package:base':
## 
##     as.array

Then, load the desired dataset, in this case, we load data about the diameter, height and Volume for black cherry trees.

x<-data.set(trees) 
x

## 
## Data set with 31 observations and 3 variables
## 
##    trees.Girth trees.Height trees.Volume
##  1         8.3           70         10.3
##  2         8.6           65         10.3
##  3         8.8           63         10.2
##  4        10.5           72         16.4
##  5        10.7           81         18.8
##  6        10.8           83         19.7
##  7        11.0           66         15.6
##  8        11.0           75         18.2
##  9        11.1           80         22.6
## 10        11.2           75         19.9
## 11        11.3           79         24.2
## 12        11.4           76         21.0
## 13        11.4           76         21.4
## 14        11.7           69         21.3
## 15        12.0           75         19.1
## 16        12.9           74         22.2
## 17        12.9           85         33.8
## 18        13.3           86         27.4
## 19        13.7           71         25.7
## 20        13.8           64         24.9
## 21        14.0           78         34.5
## 22        14.2           80         31.7
## 23        14.5           74         36.3
## 24        16.0           72         38.3
## 25        16.3           77         42.6
## .. ........... ............ ............
## (25 of 31 observations shown)

We can check our data type to confirm whether it is list or any other type.

typeof(x)

## [1] "list"

Next, we are using codebook function to retrive many different data and statistics. For instance, we are able to know the min and max girth of trees, min and max height, min and max volume as well as their mean value.

codebook(x)

## ================================================================================
## 
##    trees.Girth
## 
## --------------------------------------------------------------------------------
## 
##    Storage mode: double
##    Measurement: interval
## 
##         Min:  8.300
##         Max: 20.600
##        Mean: 13.248
##    Std.Dev.:  3.087
## 
## ================================================================================
## 
##    trees.Height
## 
## --------------------------------------------------------------------------------
## 
##    Storage mode: double
##    Measurement: interval
## 
##         Min: 63.000
##         Max: 87.000
##        Mean: 76.000
##    Std.Dev.:  6.268
## 
## ================================================================================
## 
##    trees.Volume
## 
## --------------------------------------------------------------------------------
## 
##    Storage mode: double
##    Measurement: interval
## 
##         Min: 10.200
##         Max: 77.000
##        Mean: 30.171
##    Std.Dev.: 16.171

We then use summary() to provide us a summary of the data set.It also provided us about central tendency info like mean, median etc.

summary(x)

##   trees.Girth     trees.Height  trees.Volume  
##  Min.   : 8.30   Min.   :63    Min.   :10.20  
##  1st Qu.:11.05   1st Qu.:72    1st Qu.:19.40  
##  Median :12.90   Median :76    Median :24.20  
##  Mean   :13.25   Mean   :76    Mean   :30.17  
##  3rd Qu.:15.25   3rd Qu.:80    3rd Qu.:37.30  
##  Max.   :20.60   Max.   :87    Max.   :77.00

After that, we could check on the data type. In this case, it would be a data frame.

x<-data.set(trees)
class(trees)

## [1] "data.frame"

Then, we could try with function sapply().This function would take list, vector or data frame as input and gives output in vector or matrix. The class function provided us the info that girth, height and volume were all numeric. Same as the meaning of the name, min and max would provide minimum and maximum value. Range would provide us the range of value (consisting min and max). Lastly, summary function provided a brief summary of data, consisting of central tendency info like mean, median etc.

sapply(trees,class)

##     Girth    Height    Volume 
## "numeric" "numeric" "numeric"

sapply(trees,min)

##  Girth Height Volume 
##    8.3   63.0   10.2

sapply(trees,max)

##  Girth Height Volume 
##   20.6   87.0   77.0

sapply(trees,range)

##      Girth Height Volume
## [1,]   8.3     63   10.2
## [2,]  20.6     87   77.0

summary(trees)

##      Girth           Height       Volume     
##  Min.   : 8.30   Min.   :63   Min.   :10.20  
##  1st Qu.:11.05   1st Qu.:72   1st Qu.:19.40  
##  Median :12.90   Median :76   Median :24.20  
##  Mean   :13.25   Mean   :76   Mean   :30.17  
##  3rd Qu.:15.25   3rd Qu.:80   3rd Qu.:37.30  
##  Max.   :20.60   Max.   :87   Max.   :77.00

Tutorial 4

Lim Kevin 17140821

11/9/2020

Codebook

Hello, my name is Kevin.