To see a list of datasets available in R

data ()
install.packages("memisc")
## Installing package into '/home/fatimah/R/x86_64-pc-linux-gnu-library/4.0'
## (as 'lib' is unspecified)
library (memisc)
## Loading required package: lattice
## Loading required package: MASS
## 
## Attaching package: 'memisc'
## The following objects are masked from 'package:stats':
## 
##     contr.sum, contr.treatment, contrasts
## The following object is masked from 'package:base':
## 
##     as.array
#Cast the data to R dataset
x <- data.set(Orange)
typeof(x)
## [1] "list"
x
## 
## Data set with 35 observations and 3 variables
## 
##    Orange.Tree Orange.age Orange.circumference
##  1           1        118                   30
##  2           1        484                   58
##  3           1        664                   87
##  4           1       1004                  115
##  5           1       1231                  120
##  6           1       1372                  142
##  7           1       1582                  145
##  8           2        118                   33
##  9           2        484                   69
## 10           2        664                  111
## 11           2       1004                  156
## 12           2       1231                  172
## 13           2       1372                  203
## 14           2       1582                  203
## 15           3        118                   30
## 16           3        484                   51
## 17           3        664                   75
## 18           3       1004                  108
## 19           3       1231                  115
## 20           3       1372                  139
## 21           3       1582                  140
## 22           4        118                   32
## 23           4        484                   62
## 24           4        664                  112
## 25           4       1004                  167
## .. ........... .......... ....................
## (25 of 35 observations shown)
#Call the codebook function
codebook(x)
## ================================================================================
## 
##    Orange.Tree
## 
## --------------------------------------------------------------------------------
## 
##    Storage mode: integer
##    Measurement: ordinal
## 
##    Values and labels       N Percent
##                                     
##    1 '3'                   7    20.0
##    2 '1'                   7    20.0
##    3 '5'                   7    20.0
##    4 '2'                   7    20.0
##    5 '4'                   7    20.0
## 
## ================================================================================
## 
##    Orange.age
## 
## --------------------------------------------------------------------------------
## 
##    Storage mode: double
##    Measurement: interval
## 
##         Min:  118.000
##         Max: 1582.000
##        Mean:  922.143
##    Std.Dev.:  484.787
## 
## ================================================================================
## 
##    Orange.circumference
## 
## --------------------------------------------------------------------------------
## 
##    Storage mode: double
##    Measurement: interval
## 
##         Min:  30.000
##         Max: 214.000
##        Mean: 115.857
##    Std.Dev.:  56.661
summary(x)
##  Orange.Tree   Orange.age     Orange.circumference
##  3:7         Min.   : 118.0   Min.   : 30.0       
##  1:7         1st Qu.: 484.0   1st Qu.: 65.5       
##  5:7         Median :1004.0   Median :115.0       
##  2:7         Mean   : 922.1   Mean   :115.9       
##  4:7         3rd Qu.:1372.0   3rd Qu.:161.5       
##              Max.   :1582.0   Max.   :214.0

Manually created a custom codebook - Class function: determine the data type - sapply function: to extract more details

class(Orange)
## [1] "nfnGroupedData" "nfGroupedData"  "groupedData"    "data.frame"
sapply(Orange, class)
## $Tree
## [1] "ordered" "factor" 
## 
## $age
## [1] "numeric"
## 
## $circumference
## [1] "numeric"
sapply(Orange,min)
##          Tree           age circumference 
##             1           118            30
sapply(Orange, max)
##          Tree           age circumference 
##             5          1582           214
sapply(Orange,range)
##      Tree  age circumference
## [1,]    1  118            30
## [2,]    5 1582           214
summary(Orange)
##  Tree       age         circumference  
##  3:7   Min.   : 118.0   Min.   : 30.0  
##  1:7   1st Qu.: 484.0   1st Qu.: 65.5  
##  5:7   Median :1004.0   Median :115.0  
##  2:7   Mean   : 922.1   Mean   :115.9  
##  4:7   3rd Qu.:1372.0   3rd Qu.:161.5  
##        Max.   :1582.0   Max.   :214.0

Create Your Own Codebook

The dataset called ChickWeight is selected.

  1. ChickWeight.weight shows the weight for each chick.
  2. ChickenWeight.Time shows the age of the chick.
  3. ChickWeight.chick shows the type of chick the data is referring (same number might indicates the same genetic)
  4. ChickWeight.Diet shows the type of diet given to the chick.
y <- data.set(ChickWeight)
y
## 
## Data set with 578 observations and 4 variables
## 
##    ChickWeight.weight ChickWeight.Time ChickWeight.Chick ChickWeight.Diet
##  1                 42                0                 1                1
##  2                 51                2                 1                1
##  3                 59                4                 1                1
##  4                 64                6                 1                1
##  5                 76                8                 1                1
##  6                 93               10                 1                1
##  7                106               12                 1                1
##  8                125               14                 1                1
##  9                149               16                 1                1
## 10                171               18                 1                1
## 11                199               20                 1                1
## 12                205               21                 1                1
## 13                 40                0                 2                1
## 14                 49                2                 2                1
## 15                 58                4                 2                1
## 16                 72                6                 2                1
## 17                 84                8                 2                1
## 18                103               10                 2                1
## 19                122               12                 2                1
## 20                138               14                 2                1
## 21                162               16                 2                1
## 22                187               18                 2                1
## 23                209               20                 2                1
## 24                215               21                 2                1
## 25                 43                0                 3                1
## .. .................. ................ ................. ................
## (25 of 578 observations shown)

Create a codebook for the dataset.

  1. Codebook basically shows the summary of the data. The summary consists of the min, max, mean and the standard deviation of each numeric feature.
  2. Codebook also shows the weightage of every ordinal observation inside the dataset.
codebook(y)
## ================================================================================
## 
##    ChickWeight.weight
## 
## --------------------------------------------------------------------------------
## 
##    Storage mode: double
##    Measurement: interval
## 
##         Min:  35.000
##         Max: 373.000
##        Mean: 121.818
##    Std.Dev.:  71.010
## 
## ================================================================================
## 
##    ChickWeight.Time
## 
## --------------------------------------------------------------------------------
## 
##    Storage mode: double
##    Measurement: interval
## 
##         Min:  0.000
##         Max: 21.000
##        Mean: 10.718
##    Std.Dev.:  6.753
## 
## ================================================================================
## 
##    ChickWeight.Chick
## 
## --------------------------------------------------------------------------------
## 
##    Storage mode: integer
##    Measurement: ordinal
## 
##    Values and labels       N Percent
##                                     
##     1 '18'                 2     0.3
##     2 '16'                 7     1.2
##     3 '15'                 8     1.4
##     4 '13'                12     2.1
##     5 '9'                 12     2.1
##     6 '20'                12     2.1
##     7 '10'                12     2.1
##     8 '8'                 11     1.9
##     9 '17'                12     2.1
##    10 '19'                12     2.1
##    11 '4'                 12     2.1
##    12 '6'                 12     2.1
##    13 '11'                12     2.1
##    14 '3'                 12     2.1
##    15 '1'                 12     2.1
##    16 '12'                12     2.1
##    17 '2'                 12     2.1
##    18 '5'                 12     2.1
##    19 '14'                12     2.1
##    20 '7'                 12     2.1
##    21 '24'                12     2.1
##    22 '30'                12     2.1
##    23 '22'                12     2.1
##    24 '23'                12     2.1
##    25 '27'                12     2.1
##    26 '28'                12     2.1
##    27 '26'                12     2.1
##    28 '25'                12     2.1
##    29 '29'                12     2.1
##    30 '21'                12     2.1
##    31 '33'                12     2.1
##    32 '37'                12     2.1
##    33 '36'                12     2.1
##    34 '31'                12     2.1
##    35 '39'                12     2.1
##    36 '38'                12     2.1
##    37 '32'                12     2.1
##    38 '40'                12     2.1
##    39 '34'                12     2.1
##    40 '35'                12     2.1
##    41 '44'                10     1.7
##    42 '45'                12     2.1
##    43 '43'                12     2.1
##    44 '41'                12     2.1
##    45 '47'                12     2.1
##    46 '49'                12     2.1
##    47 '46'                12     2.1
##    48 '50'                12     2.1
##    49 '42'                12     2.1
##    50 '48'                12     2.1
## 
## ================================================================================
## 
##    ChickWeight.Diet
## 
## --------------------------------------------------------------------------------
## 
##    Storage mode: integer
##    Measurement: nominal
## 
##    Values and labels       N Percent
##                                     
##    1 '1'                 220    38.1
##    2 '2'                 120    20.8
##    3 '3'                 120    20.8
##    4 '4'                 118    20.4
summary(y)
##  ChickWeight.weight ChickWeight.Time ChickWeight.Chick ChickWeight.Diet
##  Min.   : 35.0      Min.   : 0.00    18     :  2       1:220           
##  1st Qu.: 63.0      1st Qu.: 4.00    16     :  7       2:120           
##  Median :103.0      Median :10.00    15     :  8       3:120           
##  Mean   :121.8      Mean   :10.72    13     : 12       4:118           
##  3rd Qu.:163.8      3rd Qu.:16.00    9      : 12                       
##  Max.   :373.0      Max.   :21.00    20     : 12                       
##                                      (Other):525

Class() function prints the vector of names of classes an object inherits from. Each feature has its very own class.

class(ChickWeight)
## [1] "nfnGroupedData" "nfGroupedData"  "groupedData"    "data.frame"

Sapply() function is used to identify the type of class in the ChickWeight Dataset.

sapply(ChickWeight,class)
## $weight
## [1] "numeric"
## 
## $Time
## [1] "numeric"
## 
## $Chick
## [1] "ordered" "factor" 
## 
## $Diet
## [1] "factor"

Summary() function describes the whole dataset in the most simplest way without eliminating any information from the data.

summary(ChickWeight)
##      weight           Time           Chick     Diet   
##  Min.   : 35.0   Min.   : 0.00   13     : 12   1:220  
##  1st Qu.: 63.0   1st Qu.: 4.00   9      : 12   2:120  
##  Median :103.0   Median :10.00   20     : 12   3:120  
##  Mean   :121.8   Mean   :10.72   10     : 12   4:118  
##  3rd Qu.:163.8   3rd Qu.:16.00   17     : 12          
##  Max.   :373.0   Max.   :21.00   19     : 12          
##                                  (Other):506