To see a list of datasets available in R
data ()
install.packages("memisc")
## Installing package into '/home/fatimah/R/x86_64-pc-linux-gnu-library/4.0'
## (as 'lib' is unspecified)
library (memisc)
## Loading required package: lattice
## Loading required package: MASS
##
## Attaching package: 'memisc'
## The following objects are masked from 'package:stats':
##
## contr.sum, contr.treatment, contrasts
## The following object is masked from 'package:base':
##
## as.array
#Cast the data to R dataset
x <- data.set(Orange)
typeof(x)
## [1] "list"
x
##
## Data set with 35 observations and 3 variables
##
## Orange.Tree Orange.age Orange.circumference
## 1 1 118 30
## 2 1 484 58
## 3 1 664 87
## 4 1 1004 115
## 5 1 1231 120
## 6 1 1372 142
## 7 1 1582 145
## 8 2 118 33
## 9 2 484 69
## 10 2 664 111
## 11 2 1004 156
## 12 2 1231 172
## 13 2 1372 203
## 14 2 1582 203
## 15 3 118 30
## 16 3 484 51
## 17 3 664 75
## 18 3 1004 108
## 19 3 1231 115
## 20 3 1372 139
## 21 3 1582 140
## 22 4 118 32
## 23 4 484 62
## 24 4 664 112
## 25 4 1004 167
## .. ........... .......... ....................
## (25 of 35 observations shown)
#Call the codebook function
codebook(x)
## ================================================================================
##
## Orange.Tree
##
## --------------------------------------------------------------------------------
##
## Storage mode: integer
## Measurement: ordinal
##
## Values and labels N Percent
##
## 1 '3' 7 20.0
## 2 '1' 7 20.0
## 3 '5' 7 20.0
## 4 '2' 7 20.0
## 5 '4' 7 20.0
##
## ================================================================================
##
## Orange.age
##
## --------------------------------------------------------------------------------
##
## Storage mode: double
## Measurement: interval
##
## Min: 118.000
## Max: 1582.000
## Mean: 922.143
## Std.Dev.: 484.787
##
## ================================================================================
##
## Orange.circumference
##
## --------------------------------------------------------------------------------
##
## Storage mode: double
## Measurement: interval
##
## Min: 30.000
## Max: 214.000
## Mean: 115.857
## Std.Dev.: 56.661
summary(x)
## Orange.Tree Orange.age Orange.circumference
## 3:7 Min. : 118.0 Min. : 30.0
## 1:7 1st Qu.: 484.0 1st Qu.: 65.5
## 5:7 Median :1004.0 Median :115.0
## 2:7 Mean : 922.1 Mean :115.9
## 4:7 3rd Qu.:1372.0 3rd Qu.:161.5
## Max. :1582.0 Max. :214.0
Manually created a custom codebook - Class function: determine the data type - sapply function: to extract more details
class(Orange)
## [1] "nfnGroupedData" "nfGroupedData" "groupedData" "data.frame"
sapply(Orange, class)
## $Tree
## [1] "ordered" "factor"
##
## $age
## [1] "numeric"
##
## $circumference
## [1] "numeric"
sapply(Orange,min)
## Tree age circumference
## 1 118 30
sapply(Orange, max)
## Tree age circumference
## 5 1582 214
sapply(Orange,range)
## Tree age circumference
## [1,] 1 118 30
## [2,] 5 1582 214
summary(Orange)
## Tree age circumference
## 3:7 Min. : 118.0 Min. : 30.0
## 1:7 1st Qu.: 484.0 1st Qu.: 65.5
## 5:7 Median :1004.0 Median :115.0
## 2:7 Mean : 922.1 Mean :115.9
## 4:7 3rd Qu.:1372.0 3rd Qu.:161.5
## Max. :1582.0 Max. :214.0
The dataset called ChickWeight is selected.
y <- data.set(ChickWeight)
y
##
## Data set with 578 observations and 4 variables
##
## ChickWeight.weight ChickWeight.Time ChickWeight.Chick ChickWeight.Diet
## 1 42 0 1 1
## 2 51 2 1 1
## 3 59 4 1 1
## 4 64 6 1 1
## 5 76 8 1 1
## 6 93 10 1 1
## 7 106 12 1 1
## 8 125 14 1 1
## 9 149 16 1 1
## 10 171 18 1 1
## 11 199 20 1 1
## 12 205 21 1 1
## 13 40 0 2 1
## 14 49 2 2 1
## 15 58 4 2 1
## 16 72 6 2 1
## 17 84 8 2 1
## 18 103 10 2 1
## 19 122 12 2 1
## 20 138 14 2 1
## 21 162 16 2 1
## 22 187 18 2 1
## 23 209 20 2 1
## 24 215 21 2 1
## 25 43 0 3 1
## .. .................. ................ ................. ................
## (25 of 578 observations shown)
Create a codebook for the dataset.
codebook(y)
## ================================================================================
##
## ChickWeight.weight
##
## --------------------------------------------------------------------------------
##
## Storage mode: double
## Measurement: interval
##
## Min: 35.000
## Max: 373.000
## Mean: 121.818
## Std.Dev.: 71.010
##
## ================================================================================
##
## ChickWeight.Time
##
## --------------------------------------------------------------------------------
##
## Storage mode: double
## Measurement: interval
##
## Min: 0.000
## Max: 21.000
## Mean: 10.718
## Std.Dev.: 6.753
##
## ================================================================================
##
## ChickWeight.Chick
##
## --------------------------------------------------------------------------------
##
## Storage mode: integer
## Measurement: ordinal
##
## Values and labels N Percent
##
## 1 '18' 2 0.3
## 2 '16' 7 1.2
## 3 '15' 8 1.4
## 4 '13' 12 2.1
## 5 '9' 12 2.1
## 6 '20' 12 2.1
## 7 '10' 12 2.1
## 8 '8' 11 1.9
## 9 '17' 12 2.1
## 10 '19' 12 2.1
## 11 '4' 12 2.1
## 12 '6' 12 2.1
## 13 '11' 12 2.1
## 14 '3' 12 2.1
## 15 '1' 12 2.1
## 16 '12' 12 2.1
## 17 '2' 12 2.1
## 18 '5' 12 2.1
## 19 '14' 12 2.1
## 20 '7' 12 2.1
## 21 '24' 12 2.1
## 22 '30' 12 2.1
## 23 '22' 12 2.1
## 24 '23' 12 2.1
## 25 '27' 12 2.1
## 26 '28' 12 2.1
## 27 '26' 12 2.1
## 28 '25' 12 2.1
## 29 '29' 12 2.1
## 30 '21' 12 2.1
## 31 '33' 12 2.1
## 32 '37' 12 2.1
## 33 '36' 12 2.1
## 34 '31' 12 2.1
## 35 '39' 12 2.1
## 36 '38' 12 2.1
## 37 '32' 12 2.1
## 38 '40' 12 2.1
## 39 '34' 12 2.1
## 40 '35' 12 2.1
## 41 '44' 10 1.7
## 42 '45' 12 2.1
## 43 '43' 12 2.1
## 44 '41' 12 2.1
## 45 '47' 12 2.1
## 46 '49' 12 2.1
## 47 '46' 12 2.1
## 48 '50' 12 2.1
## 49 '42' 12 2.1
## 50 '48' 12 2.1
##
## ================================================================================
##
## ChickWeight.Diet
##
## --------------------------------------------------------------------------------
##
## Storage mode: integer
## Measurement: nominal
##
## Values and labels N Percent
##
## 1 '1' 220 38.1
## 2 '2' 120 20.8
## 3 '3' 120 20.8
## 4 '4' 118 20.4
summary(y)
## ChickWeight.weight ChickWeight.Time ChickWeight.Chick ChickWeight.Diet
## Min. : 35.0 Min. : 0.00 18 : 2 1:220
## 1st Qu.: 63.0 1st Qu.: 4.00 16 : 7 2:120
## Median :103.0 Median :10.00 15 : 8 3:120
## Mean :121.8 Mean :10.72 13 : 12 4:118
## 3rd Qu.:163.8 3rd Qu.:16.00 9 : 12
## Max. :373.0 Max. :21.00 20 : 12
## (Other):525
Class() function prints the vector of names of classes an object inherits from. Each feature has its very own class.
class(ChickWeight)
## [1] "nfnGroupedData" "nfGroupedData" "groupedData" "data.frame"
Sapply() function is used to identify the type of class in the ChickWeight Dataset.
sapply(ChickWeight,class)
## $weight
## [1] "numeric"
##
## $Time
## [1] "numeric"
##
## $Chick
## [1] "ordered" "factor"
##
## $Diet
## [1] "factor"
Summary() function describes the whole dataset in the most simplest way without eliminating any information from the data.
summary(ChickWeight)
## weight Time Chick Diet
## Min. : 35.0 Min. : 0.00 13 : 12 1:220
## 1st Qu.: 63.0 1st Qu.: 4.00 9 : 12 2:120
## Median :103.0 Median :10.00 20 : 12 3:120
## Mean :121.8 Mean :10.72 10 : 12 4:118
## 3rd Qu.:163.8 3rd Qu.:16.00 17 : 12
## Max. :373.0 Max. :21.00 19 : 12
## (Other):506