This document will study various data types in R using an inbuilt data set ‘Titanic’.
summary(Titanic)
## Number of cases in table: 2201
## Number of factors: 4
## Test for independence of all factors:
## Chisq = 1637.4, df = 25, p-value = 0
## Chi-squared approximation may be incorrect
str(Titanic)
## 'table' num [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
## - attr(*, "dimnames")=List of 4
## ..$ Class : chr [1:4] "1st" "2nd" "3rd" "Crew"
## ..$ Sex : chr [1:2] "Male" "Female"
## ..$ Age : chr [1:2] "Child" "Adult"
## ..$ Survived: chr [1:2] "No" "Yes"
class(Titanic)
## [1] "table"
Titanic data set is of class ‘table’ as can be seen above. It is little difficult to use this table structure for further analysis. Hence,lets convert this table into user friendly data frame.
titanicDataFrame <- as.data.frame(Titanic)
str(titanicDataFrame)
## 'data.frame': 32 obs. of 5 variables:
## $ Class : Factor w/ 4 levels "1st","2nd","3rd",..: 1 2 3 4 1 2 3 4 1 2 ...
## $ Sex : Factor w/ 2 levels "Male","Female": 1 1 1 1 2 2 2 2 1 1 ...
## $ Age : Factor w/ 2 levels "Child","Adult": 1 1 1 1 1 1 1 1 2 2 ...
## $ Survived: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ Freq : num 0 0 35 0 0 0 17 0 118 154 ...
This data frame is much user friendly to read and understand and perform further analysis.
titanicDataFrame
## Class Sex Age Survived Freq
## 1 1st Male Child No 0
## 2 2nd Male Child No 0
## 3 3rd Male Child No 35
## 4 Crew Male Child No 0
## 5 1st Female Child No 0
## 6 2nd Female Child No 0
## 7 3rd Female Child No 17
## 8 Crew Female Child No 0
## 9 1st Male Adult No 118
## 10 2nd Male Adult No 154
## 11 3rd Male Adult No 387
## 12 Crew Male Adult No 670
## 13 1st Female Adult No 4
## 14 2nd Female Adult No 13
## 15 3rd Female Adult No 89
## 16 Crew Female Adult No 3
## 17 1st Male Child Yes 5
## 18 2nd Male Child Yes 11
## 19 3rd Male Child Yes 13
## 20 Crew Male Child Yes 0
## 21 1st Female Child Yes 1
## 22 2nd Female Child Yes 13
## 23 3rd Female Child Yes 14
## 24 Crew Female Child Yes 0
## 25 1st Male Adult Yes 57
## 26 2nd Male Adult Yes 14
## 27 3rd Male Adult Yes 75
## 28 Crew Male Adult Yes 192
## 29 1st Female Adult Yes 140
## 30 2nd Female Adult Yes 80
## 31 3rd Female Adult Yes 76
## 32 Crew Female Adult Yes 20
class(titanicDataFrame)
## [1] "data.frame"
class(titanicDataFrame$Class)
## [1] "factor"
class(titanicDataFrame$Sex)
## [1] "factor"
class(titanicDataFrame$Freq)
## [1] "numeric"
Let usdraw some plots of the titanic data frame.I am using funModeling library to do so.
library(funModeling)
freq function will plot the frequency graph for all categorical variables in the data frame. Similalry , plot_num function plots the graph for all the numerical variables of the data frame.
freq(titanicDataFrame)
## Class frequency percentage cumulative_perc
## 1 1st 8 25 25
## 2 2nd 8 25 50
## 3 3rd 8 25 75
## 4 Crew 8 25 100
## Sex frequency percentage cumulative_perc
## 1 Male 16 50 50
## 2 Female 16 50 100
## Age frequency percentage cumulative_perc
## 1 Child 16 50 50
## 2 Adult 16 50 100
## Survived frequency percentage cumulative_perc
## 1 No 16 50 50
## 2 Yes 16 50 100
## [1] "Variables processed: Class, Sex, Age, Survived"
plot_num(titanicDataFrame)
We can convert each of the “factor” variables or “numeric” variable into data matrix etc.
matrixFreq<- as.matrix(titanicDataFrame$Freq)
matrixFreq
## [,1]
## [1,] 0
## [2,] 0
## [3,] 35
## [4,] 0
## [5,] 0
## [6,] 0
## [7,] 17
## [8,] 0
## [9,] 118
## [10,] 154
## [11,] 387
## [12,] 670
## [13,] 4
## [14,] 13
## [15,] 89
## [16,] 3
## [17,] 5
## [18,] 11
## [19,] 13
## [20,] 0
## [21,] 1
## [22,] 13
## [23,] 14
## [24,] 0
## [25,] 57
## [26,] 14
## [27,] 75
## [28,] 192
## [29,] 140
## [30,] 80
## [31,] 76
## [32,] 20
class(matrixFreq)
## [1] "matrix"
We can further perform any calculation like Mean , Median , Mode, SD , Variance , Distribution , Skewness , Kurtosis on this single variable.Let us take simple Freq variable in its vector form.
freqVector <- as.vector(titanicDataFrame$Freq)
freqVector
## [1] 0 0 35 0 0 0 17 0 118 154 387 670 4 13 89 3 5
## [18] 11 13 0 1 13 14 0 57 14 75 192 140 80 76 20
class(freqVector)
## [1] "numeric"
Let us try to plot a line chart for this freqVector to see its distribution.
plot(freqVector,type = "l")
Some more examples below -
class('2')
## [1] "character"
class(2.2)
## [1] "numeric"
class(2.2L)
## [1] "numeric"
class(2.0L)
## [1] "integer"
class(2L)
## [1] "integer"
class("Vikas Test")
## [1] "character"
str(2.2)
## num 2.2
str('Test')
## chr "Test"
str(2L)
## int 2
You can practice a lot on data types by either using your own data sets or inbuilt data sets or by generating vectors , matrices , lists and factors etc. Try it!