birthweight <- read.csv("birthweight.csv")

#Basic data types We have already said that logical values can be used to subset a data frame, and all the values in a given column of a data frame must be of the same type or class. But what does this mean?

##Understanding class R has the following basic data classes:

numeric (includes integer and double) character logical complex raw Generally, in bioinformatics, values belong to one of the first three classes. Read more about the complex and raw data types here.

class(birthweight$birthweight)
## [1] "numeric"
class(birthweight$smoker)
## [1] "character"
class(birthweight$geriatric.pregnancy)
## [1] "logical"

The numeric category is fairly self-explanatory. What are character and logical?

Character values are exactly what they sound like: stored characters (letters and / or numbers). In the birthweight table, the “birth.date” and “location” columns contain character values.

head(birthweight$location)
## [1] "General"     "Silver Hill" "Silver Hill" "Silver Hill" "Memorial"   
## [6] "Memorial"

Characters are recognizable by the quotation marks that appear around them in the output. R cannot perform mathematical operations on numbers stored as characters.

#1 + "1"

Logical values are TRUE, FALSE, or NA (missing). Logical values are the result of comparing one item to another with relational operators.

The relational operators in R are:

greater than = greater than or equal to < less than <= less than or equal to == equal to != not equal to

birthweight[birthweight$head.circumference > 35, c("length", "weeks.gestation", "maternal.height", "paternal.height")]
##    length weeks.gestation maternal.height paternal.height
## 1      52              38             164              NA
## 4      53              41             161             175
## 7      52              40             170             181
## 15     53              40             171             183
## 16     53              40             170             185
## 18     49              40             152             170
## 20     58              41             173             180
## 21     54              38             172             172
## 23     52              39             170             178
## 25     51              38             165              NA
## 31     58              41             172             185
## 33     51              40             168             181
## 34     51              39             157              NA
## 35     54              42             175             184
## 42     53              44             174             189
birthweight[birthweight$maternal.age <= 20, c("location", "maternal.age", "paternal.age")]
##       location maternal.age paternal.age
## 11    Memorial           20           22
## 14    Memorial           19           20
## 15 Silver Hill           19           19
## 16    Memorial           20           24
## 21 Silver Hill           18           20
## 22 Silver Hill           20           23
## 26     General           20           23
## 28     General           20           20
## 37 Silver Hill           20           20
## 39     General           19           NA
## 42 Silver Hill           20           26

Notice that when R is asked to perform a comparison between a number and a missing value, the result is a missing value.

birthweight[birthweight$paternal.education == 10, c(1,13:16)]
##        ID paternal.age paternal.education paternal.cigarettes paternal.height
## NA     NA           NA                 NA                  NA              NA
## NA.1   NA           NA                 NA                  NA              NA
## 7     365           30                 10                  25             181
## 24    321           39                 10                   0             171
## NA.2   NA           NA                 NA                  NA              NA
## 26   1360           23                 10                  35             179
## 28   1363           20                 10                  35             185
## NA.3   NA           NA                 NA                  NA              NA
## 36   1191           21                 10                  25             185
## 37    431           20                 10                  35             180
## NA.4   NA           NA                 NA                  NA              NA
birthweight[birthweight$weeks.gestation != 40, "weeks.gestation"]
##  [1] 38 39 41 41 39 39 34 38 38 38 41 37 39 41 38 35 39 37 38 44 41 37 41 41 35
## [26] 39 42 42 33 33 39 45 44
birthweight[birthweight$location == "General",]
##      ID birth.date location length birthweight head.circumference
## 1  1107  1/25/1967  General     52        3.23                 36
## 17  820  10/7/1967  General     52        3.77                 34
## 18  752 10/19/1967  General     49        3.32                 36
## 26 1360  2/16/1968  General     56        4.55                 34
## 28 1363   4/2/1968  General     48        2.37                 30
## 33 1088  7/24/1968  General     51        3.27                 36
## 36 1191   9/7/1968  General     53        3.65                 33
## 39 1600  10/9/1968  General     53        2.90                 34
## 40  532 10/25/1968  General     53        3.59                 34
## 41  223 12/11/1968  General     50        3.87                 33
##    weeks.gestation smoker maternal.age maternal.cigarettes maternal.height
## 1               38     no           31                   0             164
## 17              40     no           24                   0             157
## 18              40    yes           27                  12             152
## 26              44     no           20                   0             162
## 28              37    yes           20                   7             163
## 33              40     no           24                   0             168
## 36              42     no           21                   0             165
## 39              39     no           19                   0             165
## 40              40    yes           31                  12             163
## 41              45    yes           28                  25             163
##    maternal.prepregnant.weight paternal.age paternal.education
## 1                           57           NA                 NA
## 17                          50           31                 16
## 18                          48           37                 12
## 26                          57           23                 10
## 28                          47           20                 10
## 33                          53           29                 16
## 36                          61           21                 10
## 39                          57           NA                 NA
## 40                          49           41                 12
## 41                          54           30                 16
##    paternal.cigarettes paternal.height low.birthweight geriatric.pregnancy
## 1                   NA              NA               0               FALSE
## 17                   0             173               0               FALSE
## 18                  25             170               0               FALSE
## 26                  35             179               0               FALSE
## 28                  35             185               1               FALSE
## 33                   0             181               0               FALSE
## 36                  25             185               0               FALSE
## 39                  NA              NA               0               FALSE
## 40                  50             191               0               FALSE
## 41                   0             183               0               FALSE

Many of R’s functions also return logical values.

is.numeric(birthweight$ID)
## [1] TRUE
is.numeric(birthweight$smoker)
## [1] FALSE

##Coercion: converting between classes The birthweight data frame has three columns that should probably be logical values: “smoker”, “low.birthweight”, and “geriatric.pregnancy”. All of these are questions that can be answered with TRUE/FALSE. However, only “geriatric.pregnancy” is stored as a logical value. Storing “smoker” and “low.birthweight” as logical values would be more useful, since it allows us to subset the data frame more easily.

Changing the class of data is known as coercion.

as.logical(birthweight$low.birthweight)
##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
## [25] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
## [37]  TRUE  TRUE FALSE FALSE FALSE FALSE
as.logical(birthweight$smoker)
##  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

The as.logical() function converted “low.birthweight” to a logical vector, but could not convert “smoker,” and returned a vector of missing data denoted by NA. Why is this?

The coercion rule in R is as follows:

logical > integer > numeric > complex > character

R can convert logical values to integers, store integers as the more general numeric type, or represent numeric data as a character, but these coercion operations cannot always be reversed without losing information.

as.numeric(birthweight$geriatric.pregnancy)
##  [1] 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
## [39] 0 0 0 0

The as.logical() function only operates on “low.birthweight” the way we want because the data was encoded as 0s and 1s. If any other numbers were used, the results might be unexpected.

as.logical(birthweight$maternal.age)
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Let’s convert the “low.birthweight” column to logical.

birthweight$low.birthweight <- as.logical(birthweight$low.birthweight)
birthweight
##      ID birth.date    location length birthweight head.circumference
## 1  1107  1/25/1967     General     52        3.23                 36
## 2   697   2/6/1967 Silver Hill     48        3.03                 35
## 3  1683  2/14/1967 Silver Hill     53        3.35                 33
## 4    27   3/9/1967 Silver Hill     53        3.55                 37
## 5  1522  3/13/1967    Memorial     50        2.74                 33
## 6   569  3/23/1967    Memorial     50        2.51                 35
## 7   365  4/23/1967    Memorial     52        3.53                 37
## 8   808   5/5/1967 Silver Hill     48        2.92                 33
## 9  1369   6/4/1967 Silver Hill     49        3.18                 34
## 10 1023   6/7/1967    Memorial     52        3.00                 35
## 11  822  6/14/1967    Memorial     50        3.42                 35
## 12 1272  6/20/1967    Memorial     53        2.75                 32
## 13 1262  6/25/1967 Silver Hill     53        3.19                 34
## 14  575  7/12/1967    Memorial     50        2.78                 30
## 15 1016  7/13/1967 Silver Hill     53        4.32                 36
## 16  792   9/7/1967    Memorial     53        3.64                 38
## 17  820  10/7/1967     General     52        3.77                 34
## 18  752 10/19/1967     General     49        3.32                 36
## 19  619  11/1/1967    Memorial     52        3.41                 33
## 20 1764  12/7/1967 Silver Hill     58        4.57                 39
## 21 1081 12/14/1967 Silver Hill     54        3.63                 38
## 22  516   1/8/1968 Silver Hill     47        2.66                 33
## 23  272  1/10/1968    Memorial     52        3.86                 36
## 24  321  1/21/1968 Silver Hill     48        3.11                 33
## 25 1636   2/2/1968 Silver Hill     51        3.93                 38
## 26 1360  2/16/1968     General     56        4.55                 34
## 27 1388  2/22/1968    Memorial     51        3.14                 33
## 28 1363   4/2/1968     General     48        2.37                 30
## 29 1058  4/24/1968 Silver Hill     53        3.15                 34
## 30  755  4/25/1968    Memorial     53        3.20                 33
## 31  462  6/19/1968 Silver Hill     58        4.10                 39
## 32  300  7/18/1968 Silver Hill     46        2.05                 32
## 33 1088  7/24/1968     General     51        3.27                 36
## 34   57  8/12/1968    Memorial     51        3.32                 38
## 35  553  8/17/1968 Silver Hill     54        3.94                 37
## 36 1191   9/7/1968     General     53        3.65                 33
## 37  431  9/16/1968 Silver Hill     48        1.92                 30
## 38 1313  9/27/1968 Silver Hill     43        2.65                 32
## 39 1600  10/9/1968     General     53        2.90                 34
## 40  532 10/25/1968     General     53        3.59                 34
## 41  223 12/11/1968     General     50        3.87                 33
## 42 1187 12/19/1968 Silver Hill     53        4.07                 38
##    weeks.gestation smoker maternal.age maternal.cigarettes maternal.height
## 1               38     no           31                   0             164
## 2               39     no           27                   0             162
## 3               41     no           27                   0             164
## 4               41    yes           37                  25             161
## 5               39    yes           21                  17             156
## 6               39    yes           22                   7             159
## 7               40    yes           26                  25             170
## 8               34     no           26                   0             167
## 9               38    yes           31                  25             162
## 10              38    yes           30                  12             165
## 11              38     no           20                   0             157
## 12              40    yes           37                  50             168
## 13              41    yes           27                  35             163
## 14              37    yes           19                   7             165
## 15              40     no           19                   0             171
## 16              40    yes           20                   2             170
## 17              40     no           24                   0             157
## 18              40    yes           27                  12             152
## 19              39    yes           23                  25             181
## 20              41    yes           32                  12             173
## 21              38     no           18                   0             172
## 22              35    yes           20                  35             170
## 23              39    yes           30                  25             170
## 24              37     no           28                   0             158
## 25              38     no           29                   0             165
## 26              44     no           20                   0             162
## 27              41    yes           22                   7             160
## 28              37    yes           20                   7             163
## 29              40     no           29                   0             167
## 30              41     no           21                   0             155
## 31              41     no           35                   0             172
## 32              35    yes           41                   7             166
## 33              40     no           24                   0             168
## 34              39    yes           23                  17             157
## 35              42     no           24                   0             175
## 36              42     no           21                   0             165
## 37              33    yes           20                   7             161
## 38              33     no           24                   0             149
## 39              39     no           19                   0             165
## 40              40    yes           31                  12             163
## 41              45    yes           28                  25             163
## 42              44     no           20                   0             174
##    maternal.prepregnant.weight paternal.age paternal.education
## 1                           57           NA                 NA
## 2                           62           27                 14
## 3                           62           37                 14
## 4                           66           46                 NA
## 5                           53           24                 12
## 6                           52           23                 14
## 7                           62           30                 10
## 8                           64           25                 12
## 9                           57           32                 16
## 10                          64           38                 14
## 11                          48           22                 14
## 12                          61           31                 16
## 13                          51           31                 16
## 14                          60           20                 14
## 15                          62           19                 12
## 16                          59           24                 12
## 17                          50           31                 16
## 18                          48           37                 12
## 19                          69           23                 16
## 20                          70           38                 14
## 21                          50           20                 12
## 22                          57           23                 12
## 23                          78           40                 16
## 24                          54           39                 10
## 25                          61           NA                 NA
## 26                          57           23                 10
## 27                          53           24                 16
## 28                          47           20                 10
## 29                          60           30                 16
## 30                          55           25                 14
## 31                          58           31                 16
## 32                          57           37                 14
## 33                          53           29                 16
## 34                          48           NA                 NA
## 35                          66           30                 12
## 36                          61           21                 10
## 37                          50           20                 10
## 38                          45           26                 16
## 39                          57           NA                 NA
## 40                          49           41                 12
## 41                          54           30                 16
## 42                          68           26                 14
##    paternal.cigarettes paternal.height low.birthweight geriatric.pregnancy
## 1                   NA              NA           FALSE               FALSE
## 2                    0             178           FALSE               FALSE
## 3                    0             170           FALSE               FALSE
## 4                    0             175           FALSE                TRUE
## 5                    7             179           FALSE               FALSE
## 6                   25              NA            TRUE               FALSE
## 7                   25             181           FALSE               FALSE
## 8                   25             175           FALSE               FALSE
## 9                   50             194           FALSE               FALSE
## 10                  50             180           FALSE               FALSE
## 11                   0             179           FALSE               FALSE
## 12                   0             173           FALSE                TRUE
## 13                  25             185           FALSE               FALSE
## 14                   0             183           FALSE               FALSE
## 15                   0             183           FALSE               FALSE
## 16                  12             185           FALSE               FALSE
## 17                   0             173           FALSE               FALSE
## 18                  25             170           FALSE               FALSE
## 19                   2             181           FALSE               FALSE
## 20                  25             180           FALSE               FALSE
## 21                   7             172           FALSE               FALSE
## 22                  50             186            TRUE               FALSE
## 23                  50             178           FALSE               FALSE
## 24                   0             171           FALSE               FALSE
## 25                  NA              NA           FALSE               FALSE
## 26                  35             179           FALSE               FALSE
## 27                  12             176           FALSE               FALSE
## 28                  35             185            TRUE               FALSE
## 29                  NA             182           FALSE               FALSE
## 30                  25             183           FALSE               FALSE
## 31                  25             185           FALSE                TRUE
## 32                  25             173            TRUE                TRUE
## 33                   0             181           FALSE               FALSE
## 34                  NA              NA           FALSE               FALSE
## 35                   0             184           FALSE               FALSE
## 36                  25             185           FALSE               FALSE
## 37                  35             180            TRUE               FALSE
## 38                   0             169            TRUE               FALSE
## 39                  NA              NA           FALSE               FALSE
## 40                  50             191           FALSE               FALSE
## 41                   0             183           FALSE               FALSE
## 42                  25             189           FALSE               FALSE

Note that the output of as.logical(birthweight$low.birthweight) must be assigned to the “low.birthweight” column in order for the values in the column to change.