Ch1.2.1 Data Types

Examples of Data Types

  • Boolean (True or False)
  • Integer (\( \pm \) whole numbers)
  • Floating-point (numeric)
  • Character (letters, punctuation marks, whitespace, etc)
  • String (sequence of characters, etc)
  • Factor (type of label)

What is a Data Type?

From Wikipedia:

  • A data type is an attribute of data which tells the compiler or interpreter how the programmer intends to use the data.
  • A data type constrains the values that an expression, such as a variable or a function, might take.
  • This data type defines the operations that can be done on the data, the meaning of the data, and the way values of that type can be stored.
  • A data type provides a set of values from which an expression (i.e. variable, function, etc.) may take its values.

Boolean

  • This data type takes on two values, either TRUE or FALSE.
  • These values, returned from some other function (program), are often used to set a flag that affects the operation of a subsequent function.
1 == sqrt(2)
[1] FALSE
3 == sqrt(9)
[1] TRUE

Numerical Boolean

  • R can convert TRUE and FALSE to the numerical values of 1 and 0, respectively.
c(TRUE == -1,TRUE == 0,TRUE == 1,TRUE == 3.14)
[1] FALSE FALSE  TRUE FALSE
c(FALSE == -1,FALSE == 0,FALSE == 1,FALSE == pi)
[1] FALSE  TRUE FALSE FALSE
c(TRUE < FALSE, TRUE > FALSE, TRUE == FALSE)
[1] FALSE  TRUE FALSE

Boolean and Functions

  • Boolean as 0 and 1 allows numerical functionality.
  • That is, we can add up the values, compute averages, etc.
TRUE == 1; FALSE == 0
[1] TRUE
[1] TRUE
x <- c(TRUE, FALSE, TRUE, FALSE, TRUE)
c(sum(x), mean(x), length(x))
[1] 3.0 0.6 5.0

Boolean and Logic Checks

  • Logical checks can use TRUE and FALSE.
x <- c(1,2,3,4,3,1,2,3,3,4,1,3,4,3)
x
 [1] 1 2 3 4 3 1 2 3 3 4 1 3 4 3
c(sum(x == 3), sum(x < 3))
[1] 6 5

Integers

  • Integers are given by the set of whole numbers of the form

\[ \left\{ \ldots, -3, -2, -1, 0, 1, 2, 3, \ldots \right\} \]

  • The integer data type does not provide for any decimal component of a number.
  • This simplicity provides a savings when circumstances allow.

Floating Point

  • The floating point data type refers to numerical values that have decimal point representation.
  • The term “floating point” is reminiscent of scientific notation.

\[ 123.45 = 1.2345 \times 10^2 \]

  • This data type will be covered in more detail in Ch2.2.2.

Floating Point Example

  • The number \( e \cong 2.718282 \) is a floating point data type.
exp(1)
[1] 2.718282

Testing Data Types

  • R has functions to test internal storage data type.
is.numeric(exp(1))
[1] TRUE
is.integer(exp(1))
[1] FALSE

Default Data Types

  • The number 2 is an integer, but the default for R is to view all numbers as numeric unless otherwise specified.
is.integer(2)
[1] FALSE
as.integer(2)
[1] 2

Changing Data Types

  • R can convert from one type of data type to another, sometimes refered to as coercion.
as.numeric(exp(1)); as.integer(exp(1))
[1] 2.718282
[1] 2
as.character(exp(1))
[1] "2.71828182845905"

Round Example: e

  • The round function is a better choice than as.integer if rounding is the goal.
exp(1); as.integer(exp(1)); round(exp(1))
[1] 2.718282
[1] 2
[1] 3

Round Example: pi

  • Similarly for \( \pi \):
pi; as.integer(pi); round(pi); as.character(pi)
[1] 3.141593
[1] 3
[1] 3
[1] "3.14159265358979"

Numeric and Integer Defaults

  • Numeric and integer data can usually be freely mixed and matched.
  • By default, R will convert internal representations into whatever form is most appropriate for the data being represented.
  • This is typically the numeric data type, as this is versatile and capable enough for most data storage needs.