Missing Values in R

There are several important special values in R, including NA, NaN, Inf, and NULL. Here I would like to summarise the basic usage of these values.

NA & NaN

NA means missing value (Not Available). There are different types of NA, including NA_integer_, NA_real_, NA_character_, NA_complex_. NOTE that NA_integer_ is of the integer class, and so on.

is.na(NA_character_)

## [1] TRUE

is.na(NA_integer_)

## [1] TRUE

is.na(NA_real_)

## [1] TRUE

is.na(NA_complex_)

## [1] TRUE

class(NA_integer_)

## [1] "integer"

typeof(NA_integer_)

## [1] "integer"

Why we need NA? In some cases the components of a vector may not be completely known. When an element or value is “not available” or a “missing value” in the statistical sense, a place within a vector may be reserved for it by assigning it the special value NA.

Any operation on an NA becomes an NA. The motivation for this rule is simply that if the specification of an operation is incomplete, the result cannot be known and hence is not available.

To test if a value is NA, use is.na(). The function is.na(x) returns a logical vector of the same size as x with value TRUE if and only if the corresponding element in x is NA.

NaN means Not A Number, and is for (IEEE) arithmetic purposes. Usually NaN comes from 0/0.

Hence there is only one type of NaN, which is numeric.

To test NaN, use is.nan(x).

is.nan(NaN)

## [1] TRUE

is.nan(0/0)

## [1] TRUE

class(NaN)

## [1] "numeric"

typeof(NaN)

## [1] "double"

NOTE that is.na(x) is TRUE for both NA and NaN, but is.nan(x) is TRUE for NaN and FALSE for NA. NaN is NA, but NA is not NaN. As NaN is NA, NaN == 1 returns NA.

is.na(NaN)

## [1] TRUE

is.nan(NA)

## [1] FALSE

NaN == 1

## [1] NA

0/0 == 1

## [1] NA

Inf

Like NaN, Inf is also produced by numerical computation, such as 1/0. Inf is of the numeric class.

Inf can be test by is.infinite(x) or x==Inf.

But unlike NaN, Inf is not a NA. Inf is a very large value, larger than any other numeric.

class(Inf)

## [1] "numeric"

is.infinite(1/0)

## [1] TRUE

is.infinite(1)

## [1] FALSE

1/0 == Inf

## [1] TRUE

is.na(Inf)

## [1] FALSE

is.nan(Inf)

## [1] FALSE

is.infinite(NA)

## [1] FALSE

Any arithmetic operations on Inf return Inf or NaN. Watch out for the Inf after any operation.

Inf + 1

## [1] Inf

Inf - 1

## [1] Inf

Inf * 1

## [1] Inf

Inf / Inf

## [1] NaN

Inf - Inf

## [1] NaN

NULL

NULL is used whenever there is a need to indicate or specify that an object is absent. It is not a vector or list of zero length - these objects exist!

But NULL is simialr to a vector of zero length: NULL also has length 0.

NULL is usually procedud by using $ indexing on an unexisting element of a list.

There is only one NULL object in R, to which all instances refer. It has no modifiable properties or attributes.

To test for NULL, use is.null.

is.null(NULL)

## [1] TRUE

ab <- list(a=1, b=2)
is.null(ab$c)

## [1] TRUE

ab$c == NULL

## logical(0)

Using x == NULL will return logical(0). This often leads to the error in if(condition): arguemtn is of length zero.

if(ab$c == NULL) {print("is null")}

## Error in if (ab$c == NULL) {: argument is of length zero

if(logical()) {print("is null")}

## Error in if (logical()) {: argument is of length zero

length(NULL)

## [1] 0

Is NULL a NA? No. NULL has length 0, but NA has length 1.

is.null(NA)

## [1] FALSE

is.na(NULL)

## Warning in is.na(NULL): is.na() applied to non-(list or vector) of type
## 'NULL'

## logical(0)

More on safe if: to make your calls to if safe, a good code pattern is:

condition <- ab$a == 1
if(!is.null(condition) && 
   length(condition) == 1 && 
   !is.na(condition) && 
   condition) {
  print("ab$a is 1")
}

## [1] "ab$a is 1"

Summary

NA is Not Available (value). NaN is Not a Number (value).
NaN is a NA, but NA is not a NaN. NA and NaN are makers and not values.
Inf is a value, standing for 1/0.
NA, NaN, and Inf have length 1.
NULL is an special object with length 0. Use to indicate an object is absent.