1.4 Missing Values

1.4.1 If X <- c (22,3,7,NA,NA,67) what will be the output for the R statement length(X)

## Although 2 of the values are missing, The length of the vector is still 6.

X <- c (22,3,7,NA,NA,67)
length(X)

## [1] 6

1.4.2 X = c(NA,3,14,NA,33,17,NA,41) write some R code that will remove all occurrences of NA in X.

## The first argument X[!is.na(X)] gives the values without the values which are NA (missing). So, this one might be helpful.

X[!is.na(X)]

## [1] 22  3  7 67

## b. X[is.na(X)] will give the missing values and c. X[X==NA]= 0 will replace missing values with 0. Try it!

1.4.3 If Y = c(1,3,12,NA,33,7,NA,21) what R statement will replace all occurrences of NA with 11?

Y = c(1,3,12,NA,33,7,NA,21)

Y[is.na(Y)] <- 11

Y

## [1]  1  3 12 11 33  7 11 21

1.4.4 If X = c(34,33,65,37,89,NA,43,NA,11,NA,23,NA) then what will count the number of occurrences of NA in X?

X = c(34,33,65,37,89,NA,43,NA,11,NA,23,NA)

## is.na will give a vector like 1 1 1 1 1 0 0 0 1 0 1 1 0... So we will sum this up to find the total number of NAs.

sum(is.na(X))

## [1] 4

1.4.5 Consider the following vector W <- c (11, 3, 5, NA, 6). Write some R code that will return TRUE for value of W missing in the vector.

W <- c (11, 3, 5, NA, 6)

is.na(W)

## [1] FALSE FALSE FALSE  TRUE FALSE

1.4.6 Load ‘Orange’ dataset from R using the command data(Orange). Replace all values of age=118 to NA.

data("Orange")

Orange[Orange$age == 118,] # These are the rows where age = 118

##    Tree age circumference
## 1     1 118            30
## 8     2 118            33
## 15    3 118            30
## 22    4 118            32
## 29    5 118            30

Orange$age[Orange$age == 118] <- NA

Orange$age

##  [1]   NA  484  664 1004 1231 1372 1582   NA  484  664 1004 1231 1372 1582
## [15]   NA  484  664 1004 1231 1372 1582   NA  484  664 1004 1231 1372 1582
## [29]   NA  484  664 1004 1231 1372 1582

1.4.7 Consider the following vector A <- c (33, 21, 12, NA, 7, 8). Write some R code that will calculate the mean of A without the missing value.

A <- c (33, 21, 12, NA, 7, 8)

mean(A[!is.na(A)]) # this is the first option

## [1] 16.2

mean(A, na.rm = TRUE) # this one is the other option which looks better for me.

## [1] 16.2

1.4.8 Let: c1 <- c(1,2,3,NA) ; c2 <- c(2,4,6,89) ; c3 <- c(45,NA,66,101). If X <- rbind (c1,c2,c3, deparse.level=1) , write a code that will display all rows with missing values.

c1 <- c(1,2,3,NA)
c2 <- c(2,4,6,89)
c3 <- c(45,NA,66,101)
X <- rbind (c1,c2,c3, deparse.level=1) # rbind combines 3 vectors. Each vector will be a row.

X[!complete.cases(X),] # complete.cases argument returns the ones without NA values.

##    [,1] [,2] [,3] [,4]
## c1    1    2    3   NA
## c3   45   NA   66  101

1.4.9 Consider the following data obtained from df <- data.frame (Name = c(NA, “Joseph”, “Martin”, NA, “Andrea”), Sales = c(15, 18, 21, 56, 60), Price = c(34, 52, 21, 44, 20), stringsAsFactors = FALSE). Write some R code that will return a data frame which removes all rows with NA values in Name column.

df <- data.frame (Name = c(NA, "Joseph", "Martin", NA, "Andrea"), Sales = c(15, 18, 21, 56, 60), Price = c(34, 52, 21, 44, 20), stringsAsFactors = FALSE)

df # 1st and 4th rows have NA in Name column.

##     Name Sales Price
## 1   <NA>    15    34
## 2 Joseph    18    52
## 3 Martin    21    21
## 4   <NA>    56    44
## 5 Andrea    60    20

df[!is.na(df$Name),]

##     Name Sales Price
## 2 Joseph    18    52
## 3 Martin    21    21
## 5 Andrea    60    20

1.4.10 Consider the following data obtained from df <- data.frame(Name = c(NA, “Joseph”, “Martin”, NA, “Andrea”), Sales = c(15, 18, 21, NA, 60), Price = c(34, 52, 33, 44, NA), stringsAsFactors = FALSE). Write some R code that will remove all rows with NA values and give the following output.

Name Sales Price
Joseph 18 52
Martin 21 33

df <- data.frame(Name = c(NA, "Joseph", "Martin", NA, "Andrea"), Sales = c(15, 18, 21, NA, 60), Price = c(34, 52, 33, 44, NA), stringsAsFactors = FALSE)

# We have to check each column for NA values.

df[!(is.na(df$Name) | is.na(df$Sales)| is.na(df$Price)),]

##     Name Sales Price
## 2 Joseph    18    52
## 3 Martin    21    33

1.4 Missing Values

source: http://www.r-exercises.com/2015/12/14/missing-values/