1.4.1 If X <- c (22,3,7,NA,NA,67) what will be the output for the R statement length(X)
## Although 2 of the values are missing, The length of the vector is still 6.
X <- c (22,3,7,NA,NA,67)
length(X)
## [1] 6
1.4.2 X = c(NA,3,14,NA,33,17,NA,41) write some R code that will remove all occurrences of NA in X.
## The first argument X[!is.na(X)] gives the values without the values which are NA (missing). So, this one might be helpful.
X[!is.na(X)]
## [1] 22 3 7 67
## b. X[is.na(X)] will give the missing values and c. X[X==NA]= 0 will replace missing values with 0. Try it!
1.4.3 If Y = c(1,3,12,NA,33,7,NA,21) what R statement will replace all occurrences of NA with 11?
Y = c(1,3,12,NA,33,7,NA,21)
Y[is.na(Y)] <- 11
Y
## [1] 1 3 12 11 33 7 11 21
1.4.4 If X = c(34,33,65,37,89,NA,43,NA,11,NA,23,NA) then what will count the number of occurrences of NA in X?
X = c(34,33,65,37,89,NA,43,NA,11,NA,23,NA)
## is.na will give a vector like 1 1 1 1 1 0 0 0 1 0 1 1 0... So we will sum this up to find the total number of NAs.
sum(is.na(X))
## [1] 4
1.4.5 Consider the following vector W <- c (11, 3, 5, NA, 6). Write some R code that will return TRUE for value of W missing in the vector.
W <- c (11, 3, 5, NA, 6)
is.na(W)
## [1] FALSE FALSE FALSE TRUE FALSE
1.4.6 Load ‘Orange’ dataset from R using the command data(Orange). Replace all values of age=118 to NA.
data("Orange")
Orange[Orange$age == 118,] # These are the rows where age = 118
## Tree age circumference
## 1 1 118 30
## 8 2 118 33
## 15 3 118 30
## 22 4 118 32
## 29 5 118 30
Orange$age[Orange$age == 118] <- NA
Orange$age
## [1] NA 484 664 1004 1231 1372 1582 NA 484 664 1004 1231 1372 1582
## [15] NA 484 664 1004 1231 1372 1582 NA 484 664 1004 1231 1372 1582
## [29] NA 484 664 1004 1231 1372 1582
1.4.7 Consider the following vector A <- c (33, 21, 12, NA, 7, 8). Write some R code that will calculate the mean of A without the missing value.
A <- c (33, 21, 12, NA, 7, 8)
mean(A[!is.na(A)]) # this is the first option
## [1] 16.2
mean(A, na.rm = TRUE) # this one is the other option which looks better for me.
## [1] 16.2
1.4.8 Let: c1 <- c(1,2,3,NA) ; c2 <- c(2,4,6,89) ; c3 <- c(45,NA,66,101). If X <- rbind (c1,c2,c3, deparse.level=1) , write a code that will display all rows with missing values.
c1 <- c(1,2,3,NA)
c2 <- c(2,4,6,89)
c3 <- c(45,NA,66,101)
X <- rbind (c1,c2,c3, deparse.level=1) # rbind combines 3 vectors. Each vector will be a row.
X[!complete.cases(X),] # complete.cases argument returns the ones without NA values.
## [,1] [,2] [,3] [,4]
## c1 1 2 3 NA
## c3 45 NA 66 101
1.4.9 Consider the following data obtained from df <- data.frame (Name = c(NA, “Joseph”, “Martin”, NA, “Andrea”), Sales = c(15, 18, 21, 56, 60), Price = c(34, 52, 21, 44, 20), stringsAsFactors = FALSE). Write some R code that will return a data frame which removes all rows with NA values in Name column.
df <- data.frame (Name = c(NA, "Joseph", "Martin", NA, "Andrea"), Sales = c(15, 18, 21, 56, 60), Price = c(34, 52, 21, 44, 20), stringsAsFactors = FALSE)
df # 1st and 4th rows have NA in Name column.
## Name Sales Price
## 1 <NA> 15 34
## 2 Joseph 18 52
## 3 Martin 21 21
## 4 <NA> 56 44
## 5 Andrea 60 20
df[!is.na(df$Name),]
## Name Sales Price
## 2 Joseph 18 52
## 3 Martin 21 21
## 5 Andrea 60 20
1.4.10 Consider the following data obtained from df <- data.frame(Name = c(NA, “Joseph”, “Martin”, NA, “Andrea”), Sales = c(15, 18, 21, NA, 60), Price = c(34, 52, 33, 44, NA), stringsAsFactors = FALSE). Write some R code that will remove all rows with NA values and give the following output.
Name Sales Price
Joseph 18 52
Martin 21 33
df <- data.frame(Name = c(NA, "Joseph", "Martin", NA, "Andrea"), Sales = c(15, 18, 21, NA, 60), Price = c(34, 52, 33, 44, NA), stringsAsFactors = FALSE)
# We have to check each column for NA values.
df[!(is.na(df$Name) | is.na(df$Sales)| is.na(df$Price)),]
## Name Sales Price
## 2 Joseph 18 52
## 3 Martin 21 33