Section 2.7

Q1 Consider the vector v1 <- seq(from = 1, to = 100, by = 3), and v2 <- sqrt(v1).Find the subvector of v1 with values bigger or equal to 30 and less than 60. And assign the subvector to name v1s.

v1 <- seq(from = 1, to = 100, by = 3)
v1s <- v1[v1>=30 & v1<60]
v1s
##  [1] 31 34 37 40 43 46 49 52 55 58

Q2 Find the subvector of v2 such that the corresponding value of v1 is less than 20 or larger than 50.

v2 <- sqrt(v1)
v120 <- v1 < 20
v150 <- v1 > 50
v2s <- v2[v120 | v150]
v2s
##  [1]  1.000000  2.000000  2.645751  3.162278  3.605551  4.000000  4.358899
##  [8]  7.211103  7.416198  7.615773  7.810250  8.000000  8.185353  8.366600
## [15]  8.544004  8.717798  8.888194  9.055385  9.219544  9.380832  9.539392
## [22]  9.695360  9.848858 10.000000

Q3 Use an example to verify xor(a, b) = (!a & b) | (a & !b)

a <- c(T, T, F, T)
b <- c(F, F, T, T)
xor(a,b)
## [1]  TRUE  TRUE  TRUE FALSE
(!a & b) | (a & !b)
## [1]  TRUE  TRUE  TRUE FALSE
#from the result , we can see that xor(a, b) = (!a & b) | (a & !b)

Section 2.8

Q1 Consider the vector s1 <- seq(from = 1, to = 100, length.out = 7),Compare s1 to 50 to see whether the values of s1 are bigger than 50, then assign the result to name s2. Compare s1 to 80 to see whether the values of s1 are less or equal to 80, then assign the result to name s3.

s1 <- seq(from = 1, to = 100, length.out = 7)
s1
## [1]   1.0  17.5  34.0  50.5  67.0  83.5 100.0
s2 <- s1[s1>50]
s2
## [1]  50.5  67.0  83.5 100.0
s3 <- s1[s1<80 | s1==80]
s3
## [1]  1.0 17.5 34.0 50.5 67.0

Q2 Use two methods (logical operators and set operations) to find the subvector of s1 with values bigger than 50 and less or equal to 80.

#1)logical operators
subv1 <- s1[s1>50 & s1<= 80]
subv1
## [1] 50.5 67.0
#2)set operations
subv2 <- s1[s1>50 & s1<= 80]
intersect(subv2,s1)
## [1] 50.5 67.0

Q3 For x <- 1:200, use two methods (logical operators and set operations) to find the subvector of x that is divisible by 7, but not divisible by 2.

#1)logical operators
x <- 1:200
subv3 <- x[x%%7==0 & x%%2 !=0]
subv3
##  [1]   7  21  35  49  63  77  91 105 119 133 147 161 175 189
#2)set operations
x <- 1:200
subv3 <- x[x%%7==0 & x%%2 !=0]
intersect(x,subv3)
##  [1]   7  21  35  49  63  77  91 105 119 133 147 161 175 189

Section 2.10

Q1 For the vector x <- rep(c(1, 2, NA), 3:5),

#a.Verify each value of summary(x, na.rm = TRUE) by using other functions.

x <- rep(c(1, 2, NA), 3:5)
min(x, na.rm = TRUE)
## [1] 1
quantile(x, na.rm = TRUE)
##   0%  25%  50%  75% 100% 
##    1    1    2    2    2
median(x, na.rm = TRUE)
## [1] 2
mean(x, na.rm = TRUE)
## [1] 1.571429
max(x, na.rm = TRUE)
## [1] 2
sum(is.na(x))
## [1] 5

#b.find the indices with missing values;

is.na(x)
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
which(is.na(x)) 
## [1]  8  9 10 11 12

#c.create a vector x_no_na containing the non-missing values in x;

x_na <- is.na(x)
x_no_na <- x[!x_na]
x_no_na
## [1] 1 1 1 2 2 2 2

#d.replace those missing values by the median of the non-missing values in x

x_na <- is.na(x)
x[x_na] <-median(x,na.rm=TRUE)
x
##  [1] 1 1 1 2 2 2 2 2 2 2 2 2

Q2 For the vector y <- rep(c(“N”, 2, “A”), 5:3), the values of both “N” and “A” indicate missingness. Convert non-standard missing values to NA, then find the indices of y that correspond to missing values.

y <- rep(c("N", 2, "A"), 5:3)
y[y=="N"] <- NA
y[y=="A"] <- NA
y
##  [1] NA  NA  NA  NA  NA  "2" "2" "2" "2" NA  NA  NA
is.na(y)
##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE
which(is.na(y)) 
## [1]  1  2  3  4  5 10 11 12

Section 2.12

Q1 What are the advantages of factors over vectors?

#1)Factors can detect any input that is outside of the levels and prevent input errors;
#2)We can create factors from character vectors,numeric vectors and logical vectors.

Q2 Suppose we define x <- factor(1:5), what is the result of x[1] < x[2]? Please try to answer this question without R.

(a): TRUE (b): FALSE (c): NA

#The answer is (c)NA, Cause after we convert a numeric vector into a factor, the usual arithmetic operation can no longer be applied since the numbers become labels.And the result is NA with a warning message.

Q3 Suppose we define x <- factor(1:5, ordered = TRUE), what is the result of x[1] < x[2]? Please try to answer this question without R.

(a): TRUE (b): FALSE (c): NA

#The answer is TRUE.Cause we define the order of X by "ordered= TRUE" which means that the levels of x  is  "1<2<3<4<5". So x[1]<x[2] is true.

Q4 Suppose we define x <- factor(1:5, ordered = TRUE, levels = 5:1), what is the result of x[1] < x[2]? Please try to answer this question without R.

(a): TRUE (b): FALSE (c): NA

x <- factor(1:5, ordered = TRUE, levels = 5:1)
x
## [1] 1 2 3 4 5
## Levels: 5 < 4 < 3 < 2 < 1
#The answer is (b)False. Because "levels=5:1" define the order in x from 5,4,3,2,1,which means that x[1] is 5 and x[2] is 4. So x[1]<x[2] is false.

Q5 Suppose size <- rep(c(“big”, “small”, “medium”), 3:1), convert it to an ordered factor with levels small < median < big.

size <- rep(c("big", "small", "medium"), 3:1)
size_fac <- factor(size) 
size_fac
## [1] big    big    big    small  small  medium
## Levels: big medium small
factor(size_fac,levels= c("small","medium","big"), order = TRUE)
## [1] big    big    big    small  small  medium
## Levels: small < medium < big

Section 2.13

Q2 For matrix x <- matrix(1:16, 4, 4), compute the following questions using R.Compute the column means of x

x <- matrix(1:16, 4, 4)
x
##      [,1] [,2] [,3] [,4]
## [1,]    1    5    9   13
## [2,]    2    6   10   14
## [3,]    3    7   11   15
## [4,]    4    8   12   16
colMeans(x)
## [1]  2.5  6.5 10.5 14.5

Q3 Create a matrix that contains the 0.4 and 0.7 quantiles for each row of x.

Ma<- matrix(1:4,2,2)
Ma
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
apply(Ma,1,quantile,c(0.4,0.7))
##     [,1] [,2]
## 40%  1.8  2.8
## 70%  2.4  3.4

Q4 Compute the cumulative sum of each row of x. What type of object is the result? And explain the result of the first column.

x <- matrix(1:16, 4, 4)
x
##      [,1] [,2] [,3] [,4]
## [1,]    1    5    9   13
## [2,]    2    6   10   14
## [3,]    3    7   11   15
## [4,]    4    8   12   16
rowSums(x)
## [1] 28 32 36 40
class(rowSums(x))  #the type of result is numeric
## [1] "numeric"
#the first column of rowSums is "28" which is the sum of numbers in first row of Matrix x, 1+5+9+13=28,so the answer would go for 28.

Section 2.11

From year 1900 to year 2021 (inclusive), calculate the number of leap years. (Hint: for a leap year, February has 29 days instead of 28. The value of as.Date(“2010-02-29”, format = “%Y-%m-%d”) is NA)

x <- c(1900:2021)
x
##   [1] 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914
##  [16] 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929
##  [31] 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944
##  [46] 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959
##  [61] 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974
##  [76] 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
##  [91] 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
## [106] 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
## [121] 2020 2021
# as we know that the numbers of leap year are the multiples of four, so if x%%4 equals to 0 means that that year would be leap years.
leapyear <- x[x%%4==0]
leapyear
##  [1] 1900 1904 1908 1912 1916 1920 1924 1928 1932 1936 1940 1944 1948 1952 1956
## [16] 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008 2012 2016
## [31] 2020
length(leapyear)
## [1] 31