Assignment 1

IO
Vectors,
Matrices,
Functions,
Loops,

This is your first exposure to R. It may be overwhelming. You will not be able to solve some of these questions, which would be hard for you, for now. But the key point in learning a language is practice and more practice.

Please submit both rmd and html files by using Brigthspace / Dropbox.

Please answer any 10 questions of your choice. Any additional question will be added to the mark as a bonus. If your code chunk gives an error, comment out that line of out and send it anyway.

I’ll post the answers on Brigthspace later.

1. IO

Suppose we have:

df <- data.frame("y" = rnorm(10, 0, 1), "x" = runif(10, 1, 4))
df

##              y        x
## 1  -0.68635149 2.845104
## 2  -1.77734465 1.453307
## 3   0.33502688 3.925788
## 4   2.49580148 1.564498
## 5  -0.76941757 1.611780
## 6  -0.38661271 3.371900
## 7  -0.03020392 3.206229
## 8  -0.43205768 2.603444
## 9   0.47938576 3.421321
## 10 -0.32411511 1.161114

1.. Save df to disk using write.csv() and write.table(). Experiment with the different options.

write.table(df,file="df1.csv")
write.csv(df,file="df2.csv")

Use read.csv() and read.table() to read the file(s) you wrote in (a).

read.table("df1.csv")

##              y        x
## 1  -0.68635149 2.845104
## 2  -1.77734465 1.453307
## 3   0.33502688 3.925788
## 4   2.49580148 1.564498
## 5  -0.76941757 1.611780
## 6  -0.38661271 3.371900
## 7  -0.03020392 3.206229
## 8  -0.43205768 2.603444
## 9   0.47938576 3.421321
## 10 -0.32411511 1.161114

read.csv("df2.csv")

##     X           y        x
## 1   1 -0.68635149 2.845104
## 2   2 -1.77734465 1.453307
## 3   3  0.33502688 3.925788
## 4   4  2.49580148 1.564498
## 5   5 -0.76941757 1.611780
## 6   6 -0.38661271 3.371900
## 7   7 -0.03020392 3.206229
## 8   8 -0.43205768 2.603444
## 9   9  0.47938576 3.421321
## 10 10 -0.32411511 1.161114

2. Vectors

Create the following vectors:

\[ \begin{array}{l}{\text { (a) }(1,2,3, \ldots, 19,20)} \\ {\text { (b) }(20,19, \ldots, 2,1)} \\ {\text { (c) }(1,2,3, \ldots, 19,20,19,18, \ldots, 2,1)}\end{array} \]

a<-c(1:20)
b<-c(20:1)
#when using the command c<-c(a,b)it was giving 20 twice in the sequence.
c<-c(a,b)

# a simple workaround 
d<-(19:1)
c<-c(a,d)

#or we using seq 
c<-c(seq(1:20),seq(19:1))

Create the following vectors: \[ \begin{array}{l}{\text { (a) }(4,6,3,4,6,3, \ldots, 4,6,3) \text { where there are } 10 \text { occurrences of } 4 .} \\ {\text { (b) }(4,6,3,4,6,3, \ldots, 4,6,3,4) \text { where there are } 11 \text { occurrences of } 4,10 \text { occurrences of } 6 \text { and } 10 \text { occur- }} \\ {\text { rences of } 3.} \\ {\text{ (c) }(4,4, \ldots, 4,6,6, \ldots, 6,3,3, \ldots, 3) \text { where there are } 10 \text { occurrences of } 4,20 \text { occurrences of } 6 \text { and } 30} \\ {\text { occurences of } 3 .}\end{array} \]

x<-c(4,6,3)
a<-rep(x,10)

a<-rep(x,each=10,len=11)

#here the below code gives output of 11 occurences of 4, 10 occurences of 6, and 10 occurences of 3
b<-rep(x, each = 10, len = 31)

#in squence form 
b<-rep_len(x,31)

c<-rep(x,c(10,20,30))

Create a vector of the values of \(e^{x} \cos (x) \text { at } x=3,3.1,3.2, \ldots, 6\)

x<-seq(3,6, .1)

x<-exp(x) * cos(x)

Create a vector of the values of \(\left(0.1^{3} 0.2^{1}, 0.1^{6} 0.2^{4}, \dots .0 .1^{16} 0.2^{34}\right)\)

a<-seq(3,15,3)

b<-seq(1,34,3)
c<-(.1^a*.2^b)

## Warning in 0.1^a * 0.2^b: longer object length is not a multiple of shorter
## object length

Create a vector of the values of \(\left(2, \frac{2^{2}}{2}, \frac{2^{3}}{3}, \dots, \frac{2^{25}}{25}\right)\)

a<-seq(1,25,1)

b<-seq(1,25,1)

c<-(2^a/b)

Calculate the following: \(\sum_{i=10}^{100}\left(i^{3}+4 i^{2}\right)\)

i<- c(10:100)

sumi <-sum(i^3+4*i^2)
sumi

## [1] 26852735

Calculate the following: \(\sum_{i=1}^{25}\left(\frac{2^{i}}{i}+\frac{3^{i}}{i^{2}}\right)\).

i<-c(1:25)

sumi <- ((2^i/i)+(3^i/i^2))
sumi

##  [1] 5.000000e+00 4.250000e+00 5.666667e+00 9.062500e+00 1.612000e+01
##  [6] 3.091667e+01 6.291837e+01 1.345156e+02 2.998889e+02 6.928900e+02
## [11] 1.650207e+03 4.031896e+03 1.006402e+04 2.557319e+04 6.595745e+04
## [16] 1.722473e+05 4.545619e+05 1.210306e+06 3.247155e+06 8.769390e+06
## [21] 2.381949e+07 6.502755e+07 1.783291e+08 4.910281e+08 1.357004e+09

Use paste() to create the following character vectors of length 30: (“label 1”, “label 2”, ….., “label 30”). Note that there is a single space between label and the number following.

a<-paste("label",1:30,sep=" ")

Execute the following lines which create two vectors of random integers which are chosen with replacement from the integers 0, 1, … , 999. Both vectors have length 250.

set.seed(50)
xVec <- sample(0:999, 250, replace=T)
yVec <- sample(0:999, 250, replace=T)

Suppose \(\mathbf{x}=\left(x_{1}, x_{2}, \ldots, x_{n}\right)\) denotes the vector xVec and \(y=\left(y_{1}, y_{2}, \dots, y_{n}\right)\) denotes the vector yVec.

Create the vector \(\left(y_{2}-x_{1}, \ldots, y_{n}-x_{n-1}\right)\).
Create the vector \(\left(\frac{\sin \left(y_{1}\right)}{\cos \left(x_{2}\right)}, \frac{\sin \left(y_{2}\right)}{\cos \left(x_{3}\right)}, \ldots, \frac{\sin \left(y_{n-1}\right)}{\cos \left(x_{n}\right)}\right)\)
Create the vector \(\left(x_{1}+2 x_{2}-x_{3}, x_{2}+2 x_{3}-x_{4}, \dots, x_{n-2}+2 x_{n-1}-x_{n}\right)\)

#(a)
xVec <- sample(0:999, 250, replace=T)
yVec <- sample(0:999, 250, replace=T)
a<- yVec[c(2:250)]-xVec[c(1:249)]

#(b)
b<-sin(yVec[1:249])/cos(xVec[2:250])

#(c)
c<-xVec[1:248]+2*xVec[2:249]-xVec[3:250]

This question uses the vectors xVec and yVec created in the previous question and the functions sort, order, mean, sqrt, sum and abs.

Create the vector \(\left(\left|x_{1}-\bar{x}\right|^{1 / 2},\left|x_{2}-\bar{x}\right|^{1 / 2}, \ldots,\left|x_{n}-\bar{x}\right|^{1 / 2}\right)\), where denotes the mean of the vector \(\bar{x}\). b How many values in yVec are within 200 of the maximum value of the terms in yVec?
How many numbers in xVec are divisible by 2? (Note that the modulo operator is denoted %%.)
Sort the numbers in the vector xVec in the order of increasing values in yVec.

#(a)
meanx<-mean(xVec)
meanx

## [1] 504.852

xVec[1:200]

##   [1] 677 725 890 369 596 717 767 885 299 149 937 220 324 279 466 885 696 840
##  [19] 771 122 544 460 536  41 315  29 843 316  92 184 509 966 366 626 927 419
##  [37] 933 971 138 480 393 684 511 209 860  44  51 508 246  83 919 981 455 492
##  [55] 787 121 129  82 690 120  47 890 512 273  30 451 224  83 539 808 285   9
##  [73] 131 638 746 561 936 599 896 531 471 140 720 279 593 430 310 427 274 452
##  [91] 229 273 791 156 283 280 615 532 774 984 212 906 606  36 385 402 730 710
## [109] 637 369 985 241 199 140 625 149 821 812 614 811 476 709 359 582 951 749
## [127] 902  20 907 717 470 229 495 874 761 940 491 437 235 496  61 175 370 960
## [145]  42  60 360 143 704 556 823 738 310 190 432  32 138  92 431 832 419 113
## [163] 398 453 630  49 477 347 755 667 243 564 842 633 409 690 154 403 327 466
## [181] 777  90 846 243 648 615 209 289 845 960 807 624 554 745 316 608 639 918
## [199] 803 807

diffx<-(xVec-meanx)
diffx2<-abs(diffx)
diffx3<-sqrt(diffx2)
diffx3

##   [1] 13.120518 14.837385 19.625188 11.655557  9.547146 14.565301 16.190985
##   [8] 19.497384 14.347543 18.864040 20.788170 16.877559 13.448123 15.028373
##  [15]  6.233137 19.497384 13.825628 18.307048 16.314043 19.566604  6.256836
##  [22]  6.697164  5.581039 21.537224 13.778679 21.814032 18.388801 13.742343
##  [29] 20.318760 17.912342  2.036664 21.474357 11.783548 11.006725 20.546241
##  [36]  9.265635 20.691737 21.590461 19.153381  4.985178 10.576011 13.384618
##  [43]  2.479516 17.200349 18.845371 21.467464 21.303802  1.774260 16.088878
##  [50] 20.539036 20.350627 21.820816  7.060595  3.584969 16.797262 19.592141
##  [57] 19.386903 20.563365 13.606910 19.617645 21.397476 19.625188  2.673574
##  [64] 15.226687 21.791099  7.338392 16.758640 20.539036  5.843629 17.411146
##  [71] 14.827407 22.267735 19.335253 11.538977 15.528941  7.493197 20.764104
##  [78]  9.702989 19.777462  5.113512  5.818247 19.101099 14.667924 15.028373
##  [85]  9.388717  8.651705 13.958940  8.823378 15.193815  7.269938 16.608793
##  [92] 15.226687 16.915910 18.677580 14.894697 14.995066 10.495142  5.210374
##  [99] 16.405731 21.889450 17.112919 20.028679 10.057236 21.652991 10.947694
## [106] 10.141598 15.004933 14.322989 11.495564 11.655557 21.912280 16.243522
## [113] 17.488625 19.101099 10.961204 18.864040 17.780551 17.525638 10.447392
## [120] 17.497085  5.371406 14.288037 12.076920  8.783393 21.122216 15.625236
## [127] 19.928572 22.019355 20.053628 14.565301  5.903558 16.608793  3.138790
## [134] 19.213225 16.004624 20.860201  3.721828  8.237233 16.427173  2.975231
## [141] 21.067795 18.161828 11.612579 21.334198 21.513995 21.091515 12.035448
## [148] 19.022408 14.111981  7.151783 17.836704 15.269185 13.958940 17.744069
## [155]  8.535338 21.745160 19.153381 20.318760  8.593719 18.087233  9.265635
## [162] 19.795252 10.336924  7.200833 11.186957 21.350691  5.277499 12.563917
## [169] 15.816068 12.733735 16.181842  7.690774 18.361590 11.320247  9.790403
## [176] 13.606910 18.731044 10.092175 13.336116  6.233137 16.496909 20.367916
## [183] 18.470192 16.181842 11.964447 10.495142 17.200349 14.691903 18.443102
## [190] 21.334198 17.382405 10.915494  7.010563 15.496709 13.742343 10.156180
## [197] 11.582228 20.326042 17.266963 17.382405 21.722155 17.092337 11.275992
## [204] 18.470192 15.583709 10.947694 14.667924  5.371406  8.071431 21.067795
## [211] 21.682896  6.938876 20.021289 15.625236 20.424201 22.116691 20.490290
## [218] 12.282182  4.674612 17.611019 21.122216 14.827407 20.740010  7.358532
## [225] 20.465874  6.989421 21.774940 20.497512 16.405731 13.861746 18.792232
## [232]  5.643758 18.712242 20.812208 11.364330 15.528941  9.841341 19.361095
## [239] 18.134277 19.592141 13.448123 12.455842 19.082662 22.468912  6.094916
## [246] 15.160871 10.288440 19.845705  1.360882 18.059568

#(b)
b<-xVec/2

c<-xVec%%2
sum(c==0)

## [1] 121

#(c)
yVec[order(xVec)]

##   [1] 258 337 832 854 950  81 487 743 989 670 981 860 434 353 424 536 890 590
##  [19]  71 976 603 761 645 695 750 665 967 756 394 251 675 450 577 805 175 604
##  [37] 774 926 377 399  22 689 972 675 503 538  21  67 471 709  38 387 535 605
##  [55] 373 134 265 375 805 965 513 791 639 571 442 422  83 253 891 569 429 383
##  [73] 422  16 754 393  67 627 586 595 451 371 140 927 650 788 415 432  30 983
##  [91] 507 565 352 693 618 310 986 897 860 586 147 258 914 475 208 707 806 778
## [109] 749  49  34 724 537 209 393 428 369 836 789 791  53 658  40 456  12 653
## [127]  94 207  57 938 992 563 448 661 528 710 431 929 547 837 483 125 193 958
## [145] 139 885 503 383 216 797 478 192 516 684 382 919 819 178 631 767 165  58
## [163] 526 142 341 306 892 536 999 961 299 172 534 427 719 408 723 571 557 525
## [181] 595 640 896 475 672 366 499   9 594 372 430 981   1  33 227 718  87 770
## [199] 233 581 907 605  86   3 773 101 598 783 606 232 566 276 495 875 582 114
## [217]  36 391 720 318  22 134 637 139 414 335 893 440  47 938 433 558 646 395
## [235]  20 906 386  12 859 597 312 534 285 688 636 200 285 377  20 120

#(d)

3. Matrices

Create a 6 × 10 matrix of random integers chosen from 1, 2,. . . ,10 by executing the following two lines of code:

set.seed(75)
aMat <- matrix( sample(10, size=60, replace=T), nr=6)
aMat

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    8    8    8    7    7    5    2    2    6     5
## [2,]    9    5    2    6    6    1    6    6    3     7
## [3,]    5    1   10    2    5    6    8    9   10     8
## [4,]    9    3    1    1    6   10   10    7    9    10
## [5,]    7    3    3    3    6    4    4    6   10     2
## [6,]    9    3    3    4    1    2    1   10    6     1

Find the number of entries in each row which are greater than 4. You can use apply() function.
Which rows contain exactly two occurrences of the number seven?

#(a)
#?apply()

#(b)

Calculate

\[ \sum_{i=1}^{20} \sum_{j=1}^{5} \frac{i^{4}}{(3+j)} \]

Russian Roulette is a game played with a revolver and a single bullet. Write an R script which simulates the game of Russian Roulette, returning “CLICK” 5 out of 6 times, and “BANG” 1 out of 6 times (Hint: see help("sample")).
You are given this matrix

m = matrix(sample(c(1:10), 16, replace = TRUE), nrow = 4, byrow = TRUE)
m

##      [,1] [,2] [,3] [,4]
## [1,]    2    8    3    1
## [2,]    9    2    8    8
## [3,]    1    1    8    8
## [4,]    5    5    5    1

Write a script to find row and column index of each maximum and minimum value in m.
Write a script to find the row index of maximum and minimum value in each column of m by using apply().
Convert m to a list of column-vectors

4. Functions

Write functions tmpFn1 and tmpFn2 such that if xVec is the vector \(\left(x_{1}, x_{2}, \ldots, x_{n}\right)\) then tmpFn1(xVec) returns the vector \(\left(x_{1}, x_{2}^{2}, \ldots, x_{n}^{n}\right)\) and tmpFn2(xVec) returns the vector \(\left(x_{1}, \frac{x_{2}^{2}}{2}, \ldots, \frac{x_{n}^{n}}{n}\right)\)
Now write a function tmpFn3() which takes 2 arguments \(x\) and \(n\) where \(x\) is a single number and \(n\) is a strictly positive integer. The function should return the value of

\[ 1+\frac{x}{1}+\frac{x^{2}}{2}+\frac{x^{3}}{3}+\cdots+\frac{x^{n}}{n} \]

Write a function tmpFn(xVec) such that if xVec is the vector \(\left(x_{1}, x_{2}, \ldots, x_{n}\right)\) then tmpFn(xVec) returns the vector of moving averages:

\[ \frac{x_{1}+x_{2}+x_{3}}{3}+\frac{x_{2}+x_{3}+x_{4}}{3}, \ldots, \frac{x_{n-2}+x_{n-1}+x_{n}}{3} \] Try out your function; for example, try tmpFn( c(1:5,6:1) ).

Consider the continuous function:

\[ f(x)=\left\{\begin{array}{ll} {x^{2}+2 x+3} & {\text { if } x<0} \\ {x+3} & {\text { if } 0 \leq x<2} \\ {x^{2}+4 x-7} & {\text { if } 2 \leq x} \end{array}\right. \] Write a function tmpFn which takes a single argument xVec. The function should return the vector of values of the function f(x) evaluated at the values in xVec. Hence plot the function \(f(x) ~~\text{for} −3 < x < 3\).

Write a function which takes a single argument which is a matrix. The function should return a matrix which is the same as the function argument but every odd number is doubled. Hence the result of using the function on the matrix:

\[ \left[\begin{array}{ccc} {1} & {1} & {3} \\ {5} & {2} & {6} \\ {-2} & {-1} & {-3} \end{array}\right] \] should be

\[ \left[\begin{array}{ccc} {2} & {2} & {6} \\ {10} & {2} & {6} \\ {-2} & {-2} & {-6} \end{array}\right] \] Hint: First try this for a specific matrix on the Command Line. Also check the piping operator %%.

The mode is the value that has the highest number of occurrences in a set of data. Unlike mean and median, mode can have both numeric and character data. R does not have a standard in-built function to calculate mode. So we create a user function to calculate mode of a data set in R. This function takes the vector as input and gives the mode value as output. You have the following matrix.

set.seed(3)
m = matrix(sample(c(1:8), 16, replace = TRUE), nrow = 4, byrow = TRUE)
m

##      [,1] [,2] [,3] [,4]
## [1,]    5    2    4    7
## [2,]    4    2    8    3
## [3,]    7    8    4    2
## [4,]    7    8    8    8

Here, how I find the modal value for the matrix.

getmode <- function(v) {
   a <- table(v, v)
   b <- diag(a)
   return(as.numeric(names(which.max(b))))
}

getmode(m)

## [1] 8

Find the modal value in each column by writing your own function.

5. Conditions

Take the sqrt of the following vector. Hint you need to skip (-) numbers.

set.seed(2)
d <- sample(-10:10, 10, replace = TRUE)
d

##  [1] 10  4 -5 -5 -3  6  6  1 -2  7

Now use the same vector, \(d\), and apply sqrt() if the number is larger than or equal to 0, if not take the square of the number. Hint: create functions for each operation then use ifelse().
You have the following matrix. Create a code that prints how many numbers less than 6 there are in A[1,]. If there is no number less than 6 print “No number less than 6 in Row 1”. Make sure that you print only once! Hint use any() in the condition. Use if and else separately instead of ifelse().

set.seed(1)
A <- matrix(sample(1:10, 25, replace = TRUE), 5, 5)
A

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    9    7    5    9    5
## [2,]    4    2   10    5    5
## [3,]    7    3    6    5    2
## [4,]    1    1   10    9   10
## [5,]    2    5    7    9    9

6. Loops

We use the same matrix A for the following examples. The first example is for a simple operations. It multiplies Row 1 with Row 2 in A and assign the numbers to a vector b:

b <- as.numeric() # You need to initiate the container

We can also use a double loop. Do it in a way that it multiplies subsequent rows in A and assign them to a new matrix B:

Some loops can be done by apply() family. We will have more loop examples this week as we move into simulations. But, here is the question:

The numerical or character variable could be an indicator variable. For example, if a numerical variable has less than a certain threshold (like 10 unique numbers), it might be considered an indicator variable. Write a code that looks each variable in mtcar (datasets) and identifies each indicator variable and convert it to a factor variable in a loop.

str(mtcars)

## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...