R LESSONS 1

Complete the following using R. Copy and paste your results from the console into a document to submit:

  1. Create an object named a with the value 5.
a <- 5
  1. Create an object named b with the value 7.
b <- 7
  1. Add a to b.
a+b
## [1] 12
  1. Divide b by a.
b/a
## [1] 1.4
  1. Divide 7 by 5 without first assigning 7 and 5 to objects.
7/5
## [1] 1.4
  1. Divide 12 by 5 using integer division.
12%/%5
## [1] 2
  1. Use modulo division to find the remainder when dividing 18 by 5.
18%%5
## [1] 3
  1. Use the logical operations to check whether a and b are less than 5, equal to 5 and greater than 5.
a==5
## [1] TRUE
a<5
## [1] FALSE
a>5
## [1] FALSE
  1. Use help to find information on the the curve() function, and use that function to graph 3x2-2x+15 15 between -1 and 2, evaluating a total of 10 values. Copy and paste your plot into your assignment file using Export, Copy to Clipboard, Copy Plot. You can then paste the plot into your assignment file.
y = curve (3*x^2-2*x +15, from = -1, to = 2, n=10)

  1. Creating functions:
  1. Create a function called none() which has one parameter x, and simply returns the value of x sent to it. Run none(), sending the value of 3 to it.
none <- function(x) {
  y <- x
  return(y)
  }

none(3)
## [1] 3
  1. Create a function called mulp() which multiplies two numbers. Send 4 and 8 to mulp().
mulp <- function(x,y) {
  z= x*y
  return(z)
}

mulp(4,8)
## [1] 32
  1. Define two variables, a=5 and b=7. Send a and b to mulp().
a<-5; b<-7
mulp(a,b)
## [1] 35
  1. Put the output of none(), which itself has an input of 6, as one of the inputs of mulp(). Use the value of a from (c) as the other input.
a=5; b=7


mulp(none(6),mulp(5,7) )
## [1] 210

4.Access information on the rep function using help(rep) or ?rep, then use the rep function in R to create the following sequences:

  1. 2,2,2,6,6,6,10,10,10,14,14,14,18,18,18
# help(rep)
x <- c(2,6,10,14,18)
rep(x ,each=3 )
##  [1]  2  2  2  6  6  6 10 10 10 14 14 14 18 18 18
#OR

y <- seq(2,18,by=4)
rep(y ,each=3 )
##  [1]  2  2  2  6  6  6 10 10 10 14 14 14 18 18 18
  1. 2,6,10,14,18,2,6,10,14,18,2,6,10,14,18,2,6,10,14,18
rep(y ,times=3 )
##  [1]  2  6 10 14 18  2  6 10 14 18  2  6 10 14 18
  1. 2,2,2,6,6,6,10,10,10,14,14,14,18,18,18,2,2
rep(y ,each=3 , len=17)
##  [1]  2  2  2  6  6  6 10 10 10 14 14 14 18 18 18  2  2

5.Set v equal to the vector in 4(a), and complete the following: a. Use the sum function to find the sum of the values. b. Use help to find the mean function, then find the mean of the values in the vector. c. Calculate the square root of each value in v.

v= rep(y ,each=3 )
sum(v)
## [1] 150
mean(v)
## [1] 10
sqrt(v)
##  [1] 1.414214 1.414214 1.414214 2.449490 2.449490 2.449490 3.162278 3.162278
##  [9] 3.162278 3.741657 3.741657 3.741657 4.242641 4.242641 4.242641

6.Create a vector x which contains the values between 10 and 20 in increments of 0.1 a. Create a vector y which contains the logarithm of the values in the vector x. b. Plot the values of x versus y. Label the axes in the plot.

x<- seq(10,20,0.1)
y<-log(x, base=10)

plot(x,y)

  1. Extract the 30th value of the vector y.
  2. Extract the 30th through 35th values of the vector y.
  3. Extract the 50th and 60th values of the vector y.
y[30]
## [1] 1.11059
y[30:35]
## [1] 1.110590 1.113943 1.117271 1.120574 1.123852 1.127105
y[c(50,60)]
## [1] 1.173186 1.201397
  1. The following information gives the percentage of registered voters in 4 counties who favor a measure that bans smoking in public places: County A 72 County B 81 County C 52 County D 63
  1. Create a vector to store these percentages.
  2. Assign names to the elements of the vector in (a).
  3. Generate a barplot of the percentages.
percentages <- c(72,81,52,63)
Counties <- c('A', 'B', 'C', 'D')

names(Counties) <- percentages
Counties
##  72  81  52  63 
## "A" "B" "C" "D"
County <- c("County A"= 72, "County B"=81, "County C"=52, "County D"=63) 

barplot(County, 
        col =100,
        xlab = "Votes", 
        ylab = "Counties",
        main = "bans smoking in public places")

  1. Compute the mean and standard deviation of the percentages.
mean(percentages)
## [1] 67
sd(percentages)
## [1] 12.40967

8.Consider the following data set with responses to the questions (1) What is your favorite hot drink? (2) What is your favorite cookie? {coffee, tea, cocoa, tea, tea, chai, coffee, cocoa, coffee, coffee, coffee, chai, tea, tea} {chocolate chip, peanut butter, chocolate chip, oatmeal, oatmeal, shortbread, chocolate chip, sandwich cookie, oatmeal, oatmeal, chocolate chip, sandwich cookie, peanut butter, shortbread}

  1. Do not type this data in full. Instead, give each drink and each biscuit a number and make vectors of the numbers corresponding to the lists above (just type the numbers without quotes).

  2. Turn your vectors into factors with the right names and tabulate the levels.

drinks <- c(1,2,3,2,2,4,1,3,1,1,1,4,2,2)
cookies<- c(1,2,1,3,3,4,1,5,3,3,1,5,2,4)

f.drinks <- factor(drinks) # Factore is used for categorical data

levels(f.drinks)[1] <- "coffee"
levels(f.drinks)[2] <- "tea"
levels(f.drinks)[3] <- "cocoa"
levels(f.drinks)[4] <- "chai"


f.cookies <- factor(cookies)

levels(f.cookies)[1] <- "chocolate chip"
levels(f.cookies)[2] <- "peanut butter"
levels(f.cookies)[3] <- "oatmeal"
levels(f.cookies)[4] <- "shortbread"
levels(f.cookies)[5] <- "sandwich cookie"
  1. Use the command table() to cross tabulate the drink and cookie preferences.
table(f.drinks,f.cookies)
##         f.cookies
## f.drinks chocolate chip peanut butter oatmeal shortbread sandwich cookie
##   coffee              3             0       2          0               0
##   tea                 0             2       2          1               0
##   cocoa               1             0       0          0               1
##   chai                0             0       0          1               1
  1. Define a matrix M with three rows and three columns, the elements being the integers from 1 to 9 inclusive, such that the first row contains the integers 1 to 3 inclusive, the second 4 to 6 inclusive, and the third 7 to 9.
M <- matrix(1:9,ncol=3,nrow = 3, byrow = T)
  1. Assign names to the rows and columns of the matrix.
rownames(M)<- c("C", "O", "L") 
colnames(M)<- c("R", "O", "W")

M
##   R O W
## C 1 2 3
## O 4 5 6
## L 7 8 9
  1. Dereference the value on the second row, second column.
M[2,2]
## [1] 5
  1. Extract the whole second row.
M[2,]
## R O W 
## 4 5 6
  1. Extract the third column.
M[,3]
## C O L 
## 3 6 9
  1. Multiply the matrix by its transpose.
M %*% t(M)
##    C   O   L
## C 14  32  50
## O 32  77 122
## L 50 122 194
  1. Compute the determinant of the matrix.
det(M)
## [1] 6.661338e-16
  1. Give the diagonal elements of the matrix.
diag(M)
## [1] 1 5 9
  1. Define a new matrix, N, by multiplying the matrix in #3 by 1/3.
N <- M * 1/3
  1. Add M to N.
N+M
##          R         O  W
## C 1.333333  2.666667  4
## O 5.333333  6.666667  8
## L 9.333333 10.666667 12
  1. Round the values in the matrix from (a) to the nearest 100th.
round(N, 2)
##      R    O W
## C 0.33 0.67 1
## O 1.33 1.67 2
## L 2.33 2.67 3
  1. Round all values in the matrix from (a) down to the nearest integer at or below the value.
floor(N+M)
##   R  O  W
## C 1  2  4
## O 5  6  8
## L 9 10 12
  1. Create a matrix of ones of the same dimensions as N and multiply N by the new matrix.
NN <- matrix(1:1,nrow=3,ncol=3)
N %*% NN
##   [,1] [,2] [,3]
## C    2    2    2
## O    5    5    5
## L    8    8    8
  1. Define a matrix M with three columns, consisting of the following: Column 1 consists of the integers between 1 and 20, inclusive Column 2 consists of these integers: 1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4 Column 3 is determined by generating 20 values from the binomial distribution with 30 trials and probability of success equal to .4
col1 <- seq(1,20,1)
col2 <- rep(seq(1,4), each=5)
col3 <- rbinom(20,30,0.4)

M <- cbind(col1,col2,col3)
class(M)
## [1] "matrix" "array"
  1. Define an object which is a data frame having the same values as the matrix defined above.
x <- data.frame(col1,col2,col3)
class(x)
## [1] "data.frame"
  1. Extract the third variable of the second item.
x[[2]][3]
## [1] 1
  1. Extract the data for the second item.
x[[2]]
##  [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
  1. Extract the third variable.
x[3,]
##   col1 col2 col3
## 3    3    1   10
  1. Name the variables “trial”, “treatment” and “result”.
names(x) <- c("trial", "treatment", "result")
  1. The data set “airquality”, in R, contains a sequence of daily readings of New York’s air quality between May and September 1973 (Chambers et al. 1983). The data contains numeric readings for ozone, temperature, wind and solar radiation. There are also two ordinal variables indicating the day of the month, and the month.
  1. Produce scatterplots of each and every variable against each other using the plot() function.
plot(airquality)

  1. Produce scatterplots of the ozone measurement against temperature, and wind against temperature. Provide a main title for your plots and label the axes appropriately. Change the plotting character and size of the plotting character.
colnames(airquality)
## [1] "Ozone"   "Solar.R" "Wind"    "Temp"    "Month"   "Day"
plot (airquality$Ozone, airquality$Temp, xlab = "Ozone", 
      ylab = "Temperature", main = "Ozone vs.Temperature", pch=19)

plot (airquality$Wind, airquality$Temp, xlab = "Wind", 
      ylab = "Temperature", main = "Wind vs.Temperature", pch=6)

  1. Use the par() function to plot the scatter plots in (b) side by side. To do this, simply enter par(mfrow=c(1,2)) as a command in R before entering the two plot commands needed.
par(mfrow=c(1,2))
plot (airquality$Ozone, airquality$Temp, xlab = "Ozone", 
      ylab = "Temperature", main = "Ozone vs.Temperature", pch=19)

plot (airquality$Wind, airquality$Temp, xlab = "Wind", 
      ylab = "Temperature", main = "Wind vs.Temperature", pch=6)

  1. Construct boxplots for ozone and temperature against each month. Include a main title for each of these plots. Note that the par() function continues to work until you remove the settings.
par(mfrow=c(1,2))
boxplot(airquality$Temp ~ airquality$Month, 
        main= "Temp in Months",
        xlab = "Months", 
        ylab= "Temp Observations")