I pledge on my honor that I have not given or received any unauthorized assistance on this assignment/examination.

1 Part 1: Basic R manipulation.

  1. In the code segment below, assign the value 2 to variable x, and the value 6 to variable y. Then compute the sum ‘x+y’:
x<-2
y<-6
print(x+y)
## [1] 8
  1. Construct a list of all even numbers between 1 and 100, assign this to the variable ‘even’. Construct a list of 100 numbers, starting with 1, such that the difference between consecutive numbers is three, assign this to the variable ‘threes’. Construct a list of 100 equally spaced numbers between 0 and 2. Assign this to the variable x.
  even=seq(2,100,2)
  print(even)
##  [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38
## [20]  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72  74  76
## [39]  78  80  82  84  86  88  90  92  94  96  98 100
  threes=seq(1,300,3)
  print(threes)
##   [1]   1   4   7  10  13  16  19  22  25  28  31  34  37  40  43  46  49  52
##  [19]  55  58  61  64  67  70  73  76  79  82  85  88  91  94  97 100 103 106
##  [37] 109 112 115 118 121 124 127 130 133 136 139 142 145 148 151 154 157 160
##  [55] 163 166 169 172 175 178 181 184 187 190 193 196 199 202 205 208 211 214
##  [73] 217 220 223 226 229 232 235 238 241 244 247 250 253 256 259 262 265 268
##  [91] 271 274 277 280 283 286 289 292 295 298
  x=seq(0,2,length.out=100)
  print(x)
##   [1] 0.00000000 0.02020202 0.04040404 0.06060606 0.08080808 0.10101010
##   [7] 0.12121212 0.14141414 0.16161616 0.18181818 0.20202020 0.22222222
##  [13] 0.24242424 0.26262626 0.28282828 0.30303030 0.32323232 0.34343434
##  [19] 0.36363636 0.38383838 0.40404040 0.42424242 0.44444444 0.46464646
##  [25] 0.48484848 0.50505051 0.52525253 0.54545455 0.56565657 0.58585859
##  [31] 0.60606061 0.62626263 0.64646465 0.66666667 0.68686869 0.70707071
##  [37] 0.72727273 0.74747475 0.76767677 0.78787879 0.80808081 0.82828283
##  [43] 0.84848485 0.86868687 0.88888889 0.90909091 0.92929293 0.94949495
##  [49] 0.96969697 0.98989899 1.01010101 1.03030303 1.05050505 1.07070707
##  [55] 1.09090909 1.11111111 1.13131313 1.15151515 1.17171717 1.19191919
##  [61] 1.21212121 1.23232323 1.25252525 1.27272727 1.29292929 1.31313131
##  [67] 1.33333333 1.35353535 1.37373737 1.39393939 1.41414141 1.43434343
##  [73] 1.45454545 1.47474747 1.49494949 1.51515152 1.53535354 1.55555556
##  [79] 1.57575758 1.59595960 1.61616162 1.63636364 1.65656566 1.67676768
##  [85] 1.69696970 1.71717172 1.73737374 1.75757576 1.77777778 1.79797980
##  [91] 1.81818182 1.83838384 1.85858586 1.87878788 1.89898990 1.91919192
##  [97] 1.93939394 1.95959596 1.97979798 2.00000000
  1. Define a function that computes the square of the input value, call this function square:
x=5
square<-function(x){
  squared<-x*x
  return(squared)
}
print(square(x))
## [1] 25
  1. Plot trignometric functions sine and cosine for values of x between 0 and 10. Also plot the ‘square’ function for input values between 0 and 2.
plot(sin,0,10)

plot(cos,0,10)

x=1.4
square<-function(x){
  squared<-x*x
  return(squared)
}
plot(square(x))

2 Part 2: Working with data in R.

The dataset ‘iris’ is part of the in-built data repository of R, to load it, we use the ‘data’ command.

data("iris")

The standard way of storing data in R is a two-dimensional dataframe. The columns correspond to the the variables and the rows correspond to observations. To get a sense what the data frame looks like, we use the ‘head’ command. Apply the head command to the dataframe iris.

head(iris, n= 10)
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1           5.1         3.5          1.4         0.2  setosa
## 2           4.9         3.0          1.4         0.2  setosa
## 3           4.7         3.2          1.3         0.2  setosa
## 4           4.6         3.1          1.5         0.2  setosa
## 5           5.0         3.6          1.4         0.2  setosa
## 6           5.4         3.9          1.7         0.4  setosa
## 7           4.6         3.4          1.4         0.3  setosa
## 8           5.0         3.4          1.5         0.2  setosa
## 9           4.4         2.9          1.4         0.2  setosa
## 10          4.9         3.1          1.5         0.1  setosa

What are the variable/column names in this dataframe?

There is an easy way to find all the column names of a dataframe, using the ‘colnames’ command. Now, apply that command to the iris dataset.

colnames(iris)
## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"

We will now work with the data. The first step is to be able to isolate a column/variable. We do that using the ‘$’ operator. Assign to x, the values of the column “Petal.Length”.

x<-iris$Petal.Length

Note that x is now a list of numbers, which corresponds to the petal lenghts of the three different species of the flower iris.

Let us compute the summarry statistics for x, using the min, max, first, second, and third quartile, and the mean.

min(x)
## [1] 1
max(x)
## [1] 6.9
quartile_1<- quantile(x, probs = 0.25)
quartile_2<- quantile(x, probs = 0.50)
quartile_3<- quantile(x, probs = 0.75)
mean(x)
## [1] 3.758

There is an easy way to summarise these for the entire dataframe using the summary command. Use this command and compare the values that your manually computed above to the output of the ‘summary’ command.

summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 

We want to see if there is any relationship between the variables. To do so, it is important that we separate data based on the species. Use the filter command (from the library ‘dplyr’) to create three new dataframes callled “setosa”, “versicolor”, and “virginica”. Use the “head” command to view the first five rows of these data frames.

#install.packages("dplyr")
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
setosa<-filter(iris, Species == "setosa")
versicolor<-filter(iris, Species == "versicolor")
virginica<- filter(iris, Species == "virginica")
head(setosa)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
head(versicolor)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1          7.0         3.2          4.7         1.4 versicolor
## 2          6.4         3.2          4.5         1.5 versicolor
## 3          6.9         3.1          4.9         1.5 versicolor
## 4          5.5         2.3          4.0         1.3 versicolor
## 5          6.5         2.8          4.6         1.5 versicolor
## 6          5.7         2.8          4.5         1.3 versicolor
head(virginica)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 1          6.3         3.3          6.0         2.5 virginica
## 2          5.8         2.7          5.1         1.9 virginica
## 3          7.1         3.0          5.9         2.1 virginica
## 4          6.3         2.9          5.6         1.8 virginica
## 5          6.5         3.0          5.8         2.2 virginica
## 6          7.6         3.0          6.6         2.1 virginica