1 Part 1: Basic R manipulation.

  1. In the code segment below, assign the value 2 to variable x, and the value 6 to variable y. Then compute the sum ‘x+y’:
x <- 2 
y <- 6
sum <- x + y
print(sum)
## [1] 8
  1. Construct a list of all even numbers between 1 and 100, assign this to the variable ‘even’. Construct a list of 100 numbers, starting with 1, such that the difference between consecutive numbers is three, assign this to the variable ‘threes’. Construct a list of 100 equally spaced numbers between 0 and 2. Assign this to the variable x.
even = seq(2, 100, 2)
print(even)
##  [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38
## [20]  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72  74  76
## [39]  78  80  82  84  86  88  90  92  94  96  98 100
threes = seq(1, 300, 3)
print(threes)
##   [1]   1   4   7  10  13  16  19  22  25  28  31  34  37  40  43  46  49  52
##  [19]  55  58  61  64  67  70  73  76  79  82  85  88  91  94  97 100 103 106
##  [37] 109 112 115 118 121 124 127 130 133 136 139 142 145 148 151 154 157 160
##  [55] 163 166 169 172 175 178 181 184 187 190 193 196 199 202 205 208 211 214
##  [73] 217 220 223 226 229 232 235 238 241 244 247 250 253 256 259 262 265 268
##  [91] 271 274 277 280 283 286 289 292 295 298
x = seq(0,2, length.out = 100)
print(x)
##   [1] 0.00000000 0.02020202 0.04040404 0.06060606 0.08080808 0.10101010
##   [7] 0.12121212 0.14141414 0.16161616 0.18181818 0.20202020 0.22222222
##  [13] 0.24242424 0.26262626 0.28282828 0.30303030 0.32323232 0.34343434
##  [19] 0.36363636 0.38383838 0.40404040 0.42424242 0.44444444 0.46464646
##  [25] 0.48484848 0.50505051 0.52525253 0.54545455 0.56565657 0.58585859
##  [31] 0.60606061 0.62626263 0.64646465 0.66666667 0.68686869 0.70707071
##  [37] 0.72727273 0.74747475 0.76767677 0.78787879 0.80808081 0.82828283
##  [43] 0.84848485 0.86868687 0.88888889 0.90909091 0.92929293 0.94949495
##  [49] 0.96969697 0.98989899 1.01010101 1.03030303 1.05050505 1.07070707
##  [55] 1.09090909 1.11111111 1.13131313 1.15151515 1.17171717 1.19191919
##  [61] 1.21212121 1.23232323 1.25252525 1.27272727 1.29292929 1.31313131
##  [67] 1.33333333 1.35353535 1.37373737 1.39393939 1.41414141 1.43434343
##  [73] 1.45454545 1.47474747 1.49494949 1.51515152 1.53535354 1.55555556
##  [79] 1.57575758 1.59595960 1.61616162 1.63636364 1.65656566 1.67676768
##  [85] 1.69696970 1.71717172 1.73737374 1.75757576 1.77777778 1.79797980
##  [91] 1.81818182 1.83838384 1.85858586 1.87878788 1.89898990 1.91919192
##  [97] 1.93939394 1.95959596 1.97979798 2.00000000
  1. Define a function that computes the square of the input value, call this function square:
x = 2
square <- function(x){
  square2 <- x*x
  return(square2)
  
}
print(square(x))
## [1] 4
  1. Plot trignometric functions sine and cosine for values of x between 0 and 10. Also plot the ‘square’ function for input values between 0 and 2.
plot(sin, 0, 10)

plot(cos, 0, 10)

x <- (0.2)
square <- function(x){
  square2 <- x*x
  return (square2)
}
print(square(x))
## [1] 0.04

2 Part 2: Working with data in R.

The dataset ‘iris’ is part of the in-built data repository of R. Use the ‘data’ command to load the iris dataset.

data("iris")

The standard way of storing data in R is a two-dimensional dataframe. The columns correspond to the the variables and the rows correspond to observations. To get a sense what the data frame looks like, we use the ‘head’ command. Apply the head command to the dataframe iris.

head(iris, n = 5)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa

What are the variable/column names in this dataframe?

There is an easy way to find all the column names of a dataframe, using the ‘colnames’ command. Now, apply that command to the iris dataset.

colnames(iris)
## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"

We will now work with the data. The first step is to be able to isolate a column/variable. We do that using the ‘$’ operator. Assign to x, the values of the column “Petal.Length”.

x <- iris$Petal.Length

Note that x is now a list of numbers, which corresponds to the petal lenghts of the three different species of the flower iris.

Let us compute the summarry statistics for x, using the min, max, first, second, and third quartile, and the mean.

min(x)
## [1] 1
max(x)
## [1] 6.9
firstQ <- quantile(x, probes = 0.25)
secondQ <- quantile(x, probes = 0.50)
thirdQ <- quantile(x, probes = 0.75)

mean(x)
## [1] 3.758

There is an easy way to summarise these for the entire dataframe using the summary command. Use this command and compare the values that your manually computed above to the output of the ‘summary’ command.

summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 

We want to see if there is any relationship between the variables. To do so, it is important that we separate data based on the species. Use the filter command (from the library ‘dplyr’) to create three new dataframes callled “setosa”, “versicolor”, and “virginica”. Use the “head” command to view the first five rows of these data frames.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
setosa <- filter(iris, Species == "setosa")
versicolor <- filter(iris, Species == "versicolor")
virginica <- filter(iris, Species == "virginica")

head(setosa)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
head(versicolor)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1          7.0         3.2          4.7         1.4 versicolor
## 2          6.4         3.2          4.5         1.5 versicolor
## 3          6.9         3.1          4.9         1.5 versicolor
## 4          5.5         2.3          4.0         1.3 versicolor
## 5          6.5         2.8          4.6         1.5 versicolor
## 6          5.7         2.8          4.5         1.3 versicolor
head(virginica)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 1          6.3         3.3          6.0         2.5 virginica
## 2          5.8         2.7          5.1         1.9 virginica
## 3          7.1         3.0          5.9         2.1 virginica
## 4          6.3         2.9          5.6         1.8 virginica
## 5          6.5         3.0          5.8         2.2 virginica
## 6          7.6         3.0          6.6         2.1 virginica

3 Data Visualization

Consider the following sample data:

samp_data <- c(1.54,0.56,1.67,1.54,1.30,1.15,1.59,-0.22,1.01,1.03,1.02,1.77,1.13,0.15,1.02,
               0.27,1.40,0.98,1.12,1.12,0.83,0.79,1.29,0.35,1.24,1.35,0.82,1.63,1.34,0.96,
               1.37,0.14,1.43,2.00,1.13,1.39,1.18,0.99,0.65,0.88,1.11,1.63,1.53,0.91,0.02,
               0.84,1.54,1.21,1.56,1.02,0.87,0.63,1.68,0.71,1.43,0.42,0.80,1.40,1.45,1.34,
               1.57,0.98,0.99,1.04,1.15,0.47,0.82,0.89,0.78,1.02,1.56,1.67,1.32,1.23,0.46,
               -0.06,0.90,0.94,1.60,0.52,1.04,1.79,0.66,0.76,1.47,0.68,1.38,0.35,1.30,1.30,
               0.89,1.62,0.66,0.97,1.27,1.03,1.12,0.48,0.97,0.77,3.00,3.30,2.73,3.21,3.92,
               3.33,2.25,2.58,3.45,2.86,3.01,3.27,2.65,3.48,3.18,2.01,3.42,2.13,3.15,2.77,
               2.98,3.18,3.20,2.96,2.79,3.15,3.42,2.16,3.41,3.31,3.52,3.28,3.90,3.25,2.91,
               2.98,3.25,3.02,2.52,2.97,2.31,3.30,2.50,3.19,2.46,2.82,2.83,2.48,2.55,3.18,
               3.56,3.69,3.04,2.91,2.94,2.81,2.93,3.11,2.28,2.63,3.37,3.49,3.08,3.49,2.49,
               2.89,3.15,3.21,2.79,3.46,3.56,3.33,3.15,2.18,3.08,2.24,4.31,3.23,3.35,3.18,
               3.08,3.79,2.44,3.04,2.36,2.23,3.55,2.70,3.01,3.48,3.68,3.37,3.01,3.18,2.13,
               3.39,3.37,2.86,3.34,3.16,2.54,3.58,3.63,2.43,3.94,2.80,2.41,2.25,2.87,3.17,
               2.07,2.63,2.99,3.21,2.57,2.93,3.32,2.49,1.89,2.98,2.83,3.80,2.92,2.90,3.00,
               3.10,2.41,2.66,2.77,2.69,2.99,2.50,3.21,2.98,3.45,3.83,3.00,2.54,3.86,2.80,
               2.83,2.97,2.36,2.88,3.05,2.89,1.71,3.64,2.49,3.45,3.41,3.01,2.57,3.87,3.23,
               2.57,2.56,2.85,2.83,3.19,3.15,3.17,2.90,3.29,3.46,2.43,3.04,2.81,3.28,2.82,
               3.00,3.78,3.16,3.96,4.03,3.57,3.95,3.20,2.41,2.58,2.79,3.23,2.77,2.97,3.34,
               3.13,3.78,2.70,3.14,3.25,2.26,3.99,3.10,3.76,2.92,2.69,3.58,3.11,3.59,2.74,
               3.82,2.07,2.81,1.74,3.45,3.38,3.74,2.79,2.71,2.83,2.74,1.58,2.94,3.53,3.01,
               2.21,2.61,2.95,2.58,3.92,2.68,2.85,3.06,2.86,3.43,3.16,3.02,2.29,3.74,3.49,
               4.04,1.66,2.98,1.70,3.80,3.47,3.26,2.55,2.92,3.46,3.28,3.31,2.32,2.78,4.08,
               3.78,2.49,3.26,2.88,3.23,3.15,2.62,3.73,3.03,2.78,3.00,2.61,2.48,2.03,3.04,
               3.64,3.43,3.27,2.41,3.60,2.75,2.94,3.23,3.22,2.81,2.72,3.62,2.94,3.17,3.11,
               2.60,3.11,3.45,2.37,3.07,3.09,3.90,2.86,3.23,3.50,2.84,2.17,2.58,3.57,2.95,
               2.56,2.42,3.38,3.28,3.19,2.77,3.27,4.01,3.65,3.39)

Use the ‘hist’ function to visualize samp_data. Furthermore, use different values of for the “breaks” parameter, which species the number of bins, to explore the shape of the data.

hist(samp_data, breaks = 10)

What are some good values for the “breaks” parameter? Why?

Is the sample mean a good measure of location for this dataset? Why?

Now use the “stem” function to construct a stem-leaf plot for this data set.

#Some good values for the "breaks" are 5 or 10 because I feel like it gives us a better shape of the graph the larger the breaks are

#I would say yes because 
stem(samp_data)
## 
##   The decimal point is 1 digit(s) to the left of the |
## 
##   -2 | 2
##   -0 | 6
##    0 | 245
##    2 | 755
##    4 | 267826
##    6 | 3566816789
##    8 | 0223478990146778899
##   10 | 122223344122233558
##   12 | 134790002445789
##   14 | 003357344466789
##   16 | 0233677801479
##   18 | 9
##   20 | 0137733678
##   22 | 1345568912667
##   24 | 1111233468899990024455667778888
##   26 | 01123356899001234457777889999
##   28 | 001111223333345566667889900112223344445567778888899
##   30 | 0000011111223444456788890011113455555566677788888999
##   32 | 0011112333333555667778888900112334457778899
##   34 | 1122335555566678899902356677889
##   36 | 0234458934468889
##   38 | 00236700224569
##   40 | 1348
##   42 | 1

Do you notice anything odd? (It might help if you compute the Stem plot by hand for the first few values for this dataset) Your Answer:no

The default “scale” parameter is 1. Plot the two stem plots when the scale parameter is set to 2 and 0.5 respectively.

stem(samp_data, scale = 2)
## 
##   The decimal point is 1 digit(s) to the left of the |
## 
##   -2 | 2
##   -1 | 
##   -0 | 6
##    0 | 2
##    1 | 45
##    2 | 7
##    3 | 55
##    4 | 2678
##    5 | 26
##    6 | 35668
##    7 | 16789
##    8 | 022347899
##    9 | 0146778899
##   10 | 122223344
##   11 | 122233558
##   12 | 13479
##   13 | 0002445789
##   14 | 003357
##   15 | 344466789
##   16 | 02336778
##   17 | 01479
##   18 | 9
##   19 | 
##   20 | 01377
##   21 | 33678
##   22 | 13455689
##   23 | 12667
##   24 | 111123346889999
##   25 | 0024455667778888
##   26 | 01123356899
##   27 | 001234457777889999
##   28 | 0011112233333455666678899
##   29 | 00112223344445567778888899
##   30 | 000001111122344445678889
##   31 | 0011113455555566677788888999
##   32 | 00111123333335556677788889
##   33 | 00112334457778899
##   34 | 11223355555666788999
##   35 | 02356677889
##   36 | 02344589
##   37 | 34468889
##   38 | 002367
##   39 | 00224569
##   40 | 1348
##   41 | 
##   42 | 
##   43 | 1
stem(samp_data, scale = 0.5)
## 
##   The decimal point is at the |
## 
##   -0 | 21
##   0 | 0123444
##   0 | 555566777778888888889999999
##   1 | 00000000000000001111112222223333333344444444
##   1 | 5555556666666667777777889
##   2 | 0001111222222333333344444444444
##   2 | 55555555555566666666666666666777777777777888888888888888888888888899+10
##   3 | 00000000000000000000000000000011111111111111122222222222222222222222+34
##   3 | 5555555555555555566666666666667777778888888889999999
##   4 | 00000013