x <- 2
y <- 6
sum <- x + y
print(sum)
## [1] 8
even = seq(2, 100, 2)
print(even)
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
## [20] 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76
## [39] 78 80 82 84 86 88 90 92 94 96 98 100
threes = seq(1, 300, 3)
print(threes)
## [1] 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52
## [19] 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106
## [37] 109 112 115 118 121 124 127 130 133 136 139 142 145 148 151 154 157 160
## [55] 163 166 169 172 175 178 181 184 187 190 193 196 199 202 205 208 211 214
## [73] 217 220 223 226 229 232 235 238 241 244 247 250 253 256 259 262 265 268
## [91] 271 274 277 280 283 286 289 292 295 298
x = seq(0,2, length.out = 100)
print(x)
## [1] 0.00000000 0.02020202 0.04040404 0.06060606 0.08080808 0.10101010
## [7] 0.12121212 0.14141414 0.16161616 0.18181818 0.20202020 0.22222222
## [13] 0.24242424 0.26262626 0.28282828 0.30303030 0.32323232 0.34343434
## [19] 0.36363636 0.38383838 0.40404040 0.42424242 0.44444444 0.46464646
## [25] 0.48484848 0.50505051 0.52525253 0.54545455 0.56565657 0.58585859
## [31] 0.60606061 0.62626263 0.64646465 0.66666667 0.68686869 0.70707071
## [37] 0.72727273 0.74747475 0.76767677 0.78787879 0.80808081 0.82828283
## [43] 0.84848485 0.86868687 0.88888889 0.90909091 0.92929293 0.94949495
## [49] 0.96969697 0.98989899 1.01010101 1.03030303 1.05050505 1.07070707
## [55] 1.09090909 1.11111111 1.13131313 1.15151515 1.17171717 1.19191919
## [61] 1.21212121 1.23232323 1.25252525 1.27272727 1.29292929 1.31313131
## [67] 1.33333333 1.35353535 1.37373737 1.39393939 1.41414141 1.43434343
## [73] 1.45454545 1.47474747 1.49494949 1.51515152 1.53535354 1.55555556
## [79] 1.57575758 1.59595960 1.61616162 1.63636364 1.65656566 1.67676768
## [85] 1.69696970 1.71717172 1.73737374 1.75757576 1.77777778 1.79797980
## [91] 1.81818182 1.83838384 1.85858586 1.87878788 1.89898990 1.91919192
## [97] 1.93939394 1.95959596 1.97979798 2.00000000
x = 2
square <- function(x){
square2 <- x*x
return(square2)
}
print(square(x))
## [1] 4
plot(sin, 0, 10)
plot(cos, 0, 10)
x <- (0.2)
square <- function(x){
square2 <- x*x
return (square2)
}
print(square(x))
## [1] 0.04
The dataset ‘iris’ is part of the in-built data repository of R. Use the ‘data’ command to load the iris dataset.
data("iris")
The standard way of storing data in R is a two-dimensional dataframe. The columns correspond to the the variables and the rows correspond to observations. To get a sense what the data frame looks like, we use the ‘head’ command. Apply the head command to the dataframe iris.
head(iris, n = 5)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
What are the variable/column names in this dataframe?
There is an easy way to find all the column names of a dataframe, using the ‘colnames’ command. Now, apply that command to the iris dataset.
colnames(iris)
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
We will now work with the data. The first step is to be able to isolate a column/variable. We do that using the ‘$’ operator. Assign to x, the values of the column “Petal.Length”.
x <- iris$Petal.Length
Note that x is now a list of numbers, which corresponds to the petal lenghts of the three different species of the flower iris.
Let us compute the summarry statistics for x, using the min, max, first, second, and third quartile, and the mean.
min(x)
## [1] 1
max(x)
## [1] 6.9
firstQ <- quantile(x, probes = 0.25)
secondQ <- quantile(x, probes = 0.50)
thirdQ <- quantile(x, probes = 0.75)
mean(x)
## [1] 3.758
There is an easy way to summarise these for the entire dataframe using the summary command. Use this command and compare the values that your manually computed above to the output of the ‘summary’ command.
summary(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
We want to see if there is any relationship between the variables. To do so, it is important that we separate data based on the species. Use the filter command (from the library ‘dplyr’) to create three new dataframes callled “setosa”, “versicolor”, and “virginica”. Use the “head” command to view the first five rows of these data frames.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
setosa <- filter(iris, Species == "setosa")
versicolor <- filter(iris, Species == "versicolor")
virginica <- filter(iris, Species == "virginica")
head(setosa)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
head(versicolor)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 7.0 3.2 4.7 1.4 versicolor
## 2 6.4 3.2 4.5 1.5 versicolor
## 3 6.9 3.1 4.9 1.5 versicolor
## 4 5.5 2.3 4.0 1.3 versicolor
## 5 6.5 2.8 4.6 1.5 versicolor
## 6 5.7 2.8 4.5 1.3 versicolor
head(virginica)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 6.3 3.3 6.0 2.5 virginica
## 2 5.8 2.7 5.1 1.9 virginica
## 3 7.1 3.0 5.9 2.1 virginica
## 4 6.3 2.9 5.6 1.8 virginica
## 5 6.5 3.0 5.8 2.2 virginica
## 6 7.6 3.0 6.6 2.1 virginica
Consider the following sample data:
samp_data <- c(1.54,0.56,1.67,1.54,1.30,1.15,1.59,-0.22,1.01,1.03,1.02,1.77,1.13,0.15,1.02,
0.27,1.40,0.98,1.12,1.12,0.83,0.79,1.29,0.35,1.24,1.35,0.82,1.63,1.34,0.96,
1.37,0.14,1.43,2.00,1.13,1.39,1.18,0.99,0.65,0.88,1.11,1.63,1.53,0.91,0.02,
0.84,1.54,1.21,1.56,1.02,0.87,0.63,1.68,0.71,1.43,0.42,0.80,1.40,1.45,1.34,
1.57,0.98,0.99,1.04,1.15,0.47,0.82,0.89,0.78,1.02,1.56,1.67,1.32,1.23,0.46,
-0.06,0.90,0.94,1.60,0.52,1.04,1.79,0.66,0.76,1.47,0.68,1.38,0.35,1.30,1.30,
0.89,1.62,0.66,0.97,1.27,1.03,1.12,0.48,0.97,0.77,3.00,3.30,2.73,3.21,3.92,
3.33,2.25,2.58,3.45,2.86,3.01,3.27,2.65,3.48,3.18,2.01,3.42,2.13,3.15,2.77,
2.98,3.18,3.20,2.96,2.79,3.15,3.42,2.16,3.41,3.31,3.52,3.28,3.90,3.25,2.91,
2.98,3.25,3.02,2.52,2.97,2.31,3.30,2.50,3.19,2.46,2.82,2.83,2.48,2.55,3.18,
3.56,3.69,3.04,2.91,2.94,2.81,2.93,3.11,2.28,2.63,3.37,3.49,3.08,3.49,2.49,
2.89,3.15,3.21,2.79,3.46,3.56,3.33,3.15,2.18,3.08,2.24,4.31,3.23,3.35,3.18,
3.08,3.79,2.44,3.04,2.36,2.23,3.55,2.70,3.01,3.48,3.68,3.37,3.01,3.18,2.13,
3.39,3.37,2.86,3.34,3.16,2.54,3.58,3.63,2.43,3.94,2.80,2.41,2.25,2.87,3.17,
2.07,2.63,2.99,3.21,2.57,2.93,3.32,2.49,1.89,2.98,2.83,3.80,2.92,2.90,3.00,
3.10,2.41,2.66,2.77,2.69,2.99,2.50,3.21,2.98,3.45,3.83,3.00,2.54,3.86,2.80,
2.83,2.97,2.36,2.88,3.05,2.89,1.71,3.64,2.49,3.45,3.41,3.01,2.57,3.87,3.23,
2.57,2.56,2.85,2.83,3.19,3.15,3.17,2.90,3.29,3.46,2.43,3.04,2.81,3.28,2.82,
3.00,3.78,3.16,3.96,4.03,3.57,3.95,3.20,2.41,2.58,2.79,3.23,2.77,2.97,3.34,
3.13,3.78,2.70,3.14,3.25,2.26,3.99,3.10,3.76,2.92,2.69,3.58,3.11,3.59,2.74,
3.82,2.07,2.81,1.74,3.45,3.38,3.74,2.79,2.71,2.83,2.74,1.58,2.94,3.53,3.01,
2.21,2.61,2.95,2.58,3.92,2.68,2.85,3.06,2.86,3.43,3.16,3.02,2.29,3.74,3.49,
4.04,1.66,2.98,1.70,3.80,3.47,3.26,2.55,2.92,3.46,3.28,3.31,2.32,2.78,4.08,
3.78,2.49,3.26,2.88,3.23,3.15,2.62,3.73,3.03,2.78,3.00,2.61,2.48,2.03,3.04,
3.64,3.43,3.27,2.41,3.60,2.75,2.94,3.23,3.22,2.81,2.72,3.62,2.94,3.17,3.11,
2.60,3.11,3.45,2.37,3.07,3.09,3.90,2.86,3.23,3.50,2.84,2.17,2.58,3.57,2.95,
2.56,2.42,3.38,3.28,3.19,2.77,3.27,4.01,3.65,3.39)
Use the ‘hist’ function to visualize samp_data. Furthermore, use different values of for the “breaks” parameter, which species the number of bins, to explore the shape of the data.
hist(samp_data, breaks = 10)
What are some good values for the “breaks” parameter? Why?
Is the sample mean a good measure of location for this dataset? Why?
Now use the “stem” function to construct a stem-leaf plot for this data set.
#Some good values for the "breaks" are 5 or 10 because I feel like it gives us a better shape of the graph the larger the breaks are
#I would say yes because
stem(samp_data)
##
## The decimal point is 1 digit(s) to the left of the |
##
## -2 | 2
## -0 | 6
## 0 | 245
## 2 | 755
## 4 | 267826
## 6 | 3566816789
## 8 | 0223478990146778899
## 10 | 122223344122233558
## 12 | 134790002445789
## 14 | 003357344466789
## 16 | 0233677801479
## 18 | 9
## 20 | 0137733678
## 22 | 1345568912667
## 24 | 1111233468899990024455667778888
## 26 | 01123356899001234457777889999
## 28 | 001111223333345566667889900112223344445567778888899
## 30 | 0000011111223444456788890011113455555566677788888999
## 32 | 0011112333333555667778888900112334457778899
## 34 | 1122335555566678899902356677889
## 36 | 0234458934468889
## 38 | 00236700224569
## 40 | 1348
## 42 | 1
Do you notice anything odd? (It might help if you compute the Stem plot by hand for the first few values for this dataset) Your Answer:no
The default “scale” parameter is 1. Plot the two stem plots when the scale parameter is set to 2 and 0.5 respectively.
stem(samp_data, scale = 2)
##
## The decimal point is 1 digit(s) to the left of the |
##
## -2 | 2
## -1 |
## -0 | 6
## 0 | 2
## 1 | 45
## 2 | 7
## 3 | 55
## 4 | 2678
## 5 | 26
## 6 | 35668
## 7 | 16789
## 8 | 022347899
## 9 | 0146778899
## 10 | 122223344
## 11 | 122233558
## 12 | 13479
## 13 | 0002445789
## 14 | 003357
## 15 | 344466789
## 16 | 02336778
## 17 | 01479
## 18 | 9
## 19 |
## 20 | 01377
## 21 | 33678
## 22 | 13455689
## 23 | 12667
## 24 | 111123346889999
## 25 | 0024455667778888
## 26 | 01123356899
## 27 | 001234457777889999
## 28 | 0011112233333455666678899
## 29 | 00112223344445567778888899
## 30 | 000001111122344445678889
## 31 | 0011113455555566677788888999
## 32 | 00111123333335556677788889
## 33 | 00112334457778899
## 34 | 11223355555666788999
## 35 | 02356677889
## 36 | 02344589
## 37 | 34468889
## 38 | 002367
## 39 | 00224569
## 40 | 1348
## 41 |
## 42 |
## 43 | 1
stem(samp_data, scale = 0.5)
##
## The decimal point is at the |
##
## -0 | 21
## 0 | 0123444
## 0 | 555566777778888888889999999
## 1 | 00000000000000001111112222223333333344444444
## 1 | 5555556666666667777777889
## 2 | 0001111222222333333344444444444
## 2 | 55555555555566666666666666666777777777777888888888888888888888888899+10
## 3 | 00000000000000000000000000000011111111111111122222222222222222222222+34
## 3 | 5555555555555555566666666666667777778888888889999999
## 4 | 00000013