In today’s lab, the objectives are to
The R language is structured around a set of recognized objects. Objects include a wide range of things, such as functions, vectors, matrices, arrays, and lists.
A function can be thought of as a specialized tool; each function is a bundle of code designed to accomplish a specific task.
Objects such as vectors, data frames, matrices, arrays, and lists can be thought of as containers for storing different types of data.
Today, we will learn about functions, vectors, and data frames, the primary objects that we will work with throughout the semester.
A vector is a one-dimensional object that stores a collection of data points. Data points in a vector can be numbers, letters, words, etc, but they must all be of the same type.
For example, we can concatenate (i.e., combine) several different numbers into a vector that we will call x using the assignment operator <-. The assignment operator is like an arrow indicating what we’d like to store where.
In R coding, it is conventional to take the thing on the right hand side of the expression and store it in the thing on the left hand side, like this:
x <- c(3,5,12,38,456)
x
## [1] 3 5 12 38 456
However, the arrow can also point in the other direction.
c('hello','world') -> y
y
## [1] "hello" "world"
Either way, the arrow must point toward the object that you are creating.
In R, a function is a named set of code that performs a specific task. You can perform that task by typing the function and (where applicable) specifying criteria (arguments) necessary for completing the desired task. Those arguments are typed within the parentheses.
The function str() tells you the structure of a specified object. Here, structure refers to data type. The required argument here is the name of the object you want to know the structure of. Let’s try that on objects x and y, which we created above:
str(x)
## num [1:5] 3 5 12 38 456
str(y)
## chr [1:2] "hello" "world"
This tells us that the structure of x is num (numerical), and the structure of y is chr (character). Numerical objects are objects that contain only numbers. Character objects are interpreted by R as text, and you cannot perform mathematical operations on them.
In R, subsetting a data object means selecting only a subset of data stored in that object. Vectors (and all other container objects) can be subset using indexing. In R, indexing refers to the position of a data point within the object.
In the case of vectors, you can use indexing to look up a specific item in the list based on where it is in the vector (1st item, 2nd item, 3rd item, etc.). For example, we can look at the 1st item in vector x in this way:
x[1]
## [1] 3
Here, you’re telling R that you want to subset x by typing square brackets after x, and you’re telling R that you want to look at the first item by placing a 1 within those brackets.
For numeric data, R can be used as a calculator. It’s simple enough to just write the expression as you would in a calculator.
4 * 5 + 72
## [1] 92
But we can also work with numeric data that are stored in a vector. For example, we can add the first two values in x a couple different ways:
x[1] + x[2]
## [1] 8
sum(x[1:2])
## [1] 8
The first way (x[1] + x[2]) asks R to add items 1 and 2 together, while the second option (sum(x[1:2])) asks R to compute the sum of items 1 through 2. In this case, they produce the same result.
Notice that this doesn’t work for character (text) data:
y[1] + y[2]
## Error in y[1] + y[2]: non-numeric argument to binary operator
Objects like the vectors x and y can be appended (i.e., combined) and overwritten.
To combine vectors, use the combine function, c(). To overwrite any object, simply use the same object name but tell R to fill it with something else.
Let’s try this out. We will first look at x, then combine x and y, overwriting the old x, then look at x again to see how it has changed.
x
## [1] 3 5 12 38 456
x <- c(x,y)
x
## [1] "3" "5" "12" "38" "456" "hello" "world"
Notice that numbers contained in x are now enclosed in quotes. That is always a giveaway that R is treating those items as characters, even if they appear numerical to you. Use str() to see how R now recognizes the elements in this vector.
str(x)
## chr [1:7] "3" "5" "12" "38" "456" "hello" "world"
sum(x[1:2])
## Error in sum(x[1:2]): invalid 'type' (character) of argument
Vector x is now interpreted by R as a character-type object, and we can no longer add the first two items together, even though they look like numbers. Why?
As mentioned above, vectors can have numerical or character structure but not both. When you combine vectors with these two different structures, R converts the numeric data to characters because numeric data can be expressed as text, but text can’t be expressed as numbers. When this happens, R can no longer do any computation on those numbers. Hence the error message above.
Remember: any time that R produces an error message, if you don’t understand what went wrong, you can find out by copying and pasting the error message into a web search engine. Try pasting “invalid ‘type’ (character) of argument” into a search engine and see what comes up.
The list of objects in your global environment is the set of objects that you have created. When you exit RStudio, it will ask if you want to save your current workspace image. If you say “no,” the objects currently in your global environment will not be there when you next open RStudio, and you will have to create them again. If you say “yes,” all of those objects will still be in your environment next time that you open RStudio.
Note: You can create a project (File > New Project…), which stores a specific global environment, and each time that you switch between projects your global environment will be populated with the objects previously generated for that project. We do not require that you create RStudio projects for this course, but projects are one way to efficiently manage what’s in your global environment if you’re working on different assignments/tasks.
You will find that objects pile up quickly in your global environment, particularly when you are in the exploratory phase of coding. You can clear out all of the objects in your environment by clicking on the broom icon above the list of objects. To remove a specific object from the working environment, use the remove function rm()
rm(x)
x
## Error in eval(expr, envir, enclos): object 'x' not found
Here, we have removed x from our global environment. Notice that it is no longer in the list of objects, and when we try to call it up R lets us know that it doesn’t exist anymore (“object ‘x’ not found”).
In R, a package is like a tool kit… it is a set of functions with a common theme. To install a package, we use the install.packages() function. Let’s try this with a package called mosaic. We can’t do this in rmarkdown because it’s not in a code chunk, so copy and paste the following code into your console and hit “enter”:
install.packages(“mosaic”)
Notice that the name of the package must be in quotes. Every time that you start a new session in R, you must tell R which packages you want to use. There are a couple of ways you can do that, but let’s use the require() function:
require(mosaic)
## Warning in register(): Can't find generic `scale_type` in package ggplot2 to
## register S3 method.
Notice that when loading the package we no longer use quotes. That is because, once installed, each package becomes an object in the work environment.
But what does the mosaic package actually do? Here’s one way to find out:
?mosaic
This calls up a description in the help section of RStudio. You can use this method (question mark followed by the name) to look up information on any package or function that is in your environment, and in the case of functions the description will tell you which arguments are required, which are optional, and what they mean, and it will often include examples.
Many packages come with example data, usually so that users can test-run functions or do a vignette/walk through using example data. For example, the function data(name-of-data-set) will load a data set into the environment.
Each dataset usually includes some kind of narrative in the help associated with the dataset. You can use the same help query to find out more about datasets. Let’s try this with the dataset Births78, a sample dataset that comes with the mosaic package.
?Births78
Now bring the dataset into the global environment:
data(Births78)
We can visualize these data using the plot() function. The basic formula for this is plot(y~x,data), where y refers to the variable you want on the y-axis, x refers to the variable you want on the x-axis, and data is the dataset you are pulling these from. Here, x, y, and data are required, but we can add other arguments to make our plot look pretty:
plot(births~day_of_year,data=Births78,pch=16,col='blue')
Note: the argument “pch” simply defines what type of symbol you want to use in your plot, and “col” defines the color. For more on this, see: https://www.statmethods.net/advgraphs/parameters.html .
What do you think accounts for the “two” different patterns in the data?
Let’s plot the data again, but this time color code the dots by day of the week (col=wday). Here, we’re also adding a legend:
plot(births~day_of_year,data=Births78,pch=16,col=wday)
legend('top',horiz=T,inset=c(-0.1,-0.2),legend=levels(Births78$wday),
pch=16,col=unique(Births78$wday),xpd=T)
Don’t worry about the specifications here, as you won’t be required to learn this level of technical detail for the class.
Seeing the data color-coded in this way, what do you think the two “waves” of data points represent?
Often, datasets are stored as a special type of object in R called a data frame. A data frame is a set of vectors, each of which can be of a different type of data (num, chr, etc.), but all vectors must be of the same length. Each column represents a variable/attribute, and each row represents an observation/individual.
We will use this convention for managing data throughout the course. Let’s ask for the structure of the Births78 data set.
str(Births78)
## 'data.frame': 365 obs. of 8 variables:
## $ date : Date, format: "1978-01-01" "1978-01-02" ...
## $ births : int 7701 7527 8825 8859 9043 9208 8084 7611 9172 9089 ...
## $ wday : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tue"<..: 1 2 3 4 5 6 7 1 2 3 ...
## $ year : num 1978 1978 1978 1978 1978 ...
## $ month : num 1 1 1 1 1 1 1 1 1 1 ...
## $ day_of_year : int 1 2 3 4 5 6 7 8 9 10 ...
## $ day_of_month: int 1 2 3 4 5 6 7 8 9 10 ...
## $ day_of_week : num 1 2 3 4 5 6 7 1 2 3 ...
This time, str() told us not only the structure of the object (“data.frame”), but also the structure of each of the variables contained in that data frame. For example:
We have also been provided with information on how many rows/observations Births78 contains (365), as well as how many columns/variables (8).
We can preview the dataset by clicking on its name in the list of objects, which will pull it up in a separate tab within RStudio. Alternatively, we can simply type the name of the data set at the console prompt.
This isn’t always the easiest way to see the data, however. Often there are too many samples or too many variables to fit nicely on the console output.
We can also just peek at the first few samples or the last few samples using the functions head() and tail(), respectively.
head(Births78)
## date births wday year month day_of_year day_of_month day_of_week
## 1 1978-01-01 7701 Sun 1978 1 1 1 1
## 2 1978-01-02 7527 Mon 1978 1 2 2 2
## 3 1978-01-03 8825 Tue 1978 1 3 3 3
## 4 1978-01-04 8859 Wed 1978 1 4 4 4
## 5 1978-01-05 9043 Thu 1978 1 5 5 5
## 6 1978-01-06 9208 Fri 1978 1 6 6 6
tail(Births78)
## date births wday year month day_of_year day_of_month day_of_week
## 360 1978-12-26 8902 Tue 1978 12 360 26 3
## 361 1978-12-27 9907 Wed 1978 12 361 27 4
## 362 1978-12-28 10177 Thu 1978 12 362 28 5
## 363 1978-12-29 10401 Fri 1978 12 363 29 6
## 364 1978-12-30 8474 Sat 1978 12 364 30 7
## 365 1978-12-31 8028 Sun 1978 12 365 31 1
We can also use indexing to subset data frames. This is similar to what we did with vectors, but now we’re working in two dimensions. To subset a data frame, you indicate both column(s) and row(s) (dataframe[row,column]).
For example, maybe you just want to find out what the first date on record was. Date is the first variable, so you would look it up this way:
Births78[1,1]
## [1] "1978-01-01"
Let’s say you want to look at all data for the 50’th observation (i.e., row). To do so, specify row but leave column blank:
Births78[50,]
## date births wday year month day_of_year day_of_month day_of_week
## 50 1978-02-19 7695 Sun 1978 2 50 19 1
To look at all observations for a given variable, specify column but leave row blank:
Births78[,2]
## [1] 7701 7527 8825 8859 9043 9208 8084 7611 9172 9089 9210 9259
## [13] 9138 8299 7771 9458 9339 9120 9226 9305 7954 7560 9252 9416
## [25] 9090 9387 8983 7946 7527 9184 9152 9159 9218 9167 8065 7804
## [37] 9225 9328 9139 9247 9527 8144 7950 8966 9859 9285 9103 9238
## [49] 8167 7695 9021 9252 9335 9268 9552 8313 7881 9262 9705 9132
## [61] 9304 9431 8008 7791 9294 9573 9212 9218 9583 8144 7870 9022
## [73] 9525 9284 9327 9480 7965 7729 9135 9663 9307 9159 9157 7874
## [85] 7589 9100 9293 9195 8902 9318 8069 7691 9114 9439 8852 8969
## [97] 9077 7890 7445 8870 9023 8606 8724 9012 7527 7193 8702 9205
## [109] 8720 8582 8892 7787 7304 9017 9077 9019 8839 9047 7750 7135
## [121] 8900 9422 9051 8672 9101 7718 7388 8987 9307 9273 8903 8975
## [133] 7762 7382 9195 9200 8913 9044 9000 8064 7570 9089 9210 9196
## [145] 9180 9514 8005 7781 7780 9630 9600 9435 9303 7971 7399 9127
## [157] 9606 9328 9075 9362 8040 7581 9201 9264 9216 9175 9350 8233
## [169] 7777 9543 9672 9266 9405 9598 8122 8091 9348 9857 9701 9630
## [181] 10080 8209 7976 9284 8433 9675 10184 10241 8773 8102 9877 9852
## [193] 9705 9984 10438 8859 8416 10026 10357 10015 10386 10332 9062 8563
## [205] 9960 10349 10091 10192 10307 8677 8486 9890 10145 9824 10128 10051
## [217] 8738 8442 10206 10442 10142 10284 10162 8951 8532 10127 10502 10053
## [229] 10377 10355 8904 8477 9967 10229 9900 10152 10173 8782 8453 9998
## [241] 10387 10063 9849 10114 8580 8355 8481 10023 10703 10292 10371 9023
## [253] 8630 10154 10425 10149 10265 10265 9170 8711 10304 10711 10488 10499
## [265] 10349 8735 8647 10414 10498 10344 10175 10368 8648 8686 9927 10378
## [277] 9928 9949 10052 8605 8377 9765 10351 9873 9824 9755 8554 7873
## [289] 9531 9938 9388 9502 9625 8411 7936 9425 9576 9328 9501 9537
## [301] 8415 8155 9457 9333 9321 9245 9774 8246 8011 9507 9769 9501
## [313] 9609 9652 8352 7967 9606 10014 9536 9568 9835 8432 7868 9592
## [325] 9950 9548 7915 9037 8275 8068 9825 9814 9438 9396 9592 8528
## [337] 8196 9767 9881 9402 9480 9398 8335 8093 9686 10063 9509 9524
## [349] 9951 8507 8172 10196 10605 9998 9398 9008 7939 7964 7846 8902
## [361] 9907 10177 10401 8474 8028
Another way to look up specific values is to use variable names. A quick way to find out what those are is with the names() function:
names(Births78)
## [1] "date" "births" "wday" "year" "month"
## [6] "day_of_year" "day_of_month" "day_of_week"
You can then look up all values for a particular variable….
Births78$wday
## [1] Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed
## [19] Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun
## [37] Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu
## [55] Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon
## [73] Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri
## [91] Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue
## [109] Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat
## [127] Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed
## [145] Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun
## [163] Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu
## [181] Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon
## [199] Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri
## [217] Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue
## [235] Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat
## [253] Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed
## [271] Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun
## [289] Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu
## [307] Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon
## [325] Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri
## [343] Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue
## [361] Wed Thu Fri Sat Sun
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
… or use indexing to look up a specific value.
Births78$wday[10]
## [1] Tue
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
Here, we’re using indexing to subset a specific colum (wday). Because each column in a dataframe is considered a vector, we only need to specify position along that vector.
R is adept at data exchange, meaning that there are a lot of utilities for importing and exporting data in all sorts of formats. For example, see this list for importing from other statistical and statistical-like software formats:
https://www.statmethods.net/input/importingdata.html
and this one for exporting to those same formats:
https://www.statmethods.net/input/exportingdata.html
Today, we’ll get practice loading a .csv (comma separated values) file, and saving to a .csv file.
First, create a folder on your computer for this class.
On Canvas, download the classdata.2020.csv file and upload it into your newly created data folder. Make sure that your csv is in the same folder as your .Rmd file.
Now use the read.csv() function to import the file as a dataframe into R. R interprets letters literally. Any misspelling or case mismatches will result in an error. Make sure you type in exactly what that file name is inside the parentheses.
classdata.2020 <-read.csv("classdata.2020.csv")
To see if you successfully imported the file you can simple type the object name:
classdata.2020
## gender height wingspan shoe.size hair.color eye.color random.number bed.time
## 1 F 63 64.0 41.0 black brown 4 2330
## 2 M 67 70.0 39.0 brown brown 17 2400
## 3 F 66 69.0 41.0 brown brown 5 2300
## 4 M 75 78.0 45.0 brown green 21 2200
## 5 M 71 73.0 45.0 black black 24 2200
## 6 F 62 70.0 37.0 black black 1 2300
## 7 M 69 65.0 41.0 brown blue 22 2230
## 8 M 70 69.0 44.0 black brown 8 2230
## 9 F 65 67.0 40.0 blond blue 8 2300
## 10 F 65 63.0 39.0 blond blond 7 2000
## 11 M 69 70.0 42.0 black blue 21 600
## 12 F 70 70.0 41.0 brown brown 582 2200
## 13 F 72 72.0 40.0 brown hazel 9 2330
## 14 F 68 68.0 40.0 brown brown 4 2100
## 15 F 64 64.0 37.0 brown blue 24 2230
## 16 F 66 68.0 40.0 brown brown 7 2330
## 17 F 69 69.2 41.0 brown brown 477 2300
## 18 M 69 66.0 42.0 blond blue 23 2200
## 19 M 70 73.0 44.0 brown blue 10 200
## 20 M 71 73.0 47.5 brown green 49 2300
## 21 F 67 61.0 38.5 blond hazel 2 2200
## 22 M 75 74.5 46.0 brown hazel 59 2300
## 23 M 74 74.0 46.0 black brown 13 2200
## 24 F 68 54.0 40.0 brown brown 5 2300
## 25 F 63 53.5 39.0 black brown 9 100
## 26 M 70 64.0 44.5 brown brown 19 2230
## 27 F 61 61.5 37.0 brown brown 6 2030
## 28 M 69 69.0 42.0 black black 88 2300
## 29 F 60 59.0 36.0 brown brown 4 2300
## 30 F 60 59.0 36.0 brown brown 4 2300
## 31 M 67 69.0 42.0 black brown 15 2300
## 32 F 67 66.0 40.0 brown green 13 900
## 33 M 71 71.0 41.0 brown blue 17 2300
## 34 F 70 70.0 40.0 brown green 5 2330
## 35 F 67 67.5 41.0 black brown 7 2330
## wake.time hair.cut.cost dinner.drink recitation.number
## 1 500 50 water R1
## 2 830 28 water R1
## 3 900 60 water R1
## 4 600 35 water R1
## 5 600 0 water R1
## 6 600 15 water R1
## 7 645 29 water R1
## 8 800 15 water R1
## 9 700 50 water R1
## 10 600 200 water R1
## 11 620 30 Ice Tea R1
## 12 730 30 seltzer R1
## 13 700 25 water R1
## 14 600 25 water R1
## 15 620 30 water R1
## 16 700 50 water R1
## 17 600 70 water R2
## 18 700 30 water R2
## 19 800 0 prosecco R2
## 20 830 22 water R2
## 21 600 0 milk R2
## 22 900 30 water R2
## 23 700 20 water R2
## 24 500 40 milk R2
## 25 800 0 water R2
## 26 700 17 water R2
## 27 700 12 water R2
## 28 600 21 water R2
## 29 700 30 water R2
## 30 700 30 water R2
## 31 730 15 water R2
## 32 630 100 stella R2
## 33 630 16 water R2
## 34 530 40 milk R2
## 35 730 16 water R2
If you just want to take a peak at the file, you can use the function head() or tail(), which allows you to visualize the first or last six rows of the object, respectively.
head(classdata.2020)
## gender height wingspan shoe.size hair.color eye.color random.number bed.time
## 1 F 63 64 41 black brown 4 2330
## 2 M 67 70 39 brown brown 17 2400
## 3 F 66 69 41 brown brown 5 2300
## 4 M 75 78 45 brown green 21 2200
## 5 M 71 73 45 black black 24 2200
## 6 F 62 70 37 black black 1 2300
## wake.time hair.cut.cost dinner.drink recitation.number
## 1 500 50 water R1
## 2 830 28 water R1
## 3 900 60 water R1
## 4 600 35 water R1
## 5 600 0 water R1
## 6 600 15 water R1
tail(classdata.2020)
## gender height wingspan shoe.size hair.color eye.color random.number bed.time
## 30 F 60 59.0 36 brown brown 4 2300
## 31 M 67 69.0 42 black brown 15 2300
## 32 F 67 66.0 40 brown green 13 900
## 33 M 71 71.0 41 brown blue 17 2300
## 34 F 70 70.0 40 brown green 5 2330
## 35 F 67 67.5 41 black brown 7 2330
## wake.time hair.cut.cost dinner.drink recitation.number
## 30 700 30 water R2
## 31 730 15 water R2
## 32 630 100 stella R2
## 33 630 16 water R2
## 34 530 40 milk R2
## 35 730 16 water R2
The function str() gives you multiple details about the dataframe. It tells you how many rows (observations (obs.)) and how many columns (variables) are in your dataframe. R also tells you the nature of your data. For example, whether it is a character (meaning is it categorical), an integer (numerical), or sometimes a factor (another name for categorical).
str(classdata.2020)
## 'data.frame': 35 obs. of 12 variables:
## $ gender : chr "F" "M" "F" "M" ...
## $ height : int 63 67 66 75 71 62 69 70 65 65 ...
## $ wingspan : num 64 70 69 78 73 70 65 69 67 63 ...
## $ shoe.size : num 41 39 41 45 45 37 41 44 40 39 ...
## $ hair.color : chr "black" "brown" "brown" "brown" ...
## $ eye.color : chr "brown" "brown" "brown" "green" ...
## $ random.number : int 4 17 5 21 24 1 22 8 8 7 ...
## $ bed.time : int 2330 2400 2300 2200 2200 2300 2230 2230 2300 2000 ...
## $ wake.time : int 500 830 900 600 600 600 645 800 700 600 ...
## $ hair.cut.cost : int 50 28 60 35 0 15 29 15 50 200 ...
## $ dinner.drink : chr "water" "water" "water" "water" ...
## $ recitation.number: chr "R1" "R1" "R1" "R1" ...
The function names() tells you the names of the columns. These are your variables in your dataset
names(classdata.2020)
## [1] "gender" "height" "wingspan"
## [4] "shoe.size" "hair.color" "eye.color"
## [7] "random.number" "bed.time" "wake.time"
## [10] "hair.cut.cost" "dinner.drink" "recitation.number"
To view or specify a column in your dataframe, you can use the $ symbol.
classdata.2020$hair.color
## [1] "black" "brown" "brown" "brown" "black" "black" "brown" "black" "blond"
## [10] "blond" "black" "brown" "brown" "brown" "brown" "brown" "brown" "blond"
## [19] "brown" "brown" "blond" "brown" "black" "brown" "black" "brown" "brown"
## [28] "black" "brown" "brown" "black" "brown" "brown" "brown" "black"
Say if I know that hair.color is column 5, I can subset the number of that column.
classdata.2020[5]
## hair.color
## 1 black
## 2 brown
## 3 brown
## 4 brown
## 5 black
## 6 black
## 7 brown
## 8 black
## 9 blond
## 10 blond
## 11 black
## 12 brown
## 13 brown
## 14 brown
## 15 brown
## 16 brown
## 17 brown
## 18 blond
## 19 brown
## 20 brown
## 21 blond
## 22 brown
## 23 black
## 24 brown
## 25 black
## 26 brown
## 27 brown
## 28 black
## 29 brown
## 30 brown
## 31 black
## 32 brown
## 33 brown
## 34 brown
## 35 black
If I am interested in viewing or specifying a row in my dataframe, I can simply add a comma after the number:
classdata.2020[5,]
## gender height wingspan shoe.size hair.color eye.color random.number bed.time
## 5 M 71 73 45 black black 24 2200
## wake.time hair.cut.cost dinner.drink recitation.number
## 5 600 0 water R1
What if I want to see the distribution of hair cut costs for Biostats class of 2020? I can use the hist function to view that distribution.
hist(classdata.2020$hair.cut.cost)
Notice how the graph labels are not that intuitive. We are able to change the x and y axis labels and our plot title with something called arguments. Arguments are specifications that are used within functions and are often optional. In this example, we can use the xlab and ylab arguments within our hist() function to change our respective axis labels for our hair cut cost histogram and we can use the argument main to change our histogram title.
hist(classdata.2020$hair.cut.cost, xlab="Haircut cost in US Dollars ($)", ylab ="Frequency of haircut cost", main="Biostats haircut cost distribution in 2020")
Notice the changes? Pretty neat, huh? When changing the names of your plot title and axes, make sure that your text descriptor is in quotation marks, or else it will result in an error! If you are interested in viewing all the other available arguments in the hist() function, simply type ?hist into your console.
vector1 <- c(4,5,6,2,3,1,8)
vector1
## [1] 4 5 6 2 3 1 8
sum(vector1[1:7])
## [1] 29
The sum of vector1 is 29 which was found by using the sum function.
vector1[3]
## [1] 6
The command that brings up the thrid value of vector1 is vector1[3], which gives the number 6.
mean(vector1[1:7])
## [1] 4.142857
The mean of Vector1 is approximately 4.14.
max(vector1[1:7])
## [1] 8
The maximum value for Vector1 is 8.
min(vector1[1:7])
## [1] 1
The minimum value for Vector1 is 1.
str(classdata.2020)
## 'data.frame': 35 obs. of 12 variables:
## $ gender : chr "F" "M" "F" "M" ...
## $ height : int 63 67 66 75 71 62 69 70 65 65 ...
## $ wingspan : num 64 70 69 78 73 70 65 69 67 63 ...
## $ shoe.size : num 41 39 41 45 45 37 41 44 40 39 ...
## $ hair.color : chr "black" "brown" "brown" "brown" ...
## $ eye.color : chr "brown" "brown" "brown" "green" ...
## $ random.number : int 4 17 5 21 24 1 22 8 8 7 ...
## $ bed.time : int 2330 2400 2300 2200 2200 2300 2230 2230 2300 2000 ...
## $ wake.time : int 500 830 900 600 600 600 645 800 700 600 ...
## $ hair.cut.cost : int 50 28 60 35 0 15 29 15 50 200 ...
## $ dinner.drink : chr "water" "water" "water" "water" ...
## $ recitation.number: chr "R1" "R1" "R1" "R1" ...
hist(classdata.2020$wake.time)
Based on the histogram, it seems that a greater number of students tend to wake between 500 and 700 hours.This can be seen in the peaks in data around 530, 600, and 630. The histogram also shows that less people seem to wake around the 700 to 900 range.
hist(classdata.2020$wake.time, xlab="Wake Time in Hours (Hr)", ylab="Frequency of Wake Times")
classdata.2020[9]
## wake.time
## 1 500
## 2 830
## 3 900
## 4 600
## 5 600
## 6 600
## 7 645
## 8 800
## 9 700
## 10 600
## 11 620
## 12 730
## 13 700
## 14 600
## 15 620
## 16 700
## 17 600
## 18 700
## 19 800
## 20 830
## 21 600
## 22 900
## 23 700
## 24 500
## 25 800
## 26 700
## 27 700
## 28 600
## 29 700
## 30 700
## 31 730
## 32 630
## 33 630
## 34 530
## 35 730
HRdata <-read.csv("HRdata.csv")
boxplot(HR ~ Sex, data = HRdata)
The variables are the sex of the person in the class as well as their corresponding heart rate. Sex is a categorical and discrete, wheras heart rate is numerical and continuous.
It seems as though men generally have a lower resting heart rate than females. Male heart rates seem to trend around 60-70 BPM whereas female heart rate seems to trend around 70-80 BPM. There also seems to be a larger margin of variance around the female heart rate as opposed to male heart rate. Additionally, the range around male heart rate is much larger than in females.
Upload your knitted rmarkdown file. Pdf and html format are both acceptable, however, for pdfs, you will have to take extra steps. In order to knit to pdf, you will need to install additional software that helps to format documents. Follow the instructions below for your respective computer to be able to knit to pdf. otherwise, knitting directly to html will be the easiest option.
For Windows: You can install MikTex onto your computer before knitting to pdf. https://miktex.org/howto/install-miktex
For Mac: You can install MacTex: https://tug.org/mactex/mactex-download.html