This “mock” homework assignment is to verify that you can run R markdown scripts and properly knit them. All homework assignments in this class will be R markdown scripts that you will have to knit into html format to submit for credit.
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. We will use Markdown exclusively for computer labs and homework assignments. For more details on using R Markdown see http://rmarkdown.rstudio.com. A handy cheatsheet can also be found here https://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf.
When you click the Knit button at the top of the Rstudio interface, a document will be generated that includes text (i.e., white sections of the R script), R code (gray-green sections of the script), and output from running your code. This combination provides maximum flexibility to explain what you’re doing using text chunks, to show how you did it using R code chunks, to share results of those analyses by printing the R output, and to interpret those results using additional text.
The gray section that follows is a code chunk. You can insert one anywhere you need by clicking the insert tab up above, or by using the hot-key combination of Ctrl-Alt-i. The first line of a code chunk will have “…{r}” and the last line will have “…”. You can name each code chunk by adding a short description after the r, for example I called this one {r setup}, but each name must be unique (beware if you are copying and modifying code chunks - which is a clever thing to do - but be sure to give it a new name before you try to knit).
The first line of code in this chunk sets default printing levels for all future code chunks. “echo=TRUE” means that your code will always be included in your knitted document, along with any output. For homework assignments, you should almost always use “echo = TRUE” so that we can evaluate your code, but if you don’t want to include the code (e.g., lots of ggplot code to generate a plot), you can use “echo = FALSE” to prevent the code from being printed.
The second line of code sets the output width to 80, which will fit on most monitors (including laptops). You can change this to a smaller value if you have problems viewing all output, or larger values to prevent line-wrap.
You can run each code chunk by clicking the green arrow in the upper right hand corner of the gray box.
knitr::opts_chunk$set(echo = TRUE)
options(width = 80) # custom-fit this for your own monitor
As you can see above in the Rmd file, I included three # symbols in front of the text. This increases the size of the text when we knit. The fewer # symbols you have, the larger the text. See examples below (text size will only be altered when you knit your final document):
Including an asterisk symbol in front of and behind text will italicize it. Using two asterisks in a row will make the text bold.
Before you knit, it is important that YOUR CODE IS ERROR FREE, COMPLETE, AND IN PROPER ORDER. If not, you will get error messages when you try to knit that are not easy to interpret. So before you knit a homework assignment, it’s good to start with a clean global environment and run every code chunk starting from the beginning. If there are no errors, you should be able to knit.
Here is a code chunk where I load any R packages that I want to use. “dplyr” is a great package for data manipulation. “ggplot2” is a great package for plotting data. Cheatsheets for dplyr and ggplot2 can be found here:
dplyr:https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf
ggplot2:https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf
You’ll notice in the Rmd file that I included additional information at the top of the code chunk about warnings and messages. When you load a package, you’ll often have a bunch of messages and warnings pop up (usually relating to the version of R that you’re running). The message=FALSE and warning=FALSE comments will suppress this information from your knitted document, which will help it look cleaner and more professional.
require(dplyr)
require(ggplot2)
require(cowsay)
This next code chunk loads data and does some basic summaries. “iris” is a default data set included in the R datasets. The “data()” function is only used if it is from an R package. If you’re using your own dataset, you often need to read in the data using other methods (we will go over this at some future time). The “head()” function shows your first six observations and “tail()” will display your last six observations. The “str()” tells you the type of data for each column (i.e., numeric for the first 4 variables and factor (categorical) for Species); “glimpse()” is a tidyverse version of “str()”.
These are handy functions to use when you first load data to make sure it was properly imported into R.
data("iris")
head(iris)
tail(iris)
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Now we’ll do some simple data manipulation to showcase the dplyr package, creating a new summary dataset called “slBYspecies” to examine differences in sepal length among the three species of iris. In the code chunk below, “%>%” are called pipes, and they are tidyverse shorthand for “do all of these things in sequence”. There are no NA observations in the iris data, so the “na.rm = TRUE” statements aren’t needed here, but most biological data have missing values, and this code is needed to prevent errors.
# summarize sepal length by species
# round sd to 3 significant digits
slBYspecies <- iris %>%
group_by(Species) %>%
summarise(meanSL = mean(Sepal.Length, na.rm = TRUE),
sdSL = round(sd(Sepal.Length, na.rm = TRUE), 3),
maxSL = max(Sepal.Length, na.rm = TRUE),
minSL = min(Sepal.Length, na.rm = TRUE),
cnt=length(Species)) %>%
# calculate the standard error as standard deviation divided by square root of sample size
mutate(seSL = sdSL/sqrt(cnt))
# print the manipulated data set
slBYspecies
Here are some basic plots with ggplot2. This is mainly to demonstrate how you can have figures embedded within your knitted document. However, it will also give you an intro to using ggplot2.
# boxplot of sepal length by species
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
geom_boxplot() +
xlab("Species") + ylab("Sepal Length (cm)") # provide custom axis labels
# scatterplot of sepal length vs. sepal width by species
ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length,
group = Species, color = Species, fill = Species)) +
geom_point(stat = "identity") +
geom_smooth(method = "lm") + # use linear model (lm) to provide line of best fit
xlab("Sepal width (cm)") + ylab("Sepal length (cm)") +
theme_classic()
There are many ways to get help! Google is your friend. If you’re having a coding issue, odds are someone else has had that same problem, and before you can fully type in your question you’ll find that Google autofills for you. You’ll also find that there are many ways to do the same thing. If you were in Biometry, you’re probably familar with the R package called “swirl”. This is a nice user friendly package that can teach you some basic R commands and statistical analyses while you use R.
# you may edit this message, as needed, to be more or less wholesome
# you may also choose a different animal if you don't like cats
# try typing "sort(names(animals))" in the console to find other options
say("I can fucking do this!", "cat")
##
## --------------
## I can fucking do this!
## --------------
## \
## \
## \
## |\___/|
## ==) ^Y^ (==
## \ ^ /
## )=*=(
## / \
## | |
## /| | | |\
## \| | |_|/\
## jgs //_// ___/
## \_)
##
It can be helpful to end your Markdown file with a record of what versions of R and R packages you are using. This is useful if you return back to your code after a day, a week, a month…or even a year or more later and find that your code doesn’t run properly anymore (perhaps because one or more packages have been modified). To keep a record of this information, you can use the sessionInfo function.
Once you’ve determined that you can run each code chunk in this file, try knitting the entire document by clicking the Knit icon near the top of the page. A drop down menu will give you the options of knitting to HTML, pdf, and Word. You can try all 3, but HTML is the easiest to work with in terms of formatting and is the preferred format for homework assignments.
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22000)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] cowsay_0.8.0 ggplot2_3.3.6 dplyr_1.0.10
##
## loaded via a namespace (and not attached):
## [1] highr_0.9 pillar_1.8.1 bslib_0.4.0 compiler_4.1.2
## [5] jquerylib_0.1.4 tools_4.1.2 rmsfact_0.0.3 digest_0.6.29
## [9] lattice_0.20-45 nlme_3.1-153 jsonlite_1.8.2 evaluate_0.17
## [13] lifecycle_1.0.3 tibble_3.1.8 gtable_0.3.1 mgcv_1.8-38
## [17] pkgconfig_2.0.3 rlang_1.0.6 Matrix_1.5-1 cli_3.4.1
## [21] DBI_1.1.3 rstudioapi_0.14 yaml_2.3.5 xfun_0.33
## [25] fastmap_1.1.0 withr_2.5.0 stringr_1.4.1 knitr_1.40
## [29] generics_0.1.3 vctrs_0.4.2 sass_0.4.2 grid_4.1.2
## [33] tidyselect_1.2.0 glue_1.6.2 R6_2.5.1 fansi_1.0.3
## [37] rmarkdown_2.17 farver_2.1.1 magrittr_2.0.3 splines_4.1.2
## [41] fortunes_1.5-4 scales_1.2.1 htmltools_0.5.3 assertthat_0.2.1
## [45] colorspace_2.0-3 labeling_0.4.2 utf8_1.2.2 stringi_1.7.6
## [49] munsell_0.5.0 cachem_1.0.6 crayon_1.5.2