7 * 7[1] 49
Run these first in the console:
library(tidyverse)
library(dplyr)
library(magrittr)
I created this html document using Quarto. Quarto is the latest version of RMarkdown, and allows you to “knit” together content and executable R code into a finished document (html, pdf, Word). You can see your syntax, the results, and your writeup all together. It’s great for creating educational material, but it also supports the idea of reproducable research. Gone are the days when you can’t remember how you recoded your data, or which syntax file you used. Gone too are the days when you have to copy over all your results into a new table in Word for your academic paper. Everything you need is in the document, and you can simply hide stuff (usually syntax) that you don’t want people to see. But it’s still there for a year from now when you can’t recall what you did. Students can use Quarto to take notes! https://quarto.org
R is the language. R Studio is the Integrated Development Environment (IDE) where you write and run the code. Other examples of IDE are “Visual Studio,” “Brackets,” and “Atom.” IDEs usually have a file viewer, autocomplete, error messages, and “visual” editing for formatted output (that means instead of typing <bold>Hello World</bold>, you simply type Hello World and click on the bold button. Visual editing is what you’re probably always done. The name is for old people like me, who used to have to do things the hard way. Speaking of, in theory you could write your code in a simple text editor, but even I wouldn’t do that anymore. In addition to R, you can also write code in Python, C++, and SQL in R Studio. That’s confusing, so pretty soon they’re going to change the name of R Studio to Posit Studio. The company that created R Studio is called Posit. They’re already moving in the direction because the cloud version of R Studio is now called “Posit Cloud.” Again, this is because you can program in lots of other language. I’m using Posit Cloud to create this document in Quarto demonstrating R. I know. It’s a lot.
Here’s what the IDE looks like (again, it’s called “Posit Cloud” if you use it online and “R Studio” if you’re using the desktop version).
There are three areas. You generally write your code in the script window. Your variables, output, packages (code written by other people to make your life easier) and also your file browser can all be found in the viewer area, and people who want to kick it old school and write one line of code at a time can use the console window. The console window will also echo everything that you run in the syntax window, and any warnings or errors will show up there too. Most of the time, you can ignore that stuff. But it’s also where the output shows up, so you can’t completely ignore the console window. Note that the console output includes line numbers in brackets, i.e., [1]. So, let’s say we typed 1 + 1 in the script window and then clicked on RUN. The gray box shows your code, and below that is what you would see in the console window.
7 * 7[1] 49
Sometimes your output will have multiple lines and then you may see [2], [3], [4], etc. at the beginning of each line.
R comes with a large number of built in data sets to get you started. Here is a list of some of the most popular:
airquality - New York Air Quality Measurements
AirPassengers - Monthly Airline Passenger Numbers 1949-1960
mtcars - Motor Trend Car Road Tests
iris - Edgar Anderson’s Iris (flowers) Data
If you want to learn about other built-in datasets, go here: The R Datasets Package.
To view one of these data sets, simply type (into script window and click run) (or console and hit return) print(name), replacing “name” with the name of the dataset, like mtcars:
print(mtcars) mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
R data must be in the form of a data frame. This is like a (hidden) excel spreadsheet with columns of data, one per variable. The built-in data sets are all data frames, and graphing in R requires data frames. To create one by hand we must enter the data for each column in the form of vectors. Like this:
NameOfVector <- c(1, 2, 3, 4)
The little “c” is what defines it as a vector, and you can think of it asstanding for “column.” Use quotes around your data if the vector is supposed to hold text.
Name <- c("Jon", "Bill", "Maria", "Ben", "Tina")
Age <- c(23, 41, 32, 58, 26)
df <- data.frame(Name, Age)Often people name their dataframes “df”, but you could call it “George”
Name <- c("Jon", "Bill", "Maria", "Ben", "Tina")
Age <- c(23, 41, 32, 58, 26)
George <- data.frame(Name, Age)
print(George) Name Age
1 Jon 23
2 Bill 41
3 Maria 32
4 Ben 58
5 Tina 26
You can also do it this way:
TheWeekend <- data.frame(Name = c("Jon", "Bill", "Maria", "Ben", "Tina"),
Age = c(23, 41, 32, 58, 26)
)
print (TheWeekend) Name Age
1 Jon 23
2 Bill 41
3 Maria 32
4 Ben 58
5 Tina 26
Let’s look at mtcars again. . . Do you see the column for mpg?
print(mtcars) mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
If we try to print mpg, it won’t work.
print (mpg)
print (mtcars$mpg)
print(mtcars$mpg) [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4
Okay, let’s plot some data, then run some correlations, and then a t-test with this car data. First let’s plot data. R has pretty simple graphics package built in. A package is code that someone else wrote. It’s like running ANOVA in SPSS. There is a ton of code required to run an ANOVA, but all you have to do is use the key word and enter some variables and a few other options. The beauty of R is that it’s open source, and anybody can write a package. The best ones are (eventually) built-in to R, but you still have to enable them. To see if you already have a package, go to packages tab in the Viewer window. Any package that’s listed you have, but if the little box to the left isn’t checked, then it’s not yet in your active library. Either check the box, or type this code at the top of your script:
library(name_of_package)
Let’s use this:
plot(mtcars$wt, mtcars$mpg)
plot(mtcars$wt, mtcars$mpg)Not bad, but the base graphics package is pretty “basic.” Let’s use the famous ggplot2 graphics package. I checked my package window and it’s not even installed. As it turns out, ggplot2 and a few other packages are bundled together in a very popular package called tidyverse. I’ll type this in my script to install tidyverse and all the other packages that come with it (including ggplot):
install.packages("tidyverse")
This may take a few minutes, and you’ll see a LOT of scary messages in red in the Console. That’s okay. Eventually it’ll tell you the the “downloaded source packages are in” . . . well, wherever it puts those things. Now, when you look at your packages window you should see ggplot2
Now, you still have to load it into the library. You can check the box, or type this into your script (I recommend not checking the box, and doing everything via script):
library(ggplot2)
As a reminder, you can type this in the script window or in the console window. It’ll work either way, but it’s better to keep ALL of your code in the script. And it also makes it eaiser to share your script with someone who might not have all of your packages already installed on their computer.
Okay, now that we have ggplot2, let’s plot a simple scatterplot. Scroll back up to the printout of the data for mtcars, and you’ll see a column for mpg and hp (horsepower). I’ll bet those are inversely related:
library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point()That’s much nicer! And if we had two series, we could color code those.
Note that even though it’s called ggplot2, we only have to type ggplot. The 2 refers to the version, but everyone uses v2, so R knows what you mean.
Oops! I wanted mpg x horsepower, didn’t I? Okay, let’s do that one:
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point()Notice that this time, I didn’t need to include the library(ggplot2) because ggplot2 is already loaded into memory.
Regardless, the relationship between mpg and horsepower is the same as between mpg and wt. So, hp and wt must be (positively) related to each other. Let’s check:
ggplot(data=mtcars,mapping=aes(x=wt,y=hp)) + geom_point()It’s a little messy, but that’s definitely a positive relationship!
Let’s see where different cylinder engines show up in that last scatterplot. This is the same as creating a series in Excel. We simply add an extra “aesthetic” called col and map it to the variable for the number of cylinders, cyl.
ggplot(data=mtcars,mapping=aes(x=wt,y=hp, col=cyl)) + geom_point()Hmm… that’s not what I expected. It’s using one color and a gradient, making it hard to tell the difference between 4,5,6, 7, and 8 cylinders (there’s a 7 cylinder car??). The problem is that R automtically color codes variables it thinks are numerical using a gradient. We want it to think of the different cylinder counts as categorical, not numerical (like coding in an ANOVA, where Group = 1, 2, or 3 doesn’t mean that Group 2 has twice as much . . . stuff. . . as Group 1. To do this, we need to turn cylinder into a factor. Just like in a factorial ANOVA, the values aren’t treated as numbers, but simply categories. We use this code to do it:
mtcars$cyl <- as.factor(mtcars$cyl)
What does the $ mean? Whenever you refer to a variable, R needs to know from what dataset (because you might have multiple datasets open). In the function to create a scatterplot we did that by saying data = mtcars. If we aren’t using ggplot then we have to refer to mtcars by using the $.
What does the <- mean? That’s the same as “equals.” We’re saying turn the variable cylinders from the dataframe mtcars into a factor.
mtcars$cyl <- as.factor(mtcars$cyl)
ggplot(data=mtcars,mapping=aes(x=wt,y=hp, col=cyl)) + geom_point()mpg <- mtcars$mpg
print(mtcars) mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
print(mpg) [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4
Make sure you have the following R packages:
tidyverse for data manipulation and visualization
ggpubr for creating easily publication ready plots
rstatix provides pipe-friendly R functions for easy statistical analyses
datarium: contains required data sets for this chapter Load required R packages:
Get them this way:
{r eval = FALSE, echo=TRUE} install.packages(“tidyverse”) install.packages(“ggpubr”) install.packages(“rstatix”) install.packages(“datarium”)}
Once they’re installed, you don’t have to do this again. You can find a list of all installed packages on the right side of R Studio:

But even if they’re installed, you do still have to load them into memory every time so that the rest of your code works. Load them using the “library” command, like this:
{r include=FALSE} library(tidyverse) library(ggpubr) library(rstatix)}
Alternatively, you can skip the code and simply check the box next to each package that you want to load into memory. That is more likely to cause errors, and will cause problems if you share your code with someone else.

Key R functions: anova_test() [rstatix package], wrapper around the function car::Anova().
We’ll use the jobsatisfaction dataset [datarium package], which contains the job satisfaction score organized by gender and education levels.
In this study, a research wants to evaluate if there is a significant two-way interaction between gender and education_level on explaining the job satisfaction score. An interaction effect occurs when the effect of one independent variable on an outcome variable depends on the level of the other independent variables. If an interaction effect does not exist, main effects could be reported.
Load the data and inspect one random row by groups:
{r} library(datarium) library(dplyr) set.seed(123) data(“jobsatisfaction”, package = “datarium”)}
In this example, the effect of “education_level” is our focal variable, that is our primary concern. It is thought that the effect of “education_level” will depend on one other factor, “gender”, which are called a moderator variable.