Lab 1

John Yi

R and RStudio

R

  • Free, open source software and programming language

  • Commonly used in psychology research for data analysis and visualization

  • We will be using R in this class to do most of our labs and homeworks.

  • If you have not done so already, please download R from https://cran.rstudio.com/

RStudio

  • Download link: https://posit.co/download/rstudio-desktop/

  • Integrated Development Environment (IDE) for R

    • Although R is powerful on its own, it may be a bit clunky to use without RStudio.
  • Also download the lab1_template.R and lab1.csv files from the course canvas

Rstudio

  • You should now see a screen that looks something like this. For now, let’s only focus on the console on the left side of the screen.

Variables, Expressions, Statements

Console + Expressions

  • You can type code one line of code and the console will evaluate it for you. For example:
1 + 1
[1] 2

(You can ignore the [1] for now)

  • You can do any almost any mathematical operation you want using symbols such as +, -, *, /, etc. Remember that order of operations apply, and you can parentheses () as well.
(1 + 5) / 2 * 3 - 4
[1] 5

Variables

  • Oftentimes you want to store some sort of data in a place so that you can use this later. In R, we do this using variables and the assignment operator <-.
x <- 1
  • You’ll notice that this time there is no output but something else changed on your screen (what is it?)

  • Now, whenever we write x it will take on the value of whatever it was assigned to.

x
[1] 1
x + 1
[1] 2
  • You can also assign x back to itself in order to update its value.
x <- x + 100
x
[1] 101

Vectors and Data Frames

Vectors

  • Oftentimes you want to store more than a single number in a variable. You can do so using c() around numbers surrounded by commas.
v <- c(1, 2, 3)
  • You can do operations on every number in the vector much like you can with variables.
v * 2
[1] 2 4 6
v + 1
[1] 2 3 4
  • You can also obtain a specific number from a vector by specifying its order using square brackets []. This number is known as its index.
v[1]
[1] 1

Data Frames

  • Most data that we will be working on in this class come in a tabular format. Excel Spreadsheets are a good example of this.
1 2 3
2 3 4
3 4 5
  • We can represent the above data in R using a data frame.
x <- c(1, 2, 3)
y <- c(2, 3, 4)
z <- c(3, 4, 5)
df <- data.frame(x, y, z)
df
  x y z
1 1 2 3
2 2 3 4
3 3 4 5
  • As you can see, we first created three vectors x, y, and z. Then, we made a data frame out of those three vectors.

  • Data frames can also have operations applied to every number

df * 3
  x  y  z
1 3  6  9
2 6  9 12
3 9 12 15
  • You can obtain a specific column by using the $ symbol and the name of the column
df$x
[1] 1 2 3

R scripts

Opening a .R file

  • On the top right of the RStudio, click File –> Open File, and select lab1_template.R.

  • You should see something that looks like this on the upper left panel:

    This is your script

Running a file

  • You may run the file one line at a time just like the console by pressing the Run button.

  • You can also run the entire file at once by clicking on the Source button.

    • Unlike with the console, this will not display the output automatically. However, variables are still updated as usual.

    • To display the output of a variable, use the print() function.

  • # denotes lines which will not be run with the rest of the file.

    • Is used to make notes or test out code.

    • Try deleting the # on the last line of code and clicking Source to see what happens.

Files and Directories

Files

  • Oftentimes you need to use data from files.

  • One common file type you will use are .csv files, which stands for comma separated values.

  • Try opening the lab1.csv file to see what it looks like.

Set Working Directory

  • To make the next part of this presentation work, you must change your current working directory.

    • It’s ok if you don’t know what this means just yet! Basically this is the location from which your code will run.
  • Go to Session –> Set Working Directory –> To Source File Location

    • You will have to repeat this step every time you open up a new session in RStudio

read.csv

  • You can use the data inside the lab1.csv file by using read.csv() :
df = read.csv("lab1.csv")
df
  x  y
1 1  1
2 2  3
3 3  5
4 4  7
5 5  9
6 6 11
7 7 13
8 8 15
9 9 17
  • This will automatically read your .csv data into a data frame, with the first row being the names of each column.

Filenames and Directories

  • To be usable by your program, your .csv file must be in the same folder as your .R file.

    • There are exceptions to this.
  • Try moving your lab1.csv file to a different location and run read.csv() again to see what happens.

  • You can navigate around your files by using the bottom right panel, which we have been ignoring up until now.

Libraries

Installing Libraries

  • Sometimes the base R doesn’t have all of the tools that we need.

  • Let’s install psych, which provides functions for psychological research.

# install.packages("psych")
  • You will only need to do this step once after which the library will be permanently installed on your machine.

Importing Libraries

  • After you install the library, you can have all of its functions become available to you by using library() .
library(psych)
  • Now you can use functions such as describe()
describe(df)
  vars n mean   sd median trimmed  mad min max range skew kurtosis   se
x    1 9    5 2.74      5       5 2.97   1   9     8    0     -1.6 0.91
y    2 9    9 5.48      9       9 5.93   1  17    16    0     -1.6 1.83
  • If you were to try to use describe() without either installing or importing the library, you will get an error.

Review

Download the body_image_data.csv from the course canvas. Then create an R program that:

  1. Reads it into a data frame
  2. Finds the mean and standard deviation of the independent variable