Base R

Getting started with RStudio

Introduction

R Studio is a powerful tool when working with data. You can create and document your workflow, clean, tidy and visualize your data, wrangle with it and run simple and highly complex analyses.

At this point you should have already:

  1. Either downloaded R and R-Studio or be accessing these through the app store on a university PC.

  2. Set up a folder for your workflow.

  3. Created an R-Studio project saved in your folder.

The aim of this workbook is to get you comfortable with some of the basics for using R so you can create or read in data sets, produce some summary statistics and plot data.

PART 1: Getting started: exploring “base R”

Essentially R is a programming language where you ask it to do something and (presuming you give the correct instruction) R does it. For example if we ask R to calculate 1 + 1 it will give us the answer 2 or 2 * 2 we will get 4- but if we say 2 x 2 we will get an error:

1 + 1
[1] 2

You can add options to executable code like this

[1] 4

You can also give data names using equals = or most commonly leftward assignment <- or the less common rightward assignment ->

a = 2 * 2

b <- 1 + 1

a * b -> c

c
[1] 8

Creating a data frame

Using the same concept we can create strings of data and bind them together into a “data frame” - a table of data that we can explore. For example, imaging we have 10 participants who have completed a strength training intervention and we ask all of them for a rating of perceived excretion afterwards.

rpe <- c(7, 7.5, 3, 4, 6, 9, 8, 6.5, 5.5, 8)
participant <- c("P1", "P2", "P3", "P4", "P5", "P6", "P7", "P8","P9", "P10" )

rpe
 [1] 7.0 7.5 3.0 4.0 6.0 9.0 8.0 6.5 5.5 8.0

Note how we can now write “rpe” and the values appear. But we probably want to bind these together with our participant data. To do this we use the function data.frame() Here I have called the data frame “rpe_data”

rpe_data <- data.frame(participant, rpe)
rpe_data
   participant rpe
1           P1 7.0
2           P2 7.5
3           P3 3.0
4           P4 4.0
5           P5 6.0
6           P6 9.0
7           P7 8.0
8           P8 6.5
9           P9 5.5
10         P10 8.0

Quick summary statstics

If we want we can quickly derive some summary statistics using functions such as mean() sd() or median() inside the brackets we are going to write the name of the data frame followed by the $ symbol and then the column name rpe_data$rpe

mean(rpe_data$rpe)
[1] 6.45
sd(rpe_data$rpe)
[1] 1.877498
median(rpe_data$rpe)
[1] 6.75

Basic visualisation

We can also create a simple plot of the data using the code barplot() the height of the plot (y asix) = rpe and the name argument (names.arg [x axis]).

barplot(height = rpe_data$rpe, names.arg = rpe_data$participant)

We can also add arguments to our bar plot to change the look, for example below I have changed the color and added a main title and x and y axis labels. I also added limits to the y axis so the full range of the RPE scale could be included.

barplot(height = rpe_data$rpe, names.arg = rpe_data$participant, 
        col = "blue", 
        main = "Participant's RPE", 
        xlab = "Participants", 
        ylab = "RPE Scores (AU)",  
        ylim = c(0, 10))

What if I want the plot to be ordered from the lowest to the highest RPE? I can reorder the data frame and then re-run the graph using the “order” function, the code is slightly trickier here with both [ ] and ( ):

rpe_data <- rpe_data[order(rpe_data$rpe), ]

barplot(height = rpe_data$rpe, names.arg = rpe_data$participant, col = "blue", main = "Participant's RPE", xlab = "Participants", ylab = "RPE Scores (AU)",  ylim = c(0, 10))

Finally we might be interested in the distribution of these data, so we could plot a histogram and or run a test for normality (e.g. the Shapiro-Wilk test). I have done a basic histogram (which looks a bit odd as we only have 10 participants!) but you can individualize it yourself now, maybe add labels and changes colors to the histogram in the same way you have done above.

hist(rpe_data$rpe)

shapiro.test(rpe_data$rpe)

    Shapiro-Wilk normality test

data:  rpe_data$rpe
W = 0.95291, p-value = 0.703

Additional Challenges

  1. Can you add to this data set?

    • Perhaps add an additional column with a relevant variable (e.g., volume load of the strength session) or strength training experience?

    • You could add this data in a way that might correlate with RPE?

  2. Can you then work out how to run a Pearson’s correlation in R-Studio?

  3. Maybe you could add RPE values for a different type of resistance training session?

  4. You could then see if you could work out how to run a t-test to see if there is a difference in RPE values between the sessions.