Workshop 1 - Introduction to R

Downloading & Preparing R and RStudio

This first workshop aims to introduce you to R and RStudio. R is the programming language; RStudio is the IDE (Integrated Development Environment) that makes R easier to use. You must first download R and then Download RStudio (R is a requirement for RStudio to work).

To get started, click on the tab below entitled “Installing R and RStudio”

Installing R and RStudio

1. Installing R and RStudio on a University computer:
Go to AppsAnywhere and load RStudio (R will install automatically).

You will need to do this each time you use a university computer

2. To download R and RStudio to your own laptop:
Go to RStudio (click on this link) and download and install R and RStudio

3. Once installed:
Go to Tools > Global Options…

In Appearance, select font etc.
In General, set the default working directory to C:/Users/your KU ID number/OneDrive - Kingston University
Select Code and then select ‘Use native pipe operator’ and also select ‘soft-wrap R source files’
After you have made changes, you need to close and re-open RStudio

NB
When using RStudio on your own computer, these setiings will be saved for the next time. When using a University computer, you may need to do this each time you open RStudio.

RStudio looks like this

Click on the green ‘+’ symbol at the top left of the screen and select ‘R Script’. You will now see 4 areas of the RStudio page:

1. The Code Editor at the top left area where you will do most of your work and store your code. Here you can write R code and notes to yourself.
2. The area below it (bottom left) is the R Console, and is essentially like R without RStudio. It is where the R code that you type in the Editor above and its output appears when you run the code.
3. Top right is the Workspace environment where you can see opened objects and data
4. At the bottom right is the the File Directory and where you can also see installed packages. Use the Files tab to select the folder in which you wish to save your work
- Click on the More cog and select ‘Set as Working Directory’. This is now your default working folder.

Setting up RStudio

1. Change global settings to your preferences: font, wrapping of text, default working directory etc

Go to Tools > Global Options…
In Appearance, select font etc. In General, set the default working directory to C:/Users/your KU ID number/OneDrive - Kingston University
Select Code and then select ‘Use native pipe operator’ and also select ‘soft-wrap R source files’
After you have made changes, you need to close and re-open RStudio

Check your working directory:

getwd()

## [1] "/Users/richardtcook/Library/CloudStorage/OneDrive-KingstonUniversity/LS5022 Research skills (new)/Biostatistics lectures & workshops/Session 1"

In the pane at the bottom right, click on Files, and you should see all of your folders and files in your OneDrive folder. If not, click on Settings in the Files pane and select ‘Go to working directory’. You can now go to the folder where you have saved files.

2. To start an R session, open an R Script using the green ‘+’ sign at the top left of RStudio

You are now ready to do some statistics.

In the next tab, you will import an Excel file to R and carry out an independent samples t-test on the data. The code will also generate a bar chart.

This is just an example of how it works, and we will go through this slowly in later sessions so that you can do this yourself.

Some simple things!

The first thing I want to do is to load some packages that enhance R for what we are about to do:

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(psych)

## 
## Attaching package: 'psych'
## 
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

library(rio)
library(rstatix)

## 
## Attaching package: 'rstatix'
## 
## The following object is masked from 'package:stats':
## 
##     filter

library(ggplot2)
library(ggthemes)

To create an object called ‘greeting’ with text

greeting <- "Hello World"
greeting

## [1] "Hello World"

Some basic arithmetic! The calculation is stored in an object called ‘add’

add <- 2+4
add

## [1] 6

Division!

div <- 6/2
div

## [1] 3

Create a vector and store in an object called vector1

vector1 <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
vector1

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15

Calculate mean and standard deviation

mean1 <- mean(vector1)
mean1

## [1] 8

sd1 <- sd(vector1)
sd1

## [1] 4.472136

Get descriptive statistics of the vector data NB The first two things done here are to load two packages called ‘tidyverse’ and ‘psych’

vector1 |>
  describe()

##    vars  n mean   sd median trimmed  mad min max range skew kurtosis   se
## X1    1 15    8 4.47      8       8 5.93   1  15    14    0    -1.44 1.15

Import Excel file

1. Import an Excel file (in ‘tidy’ format) This code uses the rio package to use the import() function. It load the Excel file and stores the data in a data frame object called ‘cholesterol’

cholesterol <- import("~/Library/CloudStorage/OneDrive-KingstonUniversity/LS5022 Research skills (new)/Biostatistics lectures & workshops/Session 1/2-sample_cholesterol.xlsx")

class(cholesterol)

## [1] "data.frame"

NOTE: this is saved as a data frame. This is the type of object that we will be working with.

2. Save the cholesterol object as an RData file in your desired folder
(NB working directory is C:/Users/k******/OneDrive - Kingston University*, so that is the starting point).

This means the data frame is saved and you don’t have to import it every time you open RStudio! Just type in R:

save(cholesterol, file=“LS5022 Research skills (new)/Biostatistics lectures & workshops/Session 1/cholesterol.RData”)

To open the RData file in future, simply type

load(file=“LS5022 Research skills (new)/Biostatistics lectures & workshops/Session 1/cholesterol.RData”)

Save this R file by clicking on the Save icon in RStudio. It will save to the folder from where it was opened.

Statistics!

1. This code can be used to do an independent samples t-test on the cholesterol data

# First, make the Group data a factor (or category) instead of numeric:
cholesterol <- cholesterol |>
  mutate(Group=as.factor(Group))

test for normality with shapiro-wilk test:

cholesterol |>
  group_by(Group) |>
    shapiro_test(Cholesterol)

## # A tibble: 2 × 4
##   Group variable    statistic     p
##   <fct> <chr>           <dbl> <dbl>
## 1 1     Cholesterol     0.955 0.441
## 2 2     Cholesterol     0.954 0.437

Homogeneity of variance test:

cholesterol |>
  levene_test(Cholesterol ~ Group)

## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     1    38    0.0137 0.907

Independent samples t-test

t_test_chol <- cholesterol |>
  t_test(Cholesterol ~ Group, paired = FALSE, var.equal = TRUE) 
t_test_chol

## # A tibble: 1 × 8
##   .y.         group1 group2    n1    n2 statistic    df       p
## * <chr>       <chr>  <chr>  <int> <int>     <dbl> <dbl>   <dbl>
## 1 Cholesterol 1      2         20    20     -3.31    38 0.00207

Effect size

cholesterol |>
  cohens_d(Cholesterol ~ Group, paired = FALSE, var.equal = TRUE, hedges.correction=FALSE)

## # A tibble: 1 × 7
##   .y.         group1 group2 effsize    n1    n2 magnitude
## * <chr>       <chr>  <chr>    <dbl> <int> <int> <ord>    
## 1 Cholesterol 1      2        -1.05    20    20 large

If hedges.correction = TRUE, it gives hedge’s g instead of cohen’s d effect size

Post-hoc power analysis

library(pwr)
pwr.t.test(n=20, d = 1.05, sig.level = 0.05, power = NULL)

## 
##      Two-sample t test power calculation 
## 
##               n = 20
##               d = 1.05
##       sig.level = 0.05
##           power = 0.8989188
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

2. You can also create plots from the data; for example this bar chart summarises the cholesterol data:

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.