This first workshop aims to introduce you to R and RStudio. R is the programming language; RStudio is the IDE (Integrated Development Environment) that makes R easier to use. You must first download R and then Download RStudio (R is a requirement for RStudio to work).
To get started, click on the tab below entitled “Installing R and RStudio”
1. Installing R and RStudio on a University
computer:
Go to AppsAnywhere and load RStudio (R will
install automatically).
2. To download R and RStudio to your own laptop:
Go to RStudio
(click on this link) and download and install R and RStudio
3. Once installed:
Go to Tools >
Global Options…
In Appearance, select font etc.
In General, set the default working directory to C:/Users/your KU ID number/OneDrive - Kingston University
Select Code and then select ‘Use native pipe operator’ and also select ‘soft-wrap R source files’
After you have made changes, you need to close and re-open
RStudio
NB
When using RStudio on your own computer, these
setiings will be saved for the next time. When using a University
computer, you may need to do this each time you open RStudio.
Click on the green ‘+’ symbol at the top left of the screen and
select ‘R Script’. You will now see 4 areas of the RStudio page:
1. The Code Editor at the top left area where you
will do most of your work and store your code. Here you can write R code
and notes to yourself.
2. The area below it (bottom left) is the
R Console, and is essentially like R without RStudio.
It is where the R code that you type in the Editor above and its output
appears when you run the code.
3. Top right is the Workspace
environment where you can see opened objects and data
4. At
the bottom right is the the File Directory and where
you can also see installed packages. Use the Files tab to select the
folder in which you wish to save your work
- Click on the More cog
and select ‘Set as Working Directory’. This is now your default working
folder.
1. Change global settings to your preferences: font, wrapping of text, default working directory etc
Go to Tools > Global Options…
In Appearance, select font etc. In General, set the default working directory to C:/Users/your KU ID number/OneDrive - Kingston University
Select Code and then select ‘Use native pipe operator’ and also select ‘soft-wrap R source files’
After you have made changes, you need to close and re-open RStudio
Check your working directory:
getwd()
## [1] "/Users/richardtcook/Library/CloudStorage/OneDrive-KingstonUniversity/LS5022 Research skills (new)/Biostatistics lectures & workshops/Session 1"
In the pane at the bottom right, click on Files, and you should see all of your folders and files in your OneDrive folder. If not, click on Settings in the Files pane and select ‘Go to working directory’. You can now go to the folder where you have saved files.
2. To start an R session, open an R Script using the
green ‘+’ sign at the top left of RStudio
You are now ready to do some statistics.
In the next tab, you will import an Excel file to R and carry out an independent samples t-test on the data. The code will also generate a bar chart.
This is just an example of how it works, and we will go through this slowly in later sessions so that you can do this yourself.
The first thing I want to do is to load some packages that enhance R for what we are about to do:
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(psych)
##
## Attaching package: 'psych'
##
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(rio)
library(rstatix)
##
## Attaching package: 'rstatix'
##
## The following object is masked from 'package:stats':
##
## filter
library(ggplot2)
library(ggthemes)
To create an object called ‘greeting’ with text
greeting <- "Hello World"
greeting
## [1] "Hello World"
Some basic arithmetic! The calculation is stored in an object called ‘add’
add <- 2+4
add
## [1] 6
Division!
div <- 6/2
div
## [1] 3
Create a vector and store in an object called vector1
vector1 <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
vector1
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Calculate mean and standard deviation
mean1 <- mean(vector1)
mean1
## [1] 8
sd1 <- sd(vector1)
sd1
## [1] 4.472136
Get descriptive statistics of the vector data NB The first two things done here are to load two packages called ‘tidyverse’ and ‘psych’
vector1 |>
describe()
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 15 8 4.47 8 8 5.93 1 15 14 0 -1.44 1.15
1. Import an Excel file (in ‘tidy’ format) This code uses the rio package to use the import() function. It load the Excel file and stores the data in a data frame object called ‘cholesterol’
cholesterol <- import("~/Library/CloudStorage/OneDrive-KingstonUniversity/LS5022 Research skills (new)/Biostatistics lectures & workshops/Session 1/2-sample_cholesterol.xlsx")
class(cholesterol)
## [1] "data.frame"
NOTE: this is saved as a data frame. This is the type of object that we will be working with.
2. Save the cholesterol object as an RData file in your
desired folder
(NB working directory is
C:/Users/k******/OneDrive - Kingston University*, so that is
the starting point).
This means the data frame is saved and you don’t have to import it every time you open RStudio! Just type in R:
save(cholesterol, file=“LS5022 Research skills (new)/Biostatistics lectures & workshops/Session 1/cholesterol.RData”)
To open the RData file in future, simply type
load(file=“LS5022 Research skills (new)/Biostatistics lectures & workshops/Session 1/cholesterol.RData”)
Save this R file by clicking on the Save icon in RStudio. It will
save to the folder from where it was opened.
1. This code can be used to do an independent samples t-test on the cholesterol data
# First, make the Group data a factor (or category) instead of numeric:
cholesterol <- cholesterol |>
mutate(Group=as.factor(Group))
cholesterol |>
group_by(Group) |>
shapiro_test(Cholesterol)
## # A tibble: 2 × 4
## Group variable statistic p
## <fct> <chr> <dbl> <dbl>
## 1 1 Cholesterol 0.955 0.441
## 2 2 Cholesterol 0.954 0.437
cholesterol |>
levene_test(Cholesterol ~ Group)
## # A tibble: 1 × 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 1 38 0.0137 0.907
t_test_chol <- cholesterol |>
t_test(Cholesterol ~ Group, paired = FALSE, var.equal = TRUE)
t_test_chol
## # A tibble: 1 × 8
## .y. group1 group2 n1 n2 statistic df p
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
## 1 Cholesterol 1 2 20 20 -3.31 38 0.00207
cholesterol |>
cohens_d(Cholesterol ~ Group, paired = FALSE, var.equal = TRUE, hedges.correction=FALSE)
## # A tibble: 1 × 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 Cholesterol 1 2 -1.05 20 20 large
If hedges.correction = TRUE, it gives hedge’s g instead of cohen’s d effect size
library(pwr)
pwr.t.test(n=20, d = 1.05, sig.level = 0.05, power = NULL)
##
## Two-sample t test power calculation
##
## n = 20
## d = 1.05
## sig.level = 0.05
## power = 0.8989188
## alternative = two.sided
##
## NOTE: n is number in *each* group
2. You can also create plots from the data; for example
this bar chart summarises the cholesterol data:
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.