#BST02: Using R for Statistics in MedicalResearch
What is this Course About
Statistics have flourished in the recent years mainly due to the possibility of doing complex analysis using computers
- Many statistical software exist to do simple and specialized analysis
The programming language R is popular for data scientists
- Analysts must not only learn how to use the software but also the ideas behind it
- Learning statistical modelling and algorithm is more important than learning a programming language.
The most valuable tool of a modern quantitative researcher is his/her personal computer
Planning
- Part A: General Introduction
- how does the programming language R work
- Part B: Basic use of R
- getting started with a data set, data visualizations
- Part C: Programming
- using and writing functions, popular functions which you will later need for the more advanced courses such as Repeated Measurements (CE08), Bayesian Statistics (CE09), Missing Values in Clinical Research (EP16), etc.
- Part D: Statistics with R
- basic statistical tests, regression analysis
- Part E: tools
- some interesting tools for reporting data analyses in a reproducible manner
General introduction in R
What does R look like ?
Underlying program of Rstudio. Quite basic.
What is R ?
- R is a software environment for statistical computing and graphics
- extensive catalog of statistical and graphical methods
- R is mainly used in academia. However, many large companies also use R programming language, including healthcare industries but also Uber, Google, Airbnb, Facebook and so on
- Unlike SPSS, R is purely command driven
A brief history of R
1993: University of Auckland, New Zealand by Ross Ihaka and Robert Gentleman
1997: R core Team was formed (20 members)
2000: R 1.0.0 released
2004: First international user conference in Vienna
2013: 5026 packages available
2017: 10875 packages available
Now: nrow(available.packages())
Why learn R ?
- R is a free software environment for statistical computing and graphics
- It compiles and runs on LINUX, Windows and MacOS
- Open source language
- Users are allowed to modify and redistribute the code
- Advanced statistical language
- Supports extensions
- Related to other languages
- Flexible and fun!
Where do I get R ?
http://cran.r-project.org
- choose your platform, e.g., Windows, Linux
- e.g., for Windows: Windows → base → Download R 3.6.2 for Windows
- Install . . .
How does R work ?
- Packages built for specific tasks
- Download R packages from the CRAN web site → within R
- Packages
- Install package(s) . . .
- Make you choice(s)
- Load the package using library() (note: install does not mean load)
How to get help in R ?
- Within R
- help.search(“topic”) or ??“topic” (depends on the installed packages)
- RSiteSearch(“topic”) (requires internet connection)
- help() or ? invoke the on-line help file for the specified function
- checking the FAQ
- Online
Disadvantages of R
- Appears intimidating to the first-time user
- Output is not so nice looking (but there are some alternatives)
- Exporting output is more difficult
- Cannot easily handle very big data sets (depends on the installed RAM)
- A lot of things are available but it is sometimes hard to find your way
- The quality of the available packages is greatly varying
- Has been criticized for using only one CPU at a time (but the parallel packages helps you perform tasks in different cores)
##Summary - R is a great tool to explore and investigate the data
- Several statistical methods can be performed with R
- It is important to understand the methods before applying them in R
How to use: R uses packages that perform specific tasks
- Install package only once
- Load package every time you open R