A modern demonstration of R for epidemiologists

Epidemiology: Swedish Case Studies - November 15th, 2023

Alessio Crippa

Introduction

Alessio Crippa

  • R-enthusiastic
  • More than 10 years R experience
  • 2 packages on CRAN (and GitHub)
  • Web applications with Shiny R
  • Reproducible research with R markdown
  • Co-organizer of the Stockholm R useR group (SRUG)

alecri.github.io
github.com/alecri

Today’s plan


  • A modern: using RStudio, tidyverse, and reproducible research.

  • Demonstration of R: interactive and adaptive workshop.

  • For epidemiologists: real world data.


Expected learning outcome:

  • Basic Understanding of R in RStudio
  • Be familiar with R Fundamentals: R as a calculator, editing R script
  • Gain insights on data import and data manipulation
  • Basic data visualization and statistics
  • Hands-On on real data
  • Q&A and Problem Solving

Outline

  1. Introduction to R and RStudio
  2. Data Manipulation
  3. Exploratory Data Analysis
    • Summary statistics
    • Informative graphs
    • Common models
  4. Introduction to More Advanced Concepts
    • Workflow: projects
    • More advance analyses
    • Reproducible research (rmarkdown)
  5. Learn more about R

R for Data Science



https://r4ds.had.co.nz/introduction.html

Motivating example


Descriptive abstract
Hyponatremia has emerged as an important cause of race-related death and life-threatening illness among marathon runners. We studied a cohort of marathon runners to estimate the incidence of hyponatremia and to identify the principal risk factors.


Reference
Hyponatremia among Runners in the Boston Marathon, New England Journal of Medicine, 2005, Volume 352:1550-1556

Workshop material


The material for today is available here.
Click big green Code button and select “Download ZIP”, then open introR-epi-dis.Rproj

Folder structure


project
│   README.md
│   introR-epi-dis.Rproj   
│
└───articles
│   │   hyponatremia.pdf
│   │
└───data
│   └───raw
│   |    marathon_raw.csv
│   └───derived
│   |    marathon.RData
└───scripts
│   │   01_data_management.R
│   │   02_descriptive_statistics.R
│   │   03_analyses.R
│   │   ...
│   │
└───output
│   └───figures
│   |   ...
│   └───tables
│   │   ...
│   │
└───report
│   │   intro-R-marathon.qmd
│   |   ...
│   │
└───slides
│   |   ...

Interactive workshop

Interactive workshop

Follow/reproduce in the RStudio Cloud

1. Introduction to R and RStudio

2. Basic data manipulation

scripts/01_data_management.R

3. Exploratory Data Analysis

scripts/02_descriptive_statistics.R

4. Introduction to More Advanced Concepts

scripts/03_analyses.R
report/intro-R-marathon.qmd
slides/introR-epi-dis.qmd

Introduction to R and RStudio

Why R


  • Free and open source language for statistical computing and graphics
  • One of the most popular programming languages in both academia and industries
  • Support for reproducible research and interactive analyses
  • Interconnection with other programming languages
  • Vast community support (RStudio, Stack Overflow)

Using R via tidyverse

The tidyverse is a set of packages that work in harmony because they share common data representations and API design.

It includes:

  • ggplot2, for data visualisation
  • dplyr, for data manipulation
  • tidyr, for data tidying
  • readr, for data import
  • purrr, for functional programming
  • tibble, for modern data frames
  • and many more …

Data Manipulation

Exploratory Data Analysis

Introduction to More Advanced Concepts

Learn more about R

Resources




(https://alecri.github.io/courses/)[https://alecri.github.io/courses/]