For this presentation, I use:
- R
- RStudio
- Slidify
- markdown
Kamarul Imran Musa
Associate Professor in Epidemiology and Statistics (MD, M Community Med, PHD)
For this presentation, I use:
The latest R version
## [1] "R version 3.4.0 (2017-04-21)"
## [1] "You Stupid Darkness"
RStudio version 1.0.136
The objectives of the presentation are:
The common software at medical schools include:
Stata: Data Analysis and Statistical Software http://www.stata.com/
1. Masters of Science (Medical Statistics)
2. Doctor of Public Health (DrPH)
R is a freely available language and environment for statistical computing and graphics.
It has over 10200 packages to cater for the needs of statisticians, data scientists, epidemiologists, ecologists, econometricians and many more people.
For more info about R, check here https://cran.r-project.org/
Introduced R at USM Medical School in late 2013
We formally integrated R into our academic syllabus in 2015
R and RStudio
This is R
RStudio
base
and user-contributed packages
.xlsx
.csv
.sav
.dta
library(foreign)
dat1 <- read.dta('abc.dta', convert.factors = TRUE )
dat2 <- read.spss('abc.sav', to.data.frame = TRUE, use.value.labels = TRUE)
library(haven)
dat1 <- read_dta('abc.dta')
dat2 <- read_spss('abc.sav')
psych
dplyr
psych
package provides many useful data summarydplyr
aims to provide a function for each basic verb of data manipulation:As proposed by Hadley Wickham
dplyr
provides verbs for data managementfilter
, select
then mutate
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
mtcars2 <- mtcars %>% filter(disp < 200) %>% select(mpg, cyl, disp) %>% mutate(cyl_mpg = cyl + mpg)
slice()
, arrange()
select()
, rename()
, distinct()
mutate()
, transmute()
summarise()
, sample_n()
, sample_frac()
dplyr makes use of pipe %>%
graphics::plot
to perform plottingsggplot2
packagegraphics::plot
plot(mtcars$mpg, mtcars$hp, main = 'my mpg vs hp')
ggplot2
uses proper grammar for graphicslibrary(ggplot2)
ggplot(mtcars, aes(x= mpg, y= hp)) +
geom_point(shape=1) +
geom_smooth()
## `geom_smooth()` using method = 'loess'
Depends on the syllabus:
library(survival)
mod_sur <- coxph(Surv(time, status) ~ age , data = cancer)
summary(mod_sur)
mod_sur_para <- survreg(Surv(time, status) ~ age , data = cancer, dist = 'weibull')
summary(mod_sur_para)
General and generalized linear model
mod1 <- lm(mpg ~ disp, data = mtcars)
mod2 <- glm(vs ~ wt, data = mtcars,
family = binomial(link = 'logit'))
More on generalized linear model:
using SPSS and Stata - need to copy and paste for most of users
Risk of doing that:
We advocate Reproducibility
ability to regenerate
The term reproducible research refers to the idea that the ultimate product of academic research is the paper along with the laboratory notebooks and full computational environment used to produce the results in the paper such as the code, data, etc. that can be used to reproduce the results and create new work based ...
Reasons:
https://en.wikipedia.org/wiki/Reproducibility#/media/File:Spectrum_of_reproducible_research.png
Markdown was made by John Gruber http://daringfireball.net/
"Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).""
Some lessons for us:
I wish:
Some of our activities:
We have RStudio Server Pro running at our medical school https://healthdata.usm.my/rstudio/auth-sign-in
The opportunity at USM, Health Campus:
can contact me at drkamarul@usm.my or drki.musa@gmail.com