R-enthusiastic
More than 4 years R experience
2 packages on CRAN (and GitHub)
Web applications with Shiny R
Reproducible research with R markdown
Co-organizer of the Stockholm R useR group (SRUG)
October 13, 2016
R-enthusiastic
More than 4 years R experience
2 packages on CRAN (and GitHub)
Web applications with Shiny R
Reproducible research with R markdown
Co-organizer of the Stockholm R useR group (SRUG)
Free and open source
Leading software for statistics, data analysis, and machine learning (https://www.r-bloggers.com/r-passes-sas-in-scholarly-use-finally/)
Many packages available on CRAN
Support for reproducible research (rmarkdown), interactive analyses (shiny)
Integrated Development Environment (IDE) for R
Syntax highlighting, code completion, and smart indentation
Easily manage multiple working directories using projects
Workspace browser and data viewer
Plot history, zooming, and flexible image and PDF export
Integrated R help and documentation
Get familiar with R (in RStudio)
Read different data format in R
Manipulate manage data
Explore and obtain summary statistics
Produce common useful graphs
http://r4ds.had.co.nz/index.html
The tidyverse is a set of packages that work in harmony because they share common data representations and API design.
It includes:
load(url("http://alecri.github.io/downloads/data/marathon.Rdata"))
"Hyponatremia among Runners in the Boston Marathon", New England Journal of Medicine, 2005, Volume 352:1550-1556.
Hyponatremia has emerged as an important cause of race-related death and life-threatening illness among marathon runners. We studied a cohort of marathon runners to estimate the incidence of hyponatremia and to identify the principal risk factors.
https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html
5 functions (verbs) to manipulate data
filter()
)arrange()
)select()
)mutate()
)summarise()
)# create a variable timeh (time in hours) marathon = mutate(data = marathon, timeh = runtime/60) # select only a few variables marathon_sub = select(data = marathon, id, female, age, na, bmi, timeh) # select only female with bmi > 30 female_30 = filter(data = marathon_sub, female == "female", bmi > 30) # sorting by (descending) na levels and timeh arrange(female_30, desc(na), timeh)
Equivalent to
marathon %>% mutate(timeh = runtime/60) %>% select(id, female, age, na, bmi, timeh) %>% filter(female == "female", bmi > 30) %>% arrange(desc(na), timeh)
http://www.cookbook-r.com/Graphs/
ggplot(data = <DATA>) + <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
first argument is the dataset to use in the plot
the geom_functions add different layer to the plot
each geom_function takes a mapping argument (paired with the aes())
# an example ggplot(data = marathon, aes(x = wtdiff, y = na, color = female)) + geom_point(aes(size = runtime), shape = 18) + geom_smooth(method = "lm", se = F)