class: center, middle, inverse, title-slide # Journey through R ## A motivating workshop ### Julius Fenn ### January 18, 2023 --- <!-- *********** NEW SLIDE ************** --> ## Workshop This workshop consists of three parts: 1 Pep talk (*a speech which is intended to encourage someone to make more effort or feel more confident*) + first some basics 2 Knowledge Management: How to learn these things?! 3 Introduction to R + Overview + Objects + Data Structures + ... 4 Amazing Applications of R + typical analyses sequences in action + ... <!-- *********** NEW SLIDE ************** --> <!-- *********** HEADING ************** --> --- class: heading,middle Part 1: Pep talk <!-- *********** HEADING ************** --> --- class: heading,middle Part 1: Pep talk - Some Basics <!-- *********** NEW SLIDE ************** --> --- ## Definition: What is statistics? - Statistics is the science of responsible data analysis. - Statistics is a cross-sectional discipline that is characterized by a combination of knowledge in - Mathematics (abstraction, modeling, stochastics, numerics), - Computer science (programming, scientific computing), - Fields of application (life, natural or economic sciences, etc.). - Statistical modeling allows the description of stochastic phenomena and thus supports the finding of rational decisions under uncertainty. <br> Encyclopedia Britannica: Statistics is the art and science of gathering, analyzing and making inferences from data. Originally associated with numbers gathered for governments, the subject now includes large bodies of method and theory. <!-- *********** NEW SLIDE ************** --> --- ## The theoretical master-mind: The Statistician **Statisticians:** theoretical driven, discussing terms like point estimates, margins of error, confidence intervals and are separated between “Frequentists” and “Bayesians” - The Frequentist approach to statistics (and testing) is a method which makes predictions on the underlying truths of the experiment, using only data from the current experiment. - The Bayesian approach to statistics is a method that encodes past knowledge of similar experiments into a statistical device, known as prior. This prior is combined with current experiment data to make a conclusion on the test (knowledge accumulation). <center> <img src="images/evolutionOfStatistics.jpg", height="250px"> </center> <br> <a href="https://cxl.com/blog/bayesian-frequentist-ab-testing/" target="_blank">https://cxl.com/blog/bayesian-frequentist-ab-testing/</a> <!-- *********** NEW SLIDE ************** --> --- ## The modern (applied) statistician: The Data Scientist **Data scientists:** *data analysis is an art*; a process of data ingest, data transformation, exploratory data analysis, model selection, model evaluation, and data storytelling <center> <img src="images/peng_Epicycles.jpg", height="400px"> </center> <br> see book: Peng, R. D., & Matsui, E. (2016). The Art of Data Science: A Guide for Anyone who Works with Data. Lulu.com. https://bookdown.org/rdpeng/artofdatascience/ <!-- *********** HEADING ************** --> --- class: heading,middle Part 1: Pep talk - The Real Talk <!-- *********** NEW SLIDE ************** --> --- ## Why learn programming?! (will I look like this cat?) <center> <img src="https://media4.giphy.com/media/VbnUQpnihPSIgIXuZv/giphy.gif?cid=ecf05e47hmf8l7kqh12kn3rbuh885rmbgzgiyjan1n0v137g&rid=giphy.gif&ct=g", width="30%"> </center> <br>
: https://osf.io/ytb8q <!-- *********** NEW SLIDE ************** --> --- ## arguable reason: to impress someone <center> <img src="images/impressivePieChart.jpg", width="50%"> </center> <br> *Anyone wants to see impressive R Code in action?!* <br> <br> <a href="https://stats.stackexchange.com/questions/423/what-is-your-favorite-data-analysis-cartoon" target="_blank">https://stats.stackexchange.com/questions/423/what-is-your-favorite-data-analysis-cartoon</a> <!-- *********** NEW SLIDE ************** --> --- ## better reason: to embrace the complexity of our world - prediction to appreciate uncertainty of our predictions (e.g. prediction paradox) <center> <img src="images/appreciateUncertainty.jpg", height="250px"> </center> > random variables, distributions, expectations, confidence intervall, variance,... <br> see book: Silver, N. (2015). The Signal and the Noise: Why So Many Predictions Fail--but Some Don’t. Penguin Publishing Group. <!-- *********** NEW SLIDE ************** --> --- ## better reason: to embrace the complexity of our world - system theory <center> <img src="images/Meadows_Fig31.jpg", height="200px"> </center> <center> <img src="images/Meadows_Fig32.jpg", height="200px"> </center> > computational modelling, simulation... <br> see book: Meadows, D. H. (2008). Thinking in Systems: A Primer. Chelsea Green Pub. <!-- *********** NEW SLIDE ************** --> --- ## down-to-earth reason: to get a job! - where to study? - Göttingen: Master of Science (MSc) in Applied Statistics; https://www.uni-goettingen.de/en/421501.html - Bamberg: Masterstudiengang Survey-Statistik; https://www.uni-bamberg.de/miss/ - Trier: Master of Science (MSc) in Applied Statistics; https://www.uni-trier.de/universitaet/fachbereiche-faecher/fachbereich-iv/faecher/volkswirtschaftslehre/professuren/wirtschafts-und-sozialstatistik/studieren/applied-statistics-msc - Tübingen: Kognitionswissenschaft (MSc); https://uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/informatik/studium/studiengaenge/kognitionswissenschaft/ - Economics Master, ... <!-- *********** HEADING ************** --> --- class: heading,middle Part 2: Knowledge Management <!-- *********** NEW SLIDE ************** --> --- ## Check out the collection of materials! see: https://docs.google.com/document/d/1Z40Rkux_Ysq15VziCJJH21ca07ipwN52dA_LFYIsZ2g/edit?usp=sharing <br> start reading: - Possible Learning Process - Mixed -> Knowledge management and learning <!-- *********** NEW SLIDE ************** --> --- ## Getting Things Done - workflow **workflow:** <center> <img src="images/GTD_workflow.jpg", height="450px"> </center> <br> see book: Allen, D. (2015). Getting Things Done: The Art of Stress-Free Productivity. Penguin. <!-- *********** NEW SLIDE ************** --> --- ## Getting Things Done - projects **projects:** *a project is anything we want to do that requires more than one action step. It’s therefore a mechanism to remember that, when we finish that first action step, there will still be something more to do* 1. Set up a project list, which is an index, in no particular order, of all your open loops (to dos). 2. For every project define at least the first next action step (OR waiting for, or calendar action). 3. The Projects list and project plans are typically reviewed in your GTD Weekly Review, ensuring each project has at least one current next action, waiting for, or calendar item. 4. It’s fine to have multiple next actions on any given project, as long as they are parallel and not sequential actions. 5. Projects are listed by the outcome you will achieve when you can mark it as done. + Effective project names motivate you toward the outcome you wish to achieve, and give you clear direction about what you are trying to accomplish. <br> <br> => that's how you set up a **learning plan** <!-- *********** NEW SLIDE ************** --> --- ## How to learn - One Approach **Organize your knowledge as a "Zettelkasten":** <center> <img src="images/Zettelkasten_Luhmann.jpg", height="150px"> </center> - use Anki: https://apps.ankiweb.net/ - organize the "Zettel" by using Obsidian: https://obsidian.md/ <center> <img src="images/Obsidian_Example.jpg", height="150px"> </center> Check out YouTube Video SpiegelMining – Reverse Engineering von Spiegel-Online: https://www.youtube.com/watch?v=-YpwsdRKt8Q <!-- *********** HEADING ************** --> --- class: heading,middle Part 3: Introduction to R <!-- *********** HEADING ************** --> --- class: heading,middle Part 3: Introduction to R - Overview *Setting up your first project* <!-- *********** NEW SLIDE ************** --> --- ## Main Literatur * easy readable introduction R and statistics: Field, Andy, Jeremy Miles, and Zoë Field. Discovering Statistics Using R. SAGE, 2012. * Resource for improving coding skills and deepening (technical) understanding of R: + Wickham, Hadley. Advanced R, Second Edition. CRC Press, 2019. https://adv-r.hadley.nz/. + Jones, Owen, Robert Maillardet, and Andrew Robinson. Introduction to Scientific Programming and Simulation Using R, Second Edition. CRC Press, 2014. https://nyu-cdsc.github.io/learningr/assets/simulation.pdf. <br> plus collected materials / workshops... <!-- *********** NEW SLIDE ************** --> --- ## Introduction to R * R is a programming language and tool for statistical computing and data analysis * consist of basic functionalities (i.e., objects and functions) as well as packages that allow for robust and efficient coding * in R are multiple object-oriented programming (OOP) included like S3, R6, S4, ..., enables *polymorphism* (use the same function form for different types of input) ```r summary(c(TRUE, TRUE, FALSE, TRUE)) ``` ``` ## Mode FALSE TRUE ## logical 1 3 ``` ```r summary(rnorm(n = 100, mean = 0, sd = 20)) ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -65.9396 -14.6114 -0.7923 -0.2415 14.8804 44.3164 ``` Important: * Everything that exists is an object. * Everything that happens is a function call. <!-- *********** NEW SLIDE ************** --> --- ## Side-Note: Polymorphism? ```r methods(generic.function = "summary")[1:20] ``` ``` ## [1] "summary,ANY-method" "summary,DBIObject-method" ## [3] "summary.aov" "summary.aovlist" ## [5] "summary.aspell" "summary.check_packages_in_dir" ## [7] "summary.connection" "summary.data.frame" ## [9] "summary.Date" "summary.default" ## [11] "summary.Duration" "summary.ecdf" ## [13] "summary.factor" "summary.ggplot" ## [15] "summary.glm" "summary.haven_labelled" ## [17] "summary.hcl_palettes" "summary.infl" ## [19] "summary.Interval" "summary.lm" ``` other important functions are: ```r ?typeof ?class ?mode ``` <!-- *********** NEW SLIDE ************** --> --- ## set your working directory * every time when starting R set your working directory (or better create an R Project*) <br> <center> <img src="images/workingdir_pic.png", width="50%"> </center> *you could also use the R API: ```r # sets the directory of location of this script as the current directory setwd(dirname(rstudioapi::getSourceEditorContext()$path)) ``` <!-- *********** NEW SLIDE ************** --> --- ## your workflow ! **modular programming** <br> if using R projects: * Create a project folder * In this folder, organize your file in subfolders (e.g., data) * All filepaths in your script(s) are specified relative to the folder’s top level * Thus, your working directory is always this top level <!-- *********** NEW SLIDE ************** --> --- ## Side-Note: modular programming? you separate your programs into separate functional units (modules). Each module does a well-defined task and the modules are called by one-another as needed. Typically, the state of each module is encapsulated and only supposed to be altered by the functions from that module. <center> <img src="images/modular_programming.jpg", height=350px"> </center> > Modular programming decomposes a large program into modules! <!-- *********** NEW SLIDE ************** --> --- ## Side-Note: modular programming - Importance without following the principle of modular programming (*or sometimes structured programming*) you cannot set up a typical data science project: <center> <img src="images/dataAnalysisPipeline.jpg", height=350px"> </center> see in detail: https://r4ds.had.co.nz/introduction.html <!-- *********** NEW SLIDE ************** --> --- ## RStudio Projects When using a project, RStudio will automatically set your working directory to the location of the project file. Additionally, RStudio will * load .RData files * load .RHistory files * open source files (scripts) * restore RStudio settings This will particularly benefit your workflow when collaborating with others/sharing your script, for example by using <center> <img src="images/githublogo.jpg", width="30%", heigth="150px"> </center> [https://codehorizons.com/making-your-first-github-r-project/](https://codehorizons.com/making-your-first-github-r-project/) <!-- *********** NEW SLIDE ************** --> --- ## hands-on: set up a project! and remember the philosophy of R: **Everything that exists is an object.** **Everything that happens is a function call.** <center> <img src="images/memeletsdothis.jpg", width="50%"> </center> <!-- *********** HEADING ************** --> --- class: heading,middle Part 3: Introduction to R - Objects *Using R as a sophisticated calculator* <!-- *********** NEW SLIDE ************** --> --- ## ??????! I am lost Is anyone lost? <br> <br> <center> <img src="https://media1.giphy.com/media/1kYx8tpKGbfee2drT7/giphy.gif?cid=790b7611d38e7d9633b5e62c5857c72dfa16e5b014d7e358&rid=giphy.gif&ct=g", width="70%"> </center> <!-- *********** NEW SLIDE ************** --> --- ## the very basics .pull-left[ **Explanation:** 1 get help 2 assignments 3 operators 4 comparisons 5 comments: everything that follows # <br> > case sensitive: usage of CAPITAL and small letters matters! ] .pull-right[ **R commands:** 1 ```r ?topic, help(topic) ``` 2 ```r x <- 5 (recommended), x = 5, 5 -> x ``` 3 ```r +, -, *, /, ^, &, &&; see help("+") ``` 4 ```r == , != , >, >= , <, <=; see help("=") ``` 5 ```r # I am a comment ``` ] <!-- *********** NEW SLIDE ************** --> --- ## basic data structures .pull-left[ **Explanation:** 1 integer 2 double 3 logical 4 character 5 missings ] .pull-right[ **R commands:** 1 ```r 1; 2; 301L ``` 2 ```r 1.0; .141; 1.23e-3; NaN; Inf; -Inf ``` 3 ```r TRUE; FALSE #(not T, F!) ``` 4 ```r "hello"; "I'm a string" ``` 5 ```r NA ``` ] <!-- *********** NEW SLIDE ************** --> --- ## basic data structures - construction & coercion I .pull-left[ **Explanation:** Coercion: When you call a function with an argument of the wrong type, R will try to coerce values to a different type so that the function will work. R will convert from more specific types to more general types. **R commands:** you define a vector x as follows ```r x <- c(1, 2, 3, 4, 5) x ``` ``` ## [1] 1 2 3 4 5 ``` ```r typeof(x); class(x) ``` ``` ## [1] "double" ``` ``` ## [1] "numeric" ``` ] .pull-right[ **R commands:** change the second element of the vector to the word “hat.” ```r x[2] <- "hat" x ``` ``` ## [1] "1" "hat" "3" "4" "5" ``` ```r typeof(x); class(x) ``` ``` ## [1] "character" ``` ``` ## [1] "character" ``` ] <!-- *********** NEW SLIDE ************** --> --- ## basic data structures - construction & coercion II .pull-left[ **Explanation:** Coercion: * coerce to a type xxx by as.xxx() * check if xxx is a specific type by is.xxx() * when combining different data types, they will be coerced to the most flexible type * coercion often happens automatically ] .pull-right[ **R commands:** ```r x <- 1 is.numeric(x) ``` ``` ## [1] TRUE ``` ```r as.logical(x) ``` ``` ## [1] TRUE ``` ```r x <- c(FALSE, FALSE, TRUE) as.numeric(x) ``` ``` ## [1] 0 0 1 ``` ] <!-- *********** NEW SLIDE ************** --> --- ## basic data structures - represent nothing .pull-left[ **Explanation:** In R, there are three ways to represent 'nothing', but the reason for the missingness of the information can be distinguished: * NA: missing sample values, impossible coercion, . . . * NaN: undefined results in mathematical operation (e.g. log(-1), 1/0) * NULL: null pointer, i.e. pointer in empty/undefined memory ] .pull-right[ **R commands:** ```r c(3, NA) ``` ``` ## [1] 3 NA ``` ```r c(3, 0/0) ``` ``` ## [1] 3 NaN ``` ```r c(3, NULL) ``` ``` ## [1] 3 ``` ```r max(3, NA) ``` ``` ## [1] NA ``` ] <!-- *********** NEW SLIDE ************** --> --- ## basic data structures - Infinity .pull-left[ **Explanation:** Some mathematical operations can be performed with Inf and -Inf: ] .pull-right[ **R commands:** ```r max(3, Inf) ``` ``` ## [1] Inf ``` ```r min(3, Inf) ``` ``` ## [1] 3 ``` ```r c(Inf + Inf, (-Inf) * Inf, Inf - Inf) ``` ``` ## [1] Inf -Inf NaN ``` ] <!-- *********** NEW SLIDE ************** --> --- ## hands-on: try out the basic data structures The bored people can open the R Reference Card 2.0 and try out more fancy stuff: https://cran.r-project.org/doc/contrib/Baggott-refcard-v2.pdf <center> <img src="images/memeletsdothis.jpg", width="50%"> </center> <br> check out also the folder "additional scripts -> Basic Vocabulary" <!-- *********** HEADING ************** --> --- class: heading,middle Part 3: Introduction to R - Data Structures *We are going to face the 10th dimension (wuuuuhu)* <!-- *********** NEW SLIDE ************** --> --- ## ??????! I am not lost but thirsty Anyone needs a coffee / break? <br> <br> <center> <img src="https://i.giphy.com/media/lXCQNy6VTwmk0/giphy.webp", width="500" height="300"> </center> <!-- *********** NEW SLIDE ************** --> --- ## complex data types complex data structures in R can be organized by their dimensionality and if all their contents are of the same type (homogeneous), or not (heterogeneous) .pull-left[ **homogeneous:** * 1d atomic vector ```r 1:10; c(1,2,3,4,5,6,7,8,9,10) ``` * 2d matrix ```r matrix(data = NA, nrow = 2, ncol = 3) ``` * nd array ```r array(1:60, dim=c(3,4,5)) ``` ] .pull-right[ **heterogeneous:** * 1d list ```r list(1:10, letters) ``` * 2d matrix data frame ```r data.frame(id = 1:26, letters = letters, constant = "Hello World") ``` ] > we need depending on the issue we are facing different database paradigms! check out Fireship YouTube Video: https://youtu.be/W2Z7fbCLSTw <!-- *********** NEW SLIDE ************** --> --- ## complex data types - very rare in psychology (social science) often we have "simple" rectangular data, which are made of (values are associated with a variable and a observation): * **column**, which represents a variable (like ID) * **rows**, which represents an instance of data in the data set (like a participant) ```r DT::datatable(dat_twins, options = list(pageLength = 5)) ```
<!-- *********** NEW SLIDE ************** --> --- ## every data set should be accompanied by a codebook Sample size N = 20 Variables (columns): * ID: Identifier of the individual person * IDZP: Identifier of the twin pair * GR: Birth order * GES: Sex (1 = male, 2 = female) * OG: Surface area of the cerebral cortex in cm2 * VG: Volume of the forebrain in cm3 * CC: Area of the corpus callosum in cm2 * KU: Circumference of head in cm * IQ: Intelligence quotient * KG: Body weight in kg * additionally IQ.cat: grouped intelligence quotient in 3 groups Study: https://n.neurology.org/content/50/5/1246 <!-- *********** NEW SLIDE ************** --> --- ## Side-Note: the most scientific study ever Hypothesis: the size and shape of the human forebrain predict intelligence! <br> the mighty correlation matrix: ```r round(x = cor(x = dat_twins[, c("VG", "CC", "KU", "KG", "IQ")], method = "spearman"), digits = 2) ``` ``` ## VG CC KU KG IQ ## VG 1.00 0.80 0.63 0.36 0.11 ## CC 0.80 1.00 0.59 0.13 0.34 ## KU 0.63 0.59 1.00 0.22 0.21 ## KG 0.36 0.13 0.22 1.00 0.07 ## IQ 0.11 0.34 0.21 0.07 1.00 ``` > What is your opinion? * Spearman correlation coefficient only maps monotone relationships; r_SP > 0, if in tendency x large -> also y large and vice versa; indicate a monotonic relationship in the same direction (not linear relationship!) + to be applied especially when the data are not normally distributed, the scales have unequal answering options, for very small sample sizes (or better use concordance measures) <!-- *********** NEW SLIDE ************** --> --- ## complex data types - the SQL side of the story for huge data sets you could use, for example SQL (Structured Query Language), which is a domain-specific language used in programming and designed for managing data held in a relational database management system; check out: https://www.dbis.informatik.uni-goettingen.de/Mondial/ > multiple relational databases are matched by a primary key / multiple primary keys and complex data bases can be depicted for example as entity relationship models: <center> <img src="images/ERMstudent.jpg", height="400px"> </center> <!-- *********** NEW SLIDE ************** --> --- ## complex data types - add attributes All objects can have arbitrary additional attributes, used to store metadata about the object * can be thought of as a named list (with unique names); other frequently encountered attributes: "dimnames", "names", "class"(!) * can be accessed individually with attr() or all at once (as a list) with attributes() * arrays are simply vectors with a "dim"-attribute. * factor is a vector with attribute levels .pull-left[ ```r dat <- data.frame(id = 1:26, letters = letters, constant = "Hello World") head(dat) ``` ``` ## id letters constant ## 1 1 a Hello World ## 2 2 b Hello World ## 3 3 c Hello World ## 4 4 d Hello World ## 5 5 e Hello World ## 6 6 f Hello World ``` ] .pull-right[ ```r attr(x = dat, which = "names") ``` ``` ## [1] "id" "letters" "constant" ``` ```r attributes(x = dat) ``` ``` ## $names ## [1] "id" "letters" "constant" ## ## $class ## [1] "data.frame" ## ## $row.names ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ## [26] 26 ``` ] <!-- *********** NEW SLIDE ************** --> --- ## hands-on: try out the different data structures Hint: we have the following data structures: .pull-left[ **homogeneous:** * 1d atomic vector ```r 1:10; c(1,2,3,4,5,6,7,8,9,10) ``` * 2d matrix ```r matrix(data = NA, nrow = 2, ncol = 3) ``` * nd array ```r array(1:60, dim=c(3,4,5)) ``` ] .pull-right[ **heterogeneous:** * 1d list ```r list(1:10, letters) ``` * 2d matrix data frame ```r data.frame(id = 1:26, letters = letters, constant = "Hello World") ``` ] If you want to save your data structures to an R object e.g. write: ```r myFirstVector <- 1:10 ``` <!-- *********** HEADING ************** --> --- class: heading,middle Part 3: Introduction to R - Subsetting *From the 10th dimension down to spaceship earth again.* <!-- *********** NEW SLIDE ************** --> --- ## ??????! When you had to debug your first error messages I'm a Celebrity … Get Me Out of Here! (also valuable TV show: Get the F*ck out of my House) <br> <br> <center> <img src="https://i.giphy.com/media/JIX9t2j0ZTN9S/giphy.webp", width="400" height="300"> </center> <!-- *********** HEADING ************** --> --- class: heading,middle Part 3: Introduction to R - Flow control *That's how adults play Looping Louie!* <!-- *********** NEW SLIDE ************** --> --- ## ??????! Before running circles we need a break right?! Time to learn p-hacking for experts (in German we say "T-Test rechnen bis die Tasten glühen") <br> <br> .pull-left[ <center> <img src="images/pHackingComic.jpg", height="400px"> </center> ] .pull-right[ Instead of p-hacking why not just starting a research program to solve finally this paradox: <center> <img src="https://i.giphy.com/media/xCBE0RPfYsyWI/giphy.webp", width="400" height="300"> </center> ] <!-- *********** HEADING ************** --> --- class: heading,middle Part 3: Introduction to R - Writing Functions *Statisticians are lazy as hell, that's why we write code only once!* <!-- *********** NEW SLIDE ************** --> --- ## ?!!! Never get completely lost anymore Otherwise it could get dark (at least your mood when writing code)! <br> <br> <center> <img src="https://i.giphy.com/media/bPCwGUF2sKjyE/giphy.webp", width="400" height="300"> </center> <!-- *********** HEADING ************** --> --- class: heading,middle Part 3: Introduction to R - Adding Packages *Statisticians are lazy as hell, that's why we add packages to avoid writing any code!* <!-- *********** NEW SLIDE ************** --> --- ## !! Smart kids let the computer do the job But please do not pet my cat!!! <br> <center> <img src="https://media4.giphy.com/media/d7nd6bdypnYjGT1jP3/giphy.gif?cid=ecf05e47scl4l1loifczg8vwcos4wex8tttk74cypq1tejk2&rid=giphy.gif&ct=g", width="400" height="300"> </center> <!-- *********** NEW SLIDE ************** --> --- ## add packages * the R community’s package development means it has the most prewritten functionality of any data analysis software New packages are installed via ````markdown install.packages('package_name') # mind the quotes ```` and need to be loaded at the beginning of each session (when using them): ````markdown library(package_name) # no quotes necessary ```` cool kidz use functions: ````markdown usePackage <- function(p) { if (!is.element(p, installed.packages()[,1])) install.packages(p, dep = TRUE) require(p, character.only = TRUE) } usePackage("tidyverse") ```` <!-- *********** NEW SLIDE ************** --> --- ## The (art) gallery of the most impressionistic packages I - **tidyverse** **tidyverse** is a collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures: https://www.tidyverse.org/ <center> <img src="images/rmarkdownPic.jpg", width="50%"> </center> just write: ```r install.packages("tidyverse") ``` <!-- *********** NEW SLIDE ************** --> --- ## The (art) gallery of the most impressionistic packages II - **tidyverse** Which packages are included in tidyverse? see: https://www.tidyverse.org/packages/ Using these collection of R package you have everything to analyze the complete data analysis pipeline: <center> <img src="images/dataAnalysisPipeline.jpg", height=250px"> </center> <br> <br> gread introductiory book: Wickham, H., & Grolemund, G. (2017). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc. https://r4ds.had.co.nz/ > if you confronted in the future with hughe datasets (>> 100mb) then it could be reasonable to teach yourself the R package **data.table**; e.g. watch: https://www.youtube.com/watch?v=SfrjF5YSj0Y <!-- *********** NEW SLIDE ************** --> --- ## The (art) gallery of the most impressionistic packages III - **psych** Psychometric theory in one central package in R called **psych**: [https://personality-project.org/r/book/](https://personality-project.org/r/book/) <br> <br> *Psychometrics is that area of psychology that specializes in how to measure what we talk and think about (focused on problems in measurement)* <br> <br> What is possible in R?! check out: Mair, P. (2018). Modern Psychometrics with R. Springer International Publishing. https://doi.org/10.1007/978-3-319-93177-7 <!-- *********** NEW SLIDE ************** --> --- ## The (art) gallery of the most impressionistic packages IV - entering the world of hypothesis tests **Statistical hypothesis test** is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis; allows us to make probabilistic statements about population parameters. <br> Set of great R packages: * **afex**: convenience functions for analyzing factorial experiments using ANOVA or mixed models (multiple wrapper functions) * **ggstatsplot**: extension of **ggplot2** package for creating graphics with details from statistical tests included in the information-rich plots themselves * **BayesFactor**: enables the computation of Bayes factors in standard designs, such as one- and two- sample designs, ANOVA designs, and regression (any evidence for the H0 out there?!) <!-- *********** NEW SLIDE ************** --> --- ## The (art) gallery of the most impressionistic packages V - entering the world of regression models **Regression models** provide a function that describes the relationship between one or more independent variables and a response, dependent, or target variable of any type (binary, numeric, ...) <br> Set of great R packages: * **lme4**: analyze grouped data and complex hierarchical structures using mixed-effects models + **lmerTest**: provides p-values in type I, II or III anova and summary tables for lmer model fits * **nlme**: fits a nonlinear mixed-effects model <br> <br> gread starting book: Fox, J. (2016). Applied Regression Analysis and Generalized Linear Models. SAGE Publications. <!-- *********** NEW SLIDE ************** --> --- ## The (art) gallery of the most impressionistic packages VI - entering the world of dynamic documents **Dynamic analysis documents** combine code, rendered output (such as figures), and text using the **rmarkdown** package * R Markdown allows you to create documents that serve as a neat record of your analysis * enables reproducible research (appendix to a paper, upload it to an online repository, keep as a personal record, ...) * R Markdown file (.Rmd); when you knit the RMarkdown file, the Markdown formatting and the R code are evaluated, and an output file (HTML, PDF, etc) is produced * R Markdown makes use of *Markdown* syntax <br> <br> gread starting book: Xie, Y. (2017). Dynamic Documents with R and knitr. Chapman and Hall/CRC, https://duhi23.github.io/Analisis-de-datos/Yihue.pdf check out also the file "additional scripts -> rmarkdown package" <!-- *********** HEADING ************** --> --- class: heading,middle Part 4: Amazing Applications of R <!-- *********** NEW SLIDE ************** --> --- ## !!? It's a new dawn! It's a new day! yeah (quote: Nina Simone - Feeling Good) If you replace pain by pleasure (or better joy) you get what I mean... <br> <center> <img src="https://i.giphy.com/media/saAZVlxwMPOW4/giphy.webp", width="400" height="300"> </center> <!-- *********** HEADING ************** --> --- class: heading,middle Part 4: Amazing Applications of R - analyses sequences *<font size="5">The most important skill for the applied statistician (some people call such a person "data scientist") is to learn sequences of analyses and the assumptions and relationships of different statistical models (quote by J.F.).</font>* <!-- *********** NEW SLIDE ************** --> --- ## Motivation to learn the dependencies between different statistical models / frameworks I **A story of the six blind men and the elephant.** Six blind men were discussing exactly what they believed an elephant to be, since each had heard how strange the creature was, yet none had ever seen one before. So the blind men agreed to find an elephant and discover what the animal was really like... <center> <img src="images/storyElephant.jpg", height="300px"> </center> from http://www.1000ventures.com/business_guide/crosscuttings/knowing_people_perceptions_elephant.html <!-- *********** NEW SLIDE ************** --> --- ## Motivation to learn the dependencies between different statistical models / frameworks II **concept of Generalizability Theory:** <center> <img src="images/GeneralizabilityTheory.jpg", width="500px"> </center> Sources of variability of results: * persons * items (there is a potential universe of possible items for the query of individual knowledge areas) * statistical models * time point of measurement <br> <br> recommended book: Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton, Mifflin and Company. and article: https://www.researchgate.net/publication/227580118_Generalizability_Theory_Overview <!-- *********** NEW SLIDE ************** --> --- ## descriptive summary statistics I blub... <!-- *********** NEW SLIDE ************** --> --- ## hypothesis tests I - assumptions blub... <!-- *********** NEW SLIDE ************** --> --- ## multiple linear regression I - assumptions Assumption of a linear regression (A. 8 is normally never taught!): <center> <img src="images/assumptionsLinearModel.jpg", heigth="500px"> </center> > All models are wrong, but some are useful" - George Box. The aphorism acknowledges that statistical models always fall short of the complexities of reality but can still be useful nonetheless. <!-- *********** HEADING ************** --> --- class: heading,middle Part 4: Amazing Applications of R - Bibliometrix *Why read literature if you can use R?!* <!-- *********** NEW SLIDE ************** --> --- ## Bibliometrix I blub...