Journey through R

class: center, middle, inverse, title-slide

# Journey through R
## A motivating workshop
### Julius Fenn
### January 18, 2023

---

## Workshop

This workshop consists of three parts:

1 Pep talk (*a speech which is intended to encourage someone to make more effort or feel more confident*)
  + first some basics
  
2 Knowledge Management: How to learn these things?!
  
3 Introduction to R
 + Overview
 + Objects
 + Data Structures
 + ...
 
4 Amazing Applications of R
  + typical analyses sequences in action
  + ...

---
class: heading,middle

Part 1: Pep talk

---
class: heading,middle

Part 1: Pep talk - Some Basics

---
## Definition: What is statistics?

- Statistics is the science of responsible data analysis.
- Statistics is a cross-sectional discipline that is characterized by a combination of knowledge in
  - Mathematics (abstraction, modeling, stochastics, numerics),
  - Computer science (programming, scientific computing),
  - Fields of application (life, natural or economic sciences, etc.).
- Statistical modeling allows the description of stochastic phenomena and thus supports the finding of rational decisions under uncertainty.

<br>

Encyclopedia Britannica: Statistics is the art and science of gathering, analyzing and making inferences from data. Originally associated with numbers gathered for governments, the subject now includes large bodies of method and theory.

---
## The theoretical master-mind: The Statistician

**Statisticians:** theoretical driven, discussing terms like point estimates, margins of error, confidence intervals and are separated between  “Frequentists” and “Bayesians”
  - The Frequentist approach to statistics (and testing) is a method which makes predictions on the underlying truths of the experiment, using only data from the current experiment.
  - The Bayesian approach to statistics is a method that encodes past knowledge of similar experiments into a statistical device, known as prior. This prior is combined with current experiment data to make a conclusion on the test (knowledge accumulation).

<br>
<a href="https://cxl.com/blog/bayesian-frequentist-ab-testing/" target="_blank">https://cxl.com/blog/bayesian-frequentist-ab-testing/</a>

---
## The modern (applied) statistician: The Data Scientist

**Data scientists:** *data analysis is an art*; a process of data ingest, data transformation, exploratory data analysis, model selection, model evaluation, and data storytelling

<br>
see book: Peng, R. D., & Matsui, E. (2016). The Art of Data Science: A Guide for Anyone who Works with Data. Lulu.com. https://bookdown.org/rdpeng/artofdatascience/

---
class: heading,middle

Part 1: Pep talk - The Real Talk

---
## Why learn programming?!

(will I look like this cat?)

<br>
<svg aria-hidden="true" role="img" viewBox="0 0 384 512" style="height:1em;width:0.75em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M192 496C86 496 0 410 0 304C0 192 96 16 192 16s192 176 192 288c0 106-86 192-192 192zM156.5 138l0 0 0 0 0 0c5.5-6.9 4.4-17-2.5-22.5s-17-4.4-22.5 2.5L144 128c-12.5-10-12.5-10-12.5-10l0 0 0 0-.1 .1-.2 .2-.6 .8c-.5 .7-1.3 1.7-2.2 3c-1.9 2.6-4.5 6.3-7.7 11c-6.3 9.4-14.6 23-23 39.7C81.1 206.1 64 252.6 64 304c0 8.8 7.2 16 16 16s16-7.2 16-16c0-44.6 14.9-86.1 30.3-116.8c7.6-15.3 15.3-27.7 21-36.3c2.8-4.3 5.2-7.6 6.8-9.8c.8-1.1 1.4-1.9 1.8-2.4l.4-.6 .1-.1 0 0z"/></svg> <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M253.3 35.1c6.1-11.8 1.5-26.3-10.2-32.4s-26.3-1.5-32.4 10.2L117.6 192H32c-17.7 0-32 14.3-32 32s14.3 32 32 32L83.9 463.5C91 492 116.6 512 146 512H430c29.4 0 55-20 62.1-48.5L544 256c17.7 0 32-14.3 32-32s-14.3-32-32-32H458.4L365.3 12.9C359.2 1.2 344.7-3.4 332.9 2.7s-16.3 20.6-10.2 32.4L404.3 192H171.7L253.3 35.1zM192 304v96c0 8.8-7.2 16-16 16s-16-7.2-16-16V304c0-8.8 7.2-16 16-16s16 7.2 16 16zm96-16c8.8 0 16 7.2 16 16v96c0 8.8-7.2 16-16 16s-16-7.2-16-16V304c0-8.8 7.2-16 16-16zm128 16v96c0 8.8-7.2 16-16 16s-16-7.2-16-16V304c0-8.8 7.2-16 16-16s16 7.2 16 16z"/></svg>: https://osf.io/ytb8q

---
## arguable reason: to impress someone

<br>
*Anyone wants to see impressive R Code in action?!*

<br>
<br>
<a href="https://stats.stackexchange.com/questions/423/what-is-your-favorite-data-analysis-cartoon" target="_blank">https://stats.stackexchange.com/questions/423/what-is-your-favorite-data-analysis-cartoon</a>

---
## better reason: to embrace the complexity of our world - prediction

to appreciate uncertainty of our predictions (e.g. prediction paradox)
<center>
<img src="images/appreciateUncertainty.jpg", height="250px">
</center>

> random variables, distributions, expectations, confidence intervall, variance,...

<br>
see book: Silver, N. (2015). The Signal and the Noise: Why So Many Predictions Fail--but Some Don’t. Penguin Publishing Group.

---
## better reason: to embrace the complexity of our world - system theory

> computational modelling, simulation...

<br>
see book: Meadows, D. H. (2008). Thinking in Systems: A Primer. Chelsea Green Pub.

---
## down-to-earth reason: to get a job! - where to study?

- Göttingen: Master of Science (MSc) in Applied Statistics; https://www.uni-goettingen.de/en/421501.html
- Bamberg:  Masterstudiengang Survey-Statistik; https://www.uni-bamberg.de/miss/
- Trier: Master of Science (MSc) in Applied Statistics; https://www.uni-trier.de/universitaet/fachbereiche-faecher/fachbereich-iv/faecher/volkswirtschaftslehre/professuren/wirtschafts-und-sozialstatistik/studieren/applied-statistics-msc
- Tübingen: Kognitionswissenschaft (MSc); https://uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/informatik/studium/studiengaenge/kognitionswissenschaft/
- Economics Master, ...

---
class: heading,middle

Part 2: Knowledge Management

---
## Check out the collection of materials!

see: https://docs.google.com/document/d/1Z40Rkux_Ysq15VziCJJH21ca07ipwN52dA_LFYIsZ2g/edit?usp=sharing

<br>
start reading: 
- Possible Learning Process
- Mixed -> Knowledge management and learning

---
## Getting Things Done - workflow

**workflow:**
<center>
<img src="images/GTD_workflow.jpg", height="450px">
</center>

<br>
see book: Allen, D. (2015). Getting Things Done: The Art of Stress-Free Productivity. Penguin.

---
## Getting Things Done - projects

**projects:**
*a project is anything we want to do that requires more than one action step. It’s therefore a mechanism to remember that, when we finish that first action step, there will still be something more to do*

1. Set up a project list, which is an index, in no particular order, of all your open loops (to dos).

2. For every project define at least the first next action step (OR waiting for, or calendar action).

3. The Projects list and project plans are typically reviewed in your GTD Weekly Review, ensuring each project has at least one current next action, waiting for, or calendar item.

4. It’s fine to have multiple next actions on any given project, as long as they are parallel and not sequential actions.

5. Projects are listed by the outcome you will achieve when you can mark it as done.
  + Effective project names motivate you toward the outcome you wish to achieve, and give you clear direction about what you are trying to accomplish.
  
<br>
<br>
=> that's how you set up a **learning plan**

---
## How to learn - One Approach

**Organize your knowledge as a "Zettelkasten":**
<center>
<img src="images/Zettelkasten_Luhmann.jpg", height="150px">
</center>

- use Anki: https://apps.ankiweb.net/
- organize the "Zettel" by using Obsidian: https://obsidian.md/

Check out YouTube Video SpiegelMining – Reverse Engineering von Spiegel-Online: https://www.youtube.com/watch?v=-YpwsdRKt8Q

---
class: heading,middle

Part 3: Introduction to R

---
class: heading,middle

Part 3: Introduction to R - Overview

*Setting up your first project*

---
## Main Literatur

* easy readable introduction R and statistics: Field, Andy, Jeremy Miles, and Zoë Field. Discovering Statistics Using R. SAGE, 2012.
* Resource for improving coding skills and deepening (technical) understanding of R:
  + Wickham, Hadley. Advanced R, Second Edition. CRC Press, 2019. https://adv-r.hadley.nz/.
  + Jones, Owen, Robert Maillardet, and Andrew Robinson. Introduction to Scientific Programming and Simulation Using R, Second Edition. CRC Press, 2014. https://nyu-cdsc.github.io/learningr/assets/simulation.pdf.
  
  
<br>
plus collected materials / workshops...

---
## Introduction to R

* R is a programming language and tool for statistical computing and data analysis
* consist of basic functionalities (i.e., objects and functions) as well as packages that allow for robust and efficient coding
* in R are multiple object-oriented programming (OOP) included like S3, R6, S4, ..., enables *polymorphism* (use the same function form for different types of input)

```r
summary(c(TRUE, TRUE, FALSE, TRUE))
```

```
##    Mode   FALSE    TRUE 
## logical       1       3
```

```r
summary(rnorm(n = 100, mean = 0, sd = 20))
```

```
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -65.9396 -14.6114  -0.7923  -0.2415  14.8804  44.3164
```

Important:
* Everything that exists is an object.
* Everything that happens is a function call.

---
## Side-Note: Polymorphism?

```r
methods(generic.function = "summary")[1:20]
```

```
##  [1] "summary,ANY-method"            "summary,DBIObject-method"     
##  [3] "summary.aov"                   "summary.aovlist"              
##  [5] "summary.aspell"                "summary.check_packages_in_dir"
##  [7] "summary.connection"            "summary.data.frame"           
##  [9] "summary.Date"                  "summary.default"              
## [11] "summary.Duration"              "summary.ecdf"                 
## [13] "summary.factor"                "summary.ggplot"               
## [15] "summary.glm"                   "summary.haven_labelled"       
## [17] "summary.hcl_palettes"          "summary.infl"                 
## [19] "summary.Interval"              "summary.lm"
```

other important functions are:

```r
?typeof
?class
?mode
```

---
## set your working directory

* every time when starting R set your working directory (or better create an R Project*)

<br>

*you could also use the R API:

```r
# sets the directory of location of this script as the current directory
setwd(dirname(rstudioapi::getSourceEditorContext()$path))
```

---
## your workflow

! **modular programming**

<br>

if using R projects: 
* Create a project folder
* In this folder, organize your file in subfolders (e.g., data)
* All filepaths in your script(s) are specified relative to the folder’s top level
* Thus, your working directory is always this top level

---
## Side-Note: modular programming?

you separate your programs into separate functional units (modules). Each module does a well-defined task and the modules are called by one-another as needed. Typically, the state of each module is encapsulated and only supposed to be altered by the functions from that module.

> Modular programming decomposes a large program into modules!

---
## Side-Note: modular programming - Importance

without following the principle of modular programming (*or sometimes structured programming*) you cannot set up a typical data science project:

see in detail: https://r4ds.had.co.nz/introduction.html

---
## RStudio Projects

When using a project, RStudio will automatically set your working directory to the location of the project file.

Additionally, RStudio will

* load .RData files
* load .RHistory files
* open source files (scripts)
* restore RStudio settings

This will particularly benefit your workflow when collaborating with others/sharing your script, for example by using

[https://codehorizons.com/making-your-first-github-r-project/](https://codehorizons.com/making-your-first-github-r-project/)

---
## hands-on: set up a project!

and remember the philosophy of R:

**Everything that exists is an object.**

**Everything that happens is a function call.**

---
class: heading,middle

Part 3: Introduction to R - Objects

*Using R as a sophisticated calculator*

---
## ??????! I am lost

Is anyone lost?

---
## the very basics

.pull-left[
**Explanation:**

1 get help

2 assignments

3 operators

4 comparisons

5 comments: everything that follows #

<br>
> case sensitive: usage of CAPITAL and small letters matters!

]

.pull-right[
**R commands:**

```r
?topic, help(topic)
```
2

```r
x <- 5 (recommended), x = 5, 5 -> x
```
3

```r
+, -, *, /, ^, &, &&; see help("+")
```
4

```r
== , != , >, >= , <, <=; see help("=")
```
5

```r
# I am a comment
```
]

---
## basic data structures

.pull-left[
**Explanation:**

1 integer

2 double

3 logical

4 character

5 missings
]

.pull-right[
**R commands:**

```r
1; 2; 301L
```
2

```r
1.0; .141; 1.23e-3; NaN; Inf; -Inf
```
3

```r
TRUE; FALSE #(not T, F!)
```
4

```r
"hello"; "I'm a string"
```
5

```r
NA
```
]

---
## basic data structures - construction & coercion I

.pull-left[
**Explanation:**

Coercion: When you call a function with an argument of the wrong type, R will try to coerce values to a different type so that the function will work. R will convert from more specific types to more general types.

**R commands:**

you define a vector x as follows

```r
x <- c(1, 2, 3, 4, 5)
x
```

```
## [1] 1 2 3 4 5
```

```r
typeof(x); class(x)
```

```
## [1] "double"
```

```
## [1] "numeric"
```
]

.pull-right[
**R commands:**

change the second element of the vector to the word “hat.”

```r
x[2] <- "hat"
x
```

```
## [1] "1"   "hat" "3"   "4"   "5"
```

```r
typeof(x); class(x)
```

```
## [1] "character"
```

```
## [1] "character"
```
]

---
## basic data structures - construction & coercion II

.pull-left[
**Explanation:**

Coercion: 
* coerce to a type xxx by as.xxx()
* check if xxx is a specific type by is.xxx()
* when combining different data types, they will be coerced to the most flexible type
* coercion often happens automatically

]

.pull-right[
**R commands:**

```r
x <- 1
is.numeric(x)
```

```
## [1] TRUE
```

```r
as.logical(x)
```

```
## [1] TRUE
```

```r
x <- c(FALSE, FALSE, TRUE)
as.numeric(x)
```

```
## [1] 0 0 1
```
]

---
## basic data structures - represent nothing

.pull-left[
**Explanation:**

In R, there are three ways to represent 'nothing', but the reason for the missingness of the information can be distinguished:
* NA: missing sample values, impossible coercion, . . .
* NaN: undefined results in mathematical operation (e.g. log(-1), 1/0)
* NULL: null pointer, i.e. pointer in empty/undefined memory
]

.pull-right[
**R commands:**

```r
c(3, NA)
```

```
## [1]  3 NA
```

```r
c(3, 0/0)
```

```
## [1]   3 NaN
```

```r
c(3, NULL)
```

```
## [1] 3
```

```r
max(3, NA)
```

```
## [1] NA
```
]

---
## basic data structures - Infinity

.pull-left[
**Explanation:**

Some mathematical operations can be performed with Inf and -Inf:
]

.pull-right[
**R commands:**

```r
max(3, Inf)
```

```
## [1] Inf
```

```r
min(3, Inf)
```

```
## [1] 3
```

```r
c(Inf + Inf, (-Inf) * Inf, Inf - Inf)
```

```
## [1]  Inf -Inf  NaN
```
]

---
## hands-on: try out the basic data structures

The bored people can open the R Reference Card 2.0 and try out more fancy stuff: https://cran.r-project.org/doc/contrib/Baggott-refcard-v2.pdf

<br>
check out also the folder "additional scripts -> Basic Vocabulary"

---
class: heading,middle

Part 3: Introduction to R - Data Structures

*We are going to face the 10th dimension (wuuuuhu)*

---
## ??????! I am not lost but thirsty

Anyone needs a coffee / break?

---
## complex data types

complex data structures in R can be organized by their dimensionality and if all their contents are of the same type (homogeneous), or not (heterogeneous)

.pull-left[
**homogeneous:**

* 1d atomic vector

```r
1:10; c(1,2,3,4,5,6,7,8,9,10)
```

* 2d matrix

```r
matrix(data = NA, nrow = 2, ncol = 3)
```

* nd array

```r
array(1:60, dim=c(3,4,5))
```
]

.pull-right[
**heterogeneous:**

* 1d list

```r
list(1:10, letters)
```

* 2d matrix data frame

```r
data.frame(id = 1:26, 
           letters = letters, 
           constant = "Hello World")
```
]

> we need depending on the issue we are facing different database paradigms!

check out Fireship YouTube Video: https://youtu.be/W2Z7fbCLSTw

---
## complex data types - very rare in psychology (social science)

often we have "simple" rectangular data, which are made of (values are associated with a variable and a observation):
* **column**, which represents a variable (like ID)
* **rows**, which represents an instance of data in the data set (like a participant)

```r
DT::datatable(dat_twins, options = list(pageLength = 5))
```

<div id="htmlwidget-c44addb729d0cbdf480f" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-c44addb729d0cbdf480f">{"x":{"filter":"none","vertical":false,"data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20"],["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20"],["1","1","2","2","3","3","4","4","5","5","6","6","7","7","8","8","9","9","10","10"],["1","2","1","2","1","2","1","2","1","2","2","1","2","1","2","1","1","2","2","1"],["2","2","2","2","2","2","2","2","2","2","1","1","1","1","1","1","1","1","1","1"],[1913.88,1684.89,1902.36,1860.24,2264.25,2216.4,1866.99,1850.64,1743.04,1709.3,1689.6,1806.31,2136.37,2018.92,1966.81,2154.67,1767.56,1827.92,1773.83,1971.63],[1005,963,1035,1027,1281,1272,1051,1079,1034,1070,1173,1079,1067,1104,1347,1439,1029,1100,1204,1160],[6.08,5.73,6.22,5.8,7.99,8.42,7.44,6.84,6.48,6.43,7.99,8.76,6.32,6.32,7.6,7.62,6.03,6.59,7.52,7.67],[54.7,54.2,53,52.9,57.8,56.9,56.6,55.3,53.1,54.8,57.2,57.2,57.2,57.2,55.8,57.2,57.2,56.5,59.2,58.5],[96,89,87,87,101,103,103,96,127,126,101,96,93,88,94,85,97,114,113,124],[57.607,58.968,64.184,58.514,63.958,61.69,133.358,107.503,62.143,83.009,61.236,61.236,83.916,79.38,97.524,99.792,81.648,88.452,79.38,72.576],["below average","below average","below average","below average","medium","medium","medium","below average","above average","above average","medium","below average","below average","below average","below average","below average","below average","above average","medium","above average"]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>ID<\/th>\n      <th>IDZP<\/th>\n      <th>GR<\/th>\n      <th>GES<\/th>\n      <th>OG<\/th>\n      <th>VG<\/th>\n      <th>CC<\/th>\n      <th>KU<\/th>\n      <th>IQ<\/th>\n      <th>KG<\/th>\n      <th>IQ.cat<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"pageLength":5,"columnDefs":[{"className":"dt-right","targets":[5,6,7,8,9,10]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[5,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>

---
## every data set should be accompanied by a codebook

Sample size N = 20
Variables (columns):
* ID: Identifier of the individual person
* IDZP: Identifier of the twin pair
* GR: Birth order
* GES: Sex (1 = male, 2 = female)
* OG: Surface area of the cerebral cortex in cm2
* VG: Volume of the forebrain in cm3
* CC: Area of the corpus callosum in cm2
* KU: Circumference of head in cm
* IQ: Intelligence quotient
* KG: Body weight in kg
* additionally IQ.cat: grouped intelligence quotient in 3 groups

Study: https://n.neurology.org/content/50/5/1246

---
## Side-Note: the most scientific study ever

Hypothesis: the size and shape of the human forebrain predict intelligence!

<br>
the mighty correlation matrix:

```r
round(x = cor(x = dat_twins[, c("VG", "CC", "KU", "KG", "IQ")], 
              method = "spearman"), digits = 2)
```

```
##      VG   CC   KU   KG   IQ
## VG 1.00 0.80 0.63 0.36 0.11
## CC 0.80 1.00 0.59 0.13 0.34
## KU 0.63 0.59 1.00 0.22 0.21
## KG 0.36 0.13 0.22 1.00 0.07
## IQ 0.11 0.34 0.21 0.07 1.00
```

> What is your opinion?

* Spearman correlation coefficient only maps monotone relationships; r_SP > 0, if in tendency x large -> also y large and vice versa; indicate a monotonic relationship in the same direction (not linear relationship!)
  + to be applied especially when the data are not normally distributed, the scales have unequal answering options, for very small sample sizes (or better use concordance measures)

---
## complex data types - the SQL side of the story

for huge data sets you could use, for example SQL (Structured Query Language), which is a domain-specific language used in programming and designed for managing data held in a relational database management system; check out: https://www.dbis.informatik.uni-goettingen.de/Mondial/

> multiple relational databases are matched by a primary key / multiple primary keys

and complex data bases can be depicted for example as entity relationship models:

---
## complex data types - add attributes

All objects can have arbitrary additional attributes, used to store metadata about the object

* can be thought of as a named list (with unique names); other frequently encountered attributes: "dimnames", "names", "class"(!)
* can be accessed individually with attr() or all at once (as a list) with attributes()
* arrays are simply vectors with a "dim"-attribute.
* factor is a vector with attribute levels

.pull-left[

```r
dat <- data.frame(id = 1:26, 
           letters = letters, 
           constant = "Hello World")
head(dat)
```

```
##   id letters    constant
## 1  1       a Hello World
## 2  2       b Hello World
## 3  3       c Hello World
## 4  4       d Hello World
## 5  5       e Hello World
## 6  6       f Hello World
```

]

.pull-right[

```r
attr(x = dat, which = "names")
```

```
## [1] "id"       "letters"  "constant"
```

```r
attributes(x = dat)
```

```
## $names
## [1] "id"       "letters"  "constant"
## 
## $class
## [1] "data.frame"
## 
## $row.names
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26
```

]

---
## hands-on: try out the different data structures

Hint: we have the following data structures:

.pull-left[
**homogeneous:**

* 1d atomic vector

```r
1:10; c(1,2,3,4,5,6,7,8,9,10)
```
* 2d matrix

```r
matrix(data = NA, nrow = 2, ncol = 3)
```
* nd array

```r
array(1:60, dim=c(3,4,5))
```
]

.pull-right[
**heterogeneous:**

* 1d list

```r
list(1:10, letters)
```

* 2d matrix data frame

```r
data.frame(id = 1:26, 
           letters = letters, 
           constant = "Hello World")
```
]

If you want to save your data structures to an R object e.g. write:

```r
myFirstVector <- 1:10
```

---
class: heading,middle

Part 3: Introduction to R - Subsetting

*From the 10th dimension down to spaceship earth again.*

---
## ??????! When you had to debug your first error messages

I'm a Celebrity … Get Me Out of Here! (also valuable TV show: Get the F*ck out of my House)

---
class: heading,middle

Part 3: Introduction to R - Flow control

*That's how adults play Looping Louie!*

---
## ??????! Before running circles we need a break right?!

Time to learn p-hacking for experts (in German we say "T-Test rechnen bis die Tasten glühen")
<br>
<br>

.pull-left[

]

.pull-right[

Instead of p-hacking why not just starting a research program to solve finally this paradox:

<center>
<img src="https://i.giphy.com/media/xCBE0RPfYsyWI/giphy.webp", width="400" height="300">
</center>
]

---
class: heading,middle

Part 3: Introduction to R - Writing Functions

*Statisticians are lazy as hell, that's why we write code only once!*

---
## ?!!! Never get completely lost anymore

Otherwise it could get dark (at least your mood when writing code)!

---
class: heading,middle

Part 3: Introduction to R - Adding Packages

*Statisticians are lazy as hell, that's why we add packages to avoid writing any code!*

---
## !! Smart kids let the computer do the job

But please do not pet my cat!!!
<br>

---
## add packages

* the R community’s package development means it has the most prewritten functionality of any data analysis software

New packages are installed via
````markdown
install.packages('package_name') # mind the quotes
````
and need to be loaded at the beginning of each session (when using them):
````markdown
library(package_name) # no quotes necessary
````
cool kidz use functions: 
````markdown
usePackage <- function(p) {
  if (!is.element(p, installed.packages()[,1]))
    install.packages(p, dep = TRUE)
  require(p, character.only = TRUE)
}
usePackage("tidyverse")
````

---
## The (art) gallery of the most impressionistic packages I - **tidyverse**

**tidyverse** is a collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures: https://www.tidyverse.org/

just write:

```r
install.packages("tidyverse")
```

---
## The (art) gallery of the most impressionistic packages II - **tidyverse**

Which packages are included in tidyverse?
see: https://www.tidyverse.org/packages/

Using these collection of R package you have everything to analyze the complete data analysis pipeline:
<center>
<img src="images/dataAnalysisPipeline.jpg", height=250px">
</center>

<br>
<br>
gread introductiory book: Wickham, H., & Grolemund, G. (2017). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc. https://r4ds.had.co.nz/

> if you confronted in the future with hughe datasets (>> 100mb) then it could be reasonable to teach yourself the R package **data.table**; e.g. watch: https://www.youtube.com/watch?v=SfrjF5YSj0Y

---
## The (art) gallery of the most impressionistic packages III - **psych**

Psychometric theory in one central package in R called **psych**: [https://personality-project.org/r/book/](https://personality-project.org/r/book/)

<br>
<br>
*Psychometrics is that area of psychology that specializes in how to measure what we talk and think about (focused on problems in measurement)*

<br>
<br>
What is possible in R?!

check out: Mair, P. (2018). Modern Psychometrics with R. Springer International Publishing. https://doi.org/10.1007/978-3-319-93177-7

---
## The (art) gallery of the most impressionistic packages IV - entering the world of hypothesis tests

**Statistical hypothesis test** is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis; allows us to make probabilistic statements about population parameters.

<br>
Set of great R packages: 
* **afex**: convenience functions for analyzing factorial experiments using ANOVA or mixed models (multiple wrapper functions)
* **ggstatsplot**: extension of **ggplot2** package for creating graphics with details from statistical tests included in the information-rich plots themselves
* **BayesFactor**: enables the computation of Bayes factors in standard designs, such as one- and two- sample designs, ANOVA designs, and regression (any evidence for the H0 out there?!)

---
## The (art) gallery of the most impressionistic packages V - entering the world of regression models

**Regression models** provide a function that describes the relationship between one or more independent variables and a response, dependent, or target variable of any type (binary, numeric, ...)

<br>
Set of great R packages: 
* **lme4**: analyze grouped data and complex hierarchical structures using mixed-effects models
  + **lmerTest**: provides p-values in type I, II or III anova and summary tables for lmer model fits
* **nlme**: fits a nonlinear mixed-effects model

<br>
<br>
gread starting book: Fox, J. (2016). Applied Regression Analysis and Generalized Linear Models. SAGE Publications.

---
## The (art) gallery of the most impressionistic packages VI - entering the world of dynamic documents

**Dynamic analysis documents** combine code, rendered output (such as figures), and text using the **rmarkdown** package

* R Markdown allows you to create documents that serve as a neat record of your analysis
*  enables reproducible research (appendix to a paper, upload it to an online repository, keep as a personal record, ...)
* R Markdown file (.Rmd); when you knit the RMarkdown file, the Markdown formatting and the R code are evaluated, and an output file (HTML, PDF, etc) is produced
* R Markdown makes use of *Markdown* syntax

<br>
<br>
gread starting book: Xie, Y. (2017). Dynamic Documents with R and knitr. Chapman and Hall/CRC, https://duhi23.github.io/Analisis-de-datos/Yihue.pdf

check out also the file "additional scripts -> rmarkdown package"

---
class: heading,middle

Part 4: Amazing Applications of R

---
## !!? It's a new dawn! It's a new day! yeah (quote: Nina Simone - Feeling Good)

If you replace pain by pleasure (or better joy) you get what I mean...

<br>

---
class: heading,middle

Part 4: Amazing Applications of R - analyses sequences

*<font size="5">The most important skill for the applied statistician (some people call such a person "data scientist") is to learn sequences of analyses and the assumptions and relationships of different statistical models (quote by J.F.).</font>*

---
## Motivation to learn the dependencies between different statistical models / frameworks I

**A story of the six blind men and the elephant.**

Six blind men were discussing exactly what they believed an elephant to be, since each had heard how strange the creature was, yet none had ever seen one before. So the blind men agreed to find an elephant and discover what the animal was really like...

from http://www.1000ventures.com/business_guide/crosscuttings/knowing_people_perceptions_elephant.html

---
## Motivation to learn the dependencies between different statistical models / frameworks II

**concept of Generalizability Theory:**

Sources of variability of results:

* persons
* items (there is a potential universe of possible items for the query of individual knowledge areas)
* statistical models
* time point of measurement

<br>
<br>
recommended book: Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton, Mifflin and Company.

and article: https://www.researchgate.net/publication/227580118_Generalizability_Theory_Overview

---
## descriptive summary statistics I

blub...

---
## hypothesis tests I - assumptions

blub...

---
## multiple linear regression I - assumptions

Assumption of a linear regression (A. 8 is normally never taught!):

> All models are wrong, but some are useful" - George Box. The aphorism acknowledges that statistical models always fall short of the complexities of reality but can still be useful nonetheless.

---
class: heading,middle

Part 4: Amazing Applications of R - Bibliometrix

*Why read literature if you can use R?!*

---
## Bibliometrix I

blub...