1.Libraries

library(tidyverse)
library(readxl)
library(ggstatsplot)
library(DataExplorer)
library(dlookr)
library(flextable)
library(DT)
library(jmv)
library(pwr)
library(eurostat)
library(rnaturalearth)
library(rnaturalearthdata)
library(leaflet)
library(factoextra)
library(FactoMineR)
library(ggpubr)

2.Datasets

diss <- read_csv("~/Desktop/R/data/diss.csv") %>% mutate_if(is.character, as.factor)
t_test <- read_excel("~/Desktop/R/various/jam_data.xlsx", sheet = 14)
one_way_anova <- read_excel("~/Desktop/R/various/jam_data.xlsx", sheet = 20)
sheet_15 <- read_excel("~/Desktop/R/various/jam_data.xlsx", sheet = 15)
sheet_25 <- read_excel("~/Desktop/R/various/jam_data.xlsx", sheet = 25)
sheet_1 <- read_excel("~/Desktop/R/various/jam_data.xlsx", sheet = 1)
sheet_18 <- read_excel("~/Desktop/R/various/jam_data.xlsx", sheet = 18)
sheet_24 <- read_excel("~/Desktop/R/various/jam_data.xlsx", sheet = 24)
sheet_16 <- read_excel("~/Desktop/R/various/jam_data.xlsx", sheet = 16)
sheet_12 <- read_excel("~/Desktop/R/various/jam_data.xlsx", sheet = 12)
stress <- read_csv("~/Desktop/R/data/stress.csv")

3.Exploratory data analysis (EDA)

3.1.Introductory view of the dataset

3.2.Comlete overview of selected variables (do not apply for extensive datasets!)

3.3.Comlete overview of selected variables with an interactive table

3.4.Overview of some statistics of numerical variables only

3.5.Descriptive statistics of selected numerical variables

## 
##  DESCRIPTIVES
## 
##  Descriptives                                                                                                 
##  ──────────────────────────────────────────────────────────────────────────────────────────────────────────── 
##                               elev         slope        s_ha         s_m2          invasive      rare         
##  ──────────────────────────────────────────────────────────────────────────────────────────────────────────── 
##    N                                157          157          157           157           157           157   
##    Missing                            0            0            0             0             0             0   
##    Mean                        1448.159     18.80255     35.59236      6.496815      2.152866     0.1592357   
##    Std. error mean             22.64264    0.5420004     1.345808     0.4151531     0.1671556    0.03781294   
##    95% CI mean lower bound     1403.780     17.74025     32.95462      5.683130      1.825247    0.08512367   
##    95% CI mean upper bound     1492.538     19.86485     38.23009      7.310500      2.480485     0.2333477   
##    Median                      1385.000     18.00000     37.00000      5.000000      2.000000      0.000000   
##    Standard deviation          283.7115     6.791246     16.86293      5.201854      2.094454     0.4737947   
##    Variance                    80492.20     46.12102     284.3584      27.05928      4.386739     0.2244815   
##    Range                       1195.000     33.00000     71.00000      23.00000      12.00000      2.000000   
##    Minimum                     926.0000     3.000000     5.000000      0.000000      0.000000      0.000000   
##    Maximum                     2121.000     36.00000     76.00000      23.00000      12.00000      2.000000   
##    Shapiro-Wilk W             0.9569441    0.9891714    0.9790337     0.8985776     0.8362938     0.3714480   
##    Shapiro-Wilk p             0.0000881    0.2695829    0.0171917    < .0000001    < .0000001    < .0000001   
##  ────────────────────────────────────────────────────────────────────────────────────────────────────────────

3.6.Checking for missing values in selected variables or the whole dataset

3.7.Checking the distribution of numerical variables

3.8.Checking for correlation of some of the numerical variables

3.9.Taking a look at the categorical variables

3.10.Taking a look at the categorical variables in the context of a selected categorical variable

3.11.Plotting numerical variables in the context of a grouping (categorical) variables

4.Inferential statistics

4.1.Comparing independent samples:

4.1.1.Parametric (2 groups) - Student’s and Welch’s t-test

Assumptions of the test:

  • Normality;
  • Independence;
  • Homogeneity of variance (homoscedasticity).

4.1.2.Parametric (> 2 groups) - Fisher’s and Welch’s one-way ANOVA

Assumptions of the test:

  • Normality;
  • Independence;

4.1.3.Nonparametric (2 groups) - Wilcoxon W (Mann-Whitney U) test

4.1.4.Nonparametric (> 2 groups) - Kruskal-Wallis Rank Sum Test

4.1.5.Robust (2 groups) - Yuen’s test for trimmed means

4.1.6.Robust (> 2 groups) - Heteroscedastic one-way ANOVA for trimmed means

4.2.Comparing dependent (paired) samples

4.2.1.Parametric (2 groups) - Student’s t-test

Assumptions of the test:

  • Normality;
  • Independence;
  • Homogeneity of variance (homoscedasticity).

4.2.2.Parametric (> 2 groups) - Fisher’s one-way repeated measures ANOVA

Assumptions of the test:

  • Normality;
  • Independence;
  • Homogeneity of variance (homoscedasticity).

4.2.3.Nonparametric (2 groups) - Wilcoxon signed-rank test

4.2.4.Nonparametric (> 2 groups) - Friedman rank sum test

4.2.5.Robust (2 groups) - Yuen’s test on trimmed means for dependent samples

4.2.6.Robust (> 2 groups) - Heteroscedastic one-way repeated measures ANOVA for trimmed means

4.3.Correlation

4.3.1.Parametric - Pearson’s r

4.3.2.Nonparametric - Spearman’s ρ

4.3.3.Robust - Percentage bend correlation

4.4.Scatterplots with statistical details

4.4.1.Parametric

4.4.2.Nonparametric

4.4.3.Robust

4.5.Bar charts for categorical data with statistical details

4.5.1.Parametric (unpaired) - Pearson’s χ2

Assumptions of the test:

  • Expected frequencies are sufficiently large;
  • Data are independent of one another.

4.5.2.Parametric (paired) - McNemar’s test

4.6.Pie charts for categorical data with statistical details

4.6.1.Parametric - One-sample proportion test

4.7.Dot-and-whisker plots for regression analyses

4.8.Histograms with statistical details from one-sample tests

4.9.Power analysis

5.Modelling

5.1.Linear regression

Assumptions of linear regression:

  • Normality;
  • Linearity;
  • Homogeneity of variance;
  • Uncorrelated predictors;
  • Residuals are independent of each other;
  • No “bad” outliers.
## 
##  LINEAR REGRESSION
## 
##  Model Fit Measures                                                                                                 
##  ────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
##    Model    R            R²           Adjusted R²    AIC         RMSE        F           df1    df2    p            
##  ────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
##        1    0.9033856    0.8161056      0.8123139    582.9513    4.287989    215.2383      2     97    < .0000001   
##  ────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
## 
## 
##  MODEL SPECIFIC RESULTS
## 
##  MODEL 1
## 
##  Model Coefficients - dan.grump                                            
##  ───────────────────────────────────────────────────────────────────────── 
##    Predictor     Estimate        SE           t               p            
##  ───────────────────────────────────────────────────────────────────────── 
##    Intercept     125.96556586    3.0409482     41.42312073    < .0000001   
##    dan.sleep      -8.95024973    0.5534577    -16.17151638    < .0000001   
##    baby.sleep      0.01052447    0.2710637      0.03882656     0.9691085   
##  ───────────────────────────────────────────────────────────────────────── 
## 
## 
##  ASSUMPTION CHECKS
## 
##  Collinearity Statistics                 
##  ─────────────────────────────────────── 
##                  VIF         Tolerance   
##  ─────────────────────────────────────── 
##    dan.sleep     1.651038    0.6056796   
##    baby.sleep    1.651038    0.6056796   
##  ─────────────────────────────────────── 
## 
## 
##  Normality Test (Shapiro-Wilk) 
##  ───────────────────────────── 
##    Statistic    p           
##  ───────────────────────────── 
##    0.9922840    0.8413983   
##  ─────────────────────────────

5.1.Binomial logistic regression

## 
##  BINOMIAL LOGISTIC REGRESSION
## 
##  Model Fit Measures                                                             
##  ────────────────────────────────────────────────────────────────────────────── 
##    Model    Deviance    AIC         R²-McF       χ²          df    p            
##  ────────────────────────────────────────────────────────────────────────────── 
##        1    321.1986    333.1986    0.3940874    208.9086     5    < .0000001   
##  ────────────────────────────────────────────────────────────────────────────── 
## 
## 
##  MODEL SPECIFIC RESULTS
## 
##  MODEL 1
## 
##  Model Coefficients - stress                                              
##  ──────────────────────────────────────────────────────────────────────── 
##    Predictor       Estimate       SE            Z            p            
##  ──────────────────────────────────────────────────────────────────────── 
##    Intercept       -4.43992668    1.08564769    -4.089657     0.0000432   
##    stability        0.11078612    0.01494365     7.413590    < .0000001   
##    flexibility      0.14233614    0.01639125     8.683664    < .0000001   
##    tasks           -0.11216244    0.01977059    -5.673196    < .0000001   
##    lack_train       0.01931151    0.01036419     1.863293     0.0624211   
##    lack_car_dev     0.04517356    0.01309809     3.448866     0.0005629   
##  ──────────────────────────────────────────────────────────────────────── 
##    Note. Estimates represent the log odds of "stress = Stressed" vs.
##    "stress = Unstressed"
## 
## 
##  ASSUMPTION CHECKS
## 
##  Collinearity Statistics                   
##  ───────────────────────────────────────── 
##                    VIF         Tolerance   
##  ───────────────────────────────────────── 
##    stability       1.831916    0.5458765   
##    flexibility     2.850794    0.3507794   
##    tasks           3.653803    0.2736875   
##    lack_train      1.172410    0.8529441   
##    lack_car_dev    1.417788    0.7053242   
##  ───────────────────────────────────────── 
## 
## 
##  PREDICTION
## 
##  Classification Table – …                              
##  ───────────────────────────────────────────────────── 
##    Observed      Unstressed    Stressed    % Correct   
##  ───────────────────────────────────────────────────── 
##    Unstressed           324          24     93.10345   
##      Stressed            55          64     53.78151   
##  ───────────────────────────────────────────────────── 
##    Note. The cut-off value is set to 0.5
## 
## 
##  Predictive Measures                                      
##  ──────────────────────────────────────────────────────── 
##    Accuracy     Specificity    Sensitivity    AUC         
##  ──────────────────────────────────────────────────────── 
##    0.8308351      0.9310345      0.5378151    0.8967691   
##  ──────────────────────────────────────────────────────── 
##    Note. The cut-off value is set to 0.5

6.Visualizations

6.1.Maps

6.2.Interactive maps

6.3.Cluster analyses

6.3.1.K-means clustering

6.3.2.Hierarchical clustering

6.4.Ordination

6.4.1.Principal Component Analysis (PCA)