Advance Quantitative Data Analysis Tools

Norberto E. Milla, Jr.

Department of Statistics, Visayas State University

Outline of presentation

  • Introduction to softwares: R/RStudio, JASP, and jamovi

  • Descriptive statistics: Frequency tables and plots for categorical data, Summary statistics and plots for continuous data

  • Inferential statistics: t tests, ANOVA

  • Correlation and linear regression analyses

  • Factor analysis: exploratory, confirmatory

  • Structural equation modeling: path analysis, mediation analysis

Introduction to softwares

  • R: a free programming language for statistical analysis

  • RStudio: an Integrated Development Environment for R programming

  • JASP: Graphical User Interface for data analysis using R

  • jamovi: Graphical User Interface for data analysis using R

Introduction to softwares: R/RStudio

Introduction to softwares: JASP

Introduction to softwares: jamovi

Descriptive statistics

Frequency table and bar plot of categorical data

Descriptive statistics

Bar plot of categorical data (from R/RStudio)

Descriptive statistics

Bar plot of categorical data (from R/RStudio)

Descriptive statistics

Summary statistics for continuous data

Comparing two groups

Who earns more? IT workers or industrial workers?

Student’s t test

  • compare 2 groups based on numeric variable (interval/ratio)

  • requires: independent normal distributions with equal variances

Comparing two groups

Welch’s t test

  • requires: independent normal distributions with unequal variances

Wilcoxon rank-sum test a.k.a. Mann-Whitney U test

  • non-normal data

  • ordinal data

Comparing two groups

JASP output

Comparing two groups

jamovi output

Comparing 3 or more groups

Who earns more? Whites, Blacks, or Asians?

ANOVA

  • compare more than 2 groups based on numeric variable (interval/ratio)

  • requires: independent normal distributions with equal variances

Kruskal-Wallis test

  • non-normal data

  • ordinal data

Comparing 3 or more groups

Who earns more? Whites, Blacks, or Asians?

JASP output

Comparing 3 or more groups

Who earns more? Whites, Blacks, or Asians?

jamovi output

Correlation analysis

Is college GPA correlated with average high-school grade in mathematics and sex?

  • Pearson r: bivariate normal distribution, data are both continuous

  • Spearman rank: non-normal, ordinal data

  • Point-biserial: binary data vs continuous data

  • Rank biserial: binary data vs ordinal data

Correlation analysis

Linear regression analysis

Regression analysis is a technique of studying the dependence of one variable (called dependent variable), on one or more independent variables (called explanatory variables)

  • Estimates the relationship between the dependent variable and the explanatory variable(s)

  • Measures the effect of each of the explanatory variable on the dependent variable, controlling the effects of all other explanatory variables

  • Predicts the value of the dependent variable for a given value of the explanatory variable(s)

Linear regression analysis

What are the effects of education, experience, and tenure on wage? Is there a sex difference in wage after adjusting for the effects of education, experience, and tenure?

Factor analysis

  • investigates the underlying (unobserved/latent) factor structure that can be used to explain the correlations in a set of observed indicators

  • used to conceptualize new constructs, to develop instruments, to select items as a short form scale, or to organize observed variables into meaningful subgroups

  • correlation coefficients calculation, number of factors determination, factor extraction, and factor rotation, naming or labeling of factors

  • Exploratory or confirmatory

Factor analysis: exploratory

Suitability of EFA:

  • Bartlett’s test of sphericity: tests whether the correlation coefficients are all 0

    • a significant result is desired
  • Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy:indicates the degree to which each variable in a set is predicted without error by the other variables

    • KMO>0.90 is ideal; KMO>=0.8 is acceptable

Factor analysis: exploratory

Rules for determining number of factors to retain

  • “Elbow” in the scree plot

  • Kaiser’s eigenvalue “greater than 1”

  • Parallel analysis (simulation-based)

  • Cumulative proportion of variance explained

Factor analysis: exploratory

Factor analysis: exploratory

Factor analysis: confirmatory

Factor analysis: confirmatory

Structural equation models: Path analysis

  • exogenous variables: are not predicted by other variables in the model

  • endogenous variables: being predicted by other variables in the model

Structural equation models: Path analysis

  • manifest variable: measured/observed; represented by squares or rectangles

  • latent variable: unobserved; represented by ovals or circles

Structural equation models: Path analysis

  • measurement model: defines the relationships between the latent variables and the observed variables

  • structural model: defines the relationships between the latent variables

Structural equation models: Path analysis

Structural equation models: Mediation analysis

  • Mediation analysis investigates whether and to what extent the effect of a variable X on variable Y is explained by the variable M.

Structural equation models: Mediation analysis

Structural equation models: Mediation analysis

Structural equation models: Mediation analysis

Try this out!

Structural equation model

Structural equation model