Comp1_R Markdown

Preliminary tasks
Descriptive statistics
Correlations
T-test
ANOVA
Multiple Regression

Preliminary tasks

Set working Directory

This command tells RStudio which folder to look in to retreive the dataset or other files.

setwd("~/Desktop/Comp 1 - R Program")

Load packages

These commands prompt R to load and/or install various packages which will be utilized in later organizing, analyzing, and visualizing of data.

library(tidyverse)
# the tidyverse is a package bundle including ggplot2, dyplr, tidyr, readr, purrr, tibble, stringr, forcats, and more optional packages (https://tidyverse.org)
library(psych)
library(RColorBrewer)
library(car)
library(apaTables)
library(dplyr)
library(skimr)


# Set theme
theme_set(theme_classic())

Load dataset

Now, I will load the data set from an Excel file in the directory.

library(readxl)
dataset <- read_excel("Comp1_SPSS_subset_data_NEW_4.28.20.xlsx", 
    na = "NA")

Modify dataset, if needed

Some variables may need to be transformed, depending on the analysis and packages used.

dataset$parent_text <- as.factor(dataset$parent)

Descriptive statistics

See below an example of summary descriptive statistics along with visualizations. There are many descritpive statistics reports to choose from in R - these are a couple of my prefered functions.

describe(dataset)

	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
gender	1	62	1.225807	0.4215255	1.0000	1.160000	0.0000000	1.00	2.00	1.00	1.2799750	-0.3668039	0.0535338
gender_text*	2	62	NaN	NA	NA	NaN	NA	Inf	-Inf	-Inf	NA	NA	NA
coach	3	62	3.725807	1.1186842	4.0000	3.820000	1.4826000	1.00	5.00	4.00	-0.7710601	-0.4376070	0.1420730
coach_text*	4	62	NaN	NA	NA	NaN	NA	Inf	-Inf	-Inf	NA	NA	NA
parent	5	62	1.661290	0.4771345	2.0000	1.700000	0.0000000	1.00	2.00	1.00	-0.6651746	-1.5822873	0.0605961
parent_text*	6	62	1.661290	0.4771345	2.0000	1.700000	0.0000000	1.00	2.00	1.00	-0.6651746	-1.5822873	0.0605961
age	7	62	13.935484	2.0071264	14.0000	13.920000	1.4826000	10.00	18.00	8.00	0.0510244	-0.5487702	0.2549053
auto	8	60	4.227778	0.4760919	4.2500	4.244792	0.3706500	3.00	5.00	2.00	-0.2765614	-0.4811253	0.0614632
comp	9	60	4.287500	0.4458495	4.2500	4.281250	0.3706500	3.25	5.00	1.75	0.1127005	-0.8900254	0.0575589
relate	10	60	4.397222	0.4896208	4.5000	4.449653	0.3706500	3.00	5.00	2.00	-0.8231632	0.3519376	0.0632098
auton	11	62	4.213710	0.4861752	4.1875	4.237500	0.4633125	3.00	5.00	2.00	-0.2924376	-0.6096758	0.0617443
control	12	62	2.288307	0.7532677	2.0625	2.270000	0.8339625	1.00	3.75	2.75	0.2376715	-1.0415801	0.0956651
amotiv	13	62	1.750000	0.7520464	1.5000	1.640000	0.7413000	1.00	4.00	3.00	1.1305059	0.8053705	0.0955100
success	14	62	4.190323	0.4214377	4.2000	4.184000	0.5930400	3.40	5.00	1.60	0.1453626	-0.8832100	0.0535226

Comments:

Different functions handle missing values differently. While this approach provides flexibility across fucntions, it can be challenging to keep track of each function’s syntax requirements.

or

skim(dataset)

Data summary
Name	dataset
Number of rows	62
Number of columns	14
_______________________
Column type frequency:
character	2
factor	1
numeric	11
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
gender_text	0	1	4	6	0	2	0
coach_text	0	1	5	21	0	5	0

Variable type: factor

skim_variable	n_missing	complete_rate	ordered	n_unique	top_counts
parent_text	0	1	FALSE	2	2: 41, 1: 21

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
gender	0	1.00	1.23	0.42	1.00	1.00	1.00	1.00	2.00	▇▁▁▁▂
coach	0	1.00	3.73	1.12	1.00	3.00	4.00	4.00	5.00	▁▃▁▇▃
parent	0	1.00	1.66	0.48	1.00	1.00	2.00	2.00	2.00	▅▁▁▁▇
age	0	1.00	13.94	2.01	10.00	13.00	14.00	15.00	18.00	▂▇▂▇▂
auto	2	0.97	4.23	0.48	3.00	4.00	4.25	4.56	5.00	▁▅▃▇▆
comp	2	0.97	4.29	0.45	3.25	4.00	4.25	4.56	5.00	▁▃▇▃▅
relate	2	0.97	4.40	0.49	3.00	4.00	4.50	4.75	5.00	▁▁▃▆▇
auton	0	1.00	4.21	0.49	3.00	3.88	4.19	4.59	5.00	▂▃▇▆▆
control	0	1.00	2.29	0.75	1.00	1.75	2.06	2.88	3.75	▃▇▃▅▃
amotiv	0	1.00	1.75	0.75	1.00	1.25	1.50	2.19	4.00	▇▃▂▁▁
success	0	1.00	4.19	0.42	3.40	3.80	4.20	4.40	5.00	▃▇▅▆▅

Descriptive, histogram plots

You can also embed plots, for example:

or

# BPNS autonomy average histogram
ggplot(dataset, aes(auto)) +
  geom_histogram(bins = 7, fill = "steelblue", color = "black") +
  labs(x = "Response on Scale from 1 to 5", 
       y = "Frequency", 
       title = "Autonomy") +
  theme(plot.title = element_text(hjust = 0.5))

Correlations

Similarly, there are numerous options to produce corrlation matrices as well as visualizations.

options(width=120)

library(corrplot)

dataset$gender <- as.numeric(dataset$gender)

dataset_num <- Filter(is.numeric,dataset)

dataset.cor = cor(dataset_num, use = "pairwise.complete.obs", method = "pearson")

round(dataset.cor, 2)

##         gender coach parent   age  auto  comp relate auton control amotiv success
## gender    1.00 -0.01   0.14  0.10 -0.15 -0.20  -0.14 -0.06    0.03   0.13   -0.28
## coach    -0.01  1.00  -0.15 -0.31  0.10 -0.04   0.05  0.11   -0.15   0.02   -0.04
## parent    0.14 -0.15   1.00  0.11  0.03  0.10  -0.17  0.03    0.04  -0.02    0.02
## age       0.10 -0.31   0.11  1.00  0.00 -0.19   0.00 -0.10    0.20   0.34   -0.17
## auto     -0.15  0.10   0.03  0.00  1.00  0.53   0.37  0.44   -0.35  -0.41    0.49
## comp     -0.20 -0.04   0.10 -0.19  0.53  1.00   0.15  0.42   -0.27  -0.51    0.61
## relate   -0.14  0.05  -0.17  0.00  0.37  0.15   1.00  0.44   -0.14  -0.07    0.29
## auton    -0.06  0.11   0.03 -0.10  0.44  0.42   0.44  1.00   -0.16  -0.49    0.40
## control   0.03 -0.15   0.04  0.20 -0.35 -0.27  -0.14 -0.16    1.00   0.54   -0.16
## amotiv    0.13  0.02  -0.02  0.34 -0.41 -0.51  -0.07 -0.49    0.54   1.00   -0.35
## success  -0.28 -0.04   0.02 -0.17  0.49  0.61   0.29  0.40   -0.16  -0.35    1.00

Correlation visualizations

pairs.panels(dataset_num)

corrplot(dataset.cor)

Including p values

library(Hmisc)

# must be as.matrix
dataset.cor.p <-rcorr(as.matrix(dataset_num), type=c("pearson"))

dataset.cor.p

##         gender coach parent   age  auto  comp relate auton control amotiv success
## gender    1.00 -0.01   0.14  0.10 -0.15 -0.20  -0.14 -0.06    0.03   0.13   -0.28
## coach    -0.01  1.00  -0.15 -0.31  0.10 -0.04   0.05  0.11   -0.15   0.02   -0.04
## parent    0.14 -0.15   1.00  0.11  0.03  0.10  -0.17  0.03    0.04  -0.02    0.02
## age       0.10 -0.31   0.11  1.00  0.00 -0.19   0.00 -0.10    0.20   0.34   -0.17
## auto     -0.15  0.10   0.03  0.00  1.00  0.53   0.37  0.44   -0.35  -0.41    0.49
## comp     -0.20 -0.04   0.10 -0.19  0.53  1.00   0.15  0.42   -0.27  -0.51    0.61
## relate   -0.14  0.05  -0.17  0.00  0.37  0.15   1.00  0.44   -0.14  -0.07    0.29
## auton    -0.06  0.11   0.03 -0.10  0.44  0.42   0.44  1.00   -0.16  -0.49    0.40
## control   0.03 -0.15   0.04  0.20 -0.35 -0.27  -0.14 -0.16    1.00   0.54   -0.16
## amotiv    0.13  0.02  -0.02  0.34 -0.41 -0.51  -0.07 -0.49    0.54   1.00   -0.35
## success  -0.28 -0.04   0.02 -0.17  0.49  0.61   0.29  0.40   -0.16  -0.35    1.00
## 
## n
##         gender coach parent age auto comp relate auton control amotiv success
## gender      62    62     62  62   60   60     60    62      62     62      62
## coach       62    62     62  62   60   60     60    62      62     62      62
## parent      62    62     62  62   60   60     60    62      62     62      62
## age         62    62     62  62   60   60     60    62      62     62      62
## auto        60    60     60  60   60   60     60    60      60     60      60
## comp        60    60     60  60   60   60     60    60      60     60      60
## relate      60    60     60  60   60   60     60    60      60     60      60
## auton       62    62     62  62   60   60     60    62      62     62      62
## control     62    62     62  62   60   60     60    62      62     62      62
## amotiv      62    62     62  62   60   60     60    62      62     62      62
## success     62    62     62  62   60   60     60    62      62     62      62
## 
## P
##         gender coach  parent age    auto   comp   relate auton  control amotiv success
## gender         0.9655 0.2710 0.4626 0.2636 0.1166 0.2908 0.6468 0.8146  0.3166 0.0259 
## coach   0.9655        0.2571 0.0151 0.4316 0.7340 0.7290 0.3807 0.2515  0.8805 0.7547 
## parent  0.2710 0.2571        0.3787 0.8252 0.4473 0.1838 0.8432 0.7435  0.8601 0.9015 
## age     0.4626 0.0151 0.3787        1.0000 0.1375 0.9739 0.4246 0.1149  0.0070 0.1934 
## auto    0.2636 0.4316 0.8252 1.0000        0.0000 0.0032 0.0005 0.0056  0.0011 0.0000 
## comp    0.1166 0.7340 0.4473 0.1375 0.0000        0.2612 0.0008 0.0362  0.0000 0.0000 
## relate  0.2908 0.7290 0.1838 0.9739 0.0032 0.2612        0.0004 0.2820  0.5725 0.0246 
## auton   0.6468 0.3807 0.8432 0.4246 0.0005 0.0008 0.0004        0.2167  0.0000 0.0013 
## control 0.8146 0.2515 0.7435 0.1149 0.0056 0.0362 0.2820 0.2167         0.0000 0.2289 
## amotiv  0.3166 0.8805 0.8601 0.0070 0.0011 0.0000 0.5725 0.0000 0.0000         0.0051 
## success 0.0259 0.7547 0.9015 0.1934 0.0000 0.0000 0.0246 0.0013 0.2289  0.0051

or

A more comprehensibve, concise, and visual approach:

library("PerformanceAnalytics")

chart.Correlation(dataset_num, histogram=T, pch=19)

T-test

Now I will compare success means across gender.

Visualization

The bar plot below is a prime example of R making “easy things hard” (Meuenchen, 2014, para. 2).

male <- dataset[which(dataset$gender == 1),]
female <- dataset[which(dataset$gender == 2),]

gender_success <- data.frame(gender = c("male", "female"),
                             success_ave = c(mean(male$success), mean(female$success)))

ggplot(gender_success, aes(x=gender, y= success_ave, fill = gender)) +
  geom_bar(stat = "identity", position = 'dodge') +
  labs(x = "Gender", 
       y = "Repsponse on Scale of 1-5", 
       title = "Average Perceived Success Across Gender") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_fill_brewer(palette = "BuPu")

T-test for mean difference across two levels of gender.

### sucess accross gender
dataset$gender <- as.factor(dataset$gender)

leveneTest(dataset$success, dataset$gender, center=mean)

	Df	F value	Pr(>F)
group	1	0.4299003	0.5145442
	60	NA	NA

# p = .5145 so passes test!

male <- dataset[which(dataset$gender == 1),]
female <- dataset[which(dataset$gender == 2),]

t.test(male$success, female$success)

## 
##  Welch Two Sample t-test
## 
## data:  male$success and female$success
## t = 2.3846, df = 22.639, p-value = 0.02588
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.03723953 0.52823666
## sample estimates:
## mean of x mean of y 
##  4.254167  3.971429

# p = .02588, SIGNIFICANT **

ANOVA

Let’s analyze success across the 5-levels of goalie caoching received.

Visualization

ggplot(dataset, aes(x=reorder(coach_text, success), y = success, fill = coach_text)) +
  geom_boxplot() +
  labs(x = "Amount of Goalie Coaching Received", 
       y = "Perceived Success", 
       title = "Perceived Success Across Amount of Goalie Coaching Received") +
  theme(plot.title = element_text(hjust = 0.5), legend.position = "none") +
  scale_fill_brewer(palette = "Blues")

Run ANOVA

### success across coaching received

dataset$coach_text <- as.factor(dataset$coach_text)

leveneTest(dataset$success, dataset$coach_text, center=mean)

	Df	F value	Pr(>F)
group	4	0.6590701	0.6229689
	57	NA	NA

# p = .623 so passes test!

level.success.anova <- aov(success ~ coach_text, data = dataset)

summary(level.success.anova)

##             Df Sum Sq Mean Sq F value Pr(>F)
## coach_text   4  0.925  0.2311   1.329   0.27
## Residuals   57  9.910  0.1739

# p = .27, so NOT SIGNIFICANT **

# If signficant use follow up test:
TukeyHSD(level.success.anova)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = success ~ coach_text, data = dataset)
## 
## $coach_text
##                                                   diff        lwr       upr     p adj
## More than once a week-A few times a year -1.636364e-01 -0.6298929 0.3026202 0.8593439
## Never-A few times a year                 -1.636364e-01 -1.3904385 1.0631658 0.9956416
## Once a month-A few times a year           1.863636e-01 -0.4994396 0.8721669 0.9394057
## Once a week-A few times a year            1.137830e-01 -0.2984357 0.5260017 0.9360595
## Never-More than once a week               8.881784e-16 -1.2130944 1.2130944 1.0000000
## Once a month-More than once a week        3.500000e-01 -0.3109695 1.0109695 0.5720330
## Once a week-More than once a week         2.774194e-01 -0.0920111 0.6468498 0.2278314
## Once a month-Never                        3.500000e-01 -0.9632133 1.6632133 0.9433381
## Once a week-Never                         2.774194e-01 -0.9159487 1.4707874 0.9650319
## Once a week-Once a month                 -7.258065e-02 -0.6966077 0.5514464 0.9974377

Multiple Regression

Does autonomy predict success?

Scatterplot

attach(dataset)

ggplot(dataset, aes(x = auton, y = success)) +
  geom_point(position = "jitter", aes(color = gender_text)) +
  geom_smooth(method = lm, se = F) +
  geom_smooth() +
  labs(x = "Level of Autonomous Motivation", 
       y = "Perceived Success", 
       title = "Perceived Success Across Autonomous Motivation") +
  annotate(x=4.25, y=3.25, 
      label=paste("r = ", round(cor(dataset$auton, dataset$success, use = "complete.obs"),2)), 
           geom="text", size=5) +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_color_brewer(palette = "Blues")

Basic Regression

autonomous.success.reg <- lm(success ~ auton, data = dataset)

summary(autonomous.success.reg)

## 
## Call:
## lm(formula = success ~ auton, data = dataset)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.81865 -0.20391 -0.01653  0.26873  0.76978 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.7355     0.4354   6.283 4.11e-08 ***
## auton         0.3453     0.1026   3.364  0.00134 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3898 on 60 degrees of freedom
## Multiple R-squared:  0.1586, Adjusted R-squared:  0.1446 
## F-statistic: 11.31 on 1 and 60 DF,  p-value: 0.001345

# auton is significant at p = .00134
# R squared is .1586 (adjusted is .1446) and p value for model is .001345

Multiple Regression

Now, enter covariates age and gender into the model.

## with covariates
autonomous.success.co.reg <- lm(success ~ auton + age + gender, data = dataset)
summary(autonomous.success.co.reg)

## 
## Call:
## lm(formula = success ~ auton + age + gender, data = dataset)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.82626 -0.23799 -0.02441  0.23981  0.73662 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.19369    0.56852   5.618 5.77e-07 ***
## auton        0.32296    0.10009   3.227  0.00206 ** 
## age         -0.02208    0.02431  -0.908  0.36760    
## gender2     -0.25064    0.11535  -2.173  0.03388 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3776 on 58 degrees of freedom
## Multiple R-squared:  0.2369, Adjusted R-squared:  0.1974 
## F-statistic: 6.002 on 3 and 58 DF,  p-value: 0.001239

# auton still significant at .00206 and model significant p = .001239
# R squared is .2369 (adjusted .1974) and p value for model is .001547

Comments:

Notice how eligant many outputs are compared to SPSS.

Comp1_R Markdown

Nate Speidel

5/1/2020

Preliminary tasks

Set working Directory

Load packages

Load dataset

Modify dataset, if needed

Descriptive statistics

Comments:

or

Descriptive, histogram plots

or

Correlations

Correlation visualizations

Including p values

or

T-test

Visualization

T-test for mean difference across two levels of gender.

ANOVA

Visualization

Run ANOVA

Multiple Regression

Scatterplot

Basic Regression

Multiple Regression

Comments: