Preliminary tasks

 

Set working Directory

This command tells RStudio which folder to look in to retreive the dataset or other files.

setwd("~/Desktop/Comp 1 - R Program")

Load packages

These commands prompt R to load and/or install various packages which will be utilized in later organizing, analyzing, and visualizing of data.

library(tidyverse)
# the tidyverse is a package bundle including ggplot2, dyplr, tidyr, readr, purrr, tibble, stringr, forcats, and more optional packages (https://tidyverse.org)
library(psych)
library(RColorBrewer)
library(car)
library(apaTables)
library(dplyr)
library(skimr)


# Set theme
theme_set(theme_classic())

Load dataset

Now, I will load the data set from an Excel file in the directory.

library(readxl)
dataset <- read_excel("Comp1_SPSS_subset_data_NEW_4.28.20.xlsx", 
    na = "NA")

Modify dataset, if needed

Some variables may need to be transformed, depending on the analysis and packages used.

dataset$parent_text <- as.factor(dataset$parent)

 
 
 
 
 
 
 
 
 

Descriptive statistics

See below an example of summary descriptive statistics along with visualizations. There are many descritpive statistics reports to choose from in R - these are a couple of my prefered functions.

describe(dataset)
vars n mean sd median trimmed mad min max range skew kurtosis se
gender 1 62 1.225807 0.4215255 1.0000 1.160000 0.0000000 1.00 2.00 1.00 1.2799750 -0.3668039 0.0535338
gender_text* 2 62 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
coach 3 62 3.725807 1.1186842 4.0000 3.820000 1.4826000 1.00 5.00 4.00 -0.7710601 -0.4376070 0.1420730
coach_text* 4 62 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
parent 5 62 1.661290 0.4771345 2.0000 1.700000 0.0000000 1.00 2.00 1.00 -0.6651746 -1.5822873 0.0605961
parent_text* 6 62 1.661290 0.4771345 2.0000 1.700000 0.0000000 1.00 2.00 1.00 -0.6651746 -1.5822873 0.0605961
age 7 62 13.935484 2.0071264 14.0000 13.920000 1.4826000 10.00 18.00 8.00 0.0510244 -0.5487702 0.2549053
auto 8 60 4.227778 0.4760919 4.2500 4.244792 0.3706500 3.00 5.00 2.00 -0.2765614 -0.4811253 0.0614632
comp 9 60 4.287500 0.4458495 4.2500 4.281250 0.3706500 3.25 5.00 1.75 0.1127005 -0.8900254 0.0575589
relate 10 60 4.397222 0.4896208 4.5000 4.449653 0.3706500 3.00 5.00 2.00 -0.8231632 0.3519376 0.0632098
auton 11 62 4.213710 0.4861752 4.1875 4.237500 0.4633125 3.00 5.00 2.00 -0.2924376 -0.6096758 0.0617443
control 12 62 2.288307 0.7532677 2.0625 2.270000 0.8339625 1.00 3.75 2.75 0.2376715 -1.0415801 0.0956651
amotiv 13 62 1.750000 0.7520464 1.5000 1.640000 0.7413000 1.00 4.00 3.00 1.1305059 0.8053705 0.0955100
success 14 62 4.190323 0.4214377 4.2000 4.184000 0.5930400 3.40 5.00 1.60 0.1453626 -0.8832100 0.0535226
Comments:

Different functions handle missing values differently. While this approach provides flexibility across fucntions, it can be challenging to keep track of each function’s syntax requirements.

 
 

or

skim(dataset)
Data summary
Name dataset
Number of rows 62
Number of columns 14
_______________________
Column type frequency:
character 2
factor 1
numeric 11
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
gender_text 0 1 4 6 0 2 0
coach_text 0 1 5 21 0 5 0

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
parent_text 0 1 FALSE 2 2: 41, 1: 21

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
gender 0 1.00 1.23 0.42 1.00 1.00 1.00 1.00 2.00 ▇▁▁▁▂
coach 0 1.00 3.73 1.12 1.00 3.00 4.00 4.00 5.00 ▁▃▁▇▃
parent 0 1.00 1.66 0.48 1.00 1.00 2.00 2.00 2.00 ▅▁▁▁▇
age 0 1.00 13.94 2.01 10.00 13.00 14.00 15.00 18.00 ▂▇▂▇▂
auto 2 0.97 4.23 0.48 3.00 4.00 4.25 4.56 5.00 ▁▅▃▇▆
comp 2 0.97 4.29 0.45 3.25 4.00 4.25 4.56 5.00 ▁▃▇▃▅
relate 2 0.97 4.40 0.49 3.00 4.00 4.50 4.75 5.00 ▁▁▃▆▇
auton 0 1.00 4.21 0.49 3.00 3.88 4.19 4.59 5.00 ▂▃▇▆▆
control 0 1.00 2.29 0.75 1.00 1.75 2.06 2.88 3.75 ▃▇▃▅▃
amotiv 0 1.00 1.75 0.75 1.00 1.25 1.50 2.19 4.00 ▇▃▂▁▁
success 0 1.00 4.19 0.42 3.40 3.80 4.20 4.40 5.00 ▃▇▅▆▅

Descriptive, histogram plots

You can also embed plots, for example:

or

# BPNS autonomy average histogram
ggplot(dataset, aes(auto)) +
  geom_histogram(bins = 7, fill = "steelblue", color = "black") +
  labs(x = "Response on Scale from 1 to 5", 
       y = "Frequency", 
       title = "Autonomy") +
  theme(plot.title = element_text(hjust = 0.5))

 
 
 
 
 
 
 
 
 

Correlations

Similarly, there are numerous options to produce corrlation matrices as well as visualizations.

options(width=120)

library(corrplot)

dataset$gender <- as.numeric(dataset$gender)

dataset_num <- Filter(is.numeric,dataset)

dataset.cor = cor(dataset_num, use = "pairwise.complete.obs", method = "pearson")

round(dataset.cor, 2)
##         gender coach parent   age  auto  comp relate auton control amotiv success
## gender    1.00 -0.01   0.14  0.10 -0.15 -0.20  -0.14 -0.06    0.03   0.13   -0.28
## coach    -0.01  1.00  -0.15 -0.31  0.10 -0.04   0.05  0.11   -0.15   0.02   -0.04
## parent    0.14 -0.15   1.00  0.11  0.03  0.10  -0.17  0.03    0.04  -0.02    0.02
## age       0.10 -0.31   0.11  1.00  0.00 -0.19   0.00 -0.10    0.20   0.34   -0.17
## auto     -0.15  0.10   0.03  0.00  1.00  0.53   0.37  0.44   -0.35  -0.41    0.49
## comp     -0.20 -0.04   0.10 -0.19  0.53  1.00   0.15  0.42   -0.27  -0.51    0.61
## relate   -0.14  0.05  -0.17  0.00  0.37  0.15   1.00  0.44   -0.14  -0.07    0.29
## auton    -0.06  0.11   0.03 -0.10  0.44  0.42   0.44  1.00   -0.16  -0.49    0.40
## control   0.03 -0.15   0.04  0.20 -0.35 -0.27  -0.14 -0.16    1.00   0.54   -0.16
## amotiv    0.13  0.02  -0.02  0.34 -0.41 -0.51  -0.07 -0.49    0.54   1.00   -0.35
## success  -0.28 -0.04   0.02 -0.17  0.49  0.61   0.29  0.40   -0.16  -0.35    1.00

Correlation visualizations

pairs.panels(dataset_num)

corrplot(dataset.cor)

Including p values

library(Hmisc)

# must be as.matrix
dataset.cor.p <-rcorr(as.matrix(dataset_num), type=c("pearson"))

dataset.cor.p
##         gender coach parent   age  auto  comp relate auton control amotiv success
## gender    1.00 -0.01   0.14  0.10 -0.15 -0.20  -0.14 -0.06    0.03   0.13   -0.28
## coach    -0.01  1.00  -0.15 -0.31  0.10 -0.04   0.05  0.11   -0.15   0.02   -0.04
## parent    0.14 -0.15   1.00  0.11  0.03  0.10  -0.17  0.03    0.04  -0.02    0.02
## age       0.10 -0.31   0.11  1.00  0.00 -0.19   0.00 -0.10    0.20   0.34   -0.17
## auto     -0.15  0.10   0.03  0.00  1.00  0.53   0.37  0.44   -0.35  -0.41    0.49
## comp     -0.20 -0.04   0.10 -0.19  0.53  1.00   0.15  0.42   -0.27  -0.51    0.61
## relate   -0.14  0.05  -0.17  0.00  0.37  0.15   1.00  0.44   -0.14  -0.07    0.29
## auton    -0.06  0.11   0.03 -0.10  0.44  0.42   0.44  1.00   -0.16  -0.49    0.40
## control   0.03 -0.15   0.04  0.20 -0.35 -0.27  -0.14 -0.16    1.00   0.54   -0.16
## amotiv    0.13  0.02  -0.02  0.34 -0.41 -0.51  -0.07 -0.49    0.54   1.00   -0.35
## success  -0.28 -0.04   0.02 -0.17  0.49  0.61   0.29  0.40   -0.16  -0.35    1.00
## 
## n
##         gender coach parent age auto comp relate auton control amotiv success
## gender      62    62     62  62   60   60     60    62      62     62      62
## coach       62    62     62  62   60   60     60    62      62     62      62
## parent      62    62     62  62   60   60     60    62      62     62      62
## age         62    62     62  62   60   60     60    62      62     62      62
## auto        60    60     60  60   60   60     60    60      60     60      60
## comp        60    60     60  60   60   60     60    60      60     60      60
## relate      60    60     60  60   60   60     60    60      60     60      60
## auton       62    62     62  62   60   60     60    62      62     62      62
## control     62    62     62  62   60   60     60    62      62     62      62
## amotiv      62    62     62  62   60   60     60    62      62     62      62
## success     62    62     62  62   60   60     60    62      62     62      62
## 
## P
##         gender coach  parent age    auto   comp   relate auton  control amotiv success
## gender         0.9655 0.2710 0.4626 0.2636 0.1166 0.2908 0.6468 0.8146  0.3166 0.0259 
## coach   0.9655        0.2571 0.0151 0.4316 0.7340 0.7290 0.3807 0.2515  0.8805 0.7547 
## parent  0.2710 0.2571        0.3787 0.8252 0.4473 0.1838 0.8432 0.7435  0.8601 0.9015 
## age     0.4626 0.0151 0.3787        1.0000 0.1375 0.9739 0.4246 0.1149  0.0070 0.1934 
## auto    0.2636 0.4316 0.8252 1.0000        0.0000 0.0032 0.0005 0.0056  0.0011 0.0000 
## comp    0.1166 0.7340 0.4473 0.1375 0.0000        0.2612 0.0008 0.0362  0.0000 0.0000 
## relate  0.2908 0.7290 0.1838 0.9739 0.0032 0.2612        0.0004 0.2820  0.5725 0.0246 
## auton   0.6468 0.3807 0.8432 0.4246 0.0005 0.0008 0.0004        0.2167  0.0000 0.0013 
## control 0.8146 0.2515 0.7435 0.1149 0.0056 0.0362 0.2820 0.2167         0.0000 0.2289 
## amotiv  0.3166 0.8805 0.8601 0.0070 0.0011 0.0000 0.5725 0.0000 0.0000         0.0051 
## success 0.0259 0.7547 0.9015 0.1934 0.0000 0.0000 0.0246 0.0013 0.2289  0.0051

or

A more comprehensibve, concise, and visual approach:

library("PerformanceAnalytics")

chart.Correlation(dataset_num, histogram=T, pch=19)

 
 
 
 
 
 
 
 
 

T-test

Now I will compare success means across gender.

Visualization

The bar plot below is a prime example of R making “easy things hard” (Meuenchen, 2014, para. 2).

male <- dataset[which(dataset$gender == 1),]
female <- dataset[which(dataset$gender == 2),]

gender_success <- data.frame(gender = c("male", "female"),
                             success_ave = c(mean(male$success), mean(female$success)))

ggplot(gender_success, aes(x=gender, y= success_ave, fill = gender)) +
  geom_bar(stat = "identity", position = 'dodge') +
  labs(x = "Gender", 
       y = "Repsponse on Scale of 1-5", 
       title = "Average Perceived Success Across Gender") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_fill_brewer(palette = "BuPu")

T-test for mean difference across two levels of gender.

### sucess accross gender
dataset$gender <- as.factor(dataset$gender)

leveneTest(dataset$success, dataset$gender, center=mean)
Df F value Pr(>F)
group 1 0.4299003 0.5145442
60 NA NA
# p = .5145 so passes test!

male <- dataset[which(dataset$gender == 1),]
female <- dataset[which(dataset$gender == 2),]

t.test(male$success, female$success)
## 
##  Welch Two Sample t-test
## 
## data:  male$success and female$success
## t = 2.3846, df = 22.639, p-value = 0.02588
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.03723953 0.52823666
## sample estimates:
## mean of x mean of y 
##  4.254167  3.971429
# p = .02588, SIGNIFICANT **

 
 
 
 
 
 
 
 
 

ANOVA

Let’s analyze success across the 5-levels of goalie caoching received.

Visualization

ggplot(dataset, aes(x=reorder(coach_text, success), y = success, fill = coach_text)) +
  geom_boxplot() +
  labs(x = "Amount of Goalie Coaching Received", 
       y = "Perceived Success", 
       title = "Perceived Success Across Amount of Goalie Coaching Received") +
  theme(plot.title = element_text(hjust = 0.5), legend.position = "none") +
  scale_fill_brewer(palette = "Blues")

Run ANOVA

### success across coaching received

dataset$coach_text <- as.factor(dataset$coach_text)

leveneTest(dataset$success, dataset$coach_text, center=mean)
Df F value Pr(>F)
group 4 0.6590701 0.6229689
57 NA NA
# p = .623 so passes test!

level.success.anova <- aov(success ~ coach_text, data = dataset)

summary(level.success.anova)
##             Df Sum Sq Mean Sq F value Pr(>F)
## coach_text   4  0.925  0.2311   1.329   0.27
## Residuals   57  9.910  0.1739
# p = .27, so NOT SIGNIFICANT **

# If signficant use follow up test:
TukeyHSD(level.success.anova)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = success ~ coach_text, data = dataset)
## 
## $coach_text
##                                                   diff        lwr       upr     p adj
## More than once a week-A few times a year -1.636364e-01 -0.6298929 0.3026202 0.8593439
## Never-A few times a year                 -1.636364e-01 -1.3904385 1.0631658 0.9956416
## Once a month-A few times a year           1.863636e-01 -0.4994396 0.8721669 0.9394057
## Once a week-A few times a year            1.137830e-01 -0.2984357 0.5260017 0.9360595
## Never-More than once a week               8.881784e-16 -1.2130944 1.2130944 1.0000000
## Once a month-More than once a week        3.500000e-01 -0.3109695 1.0109695 0.5720330
## Once a week-More than once a week         2.774194e-01 -0.0920111 0.6468498 0.2278314
## Once a month-Never                        3.500000e-01 -0.9632133 1.6632133 0.9433381
## Once a week-Never                         2.774194e-01 -0.9159487 1.4707874 0.9650319
## Once a week-Once a month                 -7.258065e-02 -0.6966077 0.5514464 0.9974377

 
 
 
 
 
 
 
 
 

Multiple Regression

Does autonomy predict success?

Scatterplot

attach(dataset)

ggplot(dataset, aes(x = auton, y = success)) +
  geom_point(position = "jitter", aes(color = gender_text)) +
  geom_smooth(method = lm, se = F) +
  geom_smooth() +
  labs(x = "Level of Autonomous Motivation", 
       y = "Perceived Success", 
       title = "Perceived Success Across Autonomous Motivation") +
  annotate(x=4.25, y=3.25, 
      label=paste("r = ", round(cor(dataset$auton, dataset$success, use = "complete.obs"),2)), 
           geom="text", size=5) +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_color_brewer(palette = "Blues")

Basic Regression

autonomous.success.reg <- lm(success ~ auton, data = dataset)

summary(autonomous.success.reg)
## 
## Call:
## lm(formula = success ~ auton, data = dataset)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.81865 -0.20391 -0.01653  0.26873  0.76978 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.7355     0.4354   6.283 4.11e-08 ***
## auton         0.3453     0.1026   3.364  0.00134 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3898 on 60 degrees of freedom
## Multiple R-squared:  0.1586, Adjusted R-squared:  0.1446 
## F-statistic: 11.31 on 1 and 60 DF,  p-value: 0.001345
# auton is significant at p = .00134
# R squared is .1586 (adjusted is .1446) and p value for model is .001345

Multiple Regression

Now, enter covariates age and gender into the model.

## with covariates
autonomous.success.co.reg <- lm(success ~ auton + age + gender, data = dataset)
summary(autonomous.success.co.reg)
## 
## Call:
## lm(formula = success ~ auton + age + gender, data = dataset)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.82626 -0.23799 -0.02441  0.23981  0.73662 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.19369    0.56852   5.618 5.77e-07 ***
## auton        0.32296    0.10009   3.227  0.00206 ** 
## age         -0.02208    0.02431  -0.908  0.36760    
## gender2     -0.25064    0.11535  -2.173  0.03388 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3776 on 58 degrees of freedom
## Multiple R-squared:  0.2369, Adjusted R-squared:  0.1974 
## F-statistic: 6.002 on 3 and 58 DF,  p-value: 0.001239
# auton still significant at .00206 and model significant p = .001239
# R squared is .2369 (adjusted .1974) and p value for model is .001547
Comments:

Notice how eligant many outputs are compared to SPSS.