This command tells RStudio which folder to look in to retreive the dataset or other files.
setwd("~/Desktop/Comp 1 - R Program")These commands prompt R to load and/or install various packages which will be utilized in later organizing, analyzing, and visualizing of data.
library(tidyverse)
# the tidyverse is a package bundle including ggplot2, dyplr, tidyr, readr, purrr, tibble, stringr, forcats, and more optional packages (https://tidyverse.org)
library(psych)
library(RColorBrewer)
library(car)
library(apaTables)
library(dplyr)
library(skimr)
# Set theme
theme_set(theme_classic())Now, I will load the data set from an Excel file in the directory.
library(readxl)
dataset <- read_excel("Comp1_SPSS_subset_data_NEW_4.28.20.xlsx",
na = "NA")Some variables may need to be transformed, depending on the analysis and packages used.
dataset$parent_text <- as.factor(dataset$parent)
See below an example of summary descriptive statistics along with visualizations. There are many descritpive statistics reports to choose from in R - these are a couple of my prefered functions.
describe(dataset)| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| gender | 1 | 62 | 1.225807 | 0.4215255 | 1.0000 | 1.160000 | 0.0000000 | 1.00 | 2.00 | 1.00 | 1.2799750 | -0.3668039 | 0.0535338 |
| gender_text* | 2 | 62 | NaN | NA | NA | NaN | NA | Inf | -Inf | -Inf | NA | NA | NA |
| coach | 3 | 62 | 3.725807 | 1.1186842 | 4.0000 | 3.820000 | 1.4826000 | 1.00 | 5.00 | 4.00 | -0.7710601 | -0.4376070 | 0.1420730 |
| coach_text* | 4 | 62 | NaN | NA | NA | NaN | NA | Inf | -Inf | -Inf | NA | NA | NA |
| parent | 5 | 62 | 1.661290 | 0.4771345 | 2.0000 | 1.700000 | 0.0000000 | 1.00 | 2.00 | 1.00 | -0.6651746 | -1.5822873 | 0.0605961 |
| parent_text* | 6 | 62 | 1.661290 | 0.4771345 | 2.0000 | 1.700000 | 0.0000000 | 1.00 | 2.00 | 1.00 | -0.6651746 | -1.5822873 | 0.0605961 |
| age | 7 | 62 | 13.935484 | 2.0071264 | 14.0000 | 13.920000 | 1.4826000 | 10.00 | 18.00 | 8.00 | 0.0510244 | -0.5487702 | 0.2549053 |
| auto | 8 | 60 | 4.227778 | 0.4760919 | 4.2500 | 4.244792 | 0.3706500 | 3.00 | 5.00 | 2.00 | -0.2765614 | -0.4811253 | 0.0614632 |
| comp | 9 | 60 | 4.287500 | 0.4458495 | 4.2500 | 4.281250 | 0.3706500 | 3.25 | 5.00 | 1.75 | 0.1127005 | -0.8900254 | 0.0575589 |
| relate | 10 | 60 | 4.397222 | 0.4896208 | 4.5000 | 4.449653 | 0.3706500 | 3.00 | 5.00 | 2.00 | -0.8231632 | 0.3519376 | 0.0632098 |
| auton | 11 | 62 | 4.213710 | 0.4861752 | 4.1875 | 4.237500 | 0.4633125 | 3.00 | 5.00 | 2.00 | -0.2924376 | -0.6096758 | 0.0617443 |
| control | 12 | 62 | 2.288307 | 0.7532677 | 2.0625 | 2.270000 | 0.8339625 | 1.00 | 3.75 | 2.75 | 0.2376715 | -1.0415801 | 0.0956651 |
| amotiv | 13 | 62 | 1.750000 | 0.7520464 | 1.5000 | 1.640000 | 0.7413000 | 1.00 | 4.00 | 3.00 | 1.1305059 | 0.8053705 | 0.0955100 |
| success | 14 | 62 | 4.190323 | 0.4214377 | 4.2000 | 4.184000 | 0.5930400 | 3.40 | 5.00 | 1.60 | 0.1453626 | -0.8832100 | 0.0535226 |
skim(dataset)| Name | dataset |
| Number of rows | 62 |
| Number of columns | 14 |
| _______________________ | |
| Column type frequency: | |
| character | 2 |
| factor | 1 |
| numeric | 11 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| gender_text | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
| coach_text | 0 | 1 | 5 | 21 | 0 | 5 | 0 |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| parent_text | 0 | 1 | FALSE | 2 | 2: 41, 1: 21 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| gender | 0 | 1.00 | 1.23 | 0.42 | 1.00 | 1.00 | 1.00 | 1.00 | 2.00 | ▇▁▁▁▂ |
| coach | 0 | 1.00 | 3.73 | 1.12 | 1.00 | 3.00 | 4.00 | 4.00 | 5.00 | ▁▃▁▇▃ |
| parent | 0 | 1.00 | 1.66 | 0.48 | 1.00 | 1.00 | 2.00 | 2.00 | 2.00 | ▅▁▁▁▇ |
| age | 0 | 1.00 | 13.94 | 2.01 | 10.00 | 13.00 | 14.00 | 15.00 | 18.00 | ▂▇▂▇▂ |
| auto | 2 | 0.97 | 4.23 | 0.48 | 3.00 | 4.00 | 4.25 | 4.56 | 5.00 | ▁▅▃▇▆ |
| comp | 2 | 0.97 | 4.29 | 0.45 | 3.25 | 4.00 | 4.25 | 4.56 | 5.00 | ▁▃▇▃▅ |
| relate | 2 | 0.97 | 4.40 | 0.49 | 3.00 | 4.00 | 4.50 | 4.75 | 5.00 | ▁▁▃▆▇ |
| auton | 0 | 1.00 | 4.21 | 0.49 | 3.00 | 3.88 | 4.19 | 4.59 | 5.00 | ▂▃▇▆▆ |
| control | 0 | 1.00 | 2.29 | 0.75 | 1.00 | 1.75 | 2.06 | 2.88 | 3.75 | ▃▇▃▅▃ |
| amotiv | 0 | 1.00 | 1.75 | 0.75 | 1.00 | 1.25 | 1.50 | 2.19 | 4.00 | ▇▃▂▁▁ |
| success | 0 | 1.00 | 4.19 | 0.42 | 3.40 | 3.80 | 4.20 | 4.40 | 5.00 | ▃▇▅▆▅ |
You can also embed plots, for example:
# BPNS autonomy average histogram
ggplot(dataset, aes(auto)) +
geom_histogram(bins = 7, fill = "steelblue", color = "black") +
labs(x = "Response on Scale from 1 to 5",
y = "Frequency",
title = "Autonomy") +
theme(plot.title = element_text(hjust = 0.5))
Similarly, there are numerous options to produce corrlation matrices as well as visualizations.
options(width=120)
library(corrplot)
dataset$gender <- as.numeric(dataset$gender)
dataset_num <- Filter(is.numeric,dataset)
dataset.cor = cor(dataset_num, use = "pairwise.complete.obs", method = "pearson")
round(dataset.cor, 2)## gender coach parent age auto comp relate auton control amotiv success
## gender 1.00 -0.01 0.14 0.10 -0.15 -0.20 -0.14 -0.06 0.03 0.13 -0.28
## coach -0.01 1.00 -0.15 -0.31 0.10 -0.04 0.05 0.11 -0.15 0.02 -0.04
## parent 0.14 -0.15 1.00 0.11 0.03 0.10 -0.17 0.03 0.04 -0.02 0.02
## age 0.10 -0.31 0.11 1.00 0.00 -0.19 0.00 -0.10 0.20 0.34 -0.17
## auto -0.15 0.10 0.03 0.00 1.00 0.53 0.37 0.44 -0.35 -0.41 0.49
## comp -0.20 -0.04 0.10 -0.19 0.53 1.00 0.15 0.42 -0.27 -0.51 0.61
## relate -0.14 0.05 -0.17 0.00 0.37 0.15 1.00 0.44 -0.14 -0.07 0.29
## auton -0.06 0.11 0.03 -0.10 0.44 0.42 0.44 1.00 -0.16 -0.49 0.40
## control 0.03 -0.15 0.04 0.20 -0.35 -0.27 -0.14 -0.16 1.00 0.54 -0.16
## amotiv 0.13 0.02 -0.02 0.34 -0.41 -0.51 -0.07 -0.49 0.54 1.00 -0.35
## success -0.28 -0.04 0.02 -0.17 0.49 0.61 0.29 0.40 -0.16 -0.35 1.00
pairs.panels(dataset_num)corrplot(dataset.cor)library(Hmisc)
# must be as.matrix
dataset.cor.p <-rcorr(as.matrix(dataset_num), type=c("pearson"))
dataset.cor.p## gender coach parent age auto comp relate auton control amotiv success
## gender 1.00 -0.01 0.14 0.10 -0.15 -0.20 -0.14 -0.06 0.03 0.13 -0.28
## coach -0.01 1.00 -0.15 -0.31 0.10 -0.04 0.05 0.11 -0.15 0.02 -0.04
## parent 0.14 -0.15 1.00 0.11 0.03 0.10 -0.17 0.03 0.04 -0.02 0.02
## age 0.10 -0.31 0.11 1.00 0.00 -0.19 0.00 -0.10 0.20 0.34 -0.17
## auto -0.15 0.10 0.03 0.00 1.00 0.53 0.37 0.44 -0.35 -0.41 0.49
## comp -0.20 -0.04 0.10 -0.19 0.53 1.00 0.15 0.42 -0.27 -0.51 0.61
## relate -0.14 0.05 -0.17 0.00 0.37 0.15 1.00 0.44 -0.14 -0.07 0.29
## auton -0.06 0.11 0.03 -0.10 0.44 0.42 0.44 1.00 -0.16 -0.49 0.40
## control 0.03 -0.15 0.04 0.20 -0.35 -0.27 -0.14 -0.16 1.00 0.54 -0.16
## amotiv 0.13 0.02 -0.02 0.34 -0.41 -0.51 -0.07 -0.49 0.54 1.00 -0.35
## success -0.28 -0.04 0.02 -0.17 0.49 0.61 0.29 0.40 -0.16 -0.35 1.00
##
## n
## gender coach parent age auto comp relate auton control amotiv success
## gender 62 62 62 62 60 60 60 62 62 62 62
## coach 62 62 62 62 60 60 60 62 62 62 62
## parent 62 62 62 62 60 60 60 62 62 62 62
## age 62 62 62 62 60 60 60 62 62 62 62
## auto 60 60 60 60 60 60 60 60 60 60 60
## comp 60 60 60 60 60 60 60 60 60 60 60
## relate 60 60 60 60 60 60 60 60 60 60 60
## auton 62 62 62 62 60 60 60 62 62 62 62
## control 62 62 62 62 60 60 60 62 62 62 62
## amotiv 62 62 62 62 60 60 60 62 62 62 62
## success 62 62 62 62 60 60 60 62 62 62 62
##
## P
## gender coach parent age auto comp relate auton control amotiv success
## gender 0.9655 0.2710 0.4626 0.2636 0.1166 0.2908 0.6468 0.8146 0.3166 0.0259
## coach 0.9655 0.2571 0.0151 0.4316 0.7340 0.7290 0.3807 0.2515 0.8805 0.7547
## parent 0.2710 0.2571 0.3787 0.8252 0.4473 0.1838 0.8432 0.7435 0.8601 0.9015
## age 0.4626 0.0151 0.3787 1.0000 0.1375 0.9739 0.4246 0.1149 0.0070 0.1934
## auto 0.2636 0.4316 0.8252 1.0000 0.0000 0.0032 0.0005 0.0056 0.0011 0.0000
## comp 0.1166 0.7340 0.4473 0.1375 0.0000 0.2612 0.0008 0.0362 0.0000 0.0000
## relate 0.2908 0.7290 0.1838 0.9739 0.0032 0.2612 0.0004 0.2820 0.5725 0.0246
## auton 0.6468 0.3807 0.8432 0.4246 0.0005 0.0008 0.0004 0.2167 0.0000 0.0013
## control 0.8146 0.2515 0.7435 0.1149 0.0056 0.0362 0.2820 0.2167 0.0000 0.2289
## amotiv 0.3166 0.8805 0.8601 0.0070 0.0011 0.0000 0.5725 0.0000 0.0000 0.0051
## success 0.0259 0.7547 0.9015 0.1934 0.0000 0.0000 0.0246 0.0013 0.2289 0.0051
A more comprehensibve, concise, and visual approach:
library("PerformanceAnalytics")
chart.Correlation(dataset_num, histogram=T, pch=19)
Now I will compare success means across gender.
The bar plot below is a prime example of R making “easy things hard” (Meuenchen, 2014, para. 2).
male <- dataset[which(dataset$gender == 1),]
female <- dataset[which(dataset$gender == 2),]
gender_success <- data.frame(gender = c("male", "female"),
success_ave = c(mean(male$success), mean(female$success)))
ggplot(gender_success, aes(x=gender, y= success_ave, fill = gender)) +
geom_bar(stat = "identity", position = 'dodge') +
labs(x = "Gender",
y = "Repsponse on Scale of 1-5",
title = "Average Perceived Success Across Gender") +
theme(plot.title = element_text(hjust = 0.5)) +
scale_fill_brewer(palette = "BuPu")### sucess accross gender
dataset$gender <- as.factor(dataset$gender)
leveneTest(dataset$success, dataset$gender, center=mean)| Df | F value | Pr(>F) | |
|---|---|---|---|
| group | 1 | 0.4299003 | 0.5145442 |
| 60 | NA | NA |
# p = .5145 so passes test!
male <- dataset[which(dataset$gender == 1),]
female <- dataset[which(dataset$gender == 2),]
t.test(male$success, female$success)##
## Welch Two Sample t-test
##
## data: male$success and female$success
## t = 2.3846, df = 22.639, p-value = 0.02588
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.03723953 0.52823666
## sample estimates:
## mean of x mean of y
## 4.254167 3.971429
# p = .02588, SIGNIFICANT **
Let’s analyze success across the 5-levels of goalie caoching received.
ggplot(dataset, aes(x=reorder(coach_text, success), y = success, fill = coach_text)) +
geom_boxplot() +
labs(x = "Amount of Goalie Coaching Received",
y = "Perceived Success",
title = "Perceived Success Across Amount of Goalie Coaching Received") +
theme(plot.title = element_text(hjust = 0.5), legend.position = "none") +
scale_fill_brewer(palette = "Blues")### success across coaching received
dataset$coach_text <- as.factor(dataset$coach_text)
leveneTest(dataset$success, dataset$coach_text, center=mean)| Df | F value | Pr(>F) | |
|---|---|---|---|
| group | 4 | 0.6590701 | 0.6229689 |
| 57 | NA | NA |
# p = .623 so passes test!
level.success.anova <- aov(success ~ coach_text, data = dataset)
summary(level.success.anova)## Df Sum Sq Mean Sq F value Pr(>F)
## coach_text 4 0.925 0.2311 1.329 0.27
## Residuals 57 9.910 0.1739
# p = .27, so NOT SIGNIFICANT **
# If signficant use follow up test:
TukeyHSD(level.success.anova)## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = success ~ coach_text, data = dataset)
##
## $coach_text
## diff lwr upr p adj
## More than once a week-A few times a year -1.636364e-01 -0.6298929 0.3026202 0.8593439
## Never-A few times a year -1.636364e-01 -1.3904385 1.0631658 0.9956416
## Once a month-A few times a year 1.863636e-01 -0.4994396 0.8721669 0.9394057
## Once a week-A few times a year 1.137830e-01 -0.2984357 0.5260017 0.9360595
## Never-More than once a week 8.881784e-16 -1.2130944 1.2130944 1.0000000
## Once a month-More than once a week 3.500000e-01 -0.3109695 1.0109695 0.5720330
## Once a week-More than once a week 2.774194e-01 -0.0920111 0.6468498 0.2278314
## Once a month-Never 3.500000e-01 -0.9632133 1.6632133 0.9433381
## Once a week-Never 2.774194e-01 -0.9159487 1.4707874 0.9650319
## Once a week-Once a month -7.258065e-02 -0.6966077 0.5514464 0.9974377
Does autonomy predict success?
attach(dataset)
ggplot(dataset, aes(x = auton, y = success)) +
geom_point(position = "jitter", aes(color = gender_text)) +
geom_smooth(method = lm, se = F) +
geom_smooth() +
labs(x = "Level of Autonomous Motivation",
y = "Perceived Success",
title = "Perceived Success Across Autonomous Motivation") +
annotate(x=4.25, y=3.25,
label=paste("r = ", round(cor(dataset$auton, dataset$success, use = "complete.obs"),2)),
geom="text", size=5) +
theme(plot.title = element_text(hjust = 0.5)) +
scale_color_brewer(palette = "Blues")autonomous.success.reg <- lm(success ~ auton, data = dataset)
summary(autonomous.success.reg)##
## Call:
## lm(formula = success ~ auton, data = dataset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.81865 -0.20391 -0.01653 0.26873 0.76978
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.7355 0.4354 6.283 4.11e-08 ***
## auton 0.3453 0.1026 3.364 0.00134 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3898 on 60 degrees of freedom
## Multiple R-squared: 0.1586, Adjusted R-squared: 0.1446
## F-statistic: 11.31 on 1 and 60 DF, p-value: 0.001345
# auton is significant at p = .00134
# R squared is .1586 (adjusted is .1446) and p value for model is .001345Now, enter covariates age and gender into the model.
## with covariates
autonomous.success.co.reg <- lm(success ~ auton + age + gender, data = dataset)
summary(autonomous.success.co.reg)##
## Call:
## lm(formula = success ~ auton + age + gender, data = dataset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.82626 -0.23799 -0.02441 0.23981 0.73662
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.19369 0.56852 5.618 5.77e-07 ***
## auton 0.32296 0.10009 3.227 0.00206 **
## age -0.02208 0.02431 -0.908 0.36760
## gender2 -0.25064 0.11535 -2.173 0.03388 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3776 on 58 degrees of freedom
## Multiple R-squared: 0.2369, Adjusted R-squared: 0.1974
## F-statistic: 6.002 on 3 and 58 DF, p-value: 0.001239
# auton still significant at .00206 and model significant p = .001239
# R squared is .2369 (adjusted .1974) and p value for model is .001547Notice how eligant many outputs are compared to SPSS.
Comments:
Different functions handle missing values differently. While this approach provides flexibility across fucntions, it can be challenging to keep track of each function’s syntax requirements.