Kaitlin Kavlie PSYC 541
Lab #5: Basic Stats Analysis
The Sample Thesis data set was used for the analyses in this lab.
I used the code below to run the correlation test on age and GPA1, two continuous variables.
cor.test(thesis$Age, thesis$GPA1)
Pearson's product-moment correlation
data: thesis$Age and thesis$GPA1
t = 0.25668, df = 39, p-value = 0.7988
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.2699939 0.3443673
sample estimates:
cor
0.04106779
The correlation between Age and Self Esteem were statistically insignificant, r(39) = .04, ns. While the correlation is .04, the p value is 0.8, making the results insignificant.
Then I used this code below to create a scatter plot, with age along the x-axis and GPA1 on the y-axis.
thesis %>%
drop_na(Age, GPA1) %>%
ggplot(aes(Age, GPA1)) +
geom_point() +
theme_minimal() +
geom_smooth(formula = y~x, method = lm, se = FALSE) +
labs(title = "Relationship between Age and GPA1",
x = "Age",
y = "GPA1")
The code below was used to run a t-test on GPA1 and College the student studies under, Business or Arts & Sciences. Due to the fact that there are only 2 categories under colleges, this code works without having to alter the subcategories.
t.test(thesis$GPA1 ~ thesis$College)
Welch Two Sample t-test
data: thesis$GPA1 by thesis$College
t = -1.2753, df = 38.772, p-value = 0.2098
alternative hypothesis: true difference in means between group AS and group BU is not equal to 0
95 percent confidence interval:
-0.7143396 0.1619586
sample estimates:
mean in group AS mean in group BU
3.02381 3.30000
Students in the college of Business (M = 3.3) had a statistically insignificantly higher GPA1 than students in the college of Arts & Sciences (M = 3.02), t(38.77) = 1.28, p > .05.
Then the following code was used to create a box plot of GPA1 and the type of college individuals are enrolled in.
thesis %>%
ggplot(aes(x = GPA1, y = College)) +
geom_boxplot() +
geom_jitter(width = .1) +
theme_minimal() +
labs(title = "GPA1 by College", x = "GPA1", y = "College")
Warning: Removed 1 rows containing non-finite values (stat_boxplot).
Warning: Removed 1 rows containing missing values (geom_point).
The code below was used to filter the data for Majors and limit it to Communications and Accounting only, creating a new data set called CommAccountMajor. This allowed me to run a t-test on the two subcategories of Major, Communications and Accounting.
thesis %>%
filter(Major == "Comm" | Major == "Account") -> CommAccountMajor
t.test(CommAccountMajor$GPA1 ~ CommAccountMajor$Major)
Welch Two Sample t-test
data: CommAccountMajor$GPA1 by CommAccountMajor$Major
t = 0.95153, df = 5.297, p-value = 0.3827
alternative hypothesis: true difference in means between group Account and group Comm is not equal to 0
95 percent confidence interval:
-0.7868789 1.7368789
sample estimates:
mean in group Account mean in group Comm
3.675 3.200
Communications majors (M = 3.2) had a lower GPA1 than Accounting majors (M = 3.68), t(5.3) = 0.95, p > .05.
Then I used the code below to create a box plot comparing the data points of the two subcategories of Major.
thesis %>%
filter(Major == "Comm" | Major == "Account") %>%
ggplot(aes(x = Major, y = GPA1)) +
geom_boxplot() +
geom_jitter(width = .2) +
theme_minimal() +
labs(title = "GPA1 by Major", x = "GPA1", y = "Major")
The code below was used to run the paired t-test on Mood1 and Mood2.
t.test(thesis$Mood1, thesis$Mood2, paired = TRUE)
Paired t-test
data: thesis$Mood1 and thesis$Mood2
t = -2.1686, df = 40, p-value = 0.03611
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.80105415 -0.02821414
sample estimates:
mean of the differences
-0.4146341
Mood2 was statistically significantly higher (M = 0.24) than Mood1 (M = -0.24), t(40) = 2.17, p < .05
To find the means of Mood1 and Mood2 the summary code was used below.
summary(thesis)
Age Sex SelfEsteem Mood1 Mood2 Major College GPA1 GPA2
Min. :19.00 Length:43 Min. :14.00 Min. :-4.0000 Min. :-4.0000 Length:43 Length:43 Min. :1.400 Min. :2.20
1st Qu.:20.25 Class :character 1st Qu.:23.50 1st Qu.:-1.0000 1st Qu.: 0.0000 Class :character Class :character 1st Qu.:2.725 1st Qu.:3.00
Median :23.50 Mode :character Median :25.00 Median : 0.0000 Median : 0.0000 Mode :character Mode :character Median :3.200 Median :3.45
Mean :24.07 Mean :24.44 Mean :-0.2381 Mean : 0.2381 Mean :3.117 Mean :3.40
3rd Qu.:26.00 3rd Qu.:26.00 3rd Qu.: 1.0000 3rd Qu.: 1.0000 3rd Qu.:3.675 3rd Qu.:4.00
Max. :41.00 Max. :29.00 Max. : 3.0000 Max. : 3.0000 Max. :4.000 Max. :4.00
NA's :1 NA's :1 NA's :1 NA's :1 NA's :1
Home
Length:43
Class :character
Mode :character
The code below was used to make a data table with the Mood1 and Mood2 information, in order to make the data easier to plot.
thesis %>%
pivot_longer(cols = c(Mood1, Mood2), names_to = "Time", values_to = "Mood") %>%
select(Time, Mood)
After organizing the data into the table above, the code below was used to create a box and scatter plot of the Mood1 and Mood2 data in a comparable manner.
thesis %>%
pivot_longer(cols = c(Mood1, Mood2), names_to = "Time", values_to = "Mood") %>%
ggplot(aes(x = Time, y = Mood)) +
geom_boxplot() +
geom_jitter(width = .2) +
theme_minimal() +
labs(title = "Mood and Time", x = "Time", y = "Mood")
Warning: Removed 2 rows containing non-finite values (stat_boxplot).
Warning: Removed 2 rows containing missing values (geom_point).
The first code shown below was used to run a chi-square analysis to examine the relationship between home and college.
table(thesis$Home, thesis$College)
AS BU
Billings 5 6
OtherMT 11 7
OutofState 6 6
chisq.test(thesis$Home, thesis$College)
Pearson's Chi-squared test
data: thesis$Home and thesis$College
X-squared = 0.76438, df = 2, p-value = 0.6824
There was a statistically insignificant relationship between Home and College, chi-square(2) = 0.76, p = 0.7.
The code below was used to create side by side bar graphs for Business and Arts & Sciences, with the proportion of where students are from.
thesis %>%
drop_na(Home, College) %>%
mutate(College = as_factor(College)) %>%
mutate(College = fct_recode(College,
"Arts & Sciences" = "AS",
"Business" = "BU")) %>%
mutate(Home = as_factor(Home)) %>%
mutate(Home = fct_recode(Home,
"Billings, MT" = "Billings",
"Another city in MT" = "OtherMT",
"Out of State" = "OutofState")) %>%
ggplot(aes(x = College, fill = Home)) +
geom_bar(position = "fill") +
scale_fill_viridis_d() + # use scale_fill_grey() here if you don't want color
theme_minimal() +
coord_flip() +
labs(title = "College by Home",
y = "Proportion of College")
This code was used to find the mean and standard deviation of self esteem for the three subcategories of the variable home.
thesis %>%
drop_na(Home, SelfEsteem) %>%
group_by(Home) %>%
summarize(Mean = mean(SelfEsteem),
"Std Dev" = sd(SelfEsteem),
N = n())
NA
This code was used to run the ANOVA for self esteem and home.
Home_ANOVA <- aov(thesis$SelfEsteem ~ thesis$Home)
summary(Home_ANOVA)
Df Sum Sq Mean Sq F value Pr(>F)
thesis$Home 2 15.76 7.879 1.043 0.362
Residuals 39 294.53 7.552
1 observation deleted due to missingness
There were no statistically significant differences in self esteem by home, F(2, 39) = 1.04, ns.
This code was used to run a post hoc tests for comparisons between individual groups:
TukeyHSD(Home_ANOVA)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = thesis$SelfEsteem ~ thesis$Home)
$`thesis$Home`
diff lwr upr p adj
OtherMT-Billings -1.27777778 -3.772928 1.217373 0.4329248
OutofState-Billings -0.08333333 -2.816634 2.649967 0.9969630
OutofState-OtherMT 1.19444444 -1.300706 3.689595 0.4800845
Like the ANOVA, the post hoc tests for each subcategory of home, are each statistically insignificant. This is due to the fact that each subcategory had a p > .05.
This code was used to create an advanced box plot of the variables self-esteem and home.
thesis %>%
drop_na(Home, SelfEsteem) %>%
mutate(Home = as_factor(Home)) %>%
mutate(Home = fct_recode(Home,
"Billings, MT" = "Billings",
"Another city in MT" = "OtherMT",
"Out of State" = "OutofState")) %>%
ggplot(aes(x = Home, y = SelfEsteem)) +
geom_boxplot() +
geom_jitter(width = .2) +
theme_minimal() +
labs(title = "Self Esteem by Home",
y = "Self Esteem")