library(tidyverse)
library(readxl)
thesis <- read_excel("SampleThesisData.xlsx", na = "-")
thesis
- A correlation test is used to determine the relationship between age and GPA1. A correlation test is used because both variables are continuous.
cor.test(thesis$Age, thesis$GPA1)
Pearson's product-moment correlation
data: thesis$Age and thesis$GPA1
t = 0.25668, df = 39, p-value = 0.7988
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.2699939 0.3443673
sample estimates:
cor
0.04106779
The correlation between age and GPA1 is not statistically significant, r(39) = .80, ns.
The following scatterplot demonstrates the relationship between age and GPA1.
thesis %>%
drop_na(Age, GPA1) %>%
ggplot(aes(Age, GPA1)) +
geom_point() +
theme_minimal() +
geom_smooth(formula = y~x, method = lm, se = FALSE) +
labs(title = "Relationship Between Age and GPA1",
x = "Age",
y = "GPA1")

- To determine if there is a difference between GPA1 of students in the Business college from students in the Arts and Science college, a T-test is used. A T-test is used because there is a continuous dependent variable (GPA1) and an independent categorical variable (college).
t.test(thesis$GPA1 ~ thesis$College)
Welch Two Sample t-test
data: thesis$GPA1 by thesis$College
t = -1.2753, df = 38.772, p-value = 0.2098
alternative hypothesis: true difference in means between group AS and group BU is not equal to 0
95 percent confidence interval:
-0.7143396 0.1619586
sample estimates:
mean in group AS mean in group BU
3.02381 3.30000
The students in the Arts and Sciences college (M = 3.02) had a lower average GPA than the students in the Business college (M = 3.30) and the differences in the GPAs are not statistically significantly different, t(38.77) = 1.28, ns. The following boxplot shows the relationship between the Business college and the Arts and Sciences college and GPA1.
thesis %>%
drop_na(College, GPA1) %>%
ggplot(aes(x = College, y = GPA1)) +
geom_boxplot() +
geom_jitter(width = .1) +
theme_minimal() +
labs(title = "GPA1 by College", x = "College", y = "GPA1")

- The following shows the relationship between the GPA1 of students in accounting versus communications. To do this, a T-test and filter are used.
thesis %>%
filter(Major == "Account" | Major == "Comm") -> AccCommMajor
t.test(AccCommMajor$GPA1 ~ AccCommMajor$Major)
Welch Two Sample t-test
data: AccCommMajor$GPA1 by AccCommMajor$Major
t = 0.95153, df = 5.297, p-value = 0.3827
alternative hypothesis: true difference in means between group Account and group Comm is not equal to 0
95 percent confidence interval:
-0.7868789 1.7368789
sample estimates:
mean in group Account mean in group Comm
3.675 3.200
The accounting major students (M = 3.68) have a higher average GPA than the communications major students (M = 3.20), however, the difference is not statistically significantly different, t(5.30) = 0.95, ns. The following is a boxplot to show the relationship between the two majors and GPA1.
AccCommMajor %>%
ggplot(aes(x = Major, y = GPA1)) +
geom_boxplot() +
geom_jitter(width = .2) +
theme_minimal() +
labs(title = "GPA of Accounting and Communications Majors", x = "Major", y = "GPA1")

- To determine if there is a difference between Mood1 and Mood2, a paired-samples T-test is used. This test is used because Mood1 and Mood2 are two continuous variables measured at two different times.
t.test(thesis$Mood1, thesis$Mood2, paired = T)
Paired t-test
data: thesis$Mood1 and thesis$Mood2
t = -2.1686, df = 40, p-value = 0.03611
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.80105415 -0.02821414
sample estimates:
mean of the differences
-0.4146341
Moods were statistically significantly lower at time 1 (M = -0.24) than at time 2 (M = 0.24), t(40) = 2.17, p < .05. The following is a boxplot of the relationship between Mood1 and Mood2.
thesis %>%
pivot_longer(cols = c(Mood1, Mood2), names_to = "Time", values_to = "Mood") %>%
select(Time, Mood)
thesis %>%
pivot_longer(cols = c(Mood1, Mood2), names_to = "Time", values_to = "Mood") %>%
ggplot(aes(x = Time, y = Mood)) +
geom_boxplot() +
geom_jitter(width = .2) +
theme_minimal() +
labs(title = "Relationship Between Mood and Time", x = "Time", y = "Mood Calculation")
Warning: Removed 2 rows containing non-finite values (stat_boxplot).
Warning: Removed 2 rows containing missing values (geom_point).

- To determine if there is a relationship between where students are from and where they go to college, a chi-square test is used. This test is used because both variables are categorical.
table(thesis$Home, thesis$College)
AS BU
Billings 5 6
OtherMT 11 7
OutofState 6 6
chisq.test(thesis$Home, thesis$College)
Pearson's Chi-squared test
data: thesis$Home and thesis$College
X-squared = 0.76438, df = 2, p-value = 0.6824
There is not a statistically significant relationship between where students home is and where they attend college, chi-square(2) = 0.76, ns. The following is a bargraph showing the relationship between college and home.
thesis %>%
drop_na(College, Home) %>%
mutate(Home = as_factor(Home)) %>%
mutate(Home = fct_recode(Home,
"Billings" = "Billings",
"City/town in Montana" = "OtherMT",
"City/town out of Montana" = "OutofState")) %>%
mutate(College = as_factor(College)) %>%
mutate(College = fct_recode(College,
"Business College" = "BU",
"Arts and Sciences College" = "AS")) %>%
ggplot(aes(x = College, fill = Home)) +
geom_bar(position = "fill") +
scale_fill_viridis_d() + # use scale_fill_grey() here if you don't want color
theme_minimal() +
coord_flip() +
labs(title = "College by Home",
y = "Proportion of Different Homes")

- To determine if there is a relationship between self-esteem and where a student comes from, an analysis of variance (ANOVA) is used. An ANOVA is used instead of a T-test because the categorical independent variable has more than two factors.
thesis %>%
drop_na(Home, SelfEsteem) %>%
group_by(Home) %>%
summarize(Mean = mean(SelfEsteem),
"Std Dev" = sd(SelfEsteem),
N = n())
NA
Home_ANOVA <- aov(thesis$SelfEsteem ~ thesis$Home)
summary(Home_ANOVA)
Df Sum Sq Mean Sq F value Pr(>F)
thesis$Home 2 15.76 7.879 1.043 0.362
Residuals 39 294.53 7.552
1 observation deleted due to missingness
There were no statistically significant differences in self-esteem by where a student came from (home), F(2, 39) = 1.04, ns. A post hoc test is used for comparisons between individual groups.
TukeyHSD(Home_ANOVA)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = thesis$SelfEsteem ~ thesis$Home)
$`thesis$Home`
diff lwr upr p adj
OtherMT-Billings -1.27777778 -3.772928 1.217373 0.4329248
OutofState-Billings -0.08333333 -2.816634 2.649967 0.9969630
OutofState-OtherMT 1.19444444 -1.300706 3.689595 0.4800845
The following is a boxplot of the relationship between self-esteem and home.
thesis %>%
drop_na(SelfEsteem, Home) %>%
mutate(Home = as_factor(Home)) %>%
mutate(Home = fct_recode(Home,
"Billings" = "Billings",
"City/town in Montana" = "OtherMT",
"City/town out of Montana " = "OutofState")) %>%
ggplot(aes(x = Home, y = SelfEsteem)) +
geom_boxplot() +
geom_jitter(width = .2) +
theme_minimal() +
labs(title = "Self-Esteem by Home",
y = "Self-Esteem")

