library(psych) # for the describe() command
library(car) # for the leveneTest() command
library(effsize) # for the cohen.d() command
t-Test HW
Loading Libraries
Importing Data
<- read.csv(file="Data/mydata.csv", header=T) d
State Your Hypothesis - PART OF YOUR WRITEUP
Rural and isolated dwellers will report higher levels of isolation that urban dwellers.
Check Your Assumptions
T-test Assumptions
- Data values must be independent (independent t-test only) (confirmed by data report)
- Data obtained via a random sample (confirmed by data report)
- IV must have two levels (will check below)
- Dependent variable must be normally distributed (will check below. if issues, note and proceed)
- Variances of the two groups must be approximately equal, aka ‘homogeneity of variance’. Lacking this makes our results inaccurate (will check below - this really only applies to Student’s t-test, but we’ll check it anyway)
Checking IV levels
# preview the levels and counts for your IV
table(d$urban_rural, useNA = "always")
city isolated dwelling town village
282 30 559 382
<NA>
0
# # note that the table() output shows you exactly how the levels of your variable are rewritten. when recoding, make sure you are spelling them exactly as they appear
#
# # # to drop levels from your variable
# # # this subsets the data and says that any participant who is coded as 'LEVEL BAD' should be removed
# # # if you don't need this for the homework, comment it out (add a # at the beginning of the line)
# only combining, not dropping any IV levels
# # to combine levels
# # this says that where any participant is coded as 'LEVEL BAD' it should be replaced by 'LEVEL GOOD'
# # you can repeat this as needed, changing 'LEVEL BAD' if you have multiple levels that you want to combine into a single level
# # if you don't need this for the homework, comment it out (add a # at the beginning of the line)
$urban_rural_rc[d$urban_rural == "city"] <- "urban"
d$urban_rural_rc[d$urban_rural == "town"] <- "urban"
d$urban_rural_rc[d$urban_rural == "village"] <- "rural"
d$urban_rural_rc[d$urban_rural == "isolated dwelling"] <- "rural"
d
table(d$urban_rural_rc, useNA = "always")
rural urban <NA>
412 841 0
table(d$urban_rural, d$urban_rural_rc, useNA = "always")
rural urban <NA>
city 0 282 0
isolated dwelling 30 0 0
town 0 559 0
village 382 0 0
<NA> 0 0 0
table(d$isolation, useNA = "always")
1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.25 3.5 <NA>
178 152 91 108 108 114 103 85 83 103 128 0
# combining city and town into "urban" and village and isolated dwelling into "rural"
# # preview your changes and make sure everything is correct
table(d$urban_rural_rc, useNA = "always")
rural urban <NA>
412 841 0
table(d$isolation, useNA = "always")
1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.25 3.5 <NA>
178 152 91 108 108 114 103 85 83 103 128 0
# # check your variable types
str(d)
'data.frame': 1253 obs. of 7 variables:
$ urban_rural : chr "city" "city" "city" "town" ...
$ sexual_orientation: chr "Heterosexual/Straight" "Heterosexual/Straight" "Heterosexual/Straight" "Heterosexual/Straight" ...
$ gad : num 1.86 3.86 1.14 2 1.43 ...
$ support : num 2.5 2.17 5 2.5 3.67 ...
$ pss : num 3.25 3.75 1 3.25 2 2 4 1.25 3.75 1.25 ...
$ isolation : num 2.25 3.5 1 2.5 1.75 2 1.25 1 3 1.25 ...
$ urban_rural_rc : chr "urban" "urban" "urban" "urban" ...
#
# # make sure that your IV is recognized as a factor by R
$urban_rural_rc <- as.factor(d$urban_rural_rc) d
Testing Homogeneity of Variance with Levene’s Test
We can test whether the variances of our two groups are equal using Levene’s test. The null hypothesis is that the variance between the two groups is equal, which is the result we want. So when running Levene’s test we’re hoping for a non-significant result!
# # use the leveneTest() command from the car package to test homogeneity of variance
# # uses the same 'formula' setup that we'll use for our t-test: formula is y~x, where y is our DV and x is our IV
leveneTest(isolation~urban_rural_rc, data = d)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 3.0437 0.0813 .
1251
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# this is barely insignificant (p=0.0813), that's not good right?
This is more of a formality in our case, because we are using Welch’s t-test, which does not have the same assumptions as Student’s t-test (the default type of t-test) about variance. R defaults to using Welch’s t-test so this doesn’t require any extra effort on our part!
Check Normality
# you only need to check the variables you're using in the current analysis
# although you checked them previously, it's always a good idea to look them over again and be sure that everything is correct
# you can use the describe() command on an entire datafrom (d) or just on a single variable (d$pss)
# use it to check the skew and kurtosis of your DV
describe(d$isolation)
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 1253 2.15 0.84 2 2.12 1.11 1 3.5 2.5 0.17 -1.28 0.02
# can use the describeBy() command to view the means and standard deviations by group
# it's very similar to the describe() command but splits the dataframe according to the 'group' variable
describeBy(d$isolation, group=d$urban_rural_rc)
Descriptive statistics by group
group: rural
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 412 2.02 0.81 2 1.98 1.11 1 3.5 2.5 0.33 -1.12 0.04
------------------------------------------------------------
group: urban
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 841 2.21 0.85 2.25 2.19 1.11 1 3.5 2.5 0.08 -1.33 0.03
# also use a histogram to examine your continuous variable
hist(as.numeric(d$isolation))
# I was having issues with isolation being rendered as a factor instead of numeric so a friend of mine who's really familiar with RStudio told me to use this function and it worked, hope you don't mind.
# last, use a boxplot to examine your continuous and categorical variables together
# categorical/IV goes on the right, continuous/DV goes on the left
boxplot(d$isolation~d$urban_rural_rc)
# looks fine
Issues with My Data - PART OF YOUR WRITEUP
Before running the test, the four levels of dwelling, city, town, village, and isolated dwelling were combined into two levels, urban and rural. Levene’s test found slight heterogeneity of variance (p= .081). As a result, Welch’s t-test will be used.
Run a T-test
# # very simple! we specify the dataframe alongside the variables instead of having a separate argument for the dataframe like we did for leveneTest()
<- t.test(as.numeric(d$isolation)~d$urban_rural_rc) t_output
View Test Output
t_output
Welch Two Sample t-test
data: as.numeric(d$isolation) by d$urban_rural_rc
t = -3.6585, df = 851.71, p-value = 0.0002692
alternative hypothesis: true difference in means between group rural and group urban is not equal to 0
95 percent confidence interval:
-0.2773840 -0.0836792
sample estimates:
mean in group rural mean in group urban
2.024879 2.205410
Calculate Cohen’s d
# # once again, we use our formula to calculate cohen's d
<- cohen.d(as.numeric(d$isolation)~d$urban_rural_rc) d_output
View Effect Size
- Trivial: < .2
- Small: between .2 and .5
- Medium: between .5 and .8
- Large: > .8
d_output
Cohen's d
d estimate: -0.2164965 (small)
95 percent confidence interval:
lower upper
-0.33477819 -0.09821476
Write Up Results
My hypothesis was that rural dwellers would feel more isolation that urban dwellers. In order to run a t-test for this, city and town dwellers were combined into an urban level and village and isolated dwellers were combined into a rural level. The data met all the assumptions of a t-test and found a significant difference, although it demonstrated the opposite of the hypothesized relationship, t(851.71) = -3.65, p < .001, d = .21, 95% [-0.33, -0.09] (refer to Figure 1).
Our effect size was small accoring to Cohen (1988).
References
Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences. New York, NY: Routledge Academic.