{r}{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) options(width=70)
data()
data(package = .packages(all.available = TRUE))
#install.packages("carData")
library(carData)
mydata <- force(Davis)
head(mydata)
## sex weight height repwt repht
## 1 M 77 182 77 180
## 2 F 58 161 51 159
## 3 F 53 161 54 158
## 4 M 68 177 70 175
## 5 F 59 157 59 155
## 6 M 76 170 76 165
Now, I will add variable ID.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
mydata <- mydata %>%
mutate(ID = row_number())
head(mydata)
## sex weight height repwt repht ID
## 1 M 77 182 77 180 1
## 2 F 58 161 51 159 2
## 3 F 53 161 54 158 3
## 4 M 68 177 70 175 4
## 5 F 59 157 59 155 5
## 6 M 76 170 76 165 6
Unit of observation: Person
Sample size: 200
Definition of all variables:
Source: Personal communication from C. Davis, Departments of Physical Education and Psychology, York University.
library(tidyr)
mydata <- drop_na(mydata)
nrow(mydata)
## [1] 181
Now my data has 181 rows in comparison with 200 in the beginning.
mydata$sexF <- factor(mydata$sex,
levels = c ("M", "F"),
labels = c ("M", "F"))
library(psych)
describeBy(x=mydata$weight, group=mydata$sexF)
##
## Descriptive statistics by group
## group: M
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 82 75.98 12.16 75 75.12 11.86 54 119 65 0.77 0.77 1.34
## ------------------------------------------------------------
## group: F
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 99 58.29 12.92 56 57.02 5.93 39 166 127 5.84 46.01 1.3
For this type of research question the appropriate parametric test would be independent t-test. Data belong to two different groups of units (male and female) and I want to asses whether there is a difference in weight between male and female. (hypothesis about the difference between two population arithmetic means (independent samples)
However, certain assumptions need to be met:
To visually observe the data, ggplot will be shown:
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
ggplot(mydata, aes (x = weight)) +
geom_histogram(binwidth = 10, colour="gray" , fill= "blue") +
facet_wrap(~sexF, ncol = 1) +
ylab("'Frequency")
From the visual representation of the variable “weight” outliers can be observed, for the both groups “M” and “F”. In order to continue, the data should be cleaned.
Now I will check the second assumption:
library(dplyr)
library(rstatix)
## Warning: package 'rstatix' was built under R version 4.3.2
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
##
## filter
mydata %>%
group_by(sexF) %>%
shapiro_test(weight)
## # A tibble: 2 × 4
## sexF variable statistic p
## <fct> <chr> <dbl> <dbl>
## 1 M weight 0.960 1.25e- 2
## 2 F weight 0.543 4.87e-16
H0: Variable weight is normally distributed within M group.
H1: Variable weight is not normally distributed within M group.
H0: Variable weight is normally distributed within F group.
H1: Variable weight is not normally distributed within F group.
t.test(mydata$weight ~ mydata$sexF,
paired= FALSE,
var.equal = FALSE,
alternative = "two.sided")
##
## Welch Two Sample t-test
##
## data: mydata$weight by mydata$sexF
## t = 9.4663, df = 176.09, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group M and group F is not equal to 0
## 95 percent confidence interval:
## 13.99620 21.36916
## sample estimates:
## mean in group M mean in group F
## 75.97561 58.29293
library(effectsize)
## Warning: package 'effectsize' was built under R version 4.3.2
##
## Attaching package: 'effectsize'
## The following objects are masked from 'package:rstatix':
##
## cohens_d, eta_squared
## The following object is masked from 'package:psych':
##
## phi
effectsize ::cohens_d(mydata$weight ~ mydata$sexF,
pooled_sd - FALSE)
## Cohen's d | 95% CI
## ------------------------
## 1.41 | [1.08, 1.73]
##
## - Estimated using pooled SD.
interpret_cohens_d (1.41, rules = "sawilowsky2009")
## [1] "very large"
## (Rules: sawilowsky2009)
Using the sample data, it was able to demonstrate that the average weight differed between male and female (the effect size is very large, 𝑑 = 1.41).
H0: Distribution location of a weight is same for male and female.
H1: Distribution location of a weight is not a same for male and female.
library(effectsize)
effectsize(wilcox.test(mydata$weight ~ mydata$sexF,
paired = FALSE,
correct = FALSE,
exact = FALSE,
alternative = "two.sided" ))
## r (rank biserial) | 95% CI
## --------------------------------
## 0.84 | [0.79, 0.88]
interpret_rank_biserial(0.84)
## [1] "very large"
## (Rules: funder2019)
Based on the sample data, we find that men and women differ in weight (𝑝 < 0.001) - male are heavier, the difference in distribution is very large (𝑟 =0.84).