library(readr)
library(psych)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(readxl)
library(tidyr)
library(rstatix)
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
##
## filter
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:psych':
##
## logit
library(ggpubr)
Alzheimer <- read_csv("~/Library/CloudStorage/OneDrive-Personal/IMB/Multivariate analysis/ALzheimer features - R/alzheimer.csv")
## Rows: 373 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Group, M/F
## dbl (8): Age, EDUC, SES, MMSE, CDR, eTIV, nWBV, ASF
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(Alzheimer)
## # A tibble: 6 × 10
## Group `M/F` Age EDUC SES MMSE CDR eTIV nWBV ASF
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Nondemented M 87 14 2 27 0 1987 0.696 0.883
## 2 Nondemented M 88 14 2 30 0 2004 0.681 0.876
## 3 Demented M 75 12 NA 23 0.5 1678 0.736 1.05
## 4 Demented M 76 12 NA 28 0.5 1738 0.713 1.01
## 5 Demented M 80 12 NA 22 0.5 1698 0.701 1.03
## 6 Nondemented F 88 18 3 28 0 1215 0.71 1.44
I downloaded this data set from the site Kaggle (https://www.kaggle.com/datasets/brsdincer/alzheimer-features), published under the title ” Alzheimer Features For Analysis”.
I decided to exclude the socioeconomic status (SES) variable from my firther analysis and check if we have any Not Applicable or duplicated values which I would then exclude from further statistics. For further analysis I am using adapted data set with dropped 21 NA variables that were included in the original data set.
AlzheimerData <- drop_na(Alzheimer)
colnames(AlzheimerData)[2]<- c("Gender")
head(AlzheimerData)
## # A tibble: 6 × 10
## Group Gender Age EDUC SES MMSE CDR eTIV nWBV ASF
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Nondemented M 87 14 2 27 0 1987 0.696 0.883
## 2 Nondemented M 88 14 2 30 0 2004 0.681 0.876
## 3 Nondemented F 88 18 3 28 0 1215 0.71 1.44
## 4 Nondemented F 90 18 3 27 0 1200 0.718 1.46
## 5 Nondemented M 80 12 4 28 0 1689 0.712 1.04
## 6 Nondemented M 83 12 4 29 0.5 1701 0.711 1.03
Description of the data - data set AlzheimerData:
In this dataset we have 371 observations of 9 variables.
Group –> Class (Demented, Nondemented, Converted)
Gender –> M -> Male, F -> Female
Age –> Age
EDUC –> Years of Education
SES - Socio Economic Status (1 - low, 2 - medium, 3 - high, 4 - prestige)
MMSE –> Mini Mental State Examination: The Mini‐Mental State Examination (Folstein 1975), or MMSE, is a simple pen‐and‐paper test of cognitive function based on a total possible score of 30 points; it includes tests of orientation, concentration, attention, verbal memory, naming and visuospatial skills.The maximum score on the test is 30 and scores lower than 23 points is a cutpoint for dementia.
CDR –> Clinical Dementia Rating: The Clinical Dementia Rating (CDR®) Dementia Staging Instrument in one aspect is a 5-point scale used to characterize six domains of cognitive and functional performance applicable to Alzheimer disease and related dementias: Memory, Orientation, Judgment & Problem Solving, Community Affairs, Home & Hobbies, and Personal Care. The necessary information to make each rating is obtained through a semi-structured interview of the patient and a reliable informant or collateral source (e.g., family member) referred to as the CDR® Assessment Protocol. Ratings are assigned on a 0–5 point scale, (0 = absent; 0.5 = questionable; 1= present, but mild; 2 = moderate; 3 = severe; 4 = profound; 5 = terminal).
eTIV –> Estimated Total Tntracranial Volume: Intracranial volume (ICV) is an important normalization measure used in morphometric analyses to correct for head size in studies of Alzheimer Disease (AD).It is usually measured with an MRI.Measured im mililiters.
nWBV –> Normalized Whole Brain Volume: Measuring total intracranial volume (TIV) allows whole-brain and regional volumetric measures to be normalized for head size. The TIV can be defined as the volume within the cranium, including the brain, meninges, and CSF (Cerebral spinal fluid).
ASF –> Atlas Scaling Factor: The Atlas Scaling Factor (ASF) was computed as the determinant of the affine transform connecting each individual to the atlas-representative template. The ASF represents the whole-brain volume expansion (or contraction) required to register each individual to the template.
The research question regarding this data set that I am thinking of using also in further analysis is How these variables are connected and wether there is a correlation between wether the individual is in group Demented, Nondemented or Converted
# With this code I changed all non-numeric variables in the data set into factors which were in this case Group and Gender.
AlzheimerData1 <- as.data.frame(unclass(AlzheimerData),
stringsAsFactors = TRUE)
str(AlzheimerData1)
## 'data.frame': 354 obs. of 10 variables:
## $ Group : Factor w/ 3 levels "Converted","Demented",..: 3 3 3 3 3 3 3 3 3 2 ...
## $ Gender: Factor w/ 2 levels "F","M": 2 2 1 1 2 2 2 1 1 2 ...
## $ Age : num 87 88 88 90 80 83 85 93 95 68 ...
## $ EDUC : num 14 14 18 18 12 12 12 14 14 12 ...
## $ SES : num 2 2 3 3 4 4 4 2 2 2 ...
## $ MMSE : num 27 30 28 27 28 29 30 30 29 27 ...
## $ CDR : num 0 0 0 0 0 0.5 0 0 0 0.5 ...
## $ eTIV : num 1987 2004 1215 1200 1689 ...
## $ nWBV : num 0.696 0.681 0.71 0.718 0.712 0.711 0.705 0.698 0.703 0.806 ...
## $ ASF : num 0.883 0.876 1.444 1.462 1.039 ...
hist(AlzheimerData1$SES,
main = "Distribution of the variable SES",
ylab = "Frequency",
xlab = "SES",
right = FALSE,
col = "lightblue")
We can see from the distribution of variable SES (Socio Economic Status)
on the histogram and can coclude from the result on Shapiro-Wilk
normality test that because p < 0.05, the values of this variable are
not normally distributed in the sample.
First I will check and test the assumptions to see wether to use parametric or non-parametric test to test the hypothesis.
H0. The units of observation (eTIV) are normally distributed H1: the units of observation (eTIV) are not normally distributed
ggplot(AlzheimerData1, aes(x = eTIV))+
geom_histogram(binwidth = 50, fill = "lightblue", colour = "purple")+
xlab("eTIV")+
ylab("Frequency") +
facet_wrap(~Group, ncol = 1)
First I will check the assumption for variance with levene test for homogeneity of variance.
H0: The variance of the estimated Total Incranial Volumes among all groups is equal.
H1: The variance of the estimated Total Incranial Volumes among groups is different.
leveneTest(AlzheimerData1$eTIV, group = AlzheimerData1$Group)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 2.2476 0.1072
## 351
As the p-value is 0.1072 (p > 5%), we can conclude that there is no violation of the first assumption, variance. Next I have to check the assumption about normal distribution.
AlzheimerData1 %>%
group_by(Group) %>%
shapiro_test(eTIV)
## # A tibble: 3 × 4
## Group variable statistic p
## <fct> <chr> <dbl> <dbl>
## 1 Converted eTIV 0.927 0.0179
## 2 Demented eTIV 0.962 0.00137
## 3 Nondemented eTIV 0.970 0.000436
As we can see, the p-value is significant (p < 5%) so we can confidently say that the units of observation are not normally distributed and therefore I have to use non-parametric test to test the hypothesis for independent samples (Kruskal-Wallis test)
H0: The distribution of estimated Total Incranial Volumes is the same across Groups H0: The distribution of estimated Total Incranial Volumes is not the same across Groups
kruskal.test(eTIV ~ Group,
data = AlzheimerData1)
##
## Kruskal-Wallis rank sum test
##
## data: eTIV by Group
## Kruskal-Wallis chi-squared = 1.3522, df = 2, p-value = 0.5086
kruskal_effsize(eTIV ~ Group,
data = AlzheimerData1)
## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 eTIV 354 -0.00185 eta2[H] small
The Kruskal effect size amounts to -0.0018, which means there is a small effect.
In this case for the purpose of hypothesis testing I will use the SES (Socio Economic status) variable as categorical vacriable
AlzheimerData2 <- AlzheimerData1
AlzheimerData2[sapply(AlzheimerData1, is.numeric)] <- lapply(AlzheimerData1[sapply(AlzheimerData1, is.numeric)], as.factor)
AlzheimerData2$FactorSES <- factor(AlzheimerData1$SES,
levels = c("1", "2", "3"),
labels = c("low", "middle", "high"))
head(AlzheimerData2)
## Group Gender Age EDUC SES MMSE CDR eTIV nWBV ASF FactorSES
## 1 Nondemented M 87 14 2 27 0 1987 0.696 0.883 middle
## 2 Nondemented M 88 14 2 30 0 2004 0.681 0.876 middle
## 3 Nondemented F 88 18 3 28 0 1215 0.71 1.444 high
## 4 Nondemented F 90 18 3 27 0 1200 0.718 1.462 high
## 5 Nondemented M 80 12 4 28 0 1689 0.712 1.039 <NA>
## 6 Nondemented M 83 12 4 29 0.5 1701 0.711 1.032 <NA>
H0 : There is no association between Socio Economic status and Group (dementia status).
H1 : There is association between Socio Economic status and Group (dementia status).
AlzheimerData3 <- subset(AlzheimerData2, select = c(FactorSES, Group))
drop_na(AlzheimerData3)
## FactorSES Group
## 1 middle Nondemented
## 2 middle Nondemented
## 3 high Nondemented
## 4 high Nondemented
## 5 middle Nondemented
## 6 middle Nondemented
## 7 middle Demented
## 8 middle Demented
## 9 high Demented
## 10 high Demented
## 11 middle Nondemented
## 12 middle Nondemented
## 13 middle Nondemented
## 14 high Demented
## 15 high Demented
## 16 high Nondemented
## 17 high Nondemented
## 18 high Nondemented
## 19 high Nondemented
## 20 low Converted
## 21 low Converted
## 22 low Converted
## 23 low Converted
## 24 low Converted
## 25 low Converted
## 26 low Demented
## 27 low Demented
## 28 high Nondemented
## 29 high Nondemented
## 30 high Demented
## 31 high Demented
## 32 high Nondemented
## 33 high Nondemented
## 34 high Nondemented
## 35 high Nondemented
## 36 middle Demented
## 37 middle Demented
## 38 low Nondemented
## 39 low Nondemented
## 40 high Converted
## 41 high Converted
## 42 high Converted
## 43 high Demented
## 44 high Demented
## 45 low Nondemented
## 46 low Nondemented
## 47 low Nondemented
## 48 low Nondemented
## 49 middle Demented
## 50 middle Demented
## 51 low Converted
## 52 low Converted
## 53 low Converted
## 54 high Nondemented
## 55 high Nondemented
## 56 low Nondemented
## 57 low Nondemented
## 58 middle Demented
## 59 middle Demented
## 60 middle Nondemented
## 61 middle Nondemented
## 62 low Demented
## 63 low Demented
## 64 low Demented
## 65 low Demented
## 66 low Demented
## 67 high Nondemented
## 68 high Nondemented
## 69 high Nondemented
## 70 low Nondemented
## 71 low Nondemented
## 72 low Nondemented
## 73 middle Nondemented
## 74 middle Nondemented
## 75 high Nondemented
## 76 high Nondemented
## 77 low Converted
## 78 low Converted
## 79 high Nondemented
## 80 high Nondemented
## 81 middle Nondemented
## 82 middle Nondemented
## 83 middle Nondemented
## 84 middle Nondemented
## 85 middle Nondemented
## 86 high Demented
## 87 high Demented
## 88 high Demented
## 89 low Nondemented
## 90 low Nondemented
## 91 low Nondemented
## 92 middle Nondemented
## 93 middle Nondemented
## 94 middle Nondemented
## 95 low Demented
## 96 low Demented
## 97 high Nondemented
## 98 high Nondemented
## 99 middle Nondemented
## 100 middle Nondemented
## 101 low Nondemented
## 102 low Nondemented
## 103 low Nondemented
## 104 low Nondemented
## 105 low Nondemented
## 106 middle Demented
## 107 middle Demented
## 108 high Nondemented
## 109 high Nondemented
## 110 high Nondemented
## 111 high Nondemented
## 112 high Nondemented
## 113 middle Nondemented
## 114 middle Nondemented
## 115 middle Nondemented
## 116 middle Nondemented
## 117 middle Nondemented
## 118 low Nondemented
## 119 low Nondemented
## 120 low Nondemented
## 121 middle Demented
## 122 middle Demented
## 123 middle Demented
## 124 middle Nondemented
## 125 middle Nondemented
## 126 low Demented
## 127 low Demented
## 128 middle Demented
## 129 middle Demented
## 130 middle Nondemented
## 131 middle Nondemented
## 132 middle Nondemented
## 133 middle Converted
## 134 middle Converted
## 135 low Nondemented
## 136 low Nondemented
## 137 low Nondemented
## 138 low Nondemented
## 139 low Nondemented
## 140 high Nondemented
## 141 high Nondemented
## 142 middle Nondemented
## 143 middle Nondemented
## 144 middle Nondemented
## 145 middle Nondemented
## 146 middle Nondemented
## 147 high Demented
## 148 high Demented
## 149 high Demented
## 150 low Converted
## 151 low Converted
## 152 low Converted
## 153 low Demented
## 154 low Demented
## 155 low Demented
## 156 low Demented
## 157 high Demented
## 158 high Demented
## 159 middle Demented
## 160 middle Demented
## 161 high Demented
## 162 high Demented
## 163 middle Nondemented
## 164 middle Nondemented
## 165 middle Nondemented
## 166 middle Nondemented
## 167 middle Nondemented
## 168 middle Nondemented
## 169 middle Nondemented
## 170 high Demented
## 171 high Demented
## 172 high Nondemented
## 173 high Nondemented
## 174 high Demented
## 175 high Demented
## 176 high Nondemented
## 177 high Nondemented
## 178 high Nondemented
## 179 low Converted
## 180 low Converted
## 181 low Converted
## 182 low Converted
## 183 low Converted
## 184 low Nondemented
## 185 low Nondemented
## 186 low Nondemented
## 187 low Nondemented
## 188 low Nondemented
## 189 middle Converted
## 190 middle Converted
## 191 high Converted
## 192 high Converted
## 193 middle Nondemented
## 194 middle Nondemented
## 195 middle Demented
## 196 middle Demented
## 197 middle Nondemented
## 198 middle Nondemented
## 199 low Demented
## 200 low Demented
## 201 high Demented
## 202 high Demented
## 203 high Demented
## 204 middle Nondemented
## 205 middle Nondemented
## 206 high Nondemented
## 207 high Nondemented
## 208 middle Nondemented
## 209 middle Nondemented
## 210 middle Nondemented
## 211 low Converted
## 212 low Converted
## 213 high Converted
## 214 high Converted
## 215 middle Demented
## 216 middle Demented
## 217 middle Nondemented
## 218 middle Nondemented
## 219 middle Nondemented
## 220 middle Nondemented
## 221 middle Nondemented
## 222 middle Nondemented
## 223 high Demented
## 224 high Demented
## 225 middle Nondemented
## 226 middle Nondemented
## 227 middle Nondemented
## 228 low Nondemented
## 229 low Nondemented
## 230 low Nondemented
## 231 low Nondemented
## 232 middle Demented
## 233 middle Demented
## 234 high Demented
## 235 high Demented
## 236 low Nondemented
## 237 low Nondemented
## 238 low Nondemented
## 239 middle Demented
## 240 middle Demented
## 241 low Demented
## 242 low Demented
## 243 high Demented
## 244 high Demented
## 245 low Nondemented
## 246 low Nondemented
## 247 high Nondemented
## 248 high Nondemented
## 249 high Nondemented
## 250 low Demented
## 251 low Demented
## 252 middle Converted
## 253 middle Converted
## 254 middle Converted
## 255 high Nondemented
## 256 high Nondemented
## 257 middle Nondemented
## 258 middle Nondemented
## 259 middle Nondemented
## 260 low Demented
## 261 low Demented
## 262 middle Nondemented
## 263 middle Nondemented
## 264 middle Nondemented
## 265 middle Nondemented
## 266 high Demented
## 267 high Demented
## 268 low Demented
## 269 low Demented
## 270 low Demented
## 271 middle Nondemented
## 272 middle Nondemented
## 273 middle Nondemented
head(AlzheimerData3)
## FactorSES Group
## 1 middle Nondemented
## 2 middle Nondemented
## 3 high Nondemented
## 4 high Nondemented
## 5 <NA> Nondemented
## 6 <NA> Nondemented
results <- chisq.test(AlzheimerData3$FactorSES, AlzheimerData3$Group,
correct = TRUE)
results
##
## Pearson's Chi-squared test
##
## data: AlzheimerData3$FactorSES and AlzheimerData3$Group
## X-squared = 21.067, df = 4, p-value = 0.0003071
addmargins(results$observed)
## AlzheimerData3$Group
## AlzheimerData3$FactorSES Converted Demented Nondemented Sum
## low 21 26 41 88
## middle 7 25 71 103
## high 7 33 42 82
## Sum 35 84 154 273
round(results$expected, 2)
## AlzheimerData3$Group
## AlzheimerData3$FactorSES Converted Demented Nondemented
## low 11.28 27.08 49.64
## middle 13.21 31.69 58.10
## high 10.51 25.23 46.26
round(results$residuals, 2)
## AlzheimerData3$Group
## AlzheimerData3$FactorSES Converted Demented Nondemented
## low 2.89 -0.21 -1.23
## middle -1.71 -1.19 1.69
## high -1.08 1.55 -0.63
The assumptions are not violated. We can reject the null hypothesis at p < 5%.