library(readr)
library(psych)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(readxl)
library(tidyr)
Alzheimer <- read_csv("~/Library/CloudStorage/OneDrive-Personal/IMB/Multivariate analysis/ALzheimer features - R/alzheimer.csv")
## Rows: 373 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Group, M/F
## dbl (8): Age, EDUC, SES, MMSE, CDR, eTIV, nWBV, ASF
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(Alzheimer)
## # A tibble: 6 × 10
## Group `M/F` Age EDUC SES MMSE CDR eTIV nWBV ASF
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Nondemented M 87 14 2 27 0 1987 0.696 0.883
## 2 Nondemented M 88 14 2 30 0 2004 0.681 0.876
## 3 Demented M 75 12 NA 23 0.5 1678 0.736 1.05
## 4 Demented M 76 12 NA 28 0.5 1738 0.713 1.01
## 5 Demented M 80 12 NA 22 0.5 1698 0.701 1.03
## 6 Nondemented F 88 18 3 28 0 1215 0.71 1.44
I downloaded this data set from the site Kaggle (https://www.kaggle.com/datasets/brsdincer/alzheimer-features), published under the title ” Alzheimer Features For Analysis”.
I decided to exclude the socioeconomic status (SES) variable from my firther analysis and check if we have any Not Applicable or duplicated values which I would then exclude from further statistics.
sum(duplicated(Alzheimer))
## [1] 0
sum(is.na(Alzheimer))
## [1] 21
We can see that there are zero duplicates and 21 values. In the next step I will drop the missing NA values and reorder the data.
Alzheimer1 <- Alzheimer[,c(-5)]
AlzheimerData <- drop_na(Alzheimer1)
colnames(AlzheimerData)[2]<- c("Sex")
head(AlzheimerData)
## # A tibble: 6 × 9
## Group Sex Age EDUC MMSE CDR eTIV nWBV ASF
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Nondemented M 87 14 27 0 1987 0.696 0.883
## 2 Nondemented M 88 14 30 0 2004 0.681 0.876
## 3 Demented M 75 12 23 0.5 1678 0.736 1.05
## 4 Demented M 76 12 28 0.5 1738 0.713 1.01
## 5 Demented M 80 12 22 0.5 1698 0.701 1.03
## 6 Nondemented F 88 18 28 0 1215 0.71 1.44
Description of the data - data set AlzheimerData:
In this dataset we have 371 observations of 9 variables.
Group –> Class (Demented, Nondemented, Converted)
Sex –> M -> Male, F -> Female
Age –> Age
EDUC –> Years of Education
MMSE –> Mini Mental State Examination: The Mini‐Mental State Examination (Folstein 1975), or MMSE, is a simple pen‐and‐paper test of cognitive function based on a total possible score of 30 points; it includes tests of orientation, concentration, attention, verbal memory, naming and visuospatial skills.The maximum score on the test is 30 and scores lower than 23 points is a cutpoint for dementia.
CDR –> Clinical Dementia Rating: The Clinical Dementia Rating (CDR®) Dementia Staging Instrument in one aspect is a 5-point scale used to characterize six domains of cognitive and functional performance applicable to Alzheimer disease and related dementias: Memory, Orientation, Judgment & Problem Solving, Community Affairs, Home & Hobbies, and Personal Care. The necessary information to make each rating is obtained through a semi-structured interview of the patient and a reliable informant or collateral source (e.g., family member) referred to as the CDR® Assessment Protocol. Ratings are assigned on a 0–5 point scale, (0 = absent; 0.5 = questionable; 1= present, but mild; 2 = moderate; 3 = severe; 4 = profound; 5 = terminal).
eTIV –> Estimated Total Tntracranial Volume: Intracranial volume (ICV) is an important normalization measure used in morphometric analyses to correct for head size in studies of Alzheimer Disease (AD).It is usually measured with an MRI.Measured im mililiters.
nWBV –> Normalized Whole Brain Volume: Measuring total intracranial volume (TIV) allows whole-brain and regional volumetric measures to be normalized for head size. The TIV can be defined as the volume within the cranium, including the brain, meninges, and CSF (Cerebral spinal fluid).
ASF –> Atlas Scaling Factor: The Atlas Scaling Factor (ASF) was computed as the determinant of the affine transform connecting each individual to the atlas-representative template. The ASF represents the whole-brain volume expansion (or contraction) required to register each individual to the template.
The research question regarding this data set that I am thinking of using also in further analysis is How these variables are connected and wether there is a correlation between wether the individual is in group Demented, Nondemented or Converted
summary(AlzheimerData[,c(3,4,5,6,7,8,9)])
## Age EDUC MMSE CDR
## Min. :60.00 Min. : 6.00 Min. : 4.00 Min. :0.0000
## 1st Qu.:71.00 1st Qu.:12.00 1st Qu.:27.00 1st Qu.:0.0000
## Median :77.00 Median :15.00 Median :29.00 Median :0.0000
## Mean :77.02 Mean :14.61 Mean :27.34 Mean :0.2871
## 3rd Qu.:82.00 3rd Qu.:16.00 3rd Qu.:30.00 3rd Qu.:0.5000
## Max. :98.00 Max. :23.00 Max. :30.00 Max. :2.0000
## eTIV nWBV ASF
## Min. :1106 Min. :0.6440 Min. :0.876
## 1st Qu.:1358 1st Qu.:0.7000 1st Qu.:1.099
## Median :1471 Median :0.7290 Median :1.193
## Mean :1490 Mean :0.7295 Mean :1.194
## 3rd Qu.:1598 3rd Qu.:0.7560 3rd Qu.:1.292
## Max. :2004 Max. :0.8370 Max. :1.587
sum(AlzheimerData$Sex=="F")
## [1] 211
sum(AlzheimerData$Sex=="M")
## [1] 160
Age of the participants varies between minimum age of 60 years old and maximum age of 98 years old. The average (mean) age is 77 years (rounded from 77.2). The median of age is 77 years, which meand that hald of the participants were zounger and half of the participants were older than 77 years.
Of 371 observed participants, 211 of them were female and 160 of them were male.
If we look at MMSE (Mini Mental State Examination) the highest score reached was 30 points which is also the highest score on the scale and the lowest was 4. The q1 is 27, q2 or the median is 29, and q3 is 30. This means that 25% of the sample fall below the score of 27, and the rest above it; 50% of the sample fall below the score of 29, and 50% above it; and 75% of the sample falls below the score of 30, and 25% above 30 on the MMSE scale. Considering that the score of 24 or higher is classed as normal on the MMSE test, and score below 24 indicates signs of impairment, our sample of charactarized as demented should be less than 25% but we will take a look at that later.
The highest score on the CDR (Clinical Dementia Rating) was 2 which is classified as moderate and the lowest was 0 where we can also see that the median is 0 which indicates that half of the participants were rated on the CDR Assessment protocol with 0 which indicates intacct abilities and absentism of disabilities and half were rated with more than 0 meaning they show at maximum moderate disabilities connected with everyday activities in that are considered in connection with consequences of Alzheimer’s disease and other dementias.
Normalized whole brain value in which allows whole-brain and regional volumetric measures to be normalized for head size, among the participants the maximum nWBV was 0.837 and the minimum was 0.6440. Half of the participants had NWBV lower than 0.729 and half of them higher.
sum(AlzheimerData$Group=="Demented")
## [1] 144
sum(AlzheimerData$Group=="Nondemented")
## [1] 190
sum(AlzheimerData$Group=="Converted")
## [1] 37
The class of demented participants consisted of 144 participants, of nondemented 190 and 47 of those who are at high risk of converting to dementia.
str(AlzheimerData)
## tibble [371 × 9] (S3: tbl_df/tbl/data.frame)
## $ Group: chr [1:371] "Nondemented" "Nondemented" "Demented" "Demented" ...
## $ Sex : chr [1:371] "M" "M" "M" "M" ...
## $ Age : num [1:371] 87 88 75 76 80 88 90 80 83 85 ...
## $ EDUC : num [1:371] 14 14 12 12 12 18 18 12 12 12 ...
## $ MMSE : num [1:371] 27 30 23 28 22 28 27 28 29 30 ...
## $ CDR : num [1:371] 0 0 0.5 0.5 0.5 0 0 0 0.5 0 ...
## $ eTIV : num [1:371] 1987 2004 1678 1738 1698 ...
## $ nWBV : num [1:371] 0.696 0.681 0.736 0.713 0.701 0.71 0.718 0.712 0.711 0.705 ...
## $ ASF : num [1:371] 0.883 0.876 1.046 1.01 1.034 ...
AlzheimerData1 <- as.data.frame(unclass(AlzheimerData),
stringsAsFactors = TRUE)
str(AlzheimerData1)
## 'data.frame': 371 obs. of 9 variables:
## $ Group: Factor w/ 3 levels "Converted","Demented",..: 3 3 2 2 2 3 3 3 3 3 ...
## $ Sex : Factor w/ 2 levels "F","M": 2 2 2 2 2 1 1 2 2 2 ...
## $ Age : num 87 88 75 76 80 88 90 80 83 85 ...
## $ EDUC : num 14 14 12 12 12 18 18 12 12 12 ...
## $ MMSE : num 27 30 23 28 22 28 27 28 29 30 ...
## $ CDR : num 0 0 0.5 0.5 0.5 0 0 0 0.5 0 ...
## $ eTIV : num 1987 2004 1678 1738 1698 ...
## $ nWBV : num 0.696 0.681 0.736 0.713 0.701 0.71 0.718 0.712 0.711 0.705 ...
## $ ASF : num 0.883 0.876 1.046 1.01 1.034 ...
hist(AlzheimerData$eTIV,
main = "Estimated Total Intracranial Volume distribution",
xlab = "eTIV",
ylab = "Frequency",
col = "lightblue")
median(AlzheimerData$eTIV)
## [1] 1471
The median of estimated Total Intracranial Volume is 1471 ml which tells us that half of the participants have a higher eTIV volume and half have a lower volume. Looking at the histogram we can see that our graph is skewed to the right which tells us that the distribution of estimated TIV is higher on the left side of the graph.
ggplot(AlzheimerData1, aes(x=Group, y=MMSE)) +
geom_boxplot(fill="lightblue") +
xlab("Group")
From the boxplot we can see that participants classified as “Nondemented” had scores closest to 30 on the MMSE scale as it is expected according to the scale interpretation that the impairments connected to Alzheimer’s disease and dementia are associated with scores below 24 on the MMSE scale. However we can see that we have some outliers in the group Nondemented but none of them reach 25 points. On the other hand if we look at the group classified as Converted, we can see that the distribution of scores is broader than of those classified as Nondemented and some outliers also fall under the score of 25. In the group Demented, the distribution of scores is the broadest, some of them however reaching scores as high as participants of other two groups, while also some participants reached the scores as low as 4 which are indicating significantly impaired cogitive functions of participants.