library(DT)
library(tidyverse)
library(multcomp)
library(ggmosaic)
library(psych)
library(MuMIn)
library(ggplot2)
library(pander)
The schizophrenia dataset has 220 observations on 4 variables. The observations in the dataset were collected in a follow-up study of women patients with schizophrenia in months at interval of 0, 2, 6, 8 and 10 after hospitalization. The response variable is disorder, that is whether the disease (typical symptoms, etc) was absent or present during each of the month when data was collected. The single covariate is onset which groups the women patients into two, indicating early onset of the disorder (ages less than 20) and late onset (ages greater than 20). The question on the table is to determine whether the course of the illness differs between patients with early and late onset. In medicine, course of illness or illness trajectory explains the path or development of a disease condition from time of diagnosis to death of patients. Hence, my approach toward answering the question is to determine the difference in the disorder (present or absent) between the two groups (ages < 20 and ages > 20) for each of the month the data was collected.
setwd('D:\\R-STUDIO\\')
schizophrenia <- read.csv(file = 'schizophrenia.csv')[-1]
datatable(schizophrenia,filter = "top")
All factor variables were converted to numeric variables. The single covariate, onset consists of the two groups of patients (ages < 20 and ages > 20). The two categories of women patients were separated using the code in the chunk below. Summary statistic was then performed on each group to have a fair idea of the structure of the groups. From the output below, 160 patients with ages less than 20 were involved in the studies whereas only 60 patients with ages greater than 20 participated. Considering the response which is thought as disorder (absent or present of the disorder), the number of patients under ages < 20 with the disorder present is 55 (56.875%) out of the 160 (100%) patients in the group. The number of patients with the disorder absent (whether cured or other factors) is 91 (34.375) out of 160 patients whereas 14 (8.75%) of the patients are missing. However, 41 (68.33%) of the 60 patients in the group with ages > 20 have the disease present whereas 17 (28.33) of the 60 patients are without the disease (absent) and 2 (3.33%) of the patients are missing. Since the number of patients in the two categories of the onset of the disorder is not equal, percentages are used for easy comparison. From the summary statistics, it is observed that the course of illness differs between patients with early (<20) and late (> 20) onset of the disorder. This is because, the percentage of patients with the disease present and absent among the two groups in the onset variable differs
schizophrenia$Onset <- ifelse(schizophrenia$onset=='< 20 yrs', 0 , 1)
schizophrenia$Disorder<- ifelse(is.na(schizophrenia$disorder), 'empty', ifelse(schizophrenia$disorder=='present','present',ifelse(schizophrenia$disorder=='absent','absent','')))
schizophrenia$Disorder_num<- as.numeric(ifelse(is.na(schizophrenia$disorder), '2', ifelse(schizophrenia$disorder=='present','0',ifelse(schizophrenia$disorder=='absent','1',''))))
lessthan20 <- subset(schizophrenia, onset == "< 20 yrs")
summary(lessthan20[,-c(1,3,5:6)])
## onset month Disorder_num
## Length:160 Min. : 0.0 Min. :0.0000
## Class :character 1st Qu.: 2.0 1st Qu.:0.0000
## Mode :character Median : 6.0 Median :1.0000
## Mean : 5.2 Mean :0.7438
## 3rd Qu.: 8.0 3rd Qu.:1.0000
## Max. :10.0 Max. :2.0000
morethan20 <- subset(schizophrenia, onset == "> 20 yrs")
summary(morethan20[,-c(1,3,5:6)])
## onset month Disorder_num
## Length:60 Min. : 0.0 Min. :0.00
## Class :character 1st Qu.: 2.0 1st Qu.:0.00
## Mode :character Median : 6.0 Median :1.00
## Mean : 5.2 Mean :0.75
## 3rd Qu.: 8.0 3rd Qu.:1.00
## Max. :10.0 Max. :2.00
Boxplot was used to visually inspect the dataset. In figure 2.1, it is observed that the number of patients with the disorder present, and absent in the patient group with ages less than 20 years are more than the patient group with ages greater than 20 years. Also, the number of patients missing in the patient group with ages less than 20 is more than in the patient group with ages greater than 20 years. However, some few outliers are observed in the patient group with ages greater than 20 years.
ggplot(schizophrenia, aes(x=Disorder, y=month,fill=Disorder)) + geom_boxplot() + facet_grid(.~onset)+ stat_boxplot(geom='errorbar', linetype=1, width=0.5)+ theme_bw() + facet_grid(~onset) +labs(x='Disorder',y='Months')+ scale_fill_discrete(guide=FALSE)
Figure 2.1: Boxplot of Disorder by Month & Onset
mosaicplot(onset~month + Disorder, data=schizophrenia,color=2:4,las=1, main='Mosaic Plot of Disorder(Schizophrenia) by Month & Onset', xlab='Disorder',ylab='Month')
Figure 2.2: Mosaic Plot of Disorder(Schizophrenia) by Month & Onset
The mosaic plot is used to visually inspect the demographics of the disorder condition in the two patient groups with respect to changes in month. At month 0, which is the baseline for the studies, it is observed from figure 2.2 that the number of patients with the disorder present and those with the disorder absent in the patient group with ages < 20 (early onset) are more than those in the patient group with ages > 20 (late onset). However, there are no missing patients in either group. At month 2, the number of patients with the disorder absent in the early onset category remained approximately the same. However, the number of the patients with the disease present decreased with an increase in the number of missing patients whereas in the late onset category, the number of patients with the disorder present decreased with an increase in the number of patients with the disorder absent and have no missing patients. In month 6, the number of patients with the disorder present decreased tremendously with a corresponding increase in the number of patients with the disorder absent and patients missing. This is observed in the early onset category with similar observation in month 8 and 10. On the other hand, in month 6, the number of patients with the disorder present in the late onset category decreased with an increase in the number of patients with the disorder absent, and again with no missing patient. However, in month 8,there are no patients with the disorder present instead a number of missing patients are observed. Surprisely, patients with the disorder present are observed in the 10th month of the illness whereas the number of patients with the disorder absent decreased compared to those observed in the 8th month.
New_schizo_dat<- data.frame (
onset = schizophrenia$onset,
month =schizophrenia$month,
Disorder =schizophrenia$Disorder
)
ggplot(data=New_schizo_dat) + geom_mosaic(aes(x=product(month, onset), fill=Disorder, offset=0.5)) + labs(title='Mosaic Plot of Onset Schizophrenia Explained by Month & Disorder',x='Onset Category',y='Months') + scale_y_continuous(position='left',labels=c(0,2,6,8,10))
Figure 2.3: Mosaic Plot of Onset Schizophrenia Explained by Month & Disorder
Figure 2.3 shows that the course of the illness differs between patients with early and late onset of schizophrenia with patients suffering a lot form early onset(less than 20) compared to late onset of the disease condition.
Getting the summary statistics of 0 month for onset less than 20 yrs and greater than 20 yrs
summary(lessthan20[lessthan20$month == 0, 7]) / sum(lessthan20$month == 0)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.00000 0.00000 0.01172 0.03125 0.03125
summary(morethan20[morethan20$month == 0, 7]) / sum(morethan20$month == 0)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.00000 0.00000 0.03472 0.08333 0.08333
Getting the summary statistics of onset for month 2 for onset less than 20 yrs and greater than 20 yrs
summary(lessthan20[lessthan20$month == 2, 7]) / sum(lessthan20$month == 2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.00000 0.00000 0.01367 0.03125 0.06250
summary(morethan20[morethan20$month == 2, 7]) / sum(morethan20$month == 2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.00000 0.04167 0.04167 0.08333 0.08333
Getting the summary statistics of onset for month 6 for onset less than 20 yrs and greater than 20 yrs
summary(lessthan20[lessthan20$month == 6, 7]) / sum(lessthan20$month == 6)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.02344 0.03125 0.02734 0.03125 0.06250
summary(morethan20[morethan20$month == 6, 7]) / sum(morethan20$month == 6)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.06250 0.08333 0.06250 0.08333 0.08333
Getting the summary statistics of onset for month 8 for onset less than 20 yrs and greater than 20 yrs
summary(lessthan20[lessthan20$month == 8, 7]) / sum(lessthan20$month == 8)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.03125 0.03125 0.03027 0.03125 0.06250
summary(morethan20[morethan20$month == 8, 7]) / sum(morethan20$month == 8)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.08333 0.08333 0.08333 0.09028 0.08333 0.16667
Getting the summary statistics of onset for month 10 for onset less than 20 yrs and greater than 20 yrs
summary(lessthan20[lessthan20$month == 10, 7]) / sum(lessthan20$month == 10)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.03125 0.03125 0.03320 0.03125 0.06250
summary(morethan20[morethan20$month == 10, 7]) / sum(morethan20$month == 10)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.08333 0.08333 0.08333 0.08333 0.16667
Summary statistics on the course of the illness for each of the month computes numeric values to confirm the result observed from the mosaic plots as well as its interpretation. It is observed that the illness trajectory differs between the early onset (ages <20) and the late onset (ages > 20) with patients suffering a lot form early onset compared to late onset of the disease condition
The schizophrenia dataset has 220 observations on 4 variables with an unequal number of patients in the onset variable (160 patients for ages less than 20 and 60 patients for ages greater than 20). This somehow makes the data and its analysis bias. Output from the mosaic plots and the summary statistics on the course of illness in the various months shows that at month 8, there are no patients with the disorder present in the late onset category instead a number of missing patients are observed. However, patients with the disorder present are observed in the 10th month of the disorder whereas the number of patients with the disorder absent decreased compared to those observed in the 8th month. My interpretation for this observation is that in the 8th month of the disorder among the late onset category, most of the patients with the disorder recovered (absent of any symptoms of schizophrenia) and the few that did not recover died. However, at the 10th month, symptoms of the disorder began to show up in some of the recovered patients again. In general, analysis of the data shows that the course of the illness differs from the early and the late onset category with the late onset category having a higher recovery rate from the disorder compared to the early onset category.