The outpatient health survey was conducted to assess satisfaction levels of patients in our jhajjar outreach clinic. The questionaire was adapted from a pre-existing questionaire made of 23 questions with sub-domains evaluating Interpersonal Skills, Physical Environment , Availability, Quality and Accessibility on a Likert scale. One question evaluated overall global satisfaction with physician. Our Goal in this study is to
so lets get started
jhar %>% dplyr::dplyr::select(one:Twenty.three) %>% gather(key="Question" ,value="Score") %>% group_by(Question,Score) %>% count()
Let us see summary of scores
jhar6= read.csv("jhar6.csv")
jharx= read.csv("jharx.csv")
jhar= read.csv("jhar.csv")
jhar %>% dplyr::select(one:Twenty.three) %>% summary()
## one Two Three Four
## Min. :1.000 Min. :1.000 Min. :1.00 Min. :1.000
## 1st Qu.:4.000 1st Qu.:4.000 1st Qu.:4.00 1st Qu.:4.000
## Median :5.000 Median :5.000 Median :5.00 Median :5.000
## Mean :4.687 Mean :4.565 Mean :4.56 Mean :4.644
## 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.00 3rd Qu.:5.000
## Max. :5.000 Max. :5.000 Max. :5.00 Max. :5.000
## Five Six Seven Eight
## Min. :1.00 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:4.00 1st Qu.:4.000 1st Qu.:4.000 1st Qu.:4.000
## Median :5.00 Median :5.000 Median :4.000 Median :4.000
## Mean :4.46 Mean :4.393 Mean :4.119 Mean :4.266
## 3rd Qu.:5.00 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :5.00 Max. :5.000 Max. :5.000 Max. :5.000
## Nine Ten Eleven Twelve
## Min. :1.000 Min. :1.00 Min. :1.000 Min. :1.000
## 1st Qu.:4.000 1st Qu.:4.00 1st Qu.:4.000 1st Qu.:4.000
## Median :5.000 Median :5.00 Median :5.000 Median :4.000
## Mean :4.251 Mean :4.54 Mean :4.485 Mean :4.119
## 3rd Qu.:5.000 3rd Qu.:5.00 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :5.000 Max. :5.00 Max. :5.000 Max. :5.000
## Thirteen Fourteen Fifteen Sixteen
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.00
## 1st Qu.:4.000 1st Qu.:4.000 1st Qu.:4.000 1st Qu.:4.00
## Median :5.000 Median :5.000 Median :4.000 Median :5.00
## Mean :4.465 Mean :4.453 Mean :4.284 Mean :4.54
## 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.00
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :6.00
## Seventeen Eighteen Nineteen Twenty
## Min. :1.000 Min. :1.00 Min. :1.00 Min. :1.00
## 1st Qu.:4.000 1st Qu.:4.00 1st Qu.:4.00 1st Qu.:4.00
## Median :5.000 Median :5.00 Median :5.00 Median :5.00
## Mean :4.537 Mean :4.53 Mean :4.58 Mean :4.49
## 3rd Qu.:5.000 3rd Qu.:5.00 3rd Qu.:5.00 3rd Qu.:5.00
## Max. :5.000 Max. :5.00 Max. :5.00 Max. :5.00
## Twenty.one twenty.two Twenty.three
## Min. :1.00 Min. :1.000 Min. :1.000
## 1st Qu.:4.00 1st Qu.:4.000 1st Qu.:4.000
## Median :5.00 Median :5.000 Median :5.000
## Mean :4.51 Mean :4.577 Mean :4.532
## 3rd Qu.:5.00 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :5.00 Max. :5.000 Max. :5.000
Most of the questions have negative skew and bimodal peaks at 4 and 5 indicating high overall satisfaction. This is indicated by negative skew( Median score is 5 and higher than mean across all score) statistic calculated here .
jhar %>% dplyr::select(one:Twenty.three) %>%map(~skewness(.))
## $one
## [1] -2.595851
##
## $Two
## [1] -1.799874
##
## $Three
## [1] -1.662334
##
## $Four
## [1] -1.985509
##
## $Five
## [1] -1.70078
##
## $Six
## [1] -1.405989
##
## $Seven
## [1] -0.8816675
##
## $Eight
## [1] -1.387662
##
## $Nine
## [1] -1.117693
##
## $Ten
## [1] -1.294795
##
## $Eleven
## [1] -1.582273
##
## $Twelve
## [1] -1.196617
##
## $Thirteen
## [1] -1.403634
##
## $Fourteen
## [1] -1.514912
##
## $Fifteen
## [1] -1.311577
##
## $Sixteen
## [1] -1.485145
##
## $Seventeen
## [1] -1.485481
##
## $Eighteen
## [1] -1.759675
##
## $Nineteen
## [1] -1.686639
##
## $Twenty
## [1] -1.780721
##
## $Twenty.one
## [1] -1.612736
##
## $twenty.two
## [1] -1.858309
##
## $Twenty.three
## [1] -1.54983
Let us look at joy plot which indicate bimodal peaks at 4 and 5 indicating Very satisfied(5) or Satisfied patients(4).
## Picking joint bandwidth of 0.182
Let us look at percent of neutral/dissasatisfied responses across various questions
jhar %>% dplyr::select(one:Twenty.three) %>% gather(key="Question" ,value="Score") %>% group_by(Question) %>% summarise(Neutral_dissatisfied_percent = mean(Score<4)) %>% arrange(desc(Neutral_dissatisfied_percent))
## # A tibble: 23 x 2
## Question Neutral_dissatisfied_percent
## <chr> <dbl>
## 1 Nine 0.21393035
## 2 Twelve 0.20646766
## 3 Seven 0.17661692
## 4 Fifteen 0.13930348
## 5 Six 0.13930348
## 6 Eight 0.11691542
## 7 Thirteen 0.08208955
## 8 Twenty.one 0.06965174
## 9 Twenty 0.06467662
## 10 Five 0.05721393
## # ... with 13 more rows
So question 6,7,8,9,12 and 15 have neutral/dissatisfied reponses from greater than 10% of patients and we need to improve on these parameters.
Let us see which questions have the highest percent of very satisfied(5) score.
jhar %>% dplyr::select(one:Twenty.three) %>% gather(key="Question" ,value="Score") %>% group_by(Question) %>% summarise(very_satisfied_percent = mean(Score==5)) %>% arrange(desc(very_satisfied_percent))
## # A tibble: 23 x 2
## Question very_satisfied_percent
## <chr> <dbl>
## 1 one 0.7263682
## 2 Four 0.6741294
## 3 Nineteen 0.6368159
## 4 twenty.two 0.6318408
## 5 Three 0.6218905
## 6 Two 0.6218905
## 7 Twenty.three 0.5995025
## 8 Twenty.one 0.5945274
## 9 Ten 0.5920398
## 10 Eighteen 0.5895522
## # ... with 13 more rows
So we are doing well on 1,2,3,4 and 17-22 questions and we need to maintain our performance in these areas.
jhar6 %>% dplyr::select(one:Twenty.three) %>% sjp.likert(values = "hide")
Let us look at distribution of score across domains . First the visualisation
jhar %>% dplyr::select(Accessibility,Quality,Physical_Environment,Interpersonal,Availability,MeanScore) %>% rename(Interpersonal_Skill = Interpersonal,Total_AverageScore=MeanScore) %>% gather("Parameter","Score",1:6) %>% ggplot(aes(x = Score, y = as.factor(Parameter),fill=Parameter)) + geom_joy()+
labs(x = "Score",y = "Domains")
## Picking joint bandwidth of 0.147
Now statistics part
jhar %>% dplyr::select(Accessibility,Quality,Physical_Environment,Interpersonal,Availability,MeanScore) %>% summary()
## Accessibility Quality Physical_Environment Interpersonal
## Min. :1.200 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:4.000 1st Qu.:4.250 1st Qu.:4.000 1st Qu.:4.250
## Median :4.400 Median :4.625 Median :4.500 Median :4.750
## Mean :4.298 Mean :4.537 Mean :4.402 Mean :4.614
## 3rd Qu.:4.800 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## Availability MeanScore
## Min. :1.000 Min. :1.043
## 1st Qu.:4.000 1st Qu.:4.130
## Median :4.500 Median :4.543
## Mean :4.368 Mean :4.464
## 3rd Qu.:5.000 3rd Qu.:4.870
## Max. :5.000 Max. :5.043
We need to conduct an ANOVA test to see it..and Tukey’s post Hoc correction to see intergroup differences
df =jhar %>% dplyr::select(Accessibility,Quality,Physical_Environment,Interpersonal,Availability) %>% gather(key="Domains",value = "Score")
fitaov = aov(Score~Domains,data=df)
summary(fitaov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Domains 4 26.6 6.661 21.05 <2e-16 ***
## Residuals 2005 634.4 0.316
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
So on ANOVA we see that there is significant difference across Domain Scores. (p<0.00001). Now we need to determine post-hoc difference after adjusting for multiple comparison by Tukey’s Method
TukeyHSD(fitaov)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Score ~ Domains, data = df)
##
## $Domains
## diff lwr upr
## Availability-Accessibility 0.07014925 -0.038178796 0.17847730
## Interpersonal-Accessibility 0.31579602 0.207467971 0.42412407
## Physical_Environment-Accessibility 0.10435323 -0.003974815 0.21268128
## Quality-Accessibility 0.23899254 0.130664488 0.34732059
## Interpersonal-Availability 0.24564677 0.137318717 0.35397482
## Physical_Environment-Availability 0.03420398 -0.074124069 0.14253203
## Quality-Availability 0.16884328 0.060515234 0.27717133
## Physical_Environment-Interpersonal -0.21144279 -0.319770835 -0.10311474
## Quality-Interpersonal -0.07680348 -0.185131532 0.03152457
## Quality-Physical_Environment 0.13463930 0.026311254 0.24296735
## p adj
## Availability-Accessibility 0.3926757
## Interpersonal-Accessibility 0.0000000
## Physical_Environment-Accessibility 0.0653889
## Quality-Accessibility 0.0000000
## Interpersonal-Availability 0.0000000
## Physical_Environment-Availability 0.9106676
## Quality-Availability 0.0002115
## Physical_Environment-Interpersonal 0.0000011
## Quality-Interpersonal 0.2986533
## Quality-Physical_Environment 0.0063212
we see Quality and Interpersonal skills domains have significant higher scores than other domains like Availability,Accessibility and Physical Environment though there is not a statistically significant difference between Quality and Interpersonal Skills.
library(Hmisc)
flattenCorrMatrix <- function(cormat, pmat) {
ut <- upper.tri(cormat)
data.frame(
row = rownames(cormat)[row(cormat)[ut]],
column = rownames(cormat)[col(cormat)[ut]],
cor =(cormat)[ut],
p = pmat[ut]
)
}
df2 = jhar %>% dplyr::select(Age,Income,Occupation,Education,Accessibility,Quality,Physical_Environment,Interpersonal,Availability)
res2<-rcorr(as.matrix(df2[,1:9]))
flattenCorrMatrix(res2$r, round(res2$P,3)) %>% arrange(desc(p))
## row column cor p
## 1 Age Income -0.01599744 0.749
## 2 Income Education 0.02827840 0.572
## 3 Age Accessibility -0.03505922 0.483
## 4 Income Occupation -0.05451126 0.276
## 5 Occupation Accessibility 0.06554248 0.190
## 6 Occupation Interpersonal 0.06619538 0.185
## 7 Age Physical_Environment -0.07169959 0.151
## 8 Age Occupation -0.08336982 0.095
## 9 Occupation Quality 0.09402740 0.060
## 10 Occupation Availability 0.09441951 0.059
## 11 Age Interpersonal -0.09851158 0.048
## 12 Age Quality -0.11185898 0.025
## 13 Education Interpersonal 0.11821669 0.018
## 14 Occupation Physical_Environment 0.12015326 0.016
## 15 Income Physical_Environment 0.13437633 0.007
## 16 Education Accessibility 0.13577312 0.006
## 17 Income Quality 0.14421451 0.004
## 18 Income Interpersonal 0.14993410 0.003
## 19 Age Availability -0.14545915 0.003
## 20 Age Education -0.43327329 0.000
## 21 Occupation Education 0.38212746 0.000
## 22 Income Accessibility 0.19889995 0.000
## 23 Education Quality 0.18045050 0.000
## 24 Accessibility Quality 0.67797613 0.000
## 25 Education Physical_Environment 0.19200496 0.000
## 26 Accessibility Physical_Environment 0.70637292 0.000
## 27 Quality Physical_Environment 0.80188841 0.000
## 28 Accessibility Interpersonal 0.65991789 0.000
## 29 Quality Interpersonal 0.75494307 0.000
## 30 Physical_Environment Interpersonal 0.70554692 0.000
## 31 Income Availability 0.20687886 0.000
## 32 Education Availability 0.18588264 0.000
## 33 Accessibility Availability 0.65701002 0.000
## 34 Quality Availability 0.73149651 0.000
## 35 Physical_Environment Availability 0.72396451 0.000
## 36 Interpersonal Availability 0.64695626 0.000
We see on an average higher age,Education,Income and Occupation category is linked to higher Domain score
Let us plot a correlogram
library(corrplot)
## corrplot 0.84 loaded
df3 = jhar %>% dplyr::select(Age,Income,Occupation,Education,one:Twenty.three)
M<-cor(df3)
head(round(M,2))
## Age Income Occupation Education one Two Three Four Five
## Age 1.00 -0.02 -0.08 -0.43 -0.08 -0.06 -0.08 -0.09 -0.16
## Income -0.02 1.00 -0.05 0.03 0.21 0.09 0.11 0.07 0.14
## Occupation -0.08 -0.05 1.00 0.38 0.03 0.03 0.06 0.09 0.05
## Education -0.43 0.03 0.38 1.00 0.07 0.08 0.05 0.18 0.15
## one -0.08 0.21 0.03 0.07 1.00 0.43 0.55 0.37 0.41
## Two -0.06 0.09 0.03 0.08 0.43 1.00 0.50 0.56 0.44
## Six Seven Eight Nine Ten Eleven Twelve Thirteen Fourteen
## Age 0.00 0.04 -0.02 0.00 0.02 -0.02 -0.10 -0.09 -0.12
## Income 0.11 0.24 0.19 0.01 0.09 0.12 0.07 0.16 0.14
## Occupation 0.04 0.02 0.09 0.02 0.09 0.07 0.09 0.14 0.11
## Education 0.13 0.06 0.05 0.07 0.15 0.14 0.17 0.15 0.19
## one 0.39 0.41 0.36 0.14 0.31 0.39 0.39 0.47 0.47
## Two 0.47 0.40 0.32 0.16 0.37 0.48 0.49 0.48 0.47
## Fifteen Sixteen Seventeen Eighteen Nineteen Twenty Twenty.one
## Age -0.13 -0.07 -0.10 -0.12 -0.06 -0.10 -0.12
## Income 0.21 0.13 0.12 0.11 0.08 0.14 0.13
## Occupation 0.06 0.07 0.10 0.05 0.04 0.09 0.12
## Education 0.13 0.12 0.11 0.17 0.09 0.15 0.16
## one 0.43 0.47 0.43 0.52 0.40 0.43 0.42
## Two 0.42 0.43 0.40 0.43 0.44 0.44 0.52
## twenty.two Twenty.three
## Age -0.06 -0.07
## Income 0.07 0.12
## Occupation 0.08 0.04
## Education 0.18 0.14
## one 0.39 0.41
## Two 0.48 0.52
corrplot(M, method="number")
cor.mtest <- function(mat, ...) {
mat <- as.matrix(mat)
n <- ncol(mat)
p.mat<- matrix(NA, n, n)
diag(p.mat) <- 0
for (i in 1:(n - 1)) {
for (j in (i + 1):n) {
tmp <- cor.test(mat[, i], mat[, j], ...)
p.mat[i, j] <- p.mat[j, i] <- tmp$p.value
}
}
colnames(p.mat) <- rownames(p.mat) <- colnames(mat)
p.mat
}
# matrix of the p-value of the correlation
p.mat <- cor.mtest(df3)
corrplot(M, type="upper", order="hclust",
p.mat = p.mat, sig.level = 0.01, insig = "blank")
Now having visualised correlation plots and predictors affecting them , let’s go to factors affecting mean score
library(arm)
fit=lm(MeanScore~Interpersonal+Accessibility+Physical_Environment+Availability+Quality+Sex+Age+Income+Education,data=jhar6)
summary(fit)
##
## Call:
## lm(formula = MeanScore ~ Interpersonal + Accessibility + Physical_Environment +
## Availability + Quality + Sex + Age + Income + Education,
## data = jhar6)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.28504 -0.02039 0.00438 0.01799 1.23743
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.350e-01 3.953e-02 8.474 4.86e-16 ***
## Interpersonal 1.230e-01 1.211e-02 10.163 < 2e-16 ***
## Accessibility 1.997e-01 1.011e-02 19.745 < 2e-16 ***
## Physical_Environment 2.232e-01 1.074e-02 20.779 < 2e-16 ***
## Availability 8.339e-02 8.786e-03 9.492 < 2e-16 ***
## Quality 2.946e-01 1.372e-02 21.479 < 2e-16 ***
## SexM 3.087e-03 7.491e-03 0.412 0.680
## Age -1.172e-06 2.517e-04 -0.005 0.996
## Income 3.803e-03 3.543e-03 1.073 0.284
## Education 3.261e-03 2.442e-03 1.336 0.182
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07045 on 392 degrees of freedom
## Multiple R-squared: 0.9768, Adjusted R-squared: 0.9762
## F-statistic: 1830 on 9 and 392 DF, p-value: < 2.2e-16
We see that after controlling for Domain average score, age ,occupation and Income,education dont impact mean score
A cleaner output here
display(fit)
## lm(formula = MeanScore ~ Interpersonal + Accessibility + Physical_Environment +
## Availability + Quality + Sex + Age + Income + Education,
## data = jhar6)
## coef.est coef.se
## (Intercept) 0.33 0.04
## Interpersonal 0.12 0.01
## Accessibility 0.20 0.01
## Physical_Environment 0.22 0.01
## Availability 0.08 0.01
## Quality 0.29 0.01
## SexM 0.00 0.01
## Age 0.00 0.00
## Income 0.00 0.00
## Education 0.00 0.00
## ---
## n = 402, k = 10
## residual sd = 0.07, R-Squared = 0.98
It implies that one point improvement in Quality will lead to 0.29 point improvement in mean score while controlling for other variables. Corresponding values for other domains are 0.22 for physical environment , 0.20 for Accessibility, 0.08 for Availability and 0.12 for Interpersonal skills other variables are non-significant. It implies Quality and Physical environment play a major role in affecting average score in our study
Let us visualise the linear regression as forest plot to emphasise this impression.
sjp.lm(fit)
## Warning: 'sjstats::get_model_pval' is deprecated.
## Use 'p_value' instead.
## See help("Deprecated")
Since this core was adapted from a score used in thailand outpatient set up . One of our goals is to evaluate construct validity and if the domains differentiate between each other to five domains
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:arm':
##
## logit, rescale, sim
## The following object is masked from 'package:sjstats':
##
## phi
## The following object is masked from 'package:Hmisc':
##
## describe
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(GPArotation)
parallel <- fa.parallel(jharx, fm = 'minres', fa = 'fa')
## Parallel analysis suggests that the number of factors = 5 and the number of components = NA
The blue line shows eigenvalues of actual data and the two red lines (placed on top of each other) show simulated and resampled data. Here we look at the large drops in the actual data and spot the point where it levels off to the right. Also we locate the point of inflection – the point where the gap between simulated data and actual data tends to be minimum.
Looking at this plot and parallel analysis, anywhere between 1 to 3 factors factors would be good choice instead of five proposed in original survey.
In this case, we will dplyr::select oblique rotation (rotate = “oblimin”) as we believe that there is correlation in the factors. Note that Varimax rotation is used under the assumption that the factors are completely uncorrelated. We will use Ordinary Least Squared/Minres factoring (fm = “minres”), as it is known to provide results similar to Maximum Likelihood without assuming multivariate normal distribution and derives solutions through iterative eigendecomposition like principal axis.
fivefactor <- fa(jharx,nfactors = 5,rotate = "oblimin",fm="minres")
print(fivefactor)
sixfactor <- fa(jharx,nfactors = 6,rotate = "oblimin",fm="minres")
print(sixfactor)
print(sixfactor$loadings,cutoff = 0.3)
jharm = as.matrix(jharx)
cortest.bartlett(jharx)
## R was not square, finding R from data
## $chisq
## [1] 5295.678
##
## $p.value
## [1] 0
##
## $df
## [1] 276
For these data, Bartlett’s test is highly significant, χ 2 (253) = 5180, p < .00001, and therefore factor analysis is appropriate.
km =kmo(jharx)
list(km$overall,km$report,km$individual)
## [[1]]
## [1] 0.954155
##
## [[2]]
## [1] "The KMO test yields a degree of common variance marvelous."
##
## [[3]]
## X one Two Three Four
## 0.8271621 0.9600921 0.9688859 0.9619139 0.9619071
## Five Six Seven Eight Nine
## 0.9590314 0.9383805 0.9439488 0.9319054 0.8323736
## Ten Eleven Twelve Thirteen Fourteen
## 0.9399816 0.9629930 0.9597399 0.9632573 0.9713668
## Fifteen Sixteen Seventeen Eighteen Nineteen
## 0.9661195 0.9553893 0.9499095 0.9450933 0.9510554
## Twenty Twenty.one twenty.two Twenty.three
## 0.9552838 0.9545005 0.9706909 0.9572043
So Both KMO test and barlett test significant.
pc2 <- principal(jharx, nfactors=length(jharx), rotate="none")
pc2
pc3 <- principal(jharx, nfactors=5, rotate="oblimin")
pc3
## Principal Components Analysis
## Call: principal(r = jharx, nfactors = 5, rotate = "oblimin")
## Standardized loadings (pattern matrix) based upon correlation matrix
## TC1 TC5 TC3 TC4 TC2 h2 u2 com
## X -0.04 0.22 0.28 0.12 0.68 0.66 0.34 1.6
## one 0.61 0.03 -0.12 0.37 0.23 0.63 0.37 2.1
## Two 0.09 0.65 0.03 0.08 -0.03 0.55 0.45 1.1
## Three 0.40 0.21 0.06 0.35 -0.01 0.57 0.43 2.6
## Four 0.32 0.40 0.00 0.04 -0.21 0.50 0.50 2.5
## Five 0.36 0.42 -0.02 0.01 0.09 0.48 0.52 2.1
## Six -0.02 0.82 -0.22 0.14 0.09 0.67 0.33 1.2
## Seven -0.04 0.46 0.05 0.56 0.12 0.67 0.33 2.1
## Eight 0.28 -0.10 0.27 0.58 -0.16 0.63 0.37 2.2
## Nine -0.09 -0.16 0.88 0.09 0.08 0.72 0.28 1.1
## Ten 0.07 0.36 0.11 0.28 -0.55 0.66 0.34 2.4
## Eleven 0.04 0.60 0.19 0.12 -0.36 0.69 0.31 2.0
## Twelve 0.13 0.43 0.50 -0.01 -0.02 0.69 0.31 2.1
## Thirteen 0.39 0.30 0.37 -0.04 0.04 0.66 0.34 2.9
## Fourteen 0.36 0.39 0.10 0.11 0.01 0.57 0.43 2.3
## Fifteen 0.35 0.29 0.23 0.06 0.28 0.56 0.44 3.8
## Sixteen 0.69 -0.02 0.13 0.19 -0.03 0.64 0.36 1.2
## Seventeen 0.78 -0.06 0.11 0.04 -0.13 0.68 0.32 1.1
## Eighteen 0.90 0.01 -0.11 -0.05 0.07 0.74 0.26 1.1
## Nineteen 0.76 0.09 -0.07 -0.01 -0.18 0.69 0.31 1.2
## Twenty 0.49 0.27 0.30 -0.14 -0.06 0.67 0.33 2.6
## Twenty.one 0.40 0.49 0.15 -0.18 -0.04 0.69 0.31 2.5
## twenty.two 0.34 0.37 0.33 -0.22 -0.01 0.60 0.40 3.6
## Twenty.three 0.18 0.69 0.07 -0.08 0.10 0.67 0.33 1.2
##
## TC1 TC5 TC3 TC4 TC2
## SS loadings 5.51 4.87 2.21 1.47 1.22
## Proportion Var 0.23 0.20 0.09 0.06 0.05
## Cumulative Var 0.23 0.43 0.52 0.59 0.64
## Proportion Explained 0.36 0.32 0.14 0.10 0.08
## Cumulative Proportion 0.36 0.68 0.82 0.92 1.00
##
## With component correlations of
## TC1 TC5 TC3 TC4 TC2
## TC1 1.00 0.60 0.35 0.25 -0.12
## TC5 0.60 1.00 0.31 0.27 -0.01
## TC3 0.35 0.31 1.00 0.18 0.00
## TC4 0.25 0.27 0.18 1.00 0.02
## TC2 -0.12 -0.01 0.00 0.02 1.00
##
## Mean item complexity = 2
## Test of the hypothesis that 5 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.05
## with the empirical chi square 526.17 with prob < 4.7e-39
##
## Fit based upon off diagonal values = 0.99
print.psych(pc3, cut = 0.3, sort = TRUE)
## Principal Components Analysis
## Call: principal(r = jharx, nfactors = 5, rotate = "oblimin")
## Standardized loadings (pattern matrix) based upon correlation matrix
## item TC1 TC5 TC3 TC4 TC2 h2 u2 com
## Eighteen 19 0.90 0.74 0.26 1.1
## Seventeen 18 0.78 0.68 0.32 1.1
## Nineteen 20 0.76 0.69 0.31 1.2
## Sixteen 17 0.69 0.64 0.36 1.2
## one 2 0.61 0.37 0.63 0.37 2.1
## Twenty 21 0.49 0.67 0.33 2.6
## Three 4 0.40 0.35 0.57 0.43 2.6
## Thirteen 14 0.39 0.30 0.37 0.66 0.34 2.9
## Fifteen 16 0.35 0.56 0.44 3.8
## Six 7 0.82 0.67 0.33 1.2
## Twenty.three 24 0.69 0.67 0.33 1.2
## Two 3 0.65 0.55 0.45 1.1
## Eleven 12 0.60 -0.36 0.69 0.31 2.0
## Twenty.one 22 0.40 0.49 0.69 0.31 2.5
## Five 6 0.36 0.42 0.48 0.52 2.1
## Four 5 0.32 0.40 0.50 0.50 2.5
## Fourteen 15 0.36 0.39 0.57 0.43 2.3
## twenty.two 23 0.34 0.37 0.33 0.60 0.40 3.6
## Nine 10 0.88 0.72 0.28 1.1
## Twelve 13 0.43 0.50 0.69 0.31 2.1
## Eight 9 0.58 0.63 0.37 2.2
## Seven 8 0.46 0.56 0.67 0.33 2.1
## X 1 0.68 0.66 0.34 1.6
## Ten 11 0.36 -0.55 0.66 0.34 2.4
##
## TC1 TC5 TC3 TC4 TC2
## SS loadings 5.51 4.87 2.21 1.47 1.22
## Proportion Var 0.23 0.20 0.09 0.06 0.05
## Cumulative Var 0.23 0.43 0.52 0.59 0.64
## Proportion Explained 0.36 0.32 0.14 0.10 0.08
## Cumulative Proportion 0.36 0.68 0.82 0.92 1.00
##
## With component correlations of
## TC1 TC5 TC3 TC4 TC2
## TC1 1.00 0.60 0.35 0.25 -0.12
## TC5 0.60 1.00 0.31 0.27 -0.01
## TC3 0.35 0.31 1.00 0.18 0.00
## TC4 0.25 0.27 0.18 1.00 0.02
## TC2 -0.12 -0.01 0.00 0.02 1.00
##
## Mean item complexity = 2
## Test of the hypothesis that 5 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.05
## with the empirical chi square 526.17 with prob < 4.7e-39
##
## Fit based upon off diagonal values = 0.99
A principal components analysis (PCA) was conducted on the 23 items with orthog-onal rotation (varimax). The Kaiser–Meyer–Olkin measure verified the sampling adequacy for the analysis KMO = .93 (‘superb’ according to Kaiser, 1974), and all KMO values for individual items were > .77, which is well above the acceptablelimit of .5. Bartlett’s test of sphericity, χ2 (253) = 19,334, p < .001, indicated that correlations between items were sufficiently large for PCA. An initial analysis wasrun to obtain eigenvalues for each component in the data. Four components hadeigenvalues over Kaiser’s criterion of 1 and in combination explained 61% of the variance. The scree plot was slightly ambiguous and showed inflexions that wouldjustify retaining both two and four components. Given the large sample size, and the convergence of the scree plot and Kaiser’s criterion on four components, five components were retained in the final analysis. Table shows the factor loadings after rotation. The items that cluster on the same components suggest that component 1 represents a fear Quality of Care, Component 2 represents accessibility Component 3 represents environment , other domains are less clearly marked and there is a correlation and cross-talk between questions in domains.
Let us calculate cronbach alpha for each subscale
Interpersonal = jhar %>% dplyr::select(one:Four)
Accessibilty = jhar %>% dplyr::select(Five:Nine)
physical_Environment = jhar %>% dplyr::select(Ten:Thirteen)
Availability = jhar %>% dplyr::select(Fourteen:Fifteen)
Quality = jhar %>% dplyr::select(Sixteen:Twenty.three)
Now let us run cronbach alpha test
keys = c(1, 1, 1, 1, 1, 1, 1)
summary(alpha(Interpersonal))$raw_alpha
##
## Reliability analysis
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd
## 0.79 0.79 0.75 0.48 3.8 0.017 4.6 0.48
## [1] 0.7902073
summary(alpha(Accessibilty))$raw_alpha
##
## Reliability analysis
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd
## 0.65 0.68 0.66 0.3 2.1 0.028 4.3 0.54
## [1] 0.6545357
summary(alpha(physical_Environment))$raw_alpha
##
## Reliability analysis
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd
## 0.79 0.81 0.77 0.51 4.2 0.016 4.4 0.62
## [1] 0.7851515
summary(alpha(Availability))$raw_alpha
##
## Reliability analysis
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd
## 0.65 0.66 0.49 0.49 1.9 0.034 4.4 0.65
## [1] 0.6454973
summary(alpha(Quality))$raw_alpha
##
## Reliability analysis
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd
## 0.91 0.91 0.9 0.55 9.8 0.0069 4.5 0.5
## [1] 0.9077085
The cronbach alpha for Interpersonal,Accesibilty,physical Environment , Avilability and Quality sub- scales are 0.79,0.68,0.81,0.66,0.91 respectively. Thus except for accessibility and availability subscales which had lower than 0.7 recommended limit of cronbach alpha,other subscales had nice reliability and correlation implying the accessibilty and availability subscales need to be worded more precisely for better reliability
## Domains test_retest_reliability cronbach_alpha
## 1 Interpersonal 0.74 0.79
## 2 Accessibilty 0.60 0.68
## 3 physical_Environment 0.78 0.81
## 4 Availability 0.58 0.66
## 5 Quality 0.82 0.91