## Warning: Missing column names filled in: 'X15' [15], 'X16' [16],
## 'X17' [17], 'X19' [19], 'X20' [20]
## Parsed with column specification:
## cols(
## .default = col_double(),
## aluno = col_character(),
## X15 = col_character(),
## X17 = col_logical(),
## X19 = col_logical(),
## X20 = col_character()
## )
## See spec(...) for full column specifications.
O dataset contém um total de 11 variáveis, todas binárias.
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 0.7434 0.6456 0.5568 0.4982 0.47040 0.45063 0.4129
## Proportion of Variance 0.2236 0.1687 0.1255 0.1005 0.08955 0.08218 0.0690
## Cumulative Proportion 0.2236 0.3923 0.5178 0.6183 0.70783 0.79001 0.8590
## PC8 PC9 PC10 PC11 PC12 PC13
## Standard deviation 0.37029 0.27778 0.2501 0.22610 0.14307 8.828e-17
## Proportion of Variance 0.05549 0.03123 0.0253 0.02069 0.00828 0.000e+00
## Cumulative Proportion 0.91450 0.94572 0.9710 0.99172 1.00000 1.000e+00
As amostras ficaram separadas em 2 grupos bem definidos. PC1 consegue explicar 24% dos dados, enquanto o PC2 19%. Ambos são valores bem baixos.
Primeiro vamos tentar identificar qual seria um bom número de clusters.
k = 4 parece bom.
## Joining, by = c("X.1", "aluno")
Agora, queremos observar se os clusters gerados pelo k-means ficaram numa distribuição coerente quando plotamos com o PCA.
Os pontos nao ficaram proximos dos seus semelhantes de acordo com o k-means…
[Essa eu nao sei como explicar]
##
## Call: glm(formula = X.1 ~ hasTests + hasDoc + hasController + hasFacade +
## useInterface + useInheritance + equals + hashCode + useException +
## useAbstractClass + usedHashMap + usedHashSet + usedArrayList,
## data = provas)
##
## Coefficients:
## (Intercept) hasTests hasDoc hasController
## 4.015307 3.009464 0.207925 0.005925
## hasFacade useInterface useInheritance equals
## -0.586249 1.501690 0.031041 0.141836
## hashCode useException useAbstractClass usedHashMap
## NA 0.319071 1.007408 0.305133
## usedHashSet usedArrayList
## -0.828431 0.204504
##
## Degrees of Freedom: 111 Total (i.e. Null); 99 Residual
## Null Deviance: 557.7
## Residual Deviance: 214.6 AIC: 418.7
##
## Call:
## glm(formula = X.1 ~ hasTests + hasDoc + hasController + hasFacade +
## useInterface + useInheritance + equals + hashCode + useException +
## useAbstractClass + usedHashMap + usedHashSet + usedArrayList,
## data = provas)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.6880 -0.8456 0.2750 0.9234 3.3355
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.015307 0.676163 5.938 4.28e-08 ***
## hasTests 3.009464 0.325491 9.246 4.83e-15 ***
## hasDoc 0.207925 0.616683 0.337 0.7367
## hasController 0.005925 0.299420 0.020 0.9843
## hasFacade -0.586249 0.306974 -1.910 0.0591 .
## useInterface 1.501690 0.327877 4.580 1.36e-05 ***
## useInheritance 0.031041 0.381990 0.081 0.9354
## equals 0.141836 0.297476 0.477 0.6346
## hashCode NA NA NA NA
## useException 0.319071 0.314363 1.015 0.3126
## useAbstractClass 1.007408 0.452454 2.227 0.0282 *
## usedHashMap 0.305133 0.570824 0.535 0.5942
## usedHashSet -0.828431 0.844261 -0.981 0.3289
## usedArrayList 0.204504 0.506349 0.404 0.6872
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 2.167412)
##
## Null deviance: 557.68 on 111 degrees of freedom
## Residual deviance: 214.57 on 99 degrees of freedom
## AIC: 418.66
##
## Number of Fisher Scoring iterations: 2
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) 2.6900518 5.34056247
## hasTests 2.3715136 3.64741445
## hasDoc -1.0007517 1.41660206
## hasController -0.5809281 0.59277816
## hasFacade -1.1879056 0.01540852
## useInterface 0.8590636 2.14431596
## useInheritance -0.7176455 0.77972734
## equals -0.4412062 0.72487861
## hashCode NA NA
## useException -0.2970691 0.93521078
## useAbstractClass 0.1206139 1.89420281
## usedHashMap -0.8136607 1.42392635
## usedHashSet -2.4831530 0.82629041
## usedArrayList -0.7879211 1.19692877
## eigenvalue variance.percent cumulative.variance.percent
## Min. :0.00000 Min. : 0.000 Min. : 18.51
## 1st Qu.:0.04736 1st Qu.: 4.736 1st Qu.: 55.63
## Median :0.06833 Median : 6.833 Median : 79.23
## Mean :0.07692 Mean : 7.692 Mean : 72.26
## 3rd Qu.:0.09904 3rd Qu.: 9.904 3rd Qu.: 95.75
## Max. :0.18515 Max. :18.515 Max. :100.00
## Dim 1 Dim 2 Dim 3 Dim 4
## hasFacade_FALSE 0.2067123 -0.3990625 0.005748542 -0.38302131
## hasFacade_TRUE -0.2142291 0.4135738 -0.005957580 0.39694936
## hasController_FALSE 0.1842612 -0.4709442 -0.329187897 -0.04362486
## hasController_TRUE -0.1433143 0.3662899 0.256035031 0.03393045
## useInheritance_FALSE 0.3009261 -0.6126740 0.081698399 -0.33400793
## useInheritance_TRUE -0.3731484 0.7597157 -0.101306015 0.41416984
## Dim 5
## hasFacade_FALSE 0.2274755
## hasFacade_TRUE -0.2357474
## hasController_FALSE 0.5147276
## hasController_TRUE -0.4003437
## useInheritance_FALSE -0.2254024
## useInheritance_TRUE 0.2794989
## Dim 1 Dim 2 Dim 3 Dim 4
## hasFacade_FALSE 0.04428378 0.1650418 0.0000342474 0.152040062
## hasFacade_TRUE 0.04428378 0.1650418 0.0000342474 0.152040062
## hasController_FALSE 0.02640726 0.1725021 0.0842836333 0.001480211
## hasController_TRUE 0.02640726 0.1725021 0.0842836333 0.001480211
## useInheritance_FALSE 0.11229011 0.4654580 0.0082765393 0.138336012
## useInheritance_TRUE 0.11229011 0.4654580 0.0082765393 0.138336012
## Dim 5
## hasFacade_FALSE 0.05362677
## hasFacade_TRUE 0.05362677
## hasController_FALSE 0.20606791
## hasController_TRUE 0.20606791
## useInheritance_FALSE 0.06299972
## useInheritance_TRUE 0.06299972
## Dim 1 Dim 2 Dim 3 Dim 4 Dim 5
## hasFacade_FALSE 0.9035007 4.308584 0.001015214 5.79897956 2.235946
## hasFacade_TRUE 0.9363552 4.465260 0.001052131 6.00985154 2.317253
## hasController_FALSE 0.6171418 5.158375 2.861877480 0.06466884 9.841650
## hasController_TRUE 0.4799992 4.012069 2.225904707 0.05029799 7.654617
## useInheritance_FALSE 2.0827284 11.046594 0.223041679 4.79662876 2.387952
## useInheritance_TRUE 2.5825832 13.697777 0.276571682 5.94781966 2.961061