data_605_final
Final Problem 1
You’ll verify for yourself that PageRank works by performing calculations on a small universe of web pages. Let’s use the 6 page universe that we had in the previous discussion For this directed graph, perform the following calculations in R.
The solution of this problem and my entire understanding of how the Page Rank algorithm works, comes from this excellent tutorial, The algorithm that started google.
(a)
Form the A matrix. Then, introduce decay and form the B matrix as we did in the course notes. (5 Points)
# get transition matrix A
url_link = c(0, 1, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0,
1, 1, 0, 0, 1, 0,
0, 0, 0, 0, 1, 1,
0, 0, 0, 1, 0, 1,
0, 0, 0, 1, 0, 0)
tm <- matrix(data = url_link,
byrow = TRUE,
nrow = 6)
rownames(tm) <- 1:6
colnames(tm) <- 1:6
score = tm %>%
as.data.frame %>%
rowSums() %>%
as.vector()
# so that we can get 0 instead of NaN in the final transition matrix
score = ifelse(score == 0, 1, score)
A = round(tm / score, 3)
print("Matrix A:")## [1] "Matrix A:"
A ## 1 2 3 4 5 6
## 1 0.000 0.500 0.5 0.0 0.000 0.0
## 2 0.000 0.000 0.0 0.0 0.000 0.0
## 3 0.333 0.333 0.0 0.0 0.333 0.0
## 4 0.000 0.000 0.0 0.0 0.500 0.5
## 5 0.000 0.000 0.0 0.5 0.000 0.5
## 6 0.000 0.000 0.0 1.0 0.000 0.0
# introduce decay and form matrix B
for(i in 1:nrow(A)){
x <- sum(A[i, ])
for(j in 1:ncol(A)){
if(x >0){
A[i, j] <- A[i, j] / x
} else {
A[i, j] <- 1/nrow(A)
}
}
}
decay = 0.85
B <- decay * A + (1 - decay)/nrow(A)
print("Matrix B:")## [1] "Matrix B:"
B## 1 2 3 4 5 6
## 1 0.0250000 0.4500000 0.4500000 0.0250000 0.0250000 0.0250000
## 2 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
## 3 0.3083333 0.3083333 0.0250000 0.0250000 0.3083333 0.0250000
## 4 0.0250000 0.0250000 0.0250000 0.0250000 0.4500000 0.4500000
## 5 0.0250000 0.0250000 0.0250000 0.4500000 0.0250000 0.4500000
## 6 0.0250000 0.0250000 0.0250000 0.8750000 0.0250000 0.0250000
(b)
Start with a uniform rank vector r and perform power iterations on B till convergence. That is, compute the solution r = Bn × r. Attempt this for a sufficiently large n so that r actually converges. (5 Points)
# convergence - method 1
n = 100
purrr::reduce(rep(list(B), n), `%*%`)## 1 2 3 4 5 6
## 1 0.05170475 0.07367926 0.05741241 0.3487037 0.1999038 0.2685961
## 2 0.05170475 0.07367926 0.05741241 0.3487037 0.1999038 0.2685961
## 3 0.05170475 0.07367926 0.05741241 0.3487037 0.1999038 0.2685961
## 4 0.05170475 0.07367926 0.05741241 0.3487037 0.1999038 0.2685961
## 5 0.05170475 0.07367926 0.05741241 0.3487037 0.1999038 0.2685961
## 6 0.05170475 0.07367926 0.05741241 0.3487037 0.1999038 0.2685961
# convergence - method 2
i = 1
initial_vector <- matrix(data = 1/nrow(B), nrow = nrow(B))
init = t(B) %*% initial_vector
while(i <=n){
convergence_matrix = t(B) %*% init
init = convergence_matrix
i = i+1
}
convergence_matrix## [,1]
## 1 0.05170475
## 2 0.07367926
## 3 0.05741241
## 4 0.34870369
## 5 0.19990381
## 6 0.26859608
I tried two above methods to calculate the convergence based on 100 iterations on B, both reached the same result. At the end, people are most likely going to end up in site 4 (35%) followed by site 6 (27%) after many numbers of random clicks.
(c)
Compute the eigen-decomposition of B and verify that you indeed get an eigenvalue of 1 as the largest eigenvalue and that its corresponding eigenvector is the same vector that you obtained in the previous power iteration method. Further, this eigenvector has all positive entries and it sums to 1. (10 points)
# convergence - method 3 normalized left eigenvector
eig = eigen(t(B))
eig## eigen() decomposition
## $values
## [1] 1.00000000+0i 0.57619235+0i -0.42500000+0i -0.42500000-0i -0.34991524+0i
## [6] -0.08461044+0i
##
## $vectors
## [,1] [,2] [,3]
## [1,] 0.1044385+0i 0.2931457+0i 2.486934e-15+0.0000e+00i
## [2,] 0.1488249+0i 0.5093703+0i -8.528385e-16-6.9832e-23i
## [3,] 0.1159674+0i 0.3414619+0i -1.930646e-15-0.0000e+00i
## [4,] 0.7043472+0i -0.5890805+0i -7.071068e-01+0.0000e+00i
## [5,] 0.4037861+0i -0.1413606+0i 7.071068e-01+0.0000e+00i
## [6,] 0.5425377+0i -0.4135367+0i 0.000000e+00-1.7058e-08i
## [,4] [,5] [,6]
## [1,] 2.486934e-15-0.0000e+00i -0.06471710+0i -0.212296003+0i
## [2,] -8.528385e-16+6.9832e-23i 0.01388698+0i 0.854071294+0i
## [3,] -1.930646e-15+0.0000e+00i 0.07298180+0i -0.363638739+0i
## [4,] -7.071068e-01+0.0000e+00i -0.66058664+0i 0.018399984+0i
## [5,] 7.071068e-01-0.0000e+00i 0.73761812+0i -0.304719509+0i
## [6,] 0.000000e+00+1.7058e-08i -0.09918316+0i 0.008182973+0i
# solution
solution = as.numeric(eigen(t(B))$vectors[,1]/sum(eigen(t(B))$vectors[,1]))
solution## [1] 0.05170475 0.07367926 0.05741241 0.34870369 0.19990381 0.26859608
print(paste0("The sum of this solution is equal to ", sum(solution)))## [1] "The sum of this solution is equal to 1"
I presented the third method of calculating the convergence by computing the eigen-decomposition of B. Indeed, we get the eigenvalue of 1 as the largest eigenvalue and that its corresponding eigenvector is the same vector that I have obtained in above methods. Furthermore, this eigenvector has all positive entries and it sums to 1.
(d)
Use the graph package in R and its page.rank method to compute the Page Rank of the graph as given in A. Note that you don’t need to apply decay. The package starts with a connected graph and applies decay internally. Verify that you do get the same PageRank vector as the two approaches above. (10 points)
g <- graph.adjacency(tm)
plot(g)page.rank(g)## $vector
## 1 2 3 4 5 6
## 0.05170475 0.07367926 0.05741241 0.34870369 0.19990381 0.26859608
##
## $value
## [1] 1
##
## $options
## NULL
data.frame(url = 1:6) %>%
dplyr::mutate(method_1 = purrr::reduce(rep(list(B), n), `%*%`) %>% as.data.frame() %>% .[1, ] %>% unlist(),
method_2 = convergence_matrix %>% as.vector(),
method_3 = solution,
`method_4 (page.rank)` = page.rank(g)$vector)## url method_1 method_2 method_3 method_4 (page.rank)
## 1 1 0.05170475 0.05170475 0.05170475 0.05170475
## 2 2 0.07367926 0.07367926 0.07367926 0.07367926
## 3 3 0.05741241 0.05741241 0.05741241 0.05741241
## 4 4 0.34870369 0.34870369 0.34870369 0.34870369
## 5 5 0.19990381 0.19990381 0.19990381 0.19990381
## 6 6 0.26859608 0.26859608 0.26859608 0.26859608
I tried 4 different methods (including the one from the graph package). I have achieved the consistent results for the PageRank vector, aka convergence for landing on the six urls.
Final Problem 2
(a)
Using the training.csv file, plot representations of the first 10 images to understand the data format. Go ahead and divide all pixels by 255 to produce values between 0 and 1. (This is equivalent to min-max scaling.) (5 points)
trainDf <- readit::readit("data/digit-recognizer/train.csv")
testDf <- readit::readit("data/digit-recognizer/test.csv")head(trainDf) %>% kable() %>% scroll_box()| label | pixel0 | pixel1 | pixel2 | pixel3 | pixel4 | pixel5 | pixel6 | pixel7 | pixel8 | pixel9 | pixel10 | pixel11 | pixel12 | pixel13 | pixel14 | pixel15 | pixel16 | pixel17 | pixel18 | pixel19 | pixel20 | pixel21 | pixel22 | pixel23 | pixel24 | pixel25 | pixel26 | pixel27 | pixel28 | pixel29 | pixel30 | pixel31 | pixel32 | pixel33 | pixel34 | pixel35 | pixel36 | pixel37 | pixel38 | pixel39 | pixel40 | pixel41 | pixel42 | pixel43 | pixel44 | pixel45 | pixel46 | pixel47 | pixel48 | pixel49 | pixel50 | pixel51 | pixel52 | pixel53 | pixel54 | pixel55 | pixel56 | pixel57 | pixel58 | pixel59 | pixel60 | pixel61 | pixel62 | pixel63 | pixel64 | pixel65 | pixel66 | pixel67 | pixel68 | pixel69 | pixel70 | pixel71 | pixel72 | pixel73 | pixel74 | pixel75 | pixel76 | pixel77 | pixel78 | pixel79 | pixel80 | pixel81 | pixel82 | pixel83 | pixel84 | pixel85 | pixel86 | pixel87 | pixel88 | pixel89 | pixel90 | pixel91 | pixel92 | pixel93 | pixel94 | pixel95 | pixel96 | pixel97 | pixel98 | pixel99 | pixel100 | pixel101 | pixel102 | pixel103 | pixel104 | pixel105 | pixel106 | pixel107 | pixel108 | pixel109 | pixel110 | pixel111 | pixel112 | pixel113 | pixel114 | pixel115 | pixel116 | pixel117 | pixel118 | pixel119 | pixel120 | pixel121 | pixel122 | pixel123 | pixel124 | pixel125 | pixel126 | pixel127 | pixel128 | pixel129 | pixel130 | pixel131 | pixel132 | pixel133 | pixel134 | pixel135 | pixel136 | pixel137 | pixel138 | pixel139 | pixel140 | pixel141 | pixel142 | pixel143 | pixel144 | pixel145 | pixel146 | pixel147 | pixel148 | pixel149 | pixel150 | pixel151 | pixel152 | pixel153 | pixel154 | pixel155 | pixel156 | pixel157 | pixel158 | pixel159 | pixel160 | pixel161 | pixel162 | pixel163 | pixel164 | pixel165 | pixel166 | pixel167 | pixel168 | pixel169 | pixel170 | pixel171 | pixel172 | pixel173 | pixel174 | pixel175 | pixel176 | pixel177 | pixel178 | pixel179 | pixel180 | pixel181 | pixel182 | pixel183 | pixel184 | pixel185 | pixel186 | pixel187 | pixel188 | pixel189 | pixel190 | pixel191 | pixel192 | pixel193 | pixel194 | pixel195 | pixel196 | pixel197 | pixel198 | pixel199 | pixel200 | pixel201 | pixel202 | pixel203 | pixel204 | pixel205 | pixel206 | pixel207 | pixel208 | pixel209 | pixel210 | pixel211 | pixel212 | pixel213 | pixel214 | pixel215 | pixel216 | pixel217 | pixel218 | pixel219 | pixel220 | pixel221 | pixel222 | pixel223 | pixel224 | pixel225 | pixel226 | pixel227 | pixel228 | pixel229 | pixel230 | pixel231 | pixel232 | pixel233 | pixel234 | pixel235 | pixel236 | pixel237 | pixel238 | pixel239 | pixel240 | pixel241 | pixel242 | pixel243 | pixel244 | pixel245 | pixel246 | pixel247 | pixel248 | pixel249 | pixel250 | pixel251 | pixel252 | pixel253 | pixel254 | pixel255 | pixel256 | pixel257 | pixel258 | pixel259 | pixel260 | pixel261 | pixel262 | pixel263 | pixel264 | pixel265 | pixel266 | pixel267 | pixel268 | pixel269 | pixel270 | pixel271 | pixel272 | pixel273 | pixel274 | pixel275 | pixel276 | pixel277 | pixel278 | pixel279 | pixel280 | pixel281 | pixel282 | pixel283 | pixel284 | pixel285 | pixel286 | pixel287 | pixel288 | pixel289 | pixel290 | pixel291 | pixel292 | pixel293 | pixel294 | pixel295 | pixel296 | pixel297 | pixel298 | pixel299 | pixel300 | pixel301 | pixel302 | pixel303 | pixel304 | pixel305 | pixel306 | pixel307 | pixel308 | pixel309 | pixel310 | pixel311 | pixel312 | pixel313 | pixel314 | pixel315 | pixel316 | pixel317 | pixel318 | pixel319 | pixel320 | pixel321 | pixel322 | pixel323 | pixel324 | pixel325 | pixel326 | pixel327 | pixel328 | pixel329 | pixel330 | pixel331 | pixel332 | pixel333 | pixel334 | pixel335 | pixel336 | pixel337 | pixel338 | pixel339 | pixel340 | pixel341 | pixel342 | pixel343 | pixel344 | pixel345 | pixel346 | pixel347 | pixel348 | pixel349 | pixel350 | pixel351 | pixel352 | pixel353 | pixel354 | pixel355 | pixel356 | pixel357 | pixel358 | pixel359 | pixel360 | pixel361 | pixel362 | pixel363 | pixel364 | pixel365 | pixel366 | pixel367 | pixel368 | pixel369 | pixel370 | pixel371 | pixel372 | pixel373 | pixel374 | pixel375 | pixel376 | pixel377 | pixel378 | pixel379 | pixel380 | pixel381 | pixel382 | pixel383 | pixel384 | pixel385 | pixel386 | pixel387 | pixel388 | pixel389 | pixel390 | pixel391 | pixel392 | pixel393 | pixel394 | pixel395 | pixel396 | pixel397 | pixel398 | pixel399 | pixel400 | pixel401 | pixel402 | pixel403 | pixel404 | pixel405 | pixel406 | pixel407 | pixel408 | pixel409 | pixel410 | pixel411 | pixel412 | pixel413 | pixel414 | pixel415 | pixel416 | pixel417 | pixel418 | pixel419 | pixel420 | pixel421 | pixel422 | pixel423 | pixel424 | pixel425 | pixel426 | pixel427 | pixel428 | pixel429 | pixel430 | pixel431 | pixel432 | pixel433 | pixel434 | pixel435 | pixel436 | pixel437 | pixel438 | pixel439 | pixel440 | pixel441 | pixel442 | pixel443 | pixel444 | pixel445 | pixel446 | pixel447 | pixel448 | pixel449 | pixel450 | pixel451 | pixel452 | pixel453 | pixel454 | pixel455 | pixel456 | pixel457 | pixel458 | pixel459 | pixel460 | pixel461 | pixel462 | pixel463 | pixel464 | pixel465 | pixel466 | pixel467 | pixel468 | pixel469 | pixel470 | pixel471 | pixel472 | pixel473 | pixel474 | pixel475 | pixel476 | pixel477 | pixel478 | pixel479 | pixel480 | pixel481 | pixel482 | pixel483 | pixel484 | pixel485 | pixel486 | pixel487 | pixel488 | pixel489 | pixel490 | pixel491 | pixel492 | pixel493 | pixel494 | pixel495 | pixel496 | pixel497 | pixel498 | pixel499 | pixel500 | pixel501 | pixel502 | pixel503 | pixel504 | pixel505 | pixel506 | pixel507 | pixel508 | pixel509 | pixel510 | pixel511 | pixel512 | pixel513 | pixel514 | pixel515 | pixel516 | pixel517 | pixel518 | pixel519 | pixel520 | pixel521 | pixel522 | pixel523 | pixel524 | pixel525 | pixel526 | pixel527 | pixel528 | pixel529 | pixel530 | pixel531 | pixel532 | pixel533 | pixel534 | pixel535 | pixel536 | pixel537 | pixel538 | pixel539 | pixel540 | pixel541 | pixel542 | pixel543 | pixel544 | pixel545 | pixel546 | pixel547 | pixel548 | pixel549 | pixel550 | pixel551 | pixel552 | pixel553 | pixel554 | pixel555 | pixel556 | pixel557 | pixel558 | pixel559 | pixel560 | pixel561 | pixel562 | pixel563 | pixel564 | pixel565 | pixel566 | pixel567 | pixel568 | pixel569 | pixel570 | pixel571 | pixel572 | pixel573 | pixel574 | pixel575 | pixel576 | pixel577 | pixel578 | pixel579 | pixel580 | pixel581 | pixel582 | pixel583 | pixel584 | pixel585 | pixel586 | pixel587 | pixel588 | pixel589 | pixel590 | pixel591 | pixel592 | pixel593 | pixel594 | pixel595 | pixel596 | pixel597 | pixel598 | pixel599 | pixel600 | pixel601 | pixel602 | pixel603 | pixel604 | pixel605 | pixel606 | pixel607 | pixel608 | pixel609 | pixel610 | pixel611 | pixel612 | pixel613 | pixel614 | pixel615 | pixel616 | pixel617 | pixel618 | pixel619 | pixel620 | pixel621 | pixel622 | pixel623 | pixel624 | pixel625 | pixel626 | pixel627 | pixel628 | pixel629 | pixel630 | pixel631 | pixel632 | pixel633 | pixel634 | pixel635 | pixel636 | pixel637 | pixel638 | pixel639 | pixel640 | pixel641 | pixel642 | pixel643 | pixel644 | pixel645 | pixel646 | pixel647 | pixel648 | pixel649 | pixel650 | pixel651 | pixel652 | pixel653 | pixel654 | pixel655 | pixel656 | pixel657 | pixel658 | pixel659 | pixel660 | pixel661 | pixel662 | pixel663 | pixel664 | pixel665 | pixel666 | pixel667 | pixel668 | pixel669 | pixel670 | pixel671 | pixel672 | pixel673 | pixel674 | pixel675 | pixel676 | pixel677 | pixel678 | pixel679 | pixel680 | pixel681 | pixel682 | pixel683 | pixel684 | pixel685 | pixel686 | pixel687 | pixel688 | pixel689 | pixel690 | pixel691 | pixel692 | pixel693 | pixel694 | pixel695 | pixel696 | pixel697 | pixel698 | pixel699 | pixel700 | pixel701 | pixel702 | pixel703 | pixel704 | pixel705 | pixel706 | pixel707 | pixel708 | pixel709 | pixel710 | pixel711 | pixel712 | pixel713 | pixel714 | pixel715 | pixel716 | pixel717 | pixel718 | pixel719 | pixel720 | pixel721 | pixel722 | pixel723 | pixel724 | pixel725 | pixel726 | pixel727 | pixel728 | pixel729 | pixel730 | pixel731 | pixel732 | pixel733 | pixel734 | pixel735 | pixel736 | pixel737 | pixel738 | pixel739 | pixel740 | pixel741 | pixel742 | pixel743 | pixel744 | pixel745 | pixel746 | pixel747 | pixel748 | pixel749 | pixel750 | pixel751 | pixel752 | pixel753 | pixel754 | pixel755 | pixel756 | pixel757 | pixel758 | pixel759 | pixel760 | pixel761 | pixel762 | pixel763 | pixel764 | pixel765 | pixel766 | pixel767 | pixel768 | pixel769 | pixel770 | pixel771 | pixel772 | pixel773 | pixel774 | pixel775 | pixel776 | pixel777 | pixel778 | pixel779 | pixel780 | pixel781 | pixel782 | pixel783 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 188 | 255 | 94 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 191 | 250 | 253 | 93 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 123 | 248 | 253 | 167 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 247 | 253 | 208 | 13 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 29 | 207 | 253 | 235 | 77 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 54 | 209 | 253 | 253 | 88 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 93 | 254 | 253 | 238 | 170 | 17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 23 | 210 | 254 | 253 | 159 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 16 | 209 | 253 | 254 | 240 | 81 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 27 | 253 | 253 | 254 | 13 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 206 | 254 | 254 | 198 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 168 | 253 | 253 | 196 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 203 | 253 | 248 | 76 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 22 | 188 | 253 | 245 | 93 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 103 | 253 | 253 | 191 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 89 | 240 | 253 | 195 | 25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 15 | 220 | 253 | 253 | 80 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 94 | 253 | 253 | 253 | 94 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 89 | 251 | 253 | 250 | 131 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 214 | 218 | 95 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 18 | 30 | 137 | 137 | 192 | 86 | 72 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13 | 86 | 250 | 254 | 254 | 254 | 254 | 217 | 246 | 151 | 32 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 16 | 179 | 254 | 254 | 254 | 254 | 254 | 254 | 254 | 254 | 254 | 231 | 54 | 15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 72 | 254 | 254 | 254 | 254 | 254 | 254 | 254 | 254 | 254 | 254 | 254 | 254 | 104 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 61 | 191 | 254 | 254 | 254 | 254 | 254 | 109 | 83 | 199 | 254 | 254 | 254 | 254 | 243 | 85 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 172 | 254 | 254 | 254 | 202 | 147 | 147 | 45 | 0 | 11 | 29 | 200 | 254 | 254 | 254 | 171 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 174 | 254 | 254 | 89 | 67 | 0 | 0 | 0 | 0 | 0 | 0 | 128 | 252 | 254 | 254 | 212 | 76 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 47 | 254 | 254 | 254 | 29 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 83 | 254 | 254 | 254 | 153 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 254 | 254 | 240 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 25 | 240 | 254 | 254 | 153 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 64 | 254 | 254 | 186 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 166 | 254 | 254 | 224 | 12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 14 | 232 | 254 | 254 | 254 | 29 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 75 | 254 | 254 | 254 | 17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 18 | 254 | 254 | 254 | 254 | 29 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 48 | 254 | 254 | 254 | 17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 163 | 254 | 254 | 254 | 29 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 48 | 254 | 254 | 254 | 17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 94 | 254 | 254 | 254 | 200 | 12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 16 | 209 | 254 | 254 | 150 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 15 | 206 | 254 | 254 | 254 | 202 | 66 | 0 | 0 | 0 | 0 | 0 | 21 | 161 | 254 | 254 | 245 | 31 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 60 | 212 | 254 | 254 | 254 | 194 | 48 | 48 | 34 | 41 | 48 | 209 | 254 | 254 | 254 | 171 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 86 | 243 | 254 | 254 | 254 | 254 | 254 | 233 | 243 | 254 | 254 | 254 | 254 | 254 | 86 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 114 | 254 | 254 | 254 | 254 | 254 | 254 | 254 | 254 | 254 | 254 | 239 | 86 | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13 | 182 | 254 | 254 | 254 | 254 | 254 | 254 | 254 | 254 | 243 | 70 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 76 | 146 | 254 | 255 | 254 | 255 | 146 | 19 | 15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 141 | 139 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 254 | 254 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 254 | 254 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 254 | 254 | 106 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 254 | 254 | 184 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 254 | 254 | 184 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 254 | 254 | 184 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 185 | 254 | 184 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 89 | 254 | 184 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 146 | 254 | 184 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 254 | 254 | 184 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 254 | 254 | 184 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 254 | 254 | 184 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 254 | 254 | 184 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 254 | 254 | 184 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 156 | 254 | 254 | 184 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 185 | 255 | 255 | 184 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 185 | 254 | 254 | 184 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 185 | 254 | 254 | 184 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 63 | 254 | 254 | 62 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 220 | 179 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 77 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 28 | 247 | 17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 27 | 202 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 242 | 155 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 27 | 254 | 63 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 160 | 207 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 27 | 254 | 65 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 127 | 254 | 21 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 239 | 65 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 77 | 254 | 21 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 195 | 65 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 70 | 254 | 21 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 195 | 142 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 56 | 251 | 21 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 195 | 227 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 222 | 153 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 120 | 240 | 13 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 67 | 251 | 40 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 94 | 255 | 69 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 234 | 184 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 19 | 245 | 69 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 234 | 169 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 199 | 182 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 154 | 205 | 4 | 0 | 0 | 26 | 72 | 128 | 203 | 208 | 254 | 254 | 131 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 61 | 254 | 129 | 113 | 186 | 245 | 251 | 189 | 75 | 56 | 136 | 254 | 73 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 15 | 216 | 233 | 233 | 159 | 104 | 52 | 0 | 0 | 0 | 38 | 254 | 73 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 18 | 254 | 73 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 18 | 254 | 73 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 206 | 106 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 186 | 159 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 209 | 101 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 25 | 130 | 155 | 254 | 254 | 254 | 157 | 30 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 103 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 114 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 11 | 208 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 107 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 31 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 215 | 101 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 23 | 210 | 253 | 253 | 253 | 248 | 161 | 222 | 222 | 246 | 253 | 253 | 253 | 253 | 253 | 39 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 136 | 253 | 253 | 253 | 229 | 77 | 0 | 0 | 0 | 70 | 218 | 253 | 253 | 253 | 253 | 215 | 91 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 214 | 253 | 253 | 253 | 195 | 0 | 0 | 0 | 0 | 0 | 104 | 224 | 253 | 253 | 253 | 253 | 215 | 29 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 116 | 253 | 253 | 253 | 247 | 75 | 0 | 0 | 0 | 0 | 0 | 0 | 26 | 200 | 253 | 253 | 253 | 253 | 216 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 254 | 253 | 253 | 253 | 195 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 26 | 200 | 253 | 253 | 253 | 253 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 254 | 253 | 253 | 253 | 99 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 25 | 231 | 253 | 253 | 253 | 36 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 254 | 253 | 253 | 253 | 99 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 223 | 253 | 253 | 253 | 129 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 254 | 253 | 253 | 253 | 99 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 127 | 253 | 253 | 253 | 129 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 254 | 253 | 253 | 253 | 99 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 139 | 253 | 253 | 253 | 90 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 254 | 253 | 253 | 253 | 99 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 78 | 248 | 253 | 253 | 253 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 254 | 253 | 253 | 253 | 216 | 34 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 33 | 152 | 253 | 253 | 253 | 107 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 206 | 253 | 253 | 253 | 253 | 140 | 0 | 0 | 0 | 0 | 0 | 30 | 139 | 234 | 253 | 253 | 253 | 154 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 16 | 205 | 253 | 253 | 253 | 250 | 208 | 106 | 106 | 106 | 200 | 237 | 253 | 253 | 253 | 253 | 209 | 22 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 82 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 209 | 22 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 91 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 253 | 213 | 90 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 18 | 129 | 208 | 253 | 253 | 253 | 253 | 159 | 129 | 90 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 141 | 202 | 254 | 193 | 44 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 165 | 254 | 179 | 163 | 249 | 244 | 72 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 135 | 254 | 150 | 0 | 0 | 189 | 254 | 243 | 31 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 82 | 248 | 209 | 5 | 0 | 0 | 164 | 236 | 254 | 115 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 211 | 254 | 58 | 0 | 0 | 0 | 0 | 33 | 230 | 212 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 119 | 254 | 156 | 3 | 0 | 0 | 0 | 0 | 18 | 230 | 254 | 33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 10 | 212 | 254 | 35 | 0 | 0 | 0 | 0 | 0 | 33 | 254 | 254 | 33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 116 | 254 | 154 | 3 | 0 | 0 | 0 | 0 | 0 | 33 | 254 | 254 | 33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 124 | 254 | 115 | 0 | 0 | 0 | 0 | 0 | 0 | 160 | 254 | 239 | 23 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 203 | 254 | 35 | 0 | 0 | 0 | 0 | 0 | 0 | 197 | 254 | 178 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 23 | 239 | 221 | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 198 | 255 | 123 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 23 | 238 | 178 | 0 | 0 | 0 | 0 | 0 | 0 | 10 | 219 | 254 | 96 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 30 | 249 | 204 | 0 | 0 | 0 | 0 | 0 | 0 | 25 | 235 | 254 | 62 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 26 | 243 | 204 | 0 | 0 | 0 | 0 | 0 | 0 | 91 | 254 | 248 | 36 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 33 | 254 | 204 | 0 | 0 | 0 | 0 | 0 | 67 | 241 | 254 | 133 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 33 | 254 | 214 | 7 | 0 | 0 | 0 | 50 | 242 | 254 | 194 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 193 | 254 | 78 | 0 | 0 | 19 | 128 | 254 | 195 | 36 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 103 | 254 | 222 | 74 | 143 | 235 | 254 | 228 | 83 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 30 | 242 | 254 | 254 | 254 | 254 | 252 | 84 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 23 | 64 | 158 | 200 | 174 | 61 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
print(paste0("trainDf: ", "nrow = ", nrow(trainDf), ", ncol = ", ncol(trainDf)))## [1] "trainDf: nrow = 42000, ncol = 785"
print(paste0("testDf: ", "nrow = ", nrow(testDf), ", ncol = ", ncol(testDf)))## [1] "testDf: nrow = 28000, ncol = 784"
print(paste0("the train set has ", colSums(is.na(trainDf)) %>% unique(), " missing values in all columns"))## [1] "the train set has 0 missing values in all columns"
print(paste0("the test set has ", colSums(is.na(testDf)) %>% unique(), " missing values in all columns"))## [1] "the test set has 0 missing values in all columns"
trainDf2 <- trainDf[, 2:ncol(trainDf)] / 255
trainDf2$label <- trainDf$label
trainDf2 <- trainDf2 %>% dplyr::select(label, everything())
testDf2 <- lapply(testDf[, 1:ncol(testDf)], function(x) x/255) %>%
dplyr::bind_cols() %>%
as.data.frame()plotDigit <- function(row, label){
x <- rep(0:27)/27
y <- rep(0:27)/27
z <- matrix(as.numeric(row), ncol = 28, byrow = TRUE)
rotate <- function(x) t(apply(x, 2, rev))
z <- rotate(z)
image(x, y, z,
main = paste("Actual label:", label),
#col = gray.colors(255, start = 1, end = 0),
asp = 1,
xlim = c(0, 1),
ylim = c(-0.1, 1.1),
useRaster = TRUE,
axes = FALSE,
xlab = "",
ylab = "")
}
par(mfrow = c(2, 5))
for(i in 1:10){
r = trainDf2[i, ] %>% dplyr::select(-label)
lab = trainDf2[i, ] %>% dplyr::select(label)
plotDigit(row = r, label = lab)
}(b)
What is the frequency distribution of the numbers in the dataset? (5 points)
par(mfrow = c(1, 1))
trainDf2 %>%
group_by(label) %>%
count(count = n()) %>%
ggplot(aes(x = factor(label),
y = n)) +
geom_bar(stat = "identity") +
xlab(label = "") +
ylab(label = "")table(trainDf2$label)##
## 0 1 2 3 4 5 6 7 8 9
## 4132 4684 4177 4351 4072 3795 4137 4401 4063 4188
(c)
For each number, provide the mean pixel intensity. What does this tell you? (5 points)
pixel_intensity = trainDf2 %>%
rowwise() %>%
dplyr::mutate(pixel_intensity = sum(c_across(starts_with("pixel")))) %>%
ungroup()
mean_pixel_intensity = pixel_intensity %>%
group_by(label) %>%
summarise(mean_pixel_intensity = mean(pixel_intensity)) %>%
ungroup()
mean_pixel_intensity## # A tibble: 10 x 2
## label mean_pixel_intensity
## <dbl> <dbl>
## 1 0 136.
## 2 1 59.6
## 3 2 117.
## 4 3 111.
## 5 4 95.0
## 6 5 101.
## 7 6 109.
## 8 7 89.9
## 9 8 118.
## 10 9 96.3
pixel_intensity %>%
dplyr::select(label, pixel_intensity) %>%
ggplot(aes(pixel_intensity, color = as.factor(label))) +
geom_density(aes(fill = as.factor(label)), alpha = 0.5) +
geom_vline(data = aggregate(pixel_intensity ~ label,
data = pixel_intensity %>%
dplyr::select(label, pixel_intensity),
FUN = median),
aes(xintercept = pixel_intensity,
color = "red"),
linetype = "dashed",
size = 1) +
theme_classic() +
facet_wrap(~ label,
#scales = "free",
nrow = 5) +
theme(legend.position = "none")Above is the mean pixel intensity after scaling (divided by 255 for each pixel column). We sum across all pixel columns and then calculate the mean by group (i.e. label column). We see that some labels appear to be very close (visually) to each other, such as “2” and “8”, “3” and “6”, “5” and “9”, and so forth. Letter “1” is possibly the most unique and easiest to be recognized.
(d)
Reduce the data by using principal components that account for 95% of the variance. How many components did you generate? Use PCA to generate all possible components (100% of the variance). How many components are possible? Why? (5 points)
# get the covariance matrix of the predictors
trainCov <- cov(trainDf2 %>% dplyr::select(-label))
# conduct pca
trainPC <- prcomp(trainCov)
# summary of pca
summary(trainPC)## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 0.1660 0.1351 0.1158 0.10108 0.08829 0.08025 0.06049
## Proportion of Variance 0.2533 0.1678 0.1232 0.09395 0.07168 0.05922 0.03365
## Cumulative Proportion 0.2533 0.4211 0.5443 0.63827 0.70995 0.76918 0.80282
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.05458 0.0499 0.04430 0.03926 0.03805 0.03204 0.03194
## Proportion of Variance 0.02739 0.0229 0.01804 0.01417 0.01332 0.00944 0.00938
## Cumulative Proportion 0.83022 0.8531 0.87116 0.88533 0.89865 0.90808 0.91746
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 0.02975 0.02799 0.02485 0.02410 0.02230 0.02171 0.02024
## Proportion of Variance 0.00814 0.00720 0.00568 0.00534 0.00457 0.00433 0.00377
## Cumulative Proportion 0.92560 0.93280 0.93848 0.94382 0.94840 0.95273 0.95649
## PC22 PC23 PC24 PC25 PC26 PC27 PC28
## Standard deviation 0.01916 0.01821 0.01722 0.01670 0.01579 0.01531 0.01467
## Proportion of Variance 0.00338 0.00305 0.00273 0.00256 0.00229 0.00216 0.00198
## Cumulative Proportion 0.95987 0.96292 0.96565 0.96821 0.97050 0.97266 0.97464
## PC29 PC30 PC31 PC32 PC33 PC34 PC35
## Standard deviation 0.01396 0.01294 0.01240 0.01206 0.01131 0.01107 0.01065
## Proportion of Variance 0.00179 0.00154 0.00141 0.00134 0.00118 0.00113 0.00104
## Cumulative Proportion 0.97643 0.97797 0.97938 0.98072 0.98189 0.98302 0.98406
## PC36 PC37 PC38 PC39 PC40 PC41
## Standard deviation 0.01021 0.009607 0.009195 0.008975 0.008794 0.008539
## Proportion of Variance 0.00096 0.000850 0.000780 0.000740 0.000710 0.000670
## Cumulative Proportion 0.98502 0.985870 0.986650 0.987390 0.988100 0.988770
## PC42 PC43 PC44 PC45 PC46 PC47
## Standard deviation 0.008397 0.007889 0.007502 0.007246 0.007074 0.006812
## Proportion of Variance 0.000650 0.000570 0.000520 0.000480 0.000460 0.000430
## Cumulative Proportion 0.989420 0.989990 0.990510 0.990990 0.991450 0.991880
## PC48 PC49 PC50 PC51 PC52 PC53
## Standard deviation 0.006551 0.00635 0.00604 0.005924 0.005832 0.005538
## Proportion of Variance 0.000390 0.00037 0.00034 0.000320 0.000310 0.000280
## Cumulative Proportion 0.992270 0.99264 0.99298 0.993300 0.993610 0.993900
## PC54 PC55 PC56 PC57 PC58 PC59
## Standard deviation 0.005387 0.005288 0.005089 0.005014 0.004831 0.004781
## Proportion of Variance 0.000270 0.000260 0.000240 0.000230 0.000210 0.000210
## Cumulative Proportion 0.994160 0.994420 0.994660 0.994890 0.995100 0.995310
## PC60 PC61 PC62 PC63 PC64 PC65
## Standard deviation 0.004646 0.004521 0.004504 0.00429 0.004177 0.004036
## Proportion of Variance 0.000200 0.000190 0.000190 0.00017 0.000160 0.000150
## Cumulative Proportion 0.995510 0.995700 0.995890 0.99606 0.996220 0.996370
## PC66 PC67 PC68 PC69 PC70 PC71
## Standard deviation 0.003877 0.003827 0.003691 0.003653 0.003557 0.003515
## Proportion of Variance 0.000140 0.000130 0.000130 0.000120 0.000120 0.000110
## Cumulative Proportion 0.996500 0.996640 0.996760 0.996890 0.997000 0.997120
## PC72 PC73 PC74 PC75 PC76 PC77
## Standard deviation 0.003411 0.003334 0.003256 0.003132 0.003081 0.00302
## Proportion of Variance 0.000110 0.000100 0.000100 0.000090 0.000090 0.00008
## Cumulative Proportion 0.997220 0.997330 0.997420 0.997510 0.997600 0.99768
## PC78 PC79 PC80 PC81 PC82 PC83
## Standard deviation 0.002909 0.002757 0.002686 0.002662 0.002645 0.00262
## Proportion of Variance 0.000080 0.000070 0.000070 0.000070 0.000060 0.00006
## Cumulative Proportion 0.997760 0.997830 0.997900 0.997960 0.998030 0.99809
## PC84 PC85 PC86 PC87 PC88 PC89
## Standard deviation 0.002549 0.002497 0.002468 0.002445 0.002345 0.002306
## Proportion of Variance 0.000060 0.000060 0.000060 0.000050 0.000050 0.000050
## Cumulative Proportion 0.998150 0.998210 0.998260 0.998320 0.998370 0.998420
## PC90 PC91 PC92 PC93 PC94 PC95
## Standard deviation 0.002257 0.00218 0.002149 0.002119 0.002083 0.002038
## Proportion of Variance 0.000050 0.00004 0.000040 0.000040 0.000040 0.000040
## Cumulative Proportion 0.998470 0.99851 0.998550 0.998590 0.998630 0.998670
## PC96 PC97 PC98 PC99 PC100 PC101
## Standard deviation 0.002026 0.001958 0.001942 0.001916 0.001887 0.001839
## Proportion of Variance 0.000040 0.000040 0.000030 0.000030 0.000030 0.000030
## Cumulative Proportion 0.998710 0.998740 0.998780 0.998810 0.998850 0.998880
## PC102 PC103 PC104 PC105 PC106 PC107
## Standard deviation 0.001783 0.00177 0.001722 0.001712 0.001674 0.001629
## Proportion of Variance 0.000030 0.00003 0.000030 0.000030 0.000030 0.000020
## Cumulative Proportion 0.998910 0.99893 0.998960 0.998990 0.999010 0.999040
## PC108 PC109 PC110 PC111 PC112 PC113
## Standard deviation 0.001593 0.001576 0.001541 0.00148 0.001475 0.001465
## Proportion of Variance 0.000020 0.000020 0.000020 0.00002 0.000020 0.000020
## Cumulative Proportion 0.999060 0.999080 0.999110 0.99913 0.999150 0.999170
## PC114 PC115 PC116 PC117 PC118 PC119
## Standard deviation 0.00145 0.001429 0.001411 0.001385 0.00137 0.00135
## Proportion of Variance 0.00002 0.000020 0.000020 0.000020 0.00002 0.00002
## Cumulative Proportion 0.99919 0.999200 0.999220 0.999240 0.99926 0.99927
## PC120 PC121 PC122 PC123 PC124 PC125
## Standard deviation 0.001322 0.001307 0.001294 0.001282 0.001256 0.001237
## Proportion of Variance 0.000020 0.000020 0.000020 0.000020 0.000010 0.000010
## Cumulative Proportion 0.999290 0.999310 0.999320 0.999340 0.999350 0.999370
## PC126 PC127 PC128 PC129 PC130 PC131
## Standard deviation 0.001216 0.001198 0.00118 0.001159 0.001142 0.00114
## Proportion of Variance 0.000010 0.000010 0.00001 0.000010 0.000010 0.00001
## Cumulative Proportion 0.999380 0.999390 0.99940 0.999420 0.999430 0.99944
## PC132 PC133 PC134 PC135 PC136 PC137
## Standard deviation 0.001113 0.001106 0.001103 0.001086 0.001075 0.001065
## Proportion of Variance 0.000010 0.000010 0.000010 0.000010 0.000010 0.000010
## Cumulative Proportion 0.999450 0.999460 0.999480 0.999490 0.999500 0.999510
## PC138 PC139 PC140 PC141 PC142 PC143
## Standard deviation 0.001044 0.001008 0.000991 0.0009851 0.0009645 0.0009518
## Proportion of Variance 0.000010 0.000010 0.000010 0.0000100 0.0000100 0.0000100
## Cumulative Proportion 0.999520 0.999530 0.999540 0.9995400 0.9995500 0.9995600
## PC144 PC145 PC146 PC147 PC148
## Standard deviation 0.000942 0.0009318 0.0009263 0.0009116 0.0008994
## Proportion of Variance 0.000010 0.0000100 0.0000100 0.0000100 0.0000100
## Cumulative Proportion 0.999570 0.9995800 0.9995900 0.9995900 0.9996000
## PC149 PC150 PC151 PC152 PC153
## Standard deviation 0.0008927 0.0008803 0.0008777 0.0008707 0.0008608
## Proportion of Variance 0.0000100 0.0000100 0.0000100 0.0000100 0.0000100
## Cumulative Proportion 0.9996100 0.9996100 0.9996200 0.9996300 0.9996400
## PC154 PC155 PC156 PC157 PC158
## Standard deviation 0.0008526 0.0008483 0.000832 0.000828 0.0007979
## Proportion of Variance 0.0000100 0.0000100 0.000010 0.000010 0.0000100
## Cumulative Proportion 0.9996400 0.9996500 0.999660 0.999660 0.9996700
## PC159 PC160 PC161 PC162 PC163
## Standard deviation 0.0007931 0.0007851 0.0007794 0.0007683 0.0007521
## Proportion of Variance 0.0000100 0.0000100 0.0000100 0.0000100 0.0000100
## Cumulative Proportion 0.9996700 0.9996800 0.9996800 0.9996900 0.9997000
## PC164 PC165 PC166 PC167 PC168
## Standard deviation 0.0007442 0.0007417 0.0007386 0.00072 0.0007156
## Proportion of Variance 0.0000100 0.0000100 0.0000100 0.00000 0.0000000
## Cumulative Proportion 0.9997000 0.9997100 0.9997100 0.99972 0.9997200
## PC169 PC170 PC171 PC172 PC173
## Standard deviation 0.0007081 0.0007054 0.0006905 0.0006852 0.0006777
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9997200 0.9997300 0.9997300 0.9997400 0.9997400
## PC174 PC175 PC176 PC177 PC178
## Standard deviation 0.0006736 0.0006662 0.0006657 0.0006509 0.0006474
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9997500 0.9997500 0.9997500 0.9997600 0.9997600
## PC179 PC180 PC181 PC182 PC183
## Standard deviation 0.0006441 0.0006393 0.0006305 0.0006221 0.0006137
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9997700 0.9997700 0.9997700 0.9997800 0.9997800
## PC184 PC185 PC186 PC187 PC188 PC189
## Standard deviation 0.00061 0.0006048 0.0006039 0.0005984 0.0005978 0.000586
## Proportion of Variance 0.00000 0.0000000 0.0000000 0.0000000 0.0000000 0.000000
## Cumulative Proportion 0.99978 0.9997900 0.9997900 0.9997900 0.9998000 0.999800
## PC190 PC191 PC192 PC193 PC194
## Standard deviation 0.0005845 0.0005761 0.0005734 0.0005644 0.000563
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.000000
## Cumulative Proportion 0.9998000 0.9998100 0.9998100 0.9998100 0.999820
## PC195 PC196 PC197 PC198 PC199
## Standard deviation 0.0005549 0.0005529 0.0005517 0.0005459 0.0005378
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9998200 0.9998200 0.9998200 0.9998300 0.9998300
## PC200 PC201 PC202 PC203 PC204
## Standard deviation 0.0005354 0.0005299 0.0005213 0.0005173 0.0005126
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9998300 0.9998300 0.9998400 0.9998400 0.9998400
## PC205 PC206 PC207 PC208 PC209
## Standard deviation 0.0005038 0.0005031 0.0004961 0.0004945 0.0004935
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9998400 0.9998500 0.9998500 0.9998500 0.9998500
## PC210 PC211 PC212 PC213 PC214
## Standard deviation 0.0004873 0.0004842 0.0004816 0.0004796 0.0004763
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9998600 0.9998600 0.9998600 0.9998600 0.9998600
## PC215 PC216 PC217 PC218 PC219
## Standard deviation 0.0004733 0.0004681 0.0004673 0.0004613 0.000459
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.000000
## Cumulative Proportion 0.9998700 0.9998700 0.9998700 0.9998700 0.999870
## PC220 PC221 PC222 PC223 PC224
## Standard deviation 0.0004543 0.0004535 0.0004517 0.0004502 0.0004463
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9998800 0.9998800 0.9998800 0.9998800 0.9998800
## PC225 PC226 PC227 PC228 PC229
## Standard deviation 0.0004379 0.0004359 0.0004321 0.0004286 0.0004261
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9998800 0.9998900 0.9998900 0.9998900 0.9998900
## PC230 PC231 PC232 PC233 PC234
## Standard deviation 0.0004239 0.000417 0.0004142 0.0004101 0.0004062
## Proportion of Variance 0.0000000 0.000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9998900 0.999890 0.9999000 0.9999000 0.9999000
## PC235 PC236 PC237 PC238 PC239
## Standard deviation 0.0004044 0.0004021 0.0003991 0.0003978 0.0003925
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999000 0.9999000 0.9999000 0.9999100 0.9999100
## PC240 PC241 PC242 PC243 PC244
## Standard deviation 0.0003872 0.0003845 0.0003836 0.0003808 0.0003768
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999100 0.9999100 0.9999100 0.9999100 0.9999100
## PC245 PC246 PC247 PC248 PC249
## Standard deviation 0.0003727 0.0003714 0.0003673 0.0003656 0.0003633
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999100 0.9999200 0.9999200 0.9999200 0.9999200
## PC250 PC251 PC252 PC253 PC254
## Standard deviation 0.0003614 0.000359 0.0003554 0.0003523 0.000349
## Proportion of Variance 0.0000000 0.000000 0.0000000 0.0000000 0.000000
## Cumulative Proportion 0.9999200 0.999920 0.9999200 0.9999200 0.999930
## PC255 PC256 PC257 PC258 PC259
## Standard deviation 0.0003482 0.0003467 0.000345 0.0003434 0.0003423
## Proportion of Variance 0.0000000 0.0000000 0.000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999300 0.9999300 0.999930 0.9999300 0.9999300
## PC260 PC261 PC262 PC263 PC264
## Standard deviation 0.0003389 0.0003335 0.0003313 0.00033 0.0003273
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.00000 0.0000000
## Cumulative Proportion 0.9999300 0.9999300 0.9999300 0.99994 0.9999400
## PC265 PC266 PC267 PC268 PC269
## Standard deviation 0.0003256 0.0003236 0.0003226 0.0003179 0.0003173
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999400 0.9999400 0.9999400 0.9999400 0.9999400
## PC270 PC271 PC272 PC273 PC274
## Standard deviation 0.0003165 0.0003136 0.0003101 0.0003091 0.0003077
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999400 0.9999400 0.9999400 0.9999400 0.9999500
## PC275 PC276 PC277 PC278 PC279
## Standard deviation 0.0003056 0.0003028 0.0003022 0.0002997 0.0002989
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999500 0.9999500 0.9999500 0.9999500 0.9999500
## PC280 PC281 PC282 PC283 PC284
## Standard deviation 0.0002967 0.0002933 0.0002915 0.0002899 0.0002865
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999500 0.9999500 0.9999500 0.9999500 0.9999500
## PC285 PC286 PC287 PC288 PC289
## Standard deviation 0.0002847 0.0002838 0.0002811 0.0002799 0.0002765
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999500 0.9999500 0.9999600 0.9999600 0.9999600
## PC290 PC291 PC292 PC293 PC294
## Standard deviation 0.0002754 0.0002708 0.0002705 0.0002698 0.0002677
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999600 0.9999600 0.9999600 0.9999600 0.9999600
## PC295 PC296 PC297 PC298 PC299
## Standard deviation 0.0002669 0.0002648 0.0002611 0.0002594 0.0002574
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999600 0.9999600 0.9999600 0.9999600 0.9999600
## PC300 PC301 PC302 PC303 PC304
## Standard deviation 0.0002555 0.0002549 0.0002537 0.0002517 0.0002496
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999600 0.9999600 0.9999700 0.9999700 0.9999700
## PC305 PC306 PC307 PC308 PC309
## Standard deviation 0.0002468 0.0002448 0.0002425 0.0002412 0.0002394
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999700 0.9999700 0.9999700 0.9999700 0.9999700
## PC310 PC311 PC312 PC313 PC314
## Standard deviation 0.0002389 0.0002375 0.0002361 0.0002341 0.0002312
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999700 0.9999700 0.9999700 0.9999700 0.9999700
## PC315 PC316 PC317 PC318 PC319
## Standard deviation 0.0002306 0.0002287 0.0002281 0.0002263 0.0002246
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999700 0.9999700 0.9999700 0.9999700 0.9999700
## PC320 PC321 PC322 PC323 PC324
## Standard deviation 0.0002231 0.0002212 0.0002182 0.0002177 0.0002162
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999700 0.9999800 0.9999800 0.9999800 0.9999800
## PC325 PC326 PC327 PC328 PC329
## Standard deviation 0.0002144 0.0002136 0.0002109 0.000208 0.0002073
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.000000 0.0000000
## Cumulative Proportion 0.9999800 0.9999800 0.9999800 0.999980 0.9999800
## PC330 PC331 PC332 PC333 PC334
## Standard deviation 0.0002061 0.000206 0.000204 0.0002034 0.0002018
## Proportion of Variance 0.0000000 0.000000 0.000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999800 0.999980 0.999980 0.9999800 0.9999800
## PC335 PC336 PC337 PC338 PC339
## Standard deviation 0.0002014 0.0001997 0.0001971 0.000197 0.0001945
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.000000 0.0000000
## Cumulative Proportion 0.9999800 0.9999800 0.9999800 0.999980 0.9999800
## PC340 PC341 PC342 PC343 PC344
## Standard deviation 0.0001929 0.0001912 0.0001899 0.0001883 0.0001873
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999800 0.9999800 0.9999800 0.9999800 0.9999800
## PC345 PC346 PC347 PC348 PC349
## Standard deviation 0.0001861 0.0001831 0.000183 0.0001817 0.0001785
## Proportion of Variance 0.0000000 0.0000000 0.000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999800 0.9999800 0.999980 0.9999900 0.9999900
## PC350 PC351 PC352 PC353 PC354
## Standard deviation 0.0001776 0.0001756 0.0001746 0.0001728 0.0001719
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999900 0.9999900 0.9999900 0.9999900 0.9999900
## PC355 PC356 PC357 PC358 PC359
## Standard deviation 0.0001708 0.0001699 0.0001691 0.0001678 0.0001665
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999900 0.9999900 0.9999900 0.9999900 0.9999900
## PC360 PC361 PC362 PC363 PC364
## Standard deviation 0.0001651 0.0001644 0.0001621 0.0001605 0.0001604
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999900 0.9999900 0.9999900 0.9999900 0.9999900
## PC365 PC366 PC367 PC368 PC369
## Standard deviation 0.0001597 0.0001582 0.0001561 0.0001557 0.0001551
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999900 0.9999900 0.9999900 0.9999900 0.9999900
## PC370 PC371 PC372 PC373 PC374
## Standard deviation 0.0001526 0.0001504 0.000149 0.0001479 0.0001473
## Proportion of Variance 0.0000000 0.0000000 0.000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999900 0.9999900 0.999990 0.9999900 0.9999900
## PC375 PC376 PC377 PC378 PC379 PC380
## Standard deviation 0.0001462 0.0001446 0.000143 0.0001414 0.00014 0.0001381
## Proportion of Variance 0.0000000 0.0000000 0.000000 0.0000000 0.00000 0.0000000
## Cumulative Proportion 0.9999900 0.9999900 0.999990 0.9999900 0.99999 0.9999900
## PC381 PC382 PC383 PC384 PC385
## Standard deviation 0.0001378 0.0001369 0.0001363 0.0001354 0.0001341
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999900 0.9999900 0.9999900 0.9999900 0.9999900
## PC386 PC387 PC388 PC389 PC390
## Standard deviation 0.000131 0.0001305 0.0001294 0.000128 0.0001264
## Proportion of Variance 0.000000 0.0000000 0.0000000 0.000000 0.0000000
## Cumulative Proportion 0.999990 0.9999900 0.9999900 0.999990 0.9999900
## PC391 PC392 PC393 PC394 PC395
## Standard deviation 0.0001254 0.000124 0.0001227 0.0001206 0.0001187
## Proportion of Variance 0.0000000 0.000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999900 0.999990 0.9999900 0.9999900 0.9999900
## PC396 PC397 PC398 PC399 PC400
## Standard deviation 0.0001183 0.0001153 0.0001146 0.0001137 0.0001122
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 0.9999900 1.0000000 1.0000000 1.0000000 1.0000000
## PC401 PC402 PC403 PC404 PC405
## Standard deviation 0.0001108 0.0001099 0.000109 0.000108 0.0001061
## Proportion of Variance 0.0000000 0.0000000 0.000000 0.000000 0.0000000
## Cumulative Proportion 1.0000000 1.0000000 1.000000 1.000000 1.0000000
## PC406 PC407 PC408 PC409 PC410
## Standard deviation 0.0001056 0.0001038 0.0001028 0.0001018 0.0001016
## Proportion of Variance 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## Cumulative Proportion 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## PC411 PC412 PC413 PC414 PC415
## Standard deviation 0.0001002 9.943e-05 9.851e-05 9.694e-05 9.65e-05
## Proportion of Variance 0.0000000 0.000e+00 0.000e+00 0.000e+00 0.00e+00
## Cumulative Proportion 1.0000000 1.000e+00 1.000e+00 1.000e+00 1.00e+00
## PC416 PC417 PC418 PC419 PC420
## Standard deviation 9.489e-05 9.445e-05 9.314e-05 9.296e-05 9.098e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC421 PC422 PC423 PC424 PC425
## Standard deviation 8.898e-05 8.831e-05 8.703e-05 8.671e-05 8.601e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC426 PC427 PC428 PC429 PC430
## Standard deviation 8.372e-05 8.223e-05 8.083e-05 8.04e-05 8.008e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.00e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.00e+00 1.000e+00
## PC431 PC432 PC433 PC434 PC435
## Standard deviation 7.881e-05 7.756e-05 7.639e-05 7.622e-05 7.563e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC436 PC437 PC438 PC439 PC440
## Standard deviation 7.241e-05 7.154e-05 7.11e-05 7.067e-05 6.936e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.00e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.00e+00 1.000e+00 1.000e+00
## PC441 PC442 PC443 PC444 PC445
## Standard deviation 6.884e-05 6.807e-05 6.664e-05 6.627e-05 6.559e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC446 PC447 PC448 PC449 PC450
## Standard deviation 6.493e-05 6.42e-05 6.361e-05 6.267e-05 6.251e-05
## Proportion of Variance 0.000e+00 0.00e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.00e+00 1.000e+00 1.000e+00 1.000e+00
## PC451 PC452 PC453 PC454 PC455
## Standard deviation 6.059e-05 5.947e-05 5.897e-05 5.817e-05 5.684e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC456 PC457 PC458 PC459 PC460
## Standard deviation 5.626e-05 5.557e-05 5.487e-05 5.447e-05 5.365e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC461 PC462 PC463 PC464 PC465
## Standard deviation 5.319e-05 5.265e-05 5.074e-05 5.029e-05 4.827e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC466 PC467 PC468 PC469 PC470
## Standard deviation 4.796e-05 4.786e-05 4.684e-05 4.665e-05 4.601e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC471 PC472 PC473 PC474 PC475
## Standard deviation 4.573e-05 4.536e-05 4.457e-05 4.352e-05 4.277e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC476 PC477 PC478 PC479 PC480 PC481
## Standard deviation 4.217e-05 4.108e-05 4.033e-05 4e-05 3.983e-05 3.921e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1e+00 1.000e+00 1.000e+00
## PC482 PC483 PC484 PC485 PC486
## Standard deviation 3.888e-05 3.832e-05 3.76e-05 3.733e-05 3.675e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.00e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.00e+00 1.000e+00 1.000e+00
## PC487 PC488 PC489 PC490 PC491
## Standard deviation 3.568e-05 3.555e-05 3.501e-05 3.408e-05 3.375e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC492 PC493 PC494 PC495 PC496
## Standard deviation 3.329e-05 3.279e-05 3.248e-05 3.038e-05 2.983e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC497 PC498 PC499 PC500 PC501
## Standard deviation 2.957e-05 2.936e-05 2.886e-05 2.865e-05 2.832e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC502 PC503 PC504 PC505 PC506
## Standard deviation 2.781e-05 2.729e-05 2.696e-05 2.68e-05 2.62e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.00e+00 0.00e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.00e+00 1.00e+00
## PC507 PC508 PC509 PC510 PC511
## Standard deviation 2.615e-05 2.581e-05 2.522e-05 2.494e-05 2.483e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC512 PC513 PC514 PC515 PC516 PC517
## Standard deviation 2.35e-05 2.325e-05 2.317e-05 2.305e-05 2.236e-05 2.2e-05
## Proportion of Variance 0.00e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.0e+00
## Cumulative Proportion 1.00e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.0e+00
## PC518 PC519 PC520 PC521 PC522
## Standard deviation 2.195e-05 2.186e-05 2.164e-05 2.089e-05 2.061e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC523 PC524 PC525 PC526 PC527
## Standard deviation 2.011e-05 1.969e-05 1.955e-05 1.939e-05 1.853e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC528 PC529 PC530 PC531 PC532
## Standard deviation 1.793e-05 1.752e-05 1.732e-05 1.71e-05 1.693e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.00e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.00e+00 1.000e+00
## PC533 PC534 PC535 PC536 PC537 PC538
## Standard deviation 1.682e-05 1.618e-05 1.59e-05 1.543e-05 1.51e-05 1.49e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.00e+00 0.000e+00 0.00e+00 0.00e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.00e+00 1.000e+00 1.00e+00 1.00e+00
## PC539 PC540 PC541 PC542 PC543
## Standard deviation 1.456e-05 1.445e-05 1.401e-05 1.394e-05 1.373e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC544 PC545 PC546 PC547 PC548 PC549
## Standard deviation 1.33e-05 1.308e-05 1.261e-05 1.25e-05 1.22e-05 1.21e-05
## Proportion of Variance 0.00e+00 0.000e+00 0.000e+00 0.00e+00 0.00e+00 0.00e+00
## Cumulative Proportion 1.00e+00 1.000e+00 1.000e+00 1.00e+00 1.00e+00 1.00e+00
## PC550 PC551 PC552 PC553 PC554
## Standard deviation 1.143e-05 1.133e-05 1.101e-05 1.068e-05 1.045e-05
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC555 PC556 PC557 PC558 PC559
## Standard deviation 1.024e-05 1.007e-05 9.81e-06 9.738e-06 9.631e-06
## Proportion of Variance 0.000e+00 0.000e+00 0.00e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.00e+00 1.000e+00 1.000e+00
## PC560 PC561 PC562 PC563 PC564
## Standard deviation 9.45e-06 9.293e-06 9.049e-06 8.955e-06 8.562e-06
## Proportion of Variance 0.00e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.00e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC565 PC566 PC567 PC568 PC569
## Standard deviation 8.226e-06 8.085e-06 7.919e-06 7.639e-06 7.543e-06
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC570 PC571 PC572 PC573 PC574
## Standard deviation 7.487e-06 7.233e-06 7.191e-06 7.11e-06 6.706e-06
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.00e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.00e+00 1.000e+00
## PC575 PC576 PC577 PC578 PC579
## Standard deviation 6.422e-06 6.365e-06 6.14e-06 5.888e-06 5.75e-06
## Proportion of Variance 0.000e+00 0.000e+00 0.00e+00 0.000e+00 0.00e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.00e+00 1.000e+00 1.00e+00
## PC580 PC581 PC582 PC583 PC584
## Standard deviation 5.734e-06 5.633e-06 5.333e-06 5.109e-06 5.016e-06
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC585 PC586 PC587 PC588 PC589
## Standard deviation 4.941e-06 4.906e-06 4.802e-06 4.749e-06 4.619e-06
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC590 PC591 PC592 PC593 PC594
## Standard deviation 4.452e-06 4.362e-06 4.273e-06 4.203e-06 4.144e-06
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC595 PC596 PC597 PC598 PC599
## Standard deviation 4.031e-06 3.88e-06 3.677e-06 3.519e-06 3.356e-06
## Proportion of Variance 0.000e+00 0.00e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.00e+00 1.000e+00 1.000e+00 1.000e+00
## PC600 PC601 PC602 PC603 PC604
## Standard deviation 3.325e-06 3.317e-06 3.118e-06 3.029e-06 2.998e-06
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC605 PC606 PC607 PC608 PC609
## Standard deviation 2.97e-06 2.777e-06 2.703e-06 2.616e-06 2.609e-06
## Proportion of Variance 0.00e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.00e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC610 PC611 PC612 PC613 PC614
## Standard deviation 2.415e-06 2.395e-06 2.338e-06 2.273e-06 2.23e-06
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.00e+00
## PC615 PC616 PC617 PC618 PC619
## Standard deviation 2.143e-06 2.04e-06 2.008e-06 1.891e-06 1.783e-06
## Proportion of Variance 0.000e+00 0.00e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.00e+00 1.000e+00 1.000e+00 1.000e+00
## PC620 PC621 PC622 PC623 PC624
## Standard deviation 1.721e-06 1.645e-06 1.626e-06 1.575e-06 1.562e-06
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC625 PC626 PC627 PC628 PC629
## Standard deviation 1.509e-06 1.477e-06 1.433e-06 1.354e-06 1.317e-06
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC630 PC631 PC632 PC633 PC634
## Standard deviation 1.312e-06 1.285e-06 1.232e-06 1.202e-06 1.169e-06
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC635 PC636 PC637 PC638 PC639
## Standard deviation 1.13e-06 1.046e-06 1.027e-06 9.778e-07 9.598e-07
## Proportion of Variance 0.00e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.00e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC640 PC641 PC642 PC643 PC644
## Standard deviation 9.353e-07 9.217e-07 9.076e-07 7.152e-07 6.97e-07
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.00e+00
## PC645 PC646 PC647 PC648 PC649
## Standard deviation 6.805e-07 6.652e-07 6.109e-07 6.081e-07 5.981e-07
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC650 PC651 PC652 PC653 PC654
## Standard deviation 5.746e-07 5.661e-07 5.284e-07 4.682e-07 4.631e-07
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC655 PC656 PC657 PC658 PC659
## Standard deviation 4.402e-07 4.339e-07 4.125e-07 3.874e-07 3.726e-07
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC660 PC661 PC662 PC663 PC664
## Standard deviation 3.582e-07 3.574e-07 3.506e-07 3.309e-07 3.206e-07
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC665 PC666 PC667 PC668 PC669
## Standard deviation 3.095e-07 2.115e-07 2.065e-07 1.708e-07 1.635e-07
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC670 PC671 PC672 PC673 PC674
## Standard deviation 1.412e-07 1.404e-07 1.3e-07 1.279e-07 9.703e-08
## Proportion of Variance 0.000e+00 0.000e+00 0.0e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.0e+00 1.000e+00 1.000e+00
## PC675 PC676 PC677 PC678 PC679
## Standard deviation 9.057e-08 8.619e-08 8.013e-08 7.437e-08 7.172e-08
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC680 PC681 PC682 PC683 PC684
## Standard deviation 6.084e-08 4.417e-08 4.165e-08 3.901e-08 3.704e-08
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC685 PC686 PC687 PC688 PC689
## Standard deviation 3.624e-08 3.326e-08 3.208e-08 2.847e-08 2.462e-08
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC690 PC691 PC692 PC693 PC694
## Standard deviation 2.219e-08 1.839e-08 1.397e-08 1.327e-08 1.086e-08
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC695 PC696 PC697 PC698 PC699
## Standard deviation 7.174e-09 6.211e-09 3.776e-09 3.49e-09 1.637e-09
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.00e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.00e+00 1.000e+00
## PC700 PC701 PC702 PC703 PC704
## Standard deviation 3.063e-10 1.996e-10 1.671e-10 3.308e-11 3.014e-11
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC705 PC706 PC707 PC708 PC709
## Standard deviation 2.389e-16 1.869e-16 1.513e-16 1.446e-16 1.005e-16
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC710 PC711 PC712 PC713 PC714
## Standard deviation 7.544e-17 5.035e-17 4.695e-17 4.302e-17 3.603e-17
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC715 PC716 PC717 PC718 PC719
## Standard deviation 3.339e-17 3.058e-17 2.802e-17 2.62e-17 2.525e-17
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.00e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.00e+00 1.000e+00
## PC720 PC721 PC722 PC723 PC724
## Standard deviation 2.368e-17 1.948e-17 1.615e-17 1.566e-17 1.479e-17
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC725 PC726 PC727 PC728 PC729
## Standard deviation 1.479e-17 1.479e-17 1.479e-17 1.479e-17 1.479e-17
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC730 PC731 PC732 PC733 PC734
## Standard deviation 1.479e-17 1.479e-17 1.479e-17 1.479e-17 1.479e-17
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC735 PC736 PC737 PC738 PC739
## Standard deviation 1.479e-17 1.479e-17 1.479e-17 1.479e-17 1.479e-17
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC740 PC741 PC742 PC743 PC744
## Standard deviation 1.479e-17 1.479e-17 1.479e-17 1.479e-17 1.479e-17
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC745 PC746 PC747 PC748 PC749
## Standard deviation 1.479e-17 1.479e-17 1.479e-17 1.479e-17 1.479e-17
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC750 PC751 PC752 PC753 PC754
## Standard deviation 1.479e-17 1.479e-17 1.479e-17 1.479e-17 1.479e-17
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC755 PC756 PC757 PC758 PC759
## Standard deviation 1.479e-17 1.479e-17 1.479e-17 1.479e-17 1.479e-17
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC760 PC761 PC762 PC763 PC764
## Standard deviation 1.479e-17 1.479e-17 1.479e-17 1.479e-17 1.479e-17
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC765 PC766 PC767 PC768 PC769
## Standard deviation 1.479e-17 1.479e-17 1.479e-17 1.479e-17 1.479e-17
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC770 PC771 PC772 PC773 PC774
## Standard deviation 1.479e-17 1.479e-17 1.479e-17 1.479e-17 1.479e-17
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC775 PC776 PC777 PC778 PC779
## Standard deviation 1.479e-17 1.479e-17 1.479e-17 1.479e-17 1.469e-17
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC780 PC781 PC782 PC783 PC784
## Standard deviation 1.183e-17 1.004e-17 8.345e-18 7.018e-18 3.15e-18
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.00e+00
# scree, cumulative variance plots
par(mfrow = c(1, 2))
# scree plot
screeplot(trainPC, type = "l", npcs = 30, main = "Screeplot of the first 30 PCs")
# cumulative variance plot
cumpro <- cumsum(trainPC$sdev^2 / sum(trainPC$sdev^2))
plot(cumpro[0:30], xlab = "PC #", ylab = "Amount of explained variance", main = "Cumulative variance plot")
abline(v = 20, col = "red", lty = 2)trainPC_percent_var <- trainPC$sdev^2/sum(trainPC$sdev^2)
options(digits = 4)
result.pc <- data.frame(PCs = 1:length(trainPC_percent_var),
Covariance = cumsum(trainPC_percent_var))
result.pc[seq(from = 10, to = 100, by = 5), ]## PCs Covariance
## 10 10 0.8712
## 15 15 0.9256
## 20 20 0.9527
## 25 25 0.9682
## 30 30 0.9780
## 35 35 0.9841
## 40 40 0.9881
## 45 45 0.9910
## 50 50 0.9930
## 55 55 0.9944
## 60 60 0.9955
## 65 65 0.9964
## 70 70 0.9970
## 75 75 0.9975
## 80 80 0.9979
## 85 85 0.9982
## 90 90 0.9985
## 95 95 0.9987
## 100 100 0.9988
train_score <- as.matrix(trainDf2[, 2:ncol(trainDf2)]) %*% trainPC$rotation[, 1:20]
label <- as.factor(trainDf2[[1]])
train_score_label <- cbind(label, as.data.frame(train_score))
colors <- rainbow(length(unique(train_score_label$label)))
names(colors) <- unique(train_score_label$label)
plot(train_score_label$PC1,
train_score_label$PC2,
type = "n",
main = "First two principal components")
text(train_score_label$PC1,
train_score_label$PC2,
label = train_score_label$label,
col = colors[train_score_label$label])PCA uses orthogonal projection of highly correlated variables to a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables, i.e. min(n - 1, p). This linear transformation is defined in such a way that the first principal component has the largest possible variance. It accounts for as much of the variability in the data as possible by considering highly correlated features. Each succeeding component in turn has the highest variance using the features that are less correlated with the first principal component and that are orthogonal to the preceding component. Features that are strongly correlated with each other are more suitable for PCA than those loosely related.
In our train set, we run the principal components on the covariance matrix. The first 20 components account for 95% of the variance in our train set. The scree plot (“elbow” method) would indicate that the first 7 or 8 components are the most important ones. They capture over 80% of the variance in the data. Above plot indicates that it is good enough to classify the labels using just the first two components.
(e)
Plot the first 10 images generated by PCA. They will appear to be noise. Why? (5 points)
X_loading <- trainPC$rotation
X_train <- as.matrix(trainDf2 %>% dplyr::select(-label)) %*% X_loading
y_train <- trainDf2[['label']]
par(mfrow = c (2, 5))
for(i in 1:10){
r = X_train[i, ]
lab = y_train[i]
plotDigit(row = r, label = lab)
}I am guessing that it’s because the PCA is conducted based on the entire data set. It should have been built based on each of the 10 subsets in order to come up with meaningful components for each of the 10 labels.
(f)
Now, select only those images that have labels that are 8’s. Re-run PCA that accounts for all of the variance (100%). Plot the first 10 images. What do you see? (5 points)
# get the covariance matrix of the predictors
trainCov8 <- cov(trainDf2 %>% dplyr::filter(label == 8) %>% dplyr::select(-label))
# conduct pca
trainPC8 <- prcomp(trainCov8)
X_loading8 <- trainPC8$rotation
X_train8 <- as.matrix(trainDf2 %>% dplyr::filter(label == 8) %>% dplyr::select(-label)) %*% X_loading8
y_train8 <- trainDf2 %>% dplyr::filter(label == 8) %>% .$label
par(mfrow = c (2, 5))
for(i in 1:10){
r = X_train8[i, ]
lab = y_train8[i]
plotDigit(row = r, label = lab)
}I am still seeing noise here, and perhaps that’s because I have not properly configured my plotDigit function to account for the data transformation from the PCA.
(g)
An incorrect approach to predicting the images would be to build a linear regression model with y as the digit values and X as the pixel matrix. Instead, we can build a multinomial model that classifies the digits. Build a multinomial model on the entirety of the training set. Then provide its classification accuracy (percent correctly identified) as well as a matrix of observed versus forecast values (confusion matrix). This matrix will be a 10 x 10, and correct classifications will be on the diagonal. (10 points)
# split the data in 80/20
set.seed(1234)
index <- sample(1:nrow(trainDf2), 0.8 * nrow(trainDf2))
# data prep
train_x <- trainDf2[index, ] %>% dplyr::select(-label)
train_y <- trainDf2[index, "label"]
test_x <- trainDf2[-index, ] %>% dplyr::select(-label)
test_y <- trainDf2[-index, "label"]
# convert every variable to numeric
train_x <- as.data.frame(lapply(train_x, as.numeric))
test_x <- as.data.frame(lapply(test_x, as.numeric))
# convert data to xgboost format
train_x_xgb <- xgb.DMatrix(data = data.matrix(train_x), label = train_y)
test_x_xgb <- xgb.DMatrix(data = data.matrix(test_x), label = test_y)
# set parameters
params <- list (
eta = 0.3,
max_depth = 5,
min_child_weight = 1,
subsample = 1,
colsample_bytree = 1,
objective = "multi:softmax",
eval_metric = "merror",
num_class = 11
)
# build model
xgb.model <- xgb.train(params, train_x_xgb, nrounds = 10)
# make prediction
xgb.predict <- predict(xgb.model, test_x_xgb)
# display 10x10 confusion matrix
xgb.cm = caret::confusionMatrix(as.factor(xgb.predict), as.factor(test_y))
xgb.cm## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 777 1 6 6 2 5 9 3 1 4
## 1 0 902 4 8 5 3 1 5 18 5
## 2 1 1 790 17 5 4 3 13 10 1
## 3 2 6 5 798 0 22 0 6 13 12
## 4 2 2 12 7 754 5 8 11 5 23
## 5 1 2 0 19 4 705 16 1 10 6
## 6 2 2 8 3 6 19 784 0 7 0
## 7 2 1 12 6 1 5 1 819 6 11
## 8 9 4 8 23 4 15 13 6 705 11
## 9 2 0 4 14 33 6 0 30 17 734
##
## Overall Statistics
##
## Accuracy : 0.925
## 95% CI : (0.919, 0.93)
## No Information Rate : 0.11
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.916
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 0.9737 0.979 0.931 0.886 0.9263 0.8935
## Specificity 0.9951 0.993 0.993 0.991 0.9901 0.9922
## Pos Pred Value 0.9545 0.948 0.935 0.924 0.9095 0.9228
## Neg Pred Value 0.9972 0.997 0.992 0.986 0.9921 0.9890
## Prevalence 0.0950 0.110 0.101 0.107 0.0969 0.0939
## Detection Rate 0.0925 0.107 0.094 0.095 0.0898 0.0839
## Detection Prevalence 0.0969 0.113 0.101 0.103 0.0987 0.0910
## Balanced Accuracy 0.9844 0.986 0.962 0.938 0.9582 0.9429
## Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity 0.9389 0.9161 0.8902 0.9095
## Specificity 0.9938 0.9940 0.9878 0.9860
## Pos Pred Value 0.9434 0.9479 0.8835 0.8738
## Neg Pred Value 0.9933 0.9900 0.9886 0.9903
## Prevalence 0.0994 0.1064 0.0943 0.0961
## Detection Rate 0.0933 0.0975 0.0839 0.0874
## Detection Prevalence 0.0989 0.1029 0.0950 0.1000
## Balanced Accuracy 0.9664 0.9551 0.9390 0.9478
# report accuracy
xgb.model.accuracy = round(xgb.cm$overall[1]*100, 1)
names(xgb.model.accuracy) <- NULL
print(glue("The accuracy of the XGBoost model is ", {xgb.model.accuracy}, "%."))## The accuracy of the XGBoost model is 92.5%.
Final Problem 3
You are to compete in the House Prices: Advanced Regression Techniques competition https://www.kaggle.com/c/house-prices-advanced-regression-techniques. I want you to do the following.
(a) Descriptive and Inferential Statistics
# read df
house_price_train <- readit::readit("data/house-prices-advanced-regression-techniques/train.csv")
house_price_test <- readit::readit("data/house-prices-advanced-regression-techniques/test.csv")summary_df = sapply(house_price_train, class) %>%
as.data.frame() %>%
tibble::rownames_to_column() %>%
dplyr::select(fields = rowname, dtypes = ".") %>%
dplyr::mutate(dtypes = as.character(dtypes),
missing = colSums(is.na(house_price_train)) %>% unlist %>% as.vector(),
percent_missing_value = round(missing / nrow(house_price_train), 4),
unique_value_ct = lapply(house_price_train, function(x) length(unique(x))) %>% unlist %>% as.vector())
print(paste0("There are ", nrow(house_price_train), " rows and ", ncol(house_price_train), " columns in the train set."))## [1] "There are 1460 rows and 81 columns in the train set."
summary_df %>% kable() %>% scroll_box()| fields | dtypes | missing | percent_missing_value | unique_value_ct |
|---|---|---|---|---|
| Id | numeric | 0 | 0.0000 | 1460 |
| MSSubClass | numeric | 0 | 0.0000 | 15 |
| MSZoning | character | 0 | 0.0000 | 5 |
| LotFrontage | numeric | 259 | 0.1774 | 111 |
| LotArea | numeric | 0 | 0.0000 | 1073 |
| Street | character | 0 | 0.0000 | 2 |
| Alley | character | 1369 | 0.9377 | 3 |
| LotShape | character | 0 | 0.0000 | 4 |
| LandContour | character | 0 | 0.0000 | 4 |
| Utilities | character | 0 | 0.0000 | 2 |
| LotConfig | character | 0 | 0.0000 | 5 |
| LandSlope | character | 0 | 0.0000 | 3 |
| Neighborhood | character | 0 | 0.0000 | 25 |
| Condition1 | character | 0 | 0.0000 | 9 |
| Condition2 | character | 0 | 0.0000 | 8 |
| BldgType | character | 0 | 0.0000 | 5 |
| HouseStyle | character | 0 | 0.0000 | 8 |
| OverallQual | numeric | 0 | 0.0000 | 10 |
| OverallCond | numeric | 0 | 0.0000 | 9 |
| YearBuilt | numeric | 0 | 0.0000 | 112 |
| YearRemodAdd | numeric | 0 | 0.0000 | 61 |
| RoofStyle | character | 0 | 0.0000 | 6 |
| RoofMatl | character | 0 | 0.0000 | 8 |
| Exterior1st | character | 0 | 0.0000 | 15 |
| Exterior2nd | character | 0 | 0.0000 | 16 |
| MasVnrType | character | 8 | 0.0055 | 5 |
| MasVnrArea | numeric | 8 | 0.0055 | 328 |
| ExterQual | character | 0 | 0.0000 | 4 |
| ExterCond | character | 0 | 0.0000 | 5 |
| Foundation | character | 0 | 0.0000 | 6 |
| BsmtQual | character | 37 | 0.0253 | 5 |
| BsmtCond | character | 37 | 0.0253 | 5 |
| BsmtExposure | character | 38 | 0.0260 | 5 |
| BsmtFinType1 | character | 37 | 0.0253 | 7 |
| BsmtFinSF1 | numeric | 0 | 0.0000 | 637 |
| BsmtFinType2 | character | 38 | 0.0260 | 7 |
| BsmtFinSF2 | numeric | 0 | 0.0000 | 144 |
| BsmtUnfSF | numeric | 0 | 0.0000 | 780 |
| TotalBsmtSF | numeric | 0 | 0.0000 | 721 |
| Heating | character | 0 | 0.0000 | 6 |
| HeatingQC | character | 0 | 0.0000 | 5 |
| CentralAir | character | 0 | 0.0000 | 2 |
| Electrical | character | 1 | 0.0007 | 6 |
| 1stFlrSF | numeric | 0 | 0.0000 | 753 |
| 2ndFlrSF | numeric | 0 | 0.0000 | 417 |
| LowQualFinSF | numeric | 0 | 0.0000 | 24 |
| GrLivArea | numeric | 0 | 0.0000 | 861 |
| BsmtFullBath | numeric | 0 | 0.0000 | 4 |
| BsmtHalfBath | numeric | 0 | 0.0000 | 3 |
| FullBath | numeric | 0 | 0.0000 | 4 |
| HalfBath | numeric | 0 | 0.0000 | 3 |
| BedroomAbvGr | numeric | 0 | 0.0000 | 8 |
| KitchenAbvGr | numeric | 0 | 0.0000 | 4 |
| KitchenQual | character | 0 | 0.0000 | 4 |
| TotRmsAbvGrd | numeric | 0 | 0.0000 | 12 |
| Functional | character | 0 | 0.0000 | 7 |
| Fireplaces | numeric | 0 | 0.0000 | 4 |
| FireplaceQu | character | 690 | 0.4726 | 6 |
| GarageType | character | 81 | 0.0555 | 7 |
| GarageYrBlt | numeric | 81 | 0.0555 | 98 |
| GarageFinish | character | 81 | 0.0555 | 4 |
| GarageCars | numeric | 0 | 0.0000 | 5 |
| GarageArea | numeric | 0 | 0.0000 | 441 |
| GarageQual | character | 81 | 0.0555 | 6 |
| GarageCond | character | 81 | 0.0555 | 6 |
| PavedDrive | character | 0 | 0.0000 | 3 |
| WoodDeckSF | numeric | 0 | 0.0000 | 274 |
| OpenPorchSF | numeric | 0 | 0.0000 | 202 |
| EnclosedPorch | numeric | 0 | 0.0000 | 120 |
| 3SsnPorch | numeric | 0 | 0.0000 | 20 |
| ScreenPorch | numeric | 0 | 0.0000 | 76 |
| PoolArea | numeric | 0 | 0.0000 | 8 |
| PoolQC | character | 1453 | 0.9952 | 4 |
| Fence | character | 1179 | 0.8075 | 5 |
| MiscFeature | character | 1406 | 0.9630 | 5 |
| MiscVal | numeric | 0 | 0.0000 | 21 |
| MoSold | numeric | 0 | 0.0000 | 12 |
| YrSold | numeric | 0 | 0.0000 | 5 |
| SaleType | character | 0 | 0.0000 | 9 |
| SaleCondition | character | 0 | 0.0000 | 6 |
| SalePrice | numeric | 0 | 0.0000 | 663 |
summary_df %>%
dplyr::mutate(missing_flag = dplyr::case_when(missing == 0 ~ 'no missing', TRUE ~'with missing')) %>%
group_by(missing_flag) %>%
summarise(num_of_variables = n())## # A tibble: 2 x 2
## missing_flag num_of_variables
## <chr> <int>
## 1 no missing 62
## 2 with missing 19
summary_df %>%
dplyr::mutate(missing_flag = dplyr::case_when(missing == 0 ~ 'no missing', TRUE ~'with missing')) %>%
group_by(missing_flag, dtypes) %>%
summarise(num_of_variables = n()) %>%
tidyr::spread(missing_flag, num_of_variables) %>%
dplyr::inner_join(., summary_df %>%
group_by(dtypes) %>%
summarise(num_of_variables = n()),
by = "dtypes")## # A tibble: 2 x 4
## dtypes `no missing` `with missing` num_of_variables
## <chr> <int> <int> <int>
## 1 character 27 16 43
## 2 numeric 35 3 38
# let's just look at non-missing numeric data
numFields <- summary_df %>%
dplyr::filter(percent_missing_value == 0 & dtypes == "numeric") %>%
.$fields
numFields <- setdiff(numFields, "Id")
#numFields
# corrplot()
house_price_train %>%
dplyr::select(all_of(numFields)) %>%
#GGally::ggpairs()
cor() %>%
corrplot(type = "upper")# scatterplot matrix, let's just look at c("SalePrice", "TotRmsAbvGrd", "GrLivArea")
house_price_train %>%
dplyr::select("Id", "SalePrice", "TotRmsAbvGrd", "GrLivArea") %>%
tidyr::gather(key, value, TotRmsAbvGrd:GrLivArea) %>%
ggplot(aes(x = value, y = SalePrice)) +
geom_jitter() +
geom_smooth(method = lm) +
scale_y_continuous(labels = scales::dollar_format()) +
xlab("") +
facet_wrap(~key, scales = "free")tempDf <- house_price_train %>%
dplyr::select(all_of(
summary_df %>%
filter(fields != "Id" &
percent_missing_value == 0 &
dtypes == "numeric") %>%
arrange(desc(unique_value_ct)) %>%
head(3) %>%
.$fields
))
cor.mat = cor(tempDf)
cor.mat## LotArea GrLivArea BsmtUnfSF
## LotArea 1.000000 0.2631 -0.002618
## GrLivArea 0.263116 1.0000 0.240257
## BsmtUnfSF -0.002618 0.2403 1.000000
# just get the unique pair of the two, i.e. order of the pairs doesn't matter, such as A <-> B is the same as B <-> A
loops <- data.frame(Var1 = c(1, 2, 3),
Var2 = c(2, 3, 1))
tempList <- list()
for(i in 1:nrow(loops)){
j = loops$Var1[i]
k = loops$Var2[i]
variable_1 = tempDf[[j]]
variable_2 = tempDf[[k]]
result = cor.test(variable_1, variable_2, method = "pearson", conf.level = 0.8) %>%
broom::tidy() %>%
dplyr::mutate(Var1 = j, Var2 = k)
tempList <- c(tempList, list(result))
}
tempList %>%
dplyr::bind_rows() %>%
dplyr::inner_join(., data.frame(Var1 = 1:3,
field_1 = names(tempDf))) %>%
dplyr::inner_join(., data.frame(Var2 = 1:3,
field_2 = names(tempDf))) %>%
dplyr::mutate(flag = dplyr::case_when(`p.value` < .05~ 1, TRUE ~0)) %>%
dplyr::select(-Var1, -Var2, -method, -alternative) %>%
arrange(`p.value`) %>%
kable() %>%
scroll_box()| estimate | statistic | p.value | parameter | conf.low | conf.high | field_1 | field_2 | flag |
|---|---|---|---|---|---|---|---|---|
| 0.2631 | 10.414 | 0.0000 | 1458 | 0.2316 | 0.2941 | LotArea | GrLivArea | 1 |
| 0.2403 | 9.451 | 0.0000 | 1458 | 0.2084 | 0.2716 | GrLivArea | BsmtUnfSF | 1 |
| -0.0026 | -0.100 | 0.9204 | 1458 | -0.0362 | 0.0309 | BsmtUnfSF | LotArea | 0 |
There are many missing variables from the data set, e.g. 16 out of 43 “character” variables have missing data, 3 out of 38 “numeric” variables have missing data. We only look at non-missing numeric variables in above statistics. The corplot() displays correlation among these numeric variables, and subsequently we look at scatter plot for “SalePrice” vs “TotRmsAbvGrd”, and “SalePrice” vs “GrLivArea”. These are variables that display strongly positive correlation with “SalePrice” from the corplot().
We pick out three quantitative variables, i.e. “LotArea”, “GrLivArea” & “BsmtUnfSF”, and then we look at the correlation between each pair. There is no correlation between “BsmtUnfSF” and “LotArea” as the p-value is way above .05. The 80% confidence interval of the correlation is between -0.0362 and 0.0309, a sign of no relationship between the two. On the other hand, the other two pairs display statistically significant relationship. Their small p-value (<.05) rejects the null hypothesis that the true correlation between the two is zero. Given their small p-value, we shouldn’t worry too much about familywise/type I error.
(b) Linear Algebra and Correlation
First, let’s invert the above correlation matrix
inv.cor.mat <- solve(cor.mat)
inv.cor.mat## LotArea GrLivArea BsmtUnfSF
## LotArea 1.07972 -0.3022 0.07544
## GrLivArea -0.30221 1.1459 -0.27609
## BsmtUnfSF 0.07544 -0.2761 1.06653
We multiply the correlation matrix by the precision matrix, and then it results in the identity matrix.
cor.by.prec <- cor.mat %*% inv.cor.mat
cor.by.prec <- zapsmall(cor.by.prec)
cor.by.prec## LotArea GrLivArea BsmtUnfSF
## LotArea 1 0 0
## GrLivArea 0 1 0
## BsmtUnfSF 0 0 1
We multiply the precision matrix by the correlation matrix, and again it results in the identity matrix.
prec.by.cor <- inv.cor.mat %*% cor.mat
prec.by.cor <- zapsmall(prec.by.cor)
prec.by.cor## LotArea GrLivArea BsmtUnfSF
## LotArea 1 0 0
## GrLivArea 0 1 0
## BsmtUnfSF 0 0 1
Let’s conduct the LU decomposition on the matrix.
# output the lower triangular matrix
lu.decomp <- lu.decomposition(cor.mat)
lower <- lu.decomp$L
print("output the lower triangular matrix")## [1] "output the lower triangular matrix"
lower## [,1] [,2] [,3]
## [1,] 1.000000 0.0000 0
## [2,] 0.263116 1.0000 0
## [3,] -0.002618 0.2589 1
# output the upper triangular matrix
upper <- lu.decomp$U
print("output the upper triangular matrix")## [1] "output the upper triangular matrix"
upper## [,1] [,2] [,3]
## [1,] 1 0.2631 -0.002618
## [2,] 0 0.9308 0.240946
## [3,] 0 0.0000 0.937620
We can verify the accuracy of LU decomposition by multiplying the lower and upper triangular matrices to return to initial correlation matrix. We would see identical matrices.
lower %*% upper## [,1] [,2] [,3]
## [1,] 1.000000 0.2631 -0.002618
## [2,] 0.263116 1.0000 0.240257
## [3,] -0.002618 0.2403 1.000000
cor.mat## LotArea GrLivArea BsmtUnfSF
## LotArea 1.000000 0.2631 -0.002618
## GrLivArea 0.263116 1.0000 0.240257
## BsmtUnfSF -0.002618 0.2403 1.000000
(c) Calculus-Based Probability & Statistics
Let’s begin by plotting some density curves by numeric variables with non-missing data and that has at least more than 100 unique data value (to avoid plotting ordinal or categorical variables)
# density curves by numeric variables with non-missing data and that has at least more than 100 unique data value
house_price_train %>%
dplyr::select(all_of(
summary_df %>%
dplyr::filter(percent_missing_value == 0 & dtypes == "numeric" & unique_value_ct >100) %>%
.$fields
)) %>%
tidyr::gather(key, value, -Id) %>%
ggplot(aes(value, color = as.factor(key))) +
geom_density(aes(fill = as.factor(key)), alpha = 0.5) +
geom_vline(data = aggregate(value ~ key,
data = house_price_train %>%
dplyr::select(all_of(
summary_df %>%
dplyr::filter(percent_missing_value == 0 &
dtypes == "numeric" &
unique_value_ct >100) %>%
.$fields)) %>%
tidyr::gather(key, value, -Id),
FUN = median),
aes(xintercept = value,
color = "red"),
linetype = "dashed",
size = 1) +
theme_classic() +
facet_wrap(~ key,
scales = "free",
ncol = 2) +
xlab("") +
theme(legend.position = "none")Let’s pick “OpenPorchSF”, which is heavily skewed to the right. We would shift it by doing the “plus 1” so that we can ensure the minimal value is above zero.
# OpenPorchSF, i.e. heavily skewed to the right
open.por <- house_price_train$OpenPorchSF
# shift +1, i.e. make it above zero
open.por <- open.por + 1
# take a look
summary(open.por)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 1.0 26.0 47.7 69.0 548.0
Fit an exponential probability distribution using MASS::fitdistr(), and then find the optimal value of lambda for the distribution.
# fit an exponential probability density function
exp.pdf <- MASS::fitdistr(open.por, densfun = "exponential")
# find the optimal value of lambda for this distribution
lambda <- exp.pdf$estimate
print(paste0("The optimal value of lambda is ", round(lambda, 4), "."))## [1] "The optimal value of lambda is 0.021."
Let’s generate 1000 samples using the lambda value.
# generate random numbers using the rexp() and the lambda value
set.seed(605)
hist(rexp(1000, lambda), main = "1000 samples from the exponential distribution")# find 5th and 95th percentiles of this newly fitted distribution
lo = qexp(.05, rate = lambda, lower.tail = TRUE)
up = qexp(.95, rate = lambda, lower.tail = TRUE)
print(paste0("The 5th percentile is approximately ", round(lo, 2), " whereas the 95th percentile is about ", round(up, 2), "."))## [1] "The 5th percentile is approximately 2.44 whereas the 95th percentile is about 142.78."
And we compare the above with below empirical data.
# generate a 95% confidence interval from the empirical data, assuming normality
m <- mean(open.por)
s <- sd(open.por)
se <- qnorm(0.975) * (s/sqrt(length(open.por)-1))
left <- m - se
right <- m + se
set.seed(2358)
hist(sample(open.por, 1000), main = "1000 samples from the empirical data")print(paste0("The 95% confidence interval is roughly between ", round(left, 2), " and ", round(right, 2)))## [1] "The 95% confidence interval is roughly between 44.26 and 51.06"
# run a t test to confirm the CI
t.test(open.por) %>% broom::tidy()## # A tibble: 1 x 8
## estimate statistic p.value parameter conf.low conf.high method alternative
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 47.7 27.5 2.28e-134 1459 44.3 51.1 One Sam~ two.sided
# empirical 5th and 95th percentile of the data
q <- quantile(open.por, c(0.05, 0.95)) %>% unlist %>% as.vector()
print(paste0("The empirical 5th percentile is approximately ", round(q[1], 2), " whereas the empirical 95th percentile is about ", round(q[2], 2), "."))## [1] "The empirical 5th percentile is approximately 1 whereas the empirical 95th percentile is about 176.05."
We pick the “OpenPorchSF” which is heavily skewed to the right for our use case. The values are shifted by +1 to ensure that the minimum value is above zero. We fit the exponential distribution using the transformed data. We find the optimal value of lambda for the distribution. Subsequently, we sample 1000 data points from the distribution and then plot the histogram. In contrast to the original data, the histogram shows a smaller range of values across the x-axis along with a less concentrated count along the y-axis. That is still rightly skewed, but as expected from the exponetial distribution.
In addition, the 5th and 95th percentiles from the newly transformed distribution show a smaller range [2.44, 142.78] in comparison to the empirical’s, i.e. [1, 176.05]. Moreover, the 95% CI from the empirical data suggests that the mean is between 44.26 and 51.06. We run a t.test in order to extract the CI interval (which is between 44.3 and 51.1), and that the result is extremely close to our calculation.
With the help of exponential distribution, the skewed data is now providing a more realistic outlook than the original observations.
(d) Modeling
Below is a correlation matrix that shows the correlation between a list of numeric variables that have no missing values.
# clean up df name, which I should have done in the beginning before running all the analyses
house_price_train2 <- janitor::clean_names(house_price_train)
summary_df2 = sapply(house_price_train2, class) %>%
as.data.frame() %>%
tibble::rownames_to_column() %>%
dplyr::select(fields = rowname, dtypes = ".") %>%
dplyr::mutate(dtypes = as.character(dtypes),
missing = colSums(is.na(house_price_train2)) %>% unlist %>% as.vector(),
percent_missing_value = round(missing / nrow(house_price_train2), 4),
unique_value_ct = lapply(house_price_train2, function(x) length(unique(x))) %>% unlist %>% as.vector())
# correlation matrix
house_price_train %>%
dplyr::select(all_of(
summary_df %>%
dplyr::filter(percent_missing_value == 0 &
dtypes == "numeric" &
unique_value_ct >20) %>%
.$fields)) %>%
dplyr::select(-Id) %>%
cor() %>%
as.data.frame() %>%
dplyr::mutate(flag = dplyr::case_when(abs(SalePrice) > .5 ~1, TRUE ~0)) %>%
dplyr::arrange(desc(abs(SalePrice))) %>%
kable() %>%
scroll_box()| LotArea | YearBuilt | YearRemodAdd | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | GarageArea | WoodDeckSF | OpenPorchSF | EnclosedPorch | ScreenPorch | MiscVal | SalePrice | flag | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SalePrice | 0.2638 | 0.5229 | 0.5071 | 0.3864 | -0.0114 | 0.2145 | 0.6136 | 0.6059 | 0.3193 | -0.0256 | 0.7086 | 0.6234 | 0.3244 | 0.3159 | -0.1286 | 0.1114 | -0.0212 | 1.0000 | 1 |
| GrLivArea | 0.2631 | 0.1990 | 0.2874 | 0.2082 | -0.0096 | 0.2403 | 0.4549 | 0.5660 | 0.6875 | 0.1347 | 1.0000 | 0.4690 | 0.2474 | 0.3302 | 0.0091 | 0.1015 | -0.0024 | 0.7086 | 1 |
| GarageArea | 0.1804 | 0.4790 | 0.3716 | 0.2970 | -0.0182 | 0.1833 | 0.4867 | 0.4898 | 0.1383 | -0.0676 | 0.4690 | 1.0000 | 0.2247 | 0.2414 | -0.1218 | 0.0514 | -0.0274 | 0.6234 | 1 |
| TotalBsmtSF | 0.2608 | 0.3915 | 0.2911 | 0.5224 | 0.1048 | 0.4154 | 1.0000 | 0.8195 | -0.1745 | -0.0332 | 0.4549 | 0.4867 | 0.2320 | 0.2473 | -0.0955 | 0.0845 | -0.0185 | 0.6136 | 1 |
| 1stFlrSF | 0.2995 | 0.2820 | 0.2404 | 0.4459 | 0.0971 | 0.3180 | 0.8195 | 1.0000 | -0.2026 | -0.0142 | 0.5660 | 0.4898 | 0.2355 | 0.2117 | -0.0653 | 0.0888 | -0.0211 | 0.6059 | 1 |
| YearBuilt | 0.0142 | 1.0000 | 0.5929 | 0.2495 | -0.0491 | 0.1490 | 0.3915 | 0.2820 | 0.0103 | -0.1838 | 0.1990 | 0.4790 | 0.2249 | 0.1887 | -0.3873 | -0.0504 | -0.0344 | 0.5229 | 1 |
| YearRemodAdd | 0.0138 | 0.5929 | 1.0000 | 0.1285 | -0.0678 | 0.1811 | 0.2911 | 0.2404 | 0.1400 | -0.0624 | 0.2874 | 0.3716 | 0.2057 | 0.2263 | -0.1939 | -0.0387 | -0.0103 | 0.5071 | 1 |
| BsmtFinSF1 | 0.2141 | 0.2495 | 0.1285 | 1.0000 | -0.0501 | -0.4953 | 0.5224 | 0.4459 | -0.1371 | -0.0645 | 0.2082 | 0.2970 | 0.2043 | 0.1118 | -0.1023 | 0.0620 | 0.0036 | 0.3864 | 0 |
| WoodDeckSF | 0.1717 | 0.2249 | 0.2057 | 0.2043 | 0.0679 | -0.0053 | 0.2320 | 0.2355 | 0.0922 | -0.0254 | 0.2474 | 0.2247 | 1.0000 | 0.0587 | -0.1260 | -0.0742 | -0.0096 | 0.3244 | 0 |
| 2ndFlrSF | 0.0510 | 0.0103 | 0.1400 | -0.1371 | -0.0993 | 0.0045 | -0.1745 | -0.2026 | 1.0000 | 0.0634 | 0.6875 | 0.1383 | 0.0922 | 0.2080 | 0.0620 | 0.0406 | 0.0162 | 0.3193 | 0 |
| OpenPorchSF | 0.0848 | 0.1887 | 0.2263 | 0.1118 | 0.0031 | 0.1290 | 0.2473 | 0.2117 | 0.2080 | 0.0183 | 0.3302 | 0.2414 | 0.0587 | 1.0000 | -0.0931 | 0.0743 | -0.0186 | 0.3159 | 0 |
| LotArea | 1.0000 | 0.0142 | 0.0138 | 0.2141 | 0.1112 | -0.0026 | 0.2608 | 0.2995 | 0.0510 | 0.0048 | 0.2631 | 0.1804 | 0.1717 | 0.0848 | -0.0183 | 0.0432 | 0.0381 | 0.2638 | 0 |
| BsmtUnfSF | -0.0026 | 0.1490 | 0.1811 | -0.4953 | -0.2093 | 1.0000 | 0.4154 | 0.3180 | 0.0045 | 0.0282 | 0.2403 | 0.1833 | -0.0053 | 0.1290 | -0.0025 | -0.0126 | -0.0238 | 0.2145 | 0 |
| EnclosedPorch | -0.0183 | -0.3873 | -0.1939 | -0.1023 | 0.0365 | -0.0025 | -0.0955 | -0.0653 | 0.0620 | 0.0611 | 0.0091 | -0.1218 | -0.1260 | -0.0931 | 1.0000 | -0.0829 | 0.0184 | -0.1286 | 0 |
| ScreenPorch | 0.0432 | -0.0504 | -0.0387 | 0.0620 | 0.0889 | -0.0126 | 0.0845 | 0.0888 | 0.0406 | 0.0268 | 0.1015 | 0.0514 | -0.0742 | 0.0743 | -0.0829 | 1.0000 | 0.0319 | 0.1114 | 0 |
| LowQualFinSF | 0.0048 | -0.1838 | -0.0624 | -0.0645 | 0.0148 | 0.0282 | -0.0332 | -0.0142 | 0.0634 | 1.0000 | 0.1347 | -0.0676 | -0.0254 | 0.0183 | 0.0611 | 0.0268 | -0.0038 | -0.0256 | 0 |
| MiscVal | 0.0381 | -0.0344 | -0.0103 | 0.0036 | 0.0049 | -0.0238 | -0.0185 | -0.0211 | 0.0162 | -0.0038 | -0.0024 | -0.0274 | -0.0096 | -0.0186 | 0.0184 | 0.0319 | 1.0000 | -0.0212 | 0 |
| BsmtFinSF2 | 0.1112 | -0.0491 | -0.0678 | -0.0501 | 1.0000 | -0.2093 | 0.1048 | 0.0971 | -0.0993 | 0.0148 | -0.0096 | -0.0182 | 0.0679 | 0.0031 | 0.0365 | 0.0889 | 0.0049 | -0.0114 | 0 |
We would build a simple multi-linear regression model using some of the above numeric variables. These variables have no missing values, and they are strongly correlated with the dependent variable.
# let's just look at numeric variables only and that have no missing, also strongly correlate with the DV
IVs <- house_price_train2 %>%
dplyr::select(all_of(
summary_df2 %>%
dplyr::filter(percent_missing_value == 0 &
dtypes == "numeric" &
unique_value_ct >20) %>%
.$fields)) %>%
dplyr::select(-id) %>%
cor() %>%
as.data.frame() %>%
dplyr::mutate(flag = dplyr::case_when(abs(sale_price) > .5 ~1, TRUE ~0)) %>%
tibble::rownames_to_column() %>%
dplyr::filter(flag == 1) %>%
dplyr::select(fields = rowname) %>%
.$fields
DV <- "sale_price"
IVs <- setdiff(IVs, DV)
f <- as.formula(
paste(DV,
paste(IVs, collapse = " + "),
sep = "~")
)
print(f)## sale_price ~ year_built + year_remod_add + total_bsmt_sf + x1st_flr_sf +
## gr_liv_area + garage_area
Let’s split our train data (80/20), and then build a simple lm model with the help of stepAIC().
# split dataset 80/20
set.seed(1314)
index <- sample(1:nrow(trainDf2), 0.8 * nrow(trainDf2))
train <- house_price_train2[index, ]
test <- house_price_train2[-index, ]
# build model
house_price_model = eval(bquote(lm(f, data = train)))
house_price_model.aic <- stepAIC(house_price_model, direction = "both")## Start: AIC=25162
## sale_price ~ year_built + year_remod_add + total_bsmt_sf + x1st_flr_sf +
## gr_liv_area + garage_area
##
## Df Sum of Sq RSS AIC
## - x1st_flr_sf 1 1.45e+07 1.82e+12 25160
## <none> 1.82e+12 25162
## - year_remod_add 1 8.52e+10 1.91e+12 25214
## - year_built 1 1.11e+11 1.93e+12 25230
## - garage_area 1 1.14e+11 1.94e+12 25232
## - total_bsmt_sf 1 1.52e+11 1.97e+12 25255
## - gr_liv_area 1 1.07e+12 2.90e+12 25710
##
## Step: AIC=25160
## sale_price ~ year_built + year_remod_add + total_bsmt_sf + gr_liv_area +
## garage_area
##
## Df Sum of Sq RSS AIC
## <none> 1.82e+12 25160
## + x1st_flr_sf 1 1.45e+07 1.82e+12 25162
## - year_remod_add 1 8.53e+10 1.91e+12 25212
## - year_built 1 1.12e+11 1.93e+12 25229
## - garage_area 1 1.16e+11 1.94e+12 25231
## - total_bsmt_sf 1 3.26e+11 2.15e+12 25353
## - gr_liv_area 1 1.21e+12 3.03e+12 25764
summary(house_price_model.aic)##
## Call:
## lm(formula = sale_price ~ year_built + year_remod_add + total_bsmt_sf +
## gr_liv_area + garage_area, data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -382295 -19396 -2636 15936 263534
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.91e+06 1.23e+05 -15.52 < 2e-16 ***
## year_built 4.38e+02 5.15e+01 8.51 < 2e-16 ***
## year_remod_add 5.20e+02 6.99e+01 7.44 1.9e-13 ***
## total_bsmt_sf 4.83e+01 3.32e+00 14.54 < 2e-16 ***
## gr_liv_area 7.37e+01 2.63e+00 28.05 < 2e-16 ***
## garage_area 5.94e+01 6.86e+00 8.66 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 39300 on 1183 degrees of freedom
## (32411 observations deleted due to missingness)
## Multiple R-squared: 0.76, Adjusted R-squared: 0.759
## F-statistic: 748 on 5 and 1183 DF, p-value: <2e-16
The residual chart and QQ plot look acceptable. However, we can build a better model next time by removing some outliers from our data. The below “actual” versus “predicted” sale price chart indicated that the prediction is on par with actual sale price in majority of cases, i.e. no over- or under- estimation. However, there is one extreme case that is significantly over-predicted and that drives up the RMSE. Overall, this simple model is doing a decent job with adjusted r-squared above .75.
# plot
par(mfrow = c(2, 2))
plot(house_price_model.aic)# prediction
test["pred"] = predict(house_price_model.aic, newdata = test)
par(mfrow = c(1, 1))
plot(test$sale_price/1000, test$pred/1000,
main = "Actual vs. Predicted Sale Price",
xlab = "actual sale price in '000",
ylab = "predicted sale price in '000")
abline(coef = c(0, 1), col = "red", lty = "dashed") print(paste0("The root mean square error of the model is ", round(rmse(test$sale_price, test$pred), 1), "."))## [1] "The root mean square error of the model is 52381.2."
# submission to kaggle
house_price_test2 <- janitor::clean_names(house_price_test)
sale_price_pred <- predict(house_price_model.aic, newdata = house_price_test2)
submission <- data.frame(Id = house_price_test2$id,
SalePrice = sale_price_pred) %>%
dplyr::mutate(SalePrice = dplyr::case_when(is.na(SalePrice) ~ median(SalePrice, na.rm = TRUE),
TRUE ~SalePrice))
# write output
#write.table(submission, "clipboard-16384", sep = "\t", row.names = FALSE)
# report
print("My Kaggle username is 'JimmyNg' and my score is 0.23099")## [1] "My Kaggle username is 'JimmyNg' and my score is 0.23099"