library(HSAUR2)Loading required package: tools
data(USairpollution)SUBMISSION INSTRUCTIONS
Reconsider the US air pollution data set:
library(HSAUR2)Loading required package: tools
data(USairpollution)Perform singular value decomposition of this data matrix. Then create the matrix \(D\). Describe what this matrix looks like.
library(tidyverse)── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
decomp_usair <- svd(USairpollution)
U <- decomp_usair$u
D <- decomp_usair$d
V <- decomp_usair$v
matrix(D) %>% round(2) [,1]
[1,] 7051.95
[2,] 931.12
[3,] 540.46
[4,] 92.71
[5,] 85.24
[6,] 52.95
[7,] 10.14
D is one diagonal row of measures
Verify that \(X=UDV^T\) by plotting all the entries of \(X\) versus all the entries of \(UDV^T\) with the 0/1 line.
decomp_usair$d
[1] 7051.94936 931.12109 540.46297 92.70909 85.23724 52.94650 10.14090
$u
[,1] [,2] [,3] [,4] [,5]
[1,] -0.01841623 0.10181630 0.2091659528 -0.25110469 0.0066263386
[2,] -0.03130764 0.14812091 -0.0004043841 0.04012803 -0.1905982644
[3,] -0.08887282 0.08371317 0.1077274000 0.11286840 -0.0601742081
[4,] -0.15620922 0.13142656 -0.0218222317 -0.18626470 -0.1072911475
[5,] -0.08773289 0.05962107 0.1957870856 0.10730132 0.4416222684
[6,] -0.01291040 0.08879269 0.2527249241 -0.05135988 0.0907375365
[7,] -0.67085189 -0.44475680 -0.1960048444 0.01250518 -0.0390134558
[8,] -0.09294108 -0.01139006 0.1948170257 0.15676726 0.1215014235
[9,] -0.17545555 -0.23912303 0.3179618657 0.03038291 0.0760287831
[10,] -0.08418465 0.19081886 0.0514564479 -0.13636679 0.1395184073
[11,] -0.15070376 0.07185332 -0.0365663291 0.29187169 -0.1297223626
[12,] -0.09823394 0.01139147 0.0710114610 0.13335905 -0.0063383326
[13,] -0.03259278 0.09592527 0.1275475015 0.03872776 0.0185448034
[14,] -0.26181299 0.16971507 -0.1687630504 -0.17431370 0.2119801250
[15,] -0.05684499 -0.15456872 0.3549913343 0.02027424 -0.1198671989
[16,] -0.20000968 0.25838181 -0.1905776185 0.09894074 -0.0072008199
[17,] -0.11486242 0.23903185 -0.0455339920 -0.18706856 0.0690216078
[18,] -0.07092358 0.28277441 -0.0171282263 0.01147666 -0.0659473582
[19,] -0.09086303 0.07091844 0.0715187875 0.15766454 -0.0008081356
[20,] -0.02408128 0.06783672 0.1715520408 0.22291958 -0.1173605693
[21,] -0.09209971 0.20158541 0.0215336995 -0.13557521 0.0128712818
[22,] -0.09944086 0.18088071 -0.0036728723 0.13033635 -0.0176857164
[23,] -0.05688465 0.11763506 0.1564494233 0.29019450 -0.0612479821
[24,] -0.13085554 0.06086289 0.0607319341 0.07320631 0.2351689149
[25,] -0.14590904 -0.01769799 0.1286643576 0.02563224 0.2634155187
[26,] -0.07505363 0.12556012 0.0946877786 0.09046699 0.0140792596
[27,] -0.05918143 0.12816135 0.1113978859 0.24753918 -0.0707433010
[28,] -0.04358714 0.17586706 0.0953421697 -0.09196991 -0.1024406162
[29,] -0.05525638 0.12440693 0.0679669970 0.05115091 0.0101337198
[30,] -0.36651401 -0.04696092 -0.1377379001 -0.12909685 -0.0705355671
[31,] -0.08268986 0.22052639 -0.1521429575 0.06085074 -0.3666696102
[32,] -0.08978257 0.12133335 0.1337532775 -0.37611452 0.0015668094
[33,] -0.05285937 -0.08983487 0.3220217081 -0.38751879 -0.3768644467
[34,] -0.05189261 0.09231212 0.1429985876 0.03972485 -0.0553634280
[35,] -0.03281473 0.04861043 0.1359693697 -0.02341628 -0.1041491928
[36,] -0.11947066 0.12923102 -0.0750809221 0.10050571 -0.1154311495
[37,] -0.09397514 0.11042153 0.1561483823 -0.08131274 0.2871386646
[38,] -0.13973275 -0.15746062 0.2134064701 0.07507032 -0.2142526073
[39,] -0.12269566 0.18745469 -0.0291716760 -0.09621129 -0.0278059315
[40,] -0.04244099 0.11997575 0.0613455864 0.14882510 -0.0910619703
[41,] -0.01768084 0.04715182 0.2199230323 -0.02735761 -0.1158445632
[,6] [,7]
[1,] -0.06864647 -0.037320413
[2,] -0.40490988 0.031128158
[3,] 0.13096949 -0.023647236
[4,] 0.10220939 0.073850709
[5,] -0.10577782 0.149138716
[6,] -0.03524838 -0.370395995
[7,] 0.04802022 -0.076082274
[8,] -0.02190754 -0.279945502
[9,] -0.11570639 -0.012044535
[10,] -0.01381735 -0.097205512
[11,] -0.02991912 0.156399781
[12,] -0.34818120 -0.029354640
[13,] -0.06591367 0.283310186
[14,] -0.05522494 0.000983729
[15,] 0.08213658 0.025861862
[16,] 0.14337083 0.088230468
[17,] 0.05314046 0.067170725
[18,] 0.20519591 -0.075000728
[19,] 0.01756070 0.136178129
[20,] 0.15119963 -0.027420665
[21,] 0.09139110 -0.085182339
[22,] 0.17668012 0.033167279
[23,] 0.18379454 -0.174884145
[24,] -0.10285187 0.244263428
[25,] -0.17211678 0.065181227
[26,] 0.10030727 -0.143820999
[27,] 0.22301596 -0.103966467
[28,] 0.08750682 0.150775059
[29,] -0.07569157 0.238183518
[30,] 0.10302701 -0.019828015
[31,] -0.43834073 -0.319676055
[32,] -0.04239892 -0.055859213
[33,] 0.09527252 0.211553251
[34,] 0.05736043 -0.149645189
[35,] -0.31307337 -0.016006169
[36,] -0.18681244 0.022952890
[37,] -0.06017830 -0.147571151
[38,] -0.02823273 0.023366115
[39,] 0.03641520 0.008731176
[40,] -0.08035990 0.429052213
[41,] 0.03529523 0.032059290
$v
[,1] [,2] [,3] [,4] [,5]
[1,] -0.027692428 -0.001269295 0.21181956 -0.81547957 -0.537176769
[2,] -0.034941176 0.203689421 0.30433829 0.50447279 -0.665958138
[3,] -0.650399386 -0.710043204 0.25526802 0.08728162 0.003075502
[4,] -0.754280696 0.565004847 -0.32765610 -0.06597223 0.008832095
[5,] -0.006221096 0.029842191 0.05488853 0.04457335 -0.038773823
[6,] -0.023206784 0.128489389 0.23957587 0.23548645 -0.215107991
[7,] -0.074001735 0.343099257 0.79346144 -0.10530532 0.469125309
[,6] [,7]
[1,] 0.027425245 0.0067850608
[2,] -0.391813271 -0.1147051300
[3,] -0.005336507 -0.0012272750
[4,] 0.006412732 0.0004103824
[5,] -0.107951581 0.9904111542
[6,] 0.905504478 0.0623830988
[7,] -0.118611600 -0.0445995083
(U %*% diag(D) %*% t(V)) %>% round(2) [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 46 47.6 44 116 8.8 33.36 135
[2,] 11 56.8 46 244 8.9 7.77 58
[3,] 24 61.5 368 497 9.1 48.34 115
[4,] 47 55.0 625 905 9.6 41.31 111
[5,] 11 47.1 391 463 12.4 36.11 166
[6,] 31 55.2 35 71 6.5 40.75 148
[7,] 110 50.6 3344 3369 10.4 34.44 122
[8,] 23 54.0 462 453 7.1 39.04 132
[9,] 65 49.7 1007 751 10.9 34.99 155
[10,] 26 51.5 266 540 8.6 37.01 134
[11,] 9 66.2 641 844 10.9 35.94 78
[12,] 17 51.9 454 515 9.0 12.95 86
[13,] 17 49.0 104 201 11.2 30.85 103
[14,] 35 49.9 1064 1513 10.1 30.96 129
[15,] 56 49.1 412 158 9.0 43.37 127
[16,] 10 68.9 721 1233 10.8 48.19 103
[17,] 28 52.3 361 746 9.7 38.74 121
[18,] 14 68.4 136 529 8.8 54.47 116
[19,] 14 54.5 381 507 10.0 37.00 99
[20,] 13 61.0 91 132 8.2 48.52 100
[21,] 30 55.6 291 593 8.3 43.11 123
[22,] 10 61.6 337 624 9.2 49.10 105
[23,] 10 75.5 207 335 9.0 59.80 128
[24,] 16 45.7 569 717 11.8 29.07 123
[25,] 29 43.5 699 744 10.6 25.94 137
[26,] 18 59.4 275 448 7.9 46.00 119
[27,] 9 68.3 204 361 8.4 56.77 113
[28,] 31 59.3 96 308 10.6 44.68 116
[29,] 14 51.5 181 347 10.9 30.18 98
[30,] 69 54.6 1692 1950 9.6 39.93 115
[31,] 10 70.3 213 582 6.0 7.05 36
[32,] 61 50.4 347 520 9.4 36.22 147
[33,] 94 50.0 343 179 10.6 42.75 125
[34,] 26 57.8 197 299 7.6 42.59 115
[35,] 28 51.0 137 176 8.7 15.17 89
[36,] 12 56.7 453 716 8.7 20.66 67
[37,] 29 51.1 379 531 9.4 38.79 164
[38,] 56 55.9 775 622 9.5 35.89 105
[39,] 29 57.3 434 757 9.3 38.89 111
[40,] 8 56.6 125 277 12.7 30.58 82
[41,] 36 54.0 80 80 9.0 40.25 114
Consider low-dimensional approximations of the data matrix. What is the fewest number of dimensions required to yield a correlation between the entries of \(X\) and \(\tilde X\) of at least 0.9?
decomp_usair$d
[1] 7051.94936 931.12109 540.46297 92.70909 85.23724 52.94650 10.14090
$u
[,1] [,2] [,3] [,4] [,5]
[1,] -0.01841623 0.10181630 0.2091659528 -0.25110469 0.0066263386
[2,] -0.03130764 0.14812091 -0.0004043841 0.04012803 -0.1905982644
[3,] -0.08887282 0.08371317 0.1077274000 0.11286840 -0.0601742081
[4,] -0.15620922 0.13142656 -0.0218222317 -0.18626470 -0.1072911475
[5,] -0.08773289 0.05962107 0.1957870856 0.10730132 0.4416222684
[6,] -0.01291040 0.08879269 0.2527249241 -0.05135988 0.0907375365
[7,] -0.67085189 -0.44475680 -0.1960048444 0.01250518 -0.0390134558
[8,] -0.09294108 -0.01139006 0.1948170257 0.15676726 0.1215014235
[9,] -0.17545555 -0.23912303 0.3179618657 0.03038291 0.0760287831
[10,] -0.08418465 0.19081886 0.0514564479 -0.13636679 0.1395184073
[11,] -0.15070376 0.07185332 -0.0365663291 0.29187169 -0.1297223626
[12,] -0.09823394 0.01139147 0.0710114610 0.13335905 -0.0063383326
[13,] -0.03259278 0.09592527 0.1275475015 0.03872776 0.0185448034
[14,] -0.26181299 0.16971507 -0.1687630504 -0.17431370 0.2119801250
[15,] -0.05684499 -0.15456872 0.3549913343 0.02027424 -0.1198671989
[16,] -0.20000968 0.25838181 -0.1905776185 0.09894074 -0.0072008199
[17,] -0.11486242 0.23903185 -0.0455339920 -0.18706856 0.0690216078
[18,] -0.07092358 0.28277441 -0.0171282263 0.01147666 -0.0659473582
[19,] -0.09086303 0.07091844 0.0715187875 0.15766454 -0.0008081356
[20,] -0.02408128 0.06783672 0.1715520408 0.22291958 -0.1173605693
[21,] -0.09209971 0.20158541 0.0215336995 -0.13557521 0.0128712818
[22,] -0.09944086 0.18088071 -0.0036728723 0.13033635 -0.0176857164
[23,] -0.05688465 0.11763506 0.1564494233 0.29019450 -0.0612479821
[24,] -0.13085554 0.06086289 0.0607319341 0.07320631 0.2351689149
[25,] -0.14590904 -0.01769799 0.1286643576 0.02563224 0.2634155187
[26,] -0.07505363 0.12556012 0.0946877786 0.09046699 0.0140792596
[27,] -0.05918143 0.12816135 0.1113978859 0.24753918 -0.0707433010
[28,] -0.04358714 0.17586706 0.0953421697 -0.09196991 -0.1024406162
[29,] -0.05525638 0.12440693 0.0679669970 0.05115091 0.0101337198
[30,] -0.36651401 -0.04696092 -0.1377379001 -0.12909685 -0.0705355671
[31,] -0.08268986 0.22052639 -0.1521429575 0.06085074 -0.3666696102
[32,] -0.08978257 0.12133335 0.1337532775 -0.37611452 0.0015668094
[33,] -0.05285937 -0.08983487 0.3220217081 -0.38751879 -0.3768644467
[34,] -0.05189261 0.09231212 0.1429985876 0.03972485 -0.0553634280
[35,] -0.03281473 0.04861043 0.1359693697 -0.02341628 -0.1041491928
[36,] -0.11947066 0.12923102 -0.0750809221 0.10050571 -0.1154311495
[37,] -0.09397514 0.11042153 0.1561483823 -0.08131274 0.2871386646
[38,] -0.13973275 -0.15746062 0.2134064701 0.07507032 -0.2142526073
[39,] -0.12269566 0.18745469 -0.0291716760 -0.09621129 -0.0278059315
[40,] -0.04244099 0.11997575 0.0613455864 0.14882510 -0.0910619703
[41,] -0.01768084 0.04715182 0.2199230323 -0.02735761 -0.1158445632
[,6] [,7]
[1,] -0.06864647 -0.037320413
[2,] -0.40490988 0.031128158
[3,] 0.13096949 -0.023647236
[4,] 0.10220939 0.073850709
[5,] -0.10577782 0.149138716
[6,] -0.03524838 -0.370395995
[7,] 0.04802022 -0.076082274
[8,] -0.02190754 -0.279945502
[9,] -0.11570639 -0.012044535
[10,] -0.01381735 -0.097205512
[11,] -0.02991912 0.156399781
[12,] -0.34818120 -0.029354640
[13,] -0.06591367 0.283310186
[14,] -0.05522494 0.000983729
[15,] 0.08213658 0.025861862
[16,] 0.14337083 0.088230468
[17,] 0.05314046 0.067170725
[18,] 0.20519591 -0.075000728
[19,] 0.01756070 0.136178129
[20,] 0.15119963 -0.027420665
[21,] 0.09139110 -0.085182339
[22,] 0.17668012 0.033167279
[23,] 0.18379454 -0.174884145
[24,] -0.10285187 0.244263428
[25,] -0.17211678 0.065181227
[26,] 0.10030727 -0.143820999
[27,] 0.22301596 -0.103966467
[28,] 0.08750682 0.150775059
[29,] -0.07569157 0.238183518
[30,] 0.10302701 -0.019828015
[31,] -0.43834073 -0.319676055
[32,] -0.04239892 -0.055859213
[33,] 0.09527252 0.211553251
[34,] 0.05736043 -0.149645189
[35,] -0.31307337 -0.016006169
[36,] -0.18681244 0.022952890
[37,] -0.06017830 -0.147571151
[38,] -0.02823273 0.023366115
[39,] 0.03641520 0.008731176
[40,] -0.08035990 0.429052213
[41,] 0.03529523 0.032059290
$v
[,1] [,2] [,3] [,4] [,5]
[1,] -0.027692428 -0.001269295 0.21181956 -0.81547957 -0.537176769
[2,] -0.034941176 0.203689421 0.30433829 0.50447279 -0.665958138
[3,] -0.650399386 -0.710043204 0.25526802 0.08728162 0.003075502
[4,] -0.754280696 0.565004847 -0.32765610 -0.06597223 0.008832095
[5,] -0.006221096 0.029842191 0.05488853 0.04457335 -0.038773823
[6,] -0.023206784 0.128489389 0.23957587 0.23548645 -0.215107991
[7,] -0.074001735 0.343099257 0.79346144 -0.10530532 0.469125309
[,6] [,7]
[1,] 0.027425245 0.0067850608
[2,] -0.391813271 -0.1147051300
[3,] -0.005336507 -0.0012272750
[4,] 0.006412732 0.0004103824
[5,] -0.107951581 0.9904111542
[6,] 0.905504478 0.0623830988
[7,] -0.118611600 -0.0445995083
k <- 5
(Xtilde2 <- U[,1:k] %*% diag(D[1:k]) %*% t(V[,1:k])) %>% round(2) [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 46.10 46.13 43.98 116.02 8.78 36.67 134.55
[2,] 11.59 48.44 45.89 244.14 6.27 27.16 55.47
[3,] 23.81 64.19 368.04 496.96 10.09 42.08 115.81
[4,] 46.85 57.21 625.03 904.96 9.44 36.36 111.68
[5,] 11.14 45.08 390.97 463.04 10.30 41.09 165.40
[6,] 31.08 54.04 34.99 71.01 10.02 42.67 147.61
[7,] 109.94 51.51 3344.01 3368.98 11.44 32.19 122.27
[8,] 23.05 53.22 461.99 453.01 9.79 40.27 131.74
[9,] 65.17 47.29 1006.97 751.04 10.36 40.54 154.27
[10,] 26.03 51.10 265.99 540.01 9.50 37.73 133.87
[11,] 9.03 65.76 640.99 844.01 9.16 37.28 77.88
[12,] 17.51 44.64 453.90 515.12 7.30 29.66 83.80
[13,] 17.08 47.96 103.98 201.02 7.98 33.83 102.71
[14,] 35.08 48.76 1063.98 1513.02 9.77 33.61 128.65
[15,] 55.88 50.83 412.02 157.97 9.21 39.42 127.53
[16,] 9.79 71.98 721.04 1232.95 10.73 41.26 103.94
[17,] 27.92 53.48 361.02 745.98 9.33 36.15 121.36
[18,] 13.71 72.57 136.06 528.93 10.73 44.68 117.25
[19,] 13.97 55.02 381.01 506.99 8.73 36.07 99.17
[20,] 12.78 64.10 91.04 131.95 9.34 41.29 100.94
[21,] 29.87 57.40 291.02 592.97 9.68 38.78 123.54
[22,] 9.74 65.30 337.05 623.94 9.88 40.61 106.12
[23,] 9.75 79.11 207.05 334.94 11.81 51.10 129.08
[24,] 16.13 43.85 568.97 717.03 8.76 33.85 122.46
[25,] 29.25 40.01 698.95 744.06 8.96 34.15 135.95
[26,] 17.86 61.31 275.03 447.97 9.92 41.28 119.56
[27,] 8.68 72.81 204.06 360.92 10.72 46.14 114.35
[28,] 30.86 61.29 96.03 307.97 9.59 40.39 116.62
[29,] 14.09 50.21 180.98 347.02 8.08 33.66 97.63
[30,] 68.85 56.71 1692.03 1949.97 10.39 35.00 115.64
[31,] 10.66 60.83 212.87 582.15 6.71 28.27 33.10
[32,] 61.07 49.46 346.99 520.01 9.72 38.29 146.71
[33,] 93.85 52.22 343.03 178.97 9.02 38.05 125.69
[34,] 25.93 58.82 197.01 298.98 9.43 39.93 115.29
[35,] 28.46 44.49 136.91 176.11 7.07 30.19 87.03
[36,] 12.27 52.85 452.95 716.06 7.40 29.60 65.84
[37,] 29.10 49.68 378.98 531.02 10.54 41.77 163.56
[38,] 56.04 55.34 774.99 622.01 9.10 37.23 104.83
[39,] 28.95 58.07 434.01 756.99 9.42 37.14 111.23
[40,] 8.09 55.43 124.98 277.03 7.93 34.16 81.69
[41,] 35.95 54.77 80.01 79.99 8.88 38.54 114.24
4 dimensions
Find \(\Sigma\), the covariance matrix of this data set. Then perform eigen-decomposition of this matrix. Verify that
eigen_usair <- eigen(cov(USairpollution))
val <- eigen_usair$values
vec <- eigen_usair$vectors
cov_recon <- vec %*% diag(val) %*% t(vec)
cov_orig <- cov(USairpollution)
all.equal(cov_recon, cov_orig)[1] "Attributes: < Length mismatch: comparison on first 1 components >"
In this problem we explore how “high” a low-dimensional SVD approximation of an image has to be before you can recognize it.
.Rdata objects are essentially R workspace memory snapshots that, when loaded, load any type of R object that you want into your global environment. The command below, when executed, will load three objects into your memory: mysteryU4, mysteryD4, and mysteryV4. These are the first \(k\) vectors and singular values of an SVD I performed on a 700-pixels-tall \(\times\) 600-pixels-wide image of a well-known villain.
load('Data/mystery_person_k4.Rdata')Write a function that takes SVD ingredients u, d and v and renders the \(700 \times 600\) image produced by this approximation using functions from the magick package. Use your function to determine whether a 4-dimensional approximation to this image is enough for you to tell who the mystery villain is. Recall that you will likely need to rescale your recomposed approximation so that all pixels are in [0,1].
Image_4 <- mysteryU4 %*% diag(mysteryD4) %*% t(mysteryV4)
image(t(Image_4), col=gray.colors(256), axes=FALSE)I’m giving you slightly higher-dimensional approximations (\(k=10\) and \(k=50\), respectively) in the objects below:
load('Data/mystery_person_k10.Rdata')
load('Data/mystery_person_k50.Rdata')Create both of the images produced by these approximations. At what point can you tell who the mystery villain is?
Image_50 <- mysteryU50 %*% diag(mysteryD50) %*% t(mysteryV50)
image(t(Image_50), col=gray.colors(256), axes=FALSE)Image_10 <- mysteryU10 %*% diag(mysteryD10) %*% t(mysteryV10)
image(t(Image_10), col=gray.colors(256), axes=FALSE) suspicion at 10 k Knew the villian at 50 k
How many numbers need to be stored in memory for each of the following:
total = k(700+600+1) = 1301k