##ANALYSES
Running descriptive statistics for all quantitative variables, assigning them to descriptives data frame.
library(psych)
projectdata<-read.csv("Project2Data.csv")
descriptives<-describe(projectdata[,5:30], na.rm = TRUE)
options(digits=6)
options(scipen = 999)
Determining the skewed distributions, the variables in SkewedVar do not have a normal distrbution.
SkewedVar <- subset(descriptives[descriptives$skew < -1.0 | descriptives$skew > 1.0,])
print(SkewedVar)
## vars n mean sd median trimmed mad min
## UGDS 5 6990 2332.16 5438.85 406.00 1052.09 526.32 0
## UGDS_BLACK 7 6990 0.19 0.22 0.10 0.14 0.12 0
## UGDS_HISP 8 6990 0.16 0.22 0.07 0.11 0.08 0
## UGDS_ASIAN 9 6990 0.03 0.08 0.01 0.02 0.02 0
## UGDS_AIAN 10 6990 0.01 0.07 0.00 0.00 0.00 0
## UGDS_NHPI 11 6990 0.00 0.03 0.00 0.00 0.00 0
## UGDS_2MOR 12 6990 0.02 0.03 0.02 0.02 0.03 0
## UGDS_NRA 13 6990 0.02 0.05 0.00 0.01 0.00 0
## UGDS_UNKN 14 6990 0.05 0.09 0.01 0.02 0.02 0
## PPTUG_EF 15 6969 0.23 0.25 0.15 0.19 0.22 0
## TUITFTE 19 7270 10401.08 17375.87 9015.00 9227.39 6676.15 0
## INEXPFTE 20 7270 7360.21 12726.34 5490.00 5875.23 3399.60 0
## RET_FT4 25 2293 0.71 0.20 0.74 0.73 0.15 0
## max range skew kurtosis se
## UGDS 151558.00 151558.00 6.84 104.77 65.05
## UGDS_BLACK 1.00 1.00 1.73 2.49 0.00
## UGDS_HISP 1.00 1.00 2.24 4.83 0.00
## UGDS_ASIAN 0.97 0.97 6.38 55.98 0.00
## UGDS_AIAN 1.00 1.00 11.16 136.65 0.00
## UGDS_NHPI 1.00 1.00 22.92 608.03 0.00
## UGDS_2MOR 0.53 0.53 4.11 36.08 0.00
## UGDS_NRA 0.93 0.93 8.29 101.58 0.00
## UGDS_UNKN 0.90 0.90 4.37 23.94 0.00
## PPTUG_EF 1.00 1.00 1.02 0.22 0.00
## TUITFTE 1292154.00 1292154.00 55.94 4076.80 203.79
## INEXPFTE 735077.00 735077.00 31.03 1570.11 149.26
## RET_FT4 1.00 1.00 -1.26 2.31 0.00
Determining the skewed distributions with a kurtosis > 3
SkewedKurtosisVar <- subset(SkewedVar[ SkewedVar$kurtosis > 3.0,])
print(SkewedKurtosisVar)
## vars n mean sd median trimmed mad min
## UGDS 5 6990 2332.16 5438.85 406.00 1052.09 526.32 0
## UGDS_HISP 8 6990 0.16 0.22 0.07 0.11 0.08 0
## UGDS_ASIAN 9 6990 0.03 0.08 0.01 0.02 0.02 0
## UGDS_AIAN 10 6990 0.01 0.07 0.00 0.00 0.00 0
## UGDS_NHPI 11 6990 0.00 0.03 0.00 0.00 0.00 0
## UGDS_2MOR 12 6990 0.02 0.03 0.02 0.02 0.03 0
## UGDS_NRA 13 6990 0.02 0.05 0.00 0.01 0.00 0
## UGDS_UNKN 14 6990 0.05 0.09 0.01 0.02 0.02 0
## TUITFTE 19 7270 10401.08 17375.87 9015.00 9227.39 6676.15 0
## INEXPFTE 20 7270 7360.21 12726.34 5490.00 5875.23 3399.60 0
## max range skew kurtosis se
## UGDS 151558.00 151558.00 6.84 104.77 65.05
## UGDS_HISP 1.00 1.00 2.24 4.83 0.00
## UGDS_ASIAN 0.97 0.97 6.38 55.98 0.00
## UGDS_AIAN 1.00 1.00 11.16 136.65 0.00
## UGDS_NHPI 1.00 1.00 22.92 608.03 0.00
## UGDS_2MOR 0.53 0.53 4.11 36.08 0.00
## UGDS_NRA 0.93 0.93 8.29 101.58 0.00
## UGDS_UNKN 0.90 0.90 4.37 23.94 0.00
## TUITFTE 1292154.00 1292154.00 55.94 4076.80 203.79
## INEXPFTE 735077.00 735077.00 31.03 1570.11 149.26
###QUestion 1 The variabes UGDS, UGDS_HISP, UGDS_ASIAN, UGDS_AIAN, UGDS_NHPI, UGDS_2MOR, UGDS_NRA, UGDS_UNKN, TUITFTE, INEXPFTE do not have a normal distribution bby their skewness and kurtosis. The variabes PPTUG_EF, RET_FT4 do not have a normal distribution by their skewness.
Calculating variances:
variances_data<-data.frame(descriptives[,4]*descriptives[,4])
print(variances_data)
## descriptives...4....descriptives...4.
## 1 0.699779293
## 2 130.935905752
## 3 0.043651078
## 4 17784.083824963
## 5 29581070.838806145
## 6 0.082554181
## 7 0.050492190
## 8 0.049550061
## 9 0.005654128
## 10 0.004851592
## 11 0.001080576
## 12 0.000972495
## 13 0.002492304
## 14 0.008708988
## 15 0.060708464
## 16 21805832.123101581
## 17 52883812.827343225
## 18 162884718.941394806
## 19 301920777.617115259
## 20 161959774.003822237
## 21 0.093798135
## 22 0.051049405
## 23 0.045700584
## 24 0.066115544
## 25 0.038276927
## 26 0.080706267
The variables TUITFE, INEXPFTE and UGD_NHPI present outliers. The trimmed versions of these variables are grouped in new data frames. These data frames were for seeing the difference between distributions and will not be included in the analysis (since I do not know how correct they are).
#trimming TUITFTE outliers
boxplot(projectdata$TUITFTE)
boxplot(projectdata$TUITFTE)$out
## [1] 33714 71514 52798 30191 29651 29456 35778 27468
## [9] 30407 30090 41132 30042 28151 33136 28202 31045
## [17] 29570 27323 30355 47001 28399 27427 29662 31738
## [25] 33097 37339 30675 34134 45599 28609 28347 35692
## [33] 32490 27462 30787 29384 30440 28464 34183 29248
## [41] 29286 34848 37194 38242 35867 28496 31502 27614
## [49] 43479 27753 32781 36925 28158 32693 28395 29645
## [57] 42907 31326 33907 50828 33733 37489 41051 28543
## [65] 28496 36302 47530 33960 30778 29527 51665 28408
## [73] 31367 62516 30239 30294 28155 28992 30965 37015
## [81] 32990 29609 29681 28118 32881 29707 31418 39920
## [89] 29358 27708 28282 43240 29892 58704 29924 29760
## [97] 30401 33371 34733 64887 30824 30208 27721 41862
## [105] 41508 32018 29171 29583 29420 30389 30996 27513
## [113] 31892 29439 66287 30430 29409 32051 29026 28527
## [121] 29422 31788 36119 27496 109761 29020 29130 57333
## [129] 27378 87002 34560 34269 32135 32258 27431 33544
## [137] 28816 30816 36379 34663 28482 28727 42962 148376
## [145] 30649 29574 82287 27917 30363 38322 31715 27917
## [153] 35953 41623 27940 28924 31130 97668 29469 47231
## [161] 33765 62836 30888 35407 30110 39443 29112 31573
## [169] 44185 29353 27630 46351 30696 28036 52353 29890
## [177] 30494 32575 37667 27473 39198 40481 33369 51396
## [185] 35088 35928 61752 31193 34675 45692 44996 36857
## [193] 71583 35813 31949 35312 47796 37098 37721 29853
## [201] 44096 30001 39362 27984 30964 31038 34253 32822
## [209] 57588 48680 43041 41191 30523 39119 37084 248996
## [217] 29727 29006 56456 61433 32907 43931 37575 39634
## [225] 37738 43458 43610 36985 55197 29091 36374 32206
## [233] 27734 1292154 33953 42779 70354 44918 51452 163977
## [241] 62687 28735
outliers_TUITFTE <- boxplot(projectdata$TUITFTE, plot=FALSE)$out
projectdata_TUITFE_Trimmed <- projectdata[-which(projectdata$TUITFTE %in% outliers_TUITFTE),]
boxplot(projectdata_TUITFE_Trimmed$TUITFTE)
#trimming INEXPFTE outliers
boxplot(projectdata$INEXPFTE)
boxplot(projectdata$INEXPFTE)$out
## [1] 17920 16892 21957 55110 38544 62580 17070 17174 19131 15586
## [11] 23717 92590 18988 22820 18713 40020 24195 116672 21618 18391
## [21] 18911 27210 25915 28406 15743 30078 16048 25295 21675 17405
## [31] 18464 19228 18824 25056 37082 15669 24420 22285 28832 17809
## [41] 41743 43170 26708 21002 19122 22554 33453 23628 25359 107982
## [51] 19979 17395 16368 35595 20938 31102 22663 15547 15962 28897
## [61] 18075 18531 40450 138585 19856 38813 22541 19027 15606 26316
## [71] 83779 18050 17456 26053 17132 16167 20947 39052 29847 17333
## [81] 37339 25848 28776 17894 15675 18823 27622 18608 21481 17148
## [91] 21898 26632 20942 26091 18087 23944 17782 15813 69655 17259
## [101] 16874 60533 27497 17605 22995 21627 21684 27158 19918 77339
## [111] 33355 19343 25196 20631 20571 30750 26542 27699 32388 20890
## [121] 49500 46634 40311 18884 35861 62770 55789 22711 28846 26936
## [131] 15688 17905 15593 25201 18161 26786 21543 34705 39712 23528
## [141] 46336 22728 17259 26922 19210 37432 16837 21056 18549 17879
## [151] 20532 18186 35616 28657 17005 22992 16155 18230 18233 16440
## [161] 105933 49995 26182 25102 16809 16320 17080 19530 20653 52224
## [171] 18006 20507 39856 42245 23461 21263 16368 17511 19621 31328
## [181] 20625 22975 80944 21635 23818 193088 41614 33424 45328 33646
## [191] 25879 30436 16162 17646 17346 28982 18431 25582 26430 28869
## [201] 30033 30237 19412 18834 18698 19069 17731 28983 15966 20449
## [211] 19668 16834 15806 20564 55586 72057 60181 18301 19835 26884
## [221] 15985 29919 31658 33568 19228 50756 25710 22850 26803 30754
## [231] 15698 15817 17961 64962 30078 23413 17967 20711 54483 22319
## [241] 25737 28894 16075 22200 17261 23763 115982 47557 15982 15866
## [251] 21005 64019 18558 22586 105951 27370 20198 31807 52148 18027
## [261] 80997 19474 16497 28622 21109 19976 20469 29256 64202 17872
## [271] 49018 16511 16522 21426 16267 18097 57712 22330 17221 21294
## [281] 28135 32124 18435 41171 15732 18281 33278 23184 16533 18080
## [291] 20889 39537 32856 23526 22579 79372 17794 72196 24974 40056
## [301] 16152 16035 357489 116877 85666 16749 15779 20073 20116 118456
## [311] 46499 20658 21424 24640 111574 20798 16336 16893 59753 15806
## [321] 17109 28914 21467 16287 36226 21942 33533 16988 31496 16268
## [331] 27673 20016 16430 18755 24561 19220 16815 27076 27048 26244
## [341] 22859 25033 15635 55613 24333 40582 93146 16568 16043 34563
## [351] 16715 17344 38032 22543 45489 18983 18879 20186 18446 20148
## [361] 16936 64474 19807 19263 30299 17613 28733 22571 61857 16260
## [371] 94762 30352 28611 15607 25310 16462 16486 18925 16199 37599
## [381] 19037 16521 15917 15759 42605 16550 16308 17038 20904 36492
## [391] 44494 16189 16393 38265 91469 27472 28201 26330 18283 16120
## [401] 17550 30582 18174 19040 25627 18513 16306 21167 27397 20551
## [411] 29285 15697 20566 17037 17215 38718 21433 16496 17194 22750
## [421] 20382 49593 37165 17324 17740 30169 30513 18059 38734 15612
## [431] 17747 15885 18305 24553 20455 199372 16254 33569 16775 16892
## [441] 24621 18581 21921 16467 19615 53541 22740 18963 29338 17647
## [451] 20309 735077 24175 16783 42259 19355 23999 24815 16348 38669
## [461] 37932
outliers_INEXPFTE <- boxplot(projectdata$INEXPFTE, plot=FALSE)$out
projectdata_INEXPFTE_Trimmed <- projectdata[-which(projectdata$INEXPFTE %in% outliers_INEXPFTE),]
boxplot(projectdata_INEXPFTE_Trimmed$INEXPFTE)
#trimming UGDS_NHPI outliers
boxplot(projectdata$UGDS_NHPI)
boxplot(projectdata$UGDS_NHPI)$out
## [1] 0.0096 0.0161 0.0086 0.0147 0.0109 0.0112 0.0577 0.0084 0.0400 0.0079
## [11] 0.0449 0.0625 0.0430 0.0063 0.0067 0.0103 0.0087 0.0101 0.0081 0.0084
## [21] 0.0342 0.0079 0.0098 0.0075 0.0090 0.0081 0.0085 0.0095 0.0069 0.0113
## [31] 0.0076 0.0106 0.0119 0.0073 0.0108 0.0091 0.0085 0.0158 0.0127 0.0316
## [41] 0.0345 0.0104 0.0170 0.0128 0.0095 0.0070 0.0114 0.0486 0.0231 0.0078
## [51] 0.0067 0.0064 0.0192 0.0110 0.0243 0.0073 0.0228 0.0256 0.0123 0.0071
## [61] 0.0068 0.0274 0.0069 0.0394 0.0211 0.0104 0.0495 0.0301 0.0092 0.0391
## [71] 0.0065 0.0090 0.0207 0.0120 0.0188 0.0074 0.0170 0.0185 0.0072 0.0087
## [81] 0.0063 0.0133 0.0196 0.0205 0.0164 0.0233 0.0348 0.0083 0.0067 0.0265
## [91] 0.0231 0.0105 0.0098 0.0103 0.0118 0.0068 0.0109 0.0278 0.0181 0.0271
## [101] 0.0306 0.0260 0.0067 0.0098 0.0403 0.0103 0.0208 0.0086 0.0181 0.0076
## [111] 0.0079 0.0444 0.0122 0.0096 0.0179 0.0067 0.0087 0.0065 0.0100 0.0078
## [121] 0.0200 0.0099 0.0460 0.0256 0.0073 0.0504 0.0124 0.0071 0.0082 0.0077
## [131] 0.0298 0.0112 0.0114 0.0240 0.0067 0.0120 0.0403 0.0209 0.0120 0.0182
## [141] 0.0091 0.0193 0.0123 0.0114 0.0064 0.0237 0.0138 0.0117 0.0073 0.0107
## [151] 0.0075 0.0385 0.0123 0.0107 0.0087 0.0132 0.0152 0.0070 0.0260 0.0097
## [161] 0.0070 0.0092 0.0079 0.1770 0.1587 0.1111 0.0349 0.3333 0.0230 0.0776
## [171] 0.0338 0.0842 0.0596 0.1075 0.2391 0.6144 0.0693 0.1072 0.0101 0.0132
## [181] 0.0065 0.0113 0.0131 0.0074 0.0189 0.0105 0.0082 0.0065 0.0071 0.0089
## [191] 0.0159 0.0767 0.0093 0.0086 0.0113 0.0152 0.0164 0.0080 0.0066 0.0078
## [201] 0.0159 0.0086 0.0065 0.0065 0.0076 0.0098 0.0157 0.0200 0.0167 0.0120
## [211] 0.0071 0.0082 0.0073 0.0095 0.0263 0.0104 0.0164 0.0656 0.0072 0.0086
## [221] 0.0252 0.0107 0.0075 0.0155 0.0116 0.0753 0.0455 0.0075 0.0067 0.0222
## [231] 0.0069 0.0182 0.0400 0.0109 0.0144 0.0081 0.0092 0.0091 0.0069 0.0080
## [241] 0.0093 0.0090 0.0231 0.0816 0.0070 0.0091 0.0127 0.0108 0.0962 0.0375
## [251] 0.0066 0.0064 0.0075 0.0675 0.0179 0.0082 0.0343 0.0142 0.0071 0.0082
## [261] 0.0683 0.0179 0.0070 0.0064 0.0096 0.0099 0.0088 0.0097 0.0091 0.0063
## [271] 0.0084 0.0157 0.0113 0.0278 0.0114 0.0513 0.0109 0.0082 0.0072 0.0115
## [281] 0.0114 0.0432 0.0294 0.0127 0.0138 0.0064 0.0108 0.0283 0.0102 0.1743
## [291] 0.0110 0.0066 0.0133 0.0065 0.0081 0.0097 0.0113 0.0148 0.0432 0.0066
## [301] 0.0069 0.0984 0.0069 0.0069 0.0068 0.0250 0.0110 0.0073 0.0156 0.0092
## [311] 0.0067 0.0163 0.0065 0.0161 0.0248 0.0288 0.0317 0.0111 0.0227 0.0120
## [321] 0.0082 0.0090 0.0064 0.0233 0.0105 0.0202 0.0088 0.0074 0.0167 0.0090
## [331] 0.0105 0.0063 0.0086 0.0071 0.0087 0.0148 0.0069 0.0102 0.3509 0.0100
## [341] 0.0162 0.0065 0.0071 0.0139 0.0145 0.0067 0.0066 0.0152 0.0120 0.0091
## [351] 0.0081 0.0229 0.0801 0.0102 0.0500 0.0317 0.0141 0.0323 0.0092 0.0115
## [361] 0.0691 0.0247 0.0120 0.0185 0.0234 0.0094 0.0128 0.0110 0.0066 0.0109
## [371] 0.0163 0.0095 0.0100 0.0070 0.0076 0.0086 0.0517 0.0556 0.0130 0.0173
## [381] 0.0086 0.0323 0.0142 0.0199 0.0078 0.0154 0.0110 0.0322 0.0102 0.0143
## [391] 0.0191 0.0165 0.0115 0.0077 0.0091 0.0220 0.0088 0.0163 0.0068 0.0102
## [401] 0.0097 0.0121 0.0102 0.0082 0.0092 0.0068 0.0533 0.0073 0.0111 0.9193
## [411] 0.5318 0.4946 0.5232 0.9881 0.9983 0.0179 0.0066 0.0072 0.0139 0.0300
## [421] 0.0413 0.0100 0.0282 0.0152 0.0119 0.0067 0.0076 0.0115 0.0139 0.0075
## [431] 0.0117 0.0083 0.0144 0.0171 0.0087 0.0526 0.0078 0.0099 0.0200 0.0109
## [441] 0.0091 0.6730 0.0270 0.0064 0.0078 0.0118 0.0247 0.0113 0.0195 0.0530
## [451] 0.0204 0.0992 0.0071 0.0226 0.0146 0.0155 0.0073 0.0070 0.3009 0.0152
## [461] 0.0360 0.0265 0.9917 0.0108 0.0095 0.0213 0.0098 0.0169 0.0071 0.0163
## [471] 0.0250 0.0072 0.0450 0.0082 0.0204 0.0082 0.0093 0.0162 0.0164 0.0189
## [481] 0.0078 0.0074 0.0088 0.0322 0.1189 0.0096 0.0120 0.0080 0.0119 0.0419
## [491] 0.0194 0.0158 0.0190 0.0528 0.0071 0.0256 0.0063 0.0106 0.0070 0.0130
## [501] 0.0072 0.0132 0.0081 0.0112 0.0243 0.0172 0.0513 0.0242 0.0400 0.0139
## [511] 0.0459 0.0175 0.0161 0.0096 0.0130 0.0625 0.0094 0.0303 0.0070 0.0102
## [521] 0.0098 0.0078 0.0088 0.1215 0.0139 0.0092 0.0476 0.0092 0.0073 0.0123
## [531] 0.0127 0.0075 0.0233 0.0074 0.0121 0.0095 0.0098 0.0068 0.0132 0.0113
## [541] 0.0423 0.0092 0.0065 0.0891 0.0511 0.0121 0.0183 0.0073 0.0080 0.0508
## [551] 0.0459 0.0063 0.0098 0.0225 0.0063 0.0119 0.0298 0.0109 0.0099 0.0250
## [561] 0.0089 0.0263 0.0127 0.0170 0.0253 0.0471 0.9538 0.0347 0.0500 0.0173
## [571] 0.0139 0.0101 0.0117 0.0228 0.0095 0.0283 0.0108 0.0120 0.0317 0.0163
## [581] 0.0092 0.0304 0.0070 0.0080 0.0076 0.0103 0.0455 0.0349 0.0122 0.0200
## [591] 0.0095 0.0273 0.0088 0.0133 0.0667 0.0180 0.0080 0.0471 0.0143 0.0064
## [601] 0.0268 0.0143 0.0063 0.0626 0.0358 0.0063 0.0071 0.0076 0.0112 0.0136
## [611] 0.0134 0.0065 0.0112 0.0072 0.0212 0.0171 0.0156 0.0400 0.0068 0.0090
## [621] 0.0212 0.0201 0.0087 0.0401 0.0185 0.0086 0.0067 0.0119 0.0087 0.0132
## [631] 0.0152 0.0155 0.0223 0.0321 0.0068 0.0216 0.0526 0.0111 0.0226 0.0085
## [641] 0.0072 0.0188 0.0238 0.0076 0.0182 0.0160 0.0094 0.0256 0.0142 0.0096
## [651] 0.0093 0.0080 0.0200 0.0133 0.0099 0.0108 0.0241 0.0070 0.0121 0.0075
## [661] 0.0169 0.0500 0.0109 0.0256 0.0093 0.0107 0.0094 0.0080 0.0273 0.0271
## [671] 0.0168 0.0076 0.0084 0.0147 0.0117 0.0204 0.0086 0.0303 0.0105 0.0085
## [681] 0.0078 0.0088 0.0096 0.0132 0.0091 0.0080 0.0093 0.0229 0.0131 0.0283
## [691] 0.0310 0.0063 0.0167 0.0164 0.0241 0.0724 0.0096 0.0379 0.0502 0.0107
## [701] 0.0159 0.0072 0.0112 0.0196 0.0185 0.0067 0.0110 0.0105 0.0075 0.0094
## [711] 0.0114 0.0086 0.0108 0.0244 0.0119 0.2000 0.0202 0.0677 0.0262 0.0082
## [721] 0.0125 0.0263 0.0087 0.0088 0.1771 0.0101 0.0230 0.0084 0.0136 0.0127
## [731] 0.0100 0.0132 0.0328 0.0179 0.0196 0.0254 0.0294 0.0286 0.0085 0.0158
## [741] 0.0078 0.0106 0.0065 0.0066 0.0150 0.0116 0.0080 0.0210 0.0197 0.0367
## [751] 0.0181 0.0083 0.0179 0.0118 0.0500 0.0444 0.0129 0.0323 0.0097 0.0119
## [761] 0.0170 0.0095 0.3333 0.0154 0.0196 0.0081 0.0111 0.0086 0.1250 0.0227
## [771] 0.0642 0.0238 0.3425 0.0233 0.0333 0.0090 0.0182 0.0092 0.0169 0.0320
## [781] 0.0176 0.0134 0.0103 0.0189 0.0185 0.0129 0.0100 0.0107 0.0130 0.0204
## [791] 0.0091 0.0091 0.0157 0.0101 0.0114 0.0070 0.0094 0.0066 0.0063 0.0476
## [801] 0.0207 0.0072 0.0066 0.0084 0.0140 0.0208 0.0267 0.0357 0.0676 0.0175
## [811] 0.0143 0.0159 0.0217 0.0093 0.0543 0.0200 0.0847 0.0175 0.0138 0.0063
## [821] 0.0151 0.0063 0.0156 0.0080 0.0159 0.0066 0.0133 0.0256 0.0690 0.0366
## [831] 0.0139 0.0341 0.0136 0.0089 0.0112 0.0089 0.0156 0.0519 0.0086 0.0094
## [841] 0.0156 0.0066 0.0085 0.0253 0.0067 0.0588 0.0108 0.0206 0.0084 0.0074
## [851] 0.0086 0.0242 0.0089 0.0345 0.0294 0.0632 0.0526 0.0068 0.0173 0.0089
## [861] 0.0199 0.0105 0.0200 0.1842 0.0132 0.0149
outliers_UGDS_NHPI <- boxplot(projectdata$UGDS_NHPI, plot=FALSE)$out
projectdata_UGDS_NHPI_Trimmed <- projectdata[-which(projectdata$UGDS_NHPI %in% outliers_UGDS_NHPI),]
boxplot(projectdata_UGDS_NHPI_Trimmed$UGDS_NHPI)
###Question 2
Finding scatterplot matrix for the variables C150_4, SAT_AVG, UGDS_WHITE, PCTFLOAN
pairs(~ C150_4 + SAT_AVG + UGDS_WHITE + PCTFLOAN, data = projectdata, row1attop=FALSE)
The scatterplot matrix suggests that there might be a positive linear relation between C150_4 and SAT_AVG and a negative linear relation between SAT_AVG and PCTFLOAN. The relation of the former appears to be stronger. The other variables do not present a significant relation.
###Question 3 Findingd covariance matrix:
projectdata_matrix_quant<-data.matrix(projectdata[7:30], rownames.force = NA)
covar_matrix<-cov(projectdata_matrix_quant, y = projectdata_matrix_quant, use = "pairwise.complete.obs", method = "pearson")
print(covar_matrix)
## ADM_RATE SAT_AVG UGDS UGDS_WHITE
## ADM_RATE 0.043651078 -9.1520820 -168.15310 0.005891843
## SAT_AVG -9.152081974 17784.0838250 243791.75977 6.366862739
## UGDS -168.153102500 243791.7597651 29581070.83881 6.762644608
## UGDS_WHITE 0.005891843 6.3668627 6.76264 0.082554181
## UGDS_BLACK -0.001159646 -11.6270340 -119.05278 -0.032796138
## UGDS_HISP 0.000754761 -0.3110483 36.36321 -0.035446161
## UGDS_ASIAN -0.002837813 3.7112738 52.37768 -0.004642505
## UGDS_AIAN 0.000320177 -0.3600650 -14.31268 -0.001798072
## UGDS_NHPI 0.000249786 -0.0712613 -2.65967 -0.000876311
## UGDS_2MOR -0.000380094 0.7298147 17.13435 -0.000212685
## UGDS_NRA -0.002637014 2.2638397 30.74179 -0.000971644
## UGDS_UNKN -0.000202460 -0.7023401 -5.01941 -0.005300157
## PPTUG_EF 0.002788057 -5.8663104 217.06344 -0.001857259
## NPT4_PUB 22.287908711 154640.1284935 5346029.39297 260.162736663
## NPT4_PRIV -162.639885169 314374.7201384 3119630.40280 81.207081643
## COSTT4_A -746.514831982 925104.2174771 -13647207.01894 295.148815436
## TUITFTE -215.481576282 467680.9591030 -4938152.91700 -53.860150986
## INEXPFTE -669.256980891 735007.6231667 2801231.36981 228.723734132
## PFTFAC -0.010183456 5.2872382 94.63431 0.014265993
## PCTPELL 0.010327329 -13.7517582 -319.31466 -0.022802708
## C150_4 -0.012291160 18.6214134 212.97709 0.013378217
## PFTFTUG1_EF -0.002831299 10.4913317 -113.23971 0.005217475
## RET_FT4 -0.006093242 11.1577961 236.65239 0.011387344
## PCTFLOAN 0.004891957 -11.5521207 -281.30726 0.000890360
## UGDS_BLACK UGDS_HISP UGDS_ASIAN UGDS_AIAN
## ADM_RATE -0.001159646 0.000754761 -0.002837813 0.000320177
## SAT_AVG -11.627033986 -0.311048251 3.711273821 -0.360064984
## UGDS -119.052780239 36.363207141 52.377679715 -14.312681861
## UGDS_WHITE -0.032796138 -0.035446161 -0.004642505 -0.001798072
## UGDS_BLACK 0.050492190 -0.010256731 -0.002185419 -0.001476804
## UGDS_HISP -0.010256731 0.049550061 0.000552435 -0.000922533
## UGDS_ASIAN -0.002185419 0.000552435 0.005654128 -0.000255289
## UGDS_AIAN -0.001476804 -0.000922533 -0.000255289 0.004851592
## UGDS_NHPI -0.000372339 -0.000116499 0.000258271 -0.000012265
## UGDS_2MOR -0.000678691 -0.000615635 0.000250582 -0.000037438
## UGDS_NRA -0.001361128 -0.000636489 0.000609475 -0.000118046
## UGDS_UNKN -0.001174675 -0.001946117 -0.000207806 -0.000217406
## PPTUG_EF 0.001450254 -0.000362191 0.000718448 0.000702498
## NPT4_PUB -24.657390012 -204.326024722 7.956597567 -50.327450792
## NPT4_PRIV -48.454593345 -322.347557680 44.368208662 -19.539684747
## COSTT4_A -189.039018340 -465.880822029 116.110053363 -120.299835993
## TUITFTE 55.258771677 -178.241323861 33.395045806 -53.976849602
## INEXPFTE -177.902368858 -210.801636229 59.561253236 8.012348373
## PFTFAC -0.008998135 -0.006329072 -0.000339975 0.001005123
## PCTPELL 0.017991040 0.010375593 -0.002565856 0.000186754
## C150_4 -0.013754691 -0.002437542 0.003741012 -0.001613990
## PFTFTUG1_EF -0.003383117 -0.001987702 -0.000371023 -0.000431654
## RET_FT4 -0.012263766 -0.000788216 0.002978017 -0.000813807
## PCTFLOAN 0.013667358 -0.011181201 -0.003315926 -0.002840080
## UGDS_NHPI UGDS_2MOR UGDS_NRA UGDS_UNKN
## ADM_RATE 0.0002497858 -0.0003800944 -0.0026370141 -0.0002024596
## SAT_AVG -0.0712613221 0.7298147184 2.2638396660 -0.7023401403
## UGDS -2.6596659709 17.1343532577 30.7417895619 -5.0194075289
## UGDS_WHITE -0.0008763112 -0.0002126852 -0.0009716440 -0.0053001574
## UGDS_BLACK -0.0003723387 -0.0006786910 -0.0013611281 -0.0011746753
## UGDS_HISP -0.0001164985 -0.0006156348 -0.0006364885 -0.0019461172
## UGDS_ASIAN 0.0002582712 0.0002505822 0.0006094753 -0.0002078060
## UGDS_AIAN -0.0000122650 -0.0000374380 -0.0001180458 -0.0002174059
## UGDS_NHPI 0.0010805755 0.0000674821 -0.0000104207 -0.0000139173
## UGDS_2MOR 0.0000674821 0.0009724945 0.0000467315 0.0002310853
## UGDS_NRA -0.0000104207 0.0000467315 0.0024923038 -0.0000347095
## UGDS_UNKN -0.0000139173 0.0002310853 -0.0000347095 0.0087089875
## PPTUG_EF 0.0000935616 0.0002083970 -0.0009876967 0.0000339962
## NPT4_PUB -17.9221260879 5.1570790469 21.8246296769 2.1374055035
## NPT4_PRIV -0.0893201556 43.8284227786 92.2463517510 127.9576167144
## COSTT4_A -25.8975956709 38.5875710136 213.3744366696 133.4981393558
## TUITFTE -2.3474377591 24.1699733818 74.0864556336 97.8220751594
## INEXPFTE -0.5790653933 19.2105469156 73.9838572128 -0.9153065496
## PFTFAC -0.0000151138 0.0001831850 0.0019258589 -0.0017790384
## PCTPELL 0.0002954484 -0.0007441574 -0.0031394392 0.0004440981
## C150_4 -0.0000527187 0.0006519264 0.0029001986 -0.0029494838
## PFTFTUG1_EF -0.0000365025 -0.0000549873 0.0023148606 -0.0012676337
## RET_FT4 -0.0001330618 0.0004767810 0.0026655226 -0.0035090560
## PCTFLOAN -0.0004383001 0.0004538808 -0.0013690732 0.0042744289
## PPTUG_EF NPT4_PUB NPT4_PRIV
## ADM_RATE 0.0027880573 22.28791 -162.6398852
## SAT_AVG -5.8663104270 154640.12849 314374.7201384
## UGDS 217.0634360336 5346029.39297 3119630.4028008
## UGDS_WHITE -0.0018572589 260.16274 81.2070816
## UGDS_BLACK 0.0014502540 -24.65739 -48.4545933
## UGDS_HISP -0.0003621910 -204.32602 -322.3475577
## UGDS_ASIAN 0.0007184478 7.95660 44.3682087
## UGDS_AIAN 0.0007024979 -50.32745 -19.5396847
## UGDS_NHPI 0.0000935616 -17.92213 -0.0893202
## UGDS_2MOR 0.0002083970 5.15708 43.8284228
## UGDS_NRA -0.0009876967 21.82463 92.2463518
## UGDS_UNKN 0.0000339962 2.13741 127.9576167
## PPTUG_EF 0.0607084636 -494.74628 -33.1514380
## NPT4_PUB -494.7462837076 21805832.12310 NA
## NPT4_PRIV -33.1514380193 NA 52883812.8273432
## COSTT4_A -1464.0351911282 20420278.27211 60259561.8222371
## TUITFTE -214.9796535216 11215184.60953 29011477.6565112
## INEXPFTE -79.9221479595 6239926.76979 12658912.3186321
## PFTFAC -0.0175401442 292.43019 -69.2458067
## PCTPELL -0.0073582758 -82.78810 -499.5590628
## C150_4 -0.0171589736 450.89530 520.9487913
## PFTFTUG1_EF -0.0427906546 359.34794 236.9126486
## RET_FT4 -0.0137613119 161.13032 293.7806259
## PCTFLOAN -0.0160746129 729.64347 525.8665706
## COSTT4_A TUITFTE INEXPFTE PFTFAC
## ADM_RATE -746.5148 -215.48158 -669.256981 -0.0101834565
## SAT_AVG 925104.2175 467680.95910 735007.623167 5.2872381713
## UGDS -13647207.0189 -4938152.91700 2801231.369813 94.6343055005
## UGDS_WHITE 295.1488 -53.86015 228.723734 0.0142659935
## UGDS_BLACK -189.0390 55.25877 -177.902369 -0.0089981355
## UGDS_HISP -465.8808 -178.24132 -210.801636 -0.0063290716
## UGDS_ASIAN 116.1101 33.39505 59.561253 -0.0003399748
## UGDS_AIAN -120.2998 -53.97685 8.012348 0.0010051229
## UGDS_NHPI -25.8976 -2.34744 -0.579065 -0.0000151138
## UGDS_2MOR 38.5876 24.16997 19.210547 0.0001831850
## UGDS_NRA 213.3744 74.08646 73.983857 0.0019258589
## UGDS_UNKN 133.4981 97.82208 -0.915307 -0.0017790384
## PPTUG_EF -1464.0352 -214.97965 -79.922148 -0.0175401442
## NPT4_PUB 20420278.2721 11215184.60953 6239926.769789 292.4301896586
## NPT4_PRIV 60259561.8222 29011477.65651 12658912.318632 -69.2458066978
## COSTT4_A 162884718.9414 75891018.51725 38078362.824948 498.6624125201
## TUITFTE 75891018.5173 301920777.61712 161489240.074271 202.8546875181
## INEXPFTE 38078362.8249 161489240.07427 161959774.003822 732.1314752178
## PFTFAC 498.6624 202.85469 732.131475 0.0937981354
## PCTPELL -686.4275 -208.65408 -566.606051 -0.0172571941
## C150_4 1501.5868 555.91617 763.661161 0.0186312462
## PFTFTUG1_EF 1522.7458 746.49789 450.675307 0.0170839536
## RET_FT4 809.5945 266.67038 530.979354 0.0159396898
## PCTFLOAN 1551.4418 664.30383 -178.500531 -0.0036342407
## PCTPELL C150_4 PFTFTUG1_EF RET_FT4
## ADM_RATE 0.010327329 -0.0122911600 -0.0028312987 -0.006093242
## SAT_AVG -13.751758181 18.6214133765 10.4913316735 11.157796086
## UGDS -319.314661688 212.9770901298 -113.2397061814 236.652385191
## UGDS_WHITE -0.022802708 0.0133782167 0.0052174752 0.011387344
## UGDS_BLACK 0.017991040 -0.0137546905 -0.0033831173 -0.012263766
## UGDS_HISP 0.010375593 -0.0024375422 -0.0019877022 -0.000788216
## UGDS_ASIAN -0.002565856 0.0037410123 -0.0003710226 0.002978017
## UGDS_AIAN 0.000186754 -0.0016139900 -0.0004316542 -0.000813807
## UGDS_NHPI 0.000295448 -0.0000527187 -0.0000365025 -0.000133062
## UGDS_2MOR -0.000744157 0.0006519264 -0.0000549873 0.000476781
## UGDS_NRA -0.003139439 0.0029001986 0.0023148606 0.002665523
## UGDS_UNKN 0.000444098 -0.0029494838 -0.0012676337 -0.003509056
## PPTUG_EF -0.007358276 -0.0171589736 -0.0427906546 -0.013761312
## NPT4_PUB -82.788101963 450.8952965380 359.3479413797 161.130321649
## NPT4_PRIV -499.559062819 520.9487912600 236.9126486444 293.780625852
## COSTT4_A -686.427546284 1501.5867861709 1522.7458450175 809.594518252
## TUITFTE -208.654081105 555.9161716995 746.4978857114 266.670382196
## INEXPFTE -566.606050543 763.6611606261 450.6753065749 530.979353542
## PFTFAC -0.017257194 0.0186312462 0.0170839536 0.015939690
## PCTPELL 0.051049405 -0.0229234103 -0.0014985256 -0.018235528
## C150_4 -0.022923410 0.0457005844 0.0217898121 0.023206013
## PFTFTUG1_EF -0.001498526 0.0217898121 0.0661155441 0.018617523
## RET_FT4 -0.018235528 0.0232060134 0.0186175229 0.038276927
## PCTFLOAN 0.026869500 -0.0044591631 0.0156008719 -0.010957764
## PCTFLOAN
## ADM_RATE 0.004891957
## SAT_AVG -11.552120720
## UGDS -281.307264777
## UGDS_WHITE 0.000890360
## UGDS_BLACK 0.013667358
## UGDS_HISP -0.011181201
## UGDS_ASIAN -0.003315926
## UGDS_AIAN -0.002840080
## UGDS_NHPI -0.000438300
## UGDS_2MOR 0.000453881
## UGDS_NRA -0.001369073
## UGDS_UNKN 0.004274429
## PPTUG_EF -0.016074613
## NPT4_PUB 729.643474564
## NPT4_PRIV 525.866570647
## COSTT4_A 1551.441788778
## TUITFTE 664.303830814
## INEXPFTE -178.500530954
## PFTFAC -0.003634241
## PCTPELL 0.026869500
## C150_4 -0.004459163
## PFTFTUG1_EF 0.015600872
## RET_FT4 -0.010957764
## PCTFLOAN 0.080706267
Finding corelation matrix:
correlation_matrix<-cor(projectdata_matrix_quant, y = projectdata_matrix_quant, use = "pairwise.complete.obs", method = "pearson")
print(correlation_matrix)
## ADM_RATE SAT_AVG UGDS UGDS_WHITE UGDS_BLACK
## ADM_RATE 1.0000000 -0.3543624 -0.12727960 0.11168300 -0.0290549
## SAT_AVG -0.3543624 1.0000000 0.24961110 0.21669481 -0.4694846
## UGDS -0.1272796 0.2496111 1.00000000 0.00432753 -0.0974138
## UGDS_WHITE 0.1116830 0.2166948 0.00432753 1.00000000 -0.5079735
## UGDS_BLACK -0.0290549 -0.4694846 -0.09741380 -0.50797351 1.0000000
## UGDS_HISP 0.0203580 -0.0207152 0.03003538 -0.55421410 -0.2050571
## UGDS_ASIAN -0.2185492 0.4657344 0.12807268 -0.21488201 -0.1293421
## UGDS_AIAN 0.0465511 -0.0835808 -0.03778087 -0.08984532 -0.0943558
## UGDS_NHPI 0.0312677 -0.0727155 -0.01487622 -0.09278148 -0.0504079
## UGDS_2MOR -0.0712327 0.2437690 0.10102229 -0.02373690 -0.0968537
## UGDS_NRA -0.2261192 0.3609904 0.11321962 -0.06773875 -0.1213351
## UGDS_UNKN -0.0153158 -0.1044088 -0.00988921 -0.19766756 -0.0560173
## PPTUG_EF 0.0790390 -0.3470369 0.16177720 -0.02627129 0.0261768
## NPT4_PUB 0.0278255 0.3416979 0.14680391 0.21739600 -0.0298171
## NPT4_PRIV -0.1054266 0.3629225 0.13204619 0.03835043 -0.0278392
## COSTT4_A -0.2775023 0.5211713 -0.15917285 0.08629270 -0.0715568
## TUITFTE -0.1414303 0.5252813 -0.05231641 -0.01081433 0.0141871
## INEXPFTE -0.4089865 0.6351096 0.04265807 0.06601190 -0.0656525
## PFTFAC -0.1725667 0.1689907 0.04540840 0.17884731 -0.1470991
## PCTPELL 0.2448835 -0.7122496 -0.25943897 -0.35141655 0.3540659
## C150_4 -0.3201865 0.8014783 0.13824336 0.23617301 -0.3172370
## PFTFTUG1_EF -0.0590828 0.3992876 -0.06883520 0.07690362 -0.0645547
## RET_FT4 -0.2102650 0.7419929 0.16824880 0.21713437 -0.3053423
## PCTFLOAN 0.1076242 -0.5103375 -0.18178246 0.01091330 0.2139277
## UGDS_HISP UGDS_ASIAN UGDS_AIAN UGDS_NHPI UGDS_2MOR
## ADM_RATE 0.02035797 -0.2185492 0.0465511 0.031267683 -0.07123272
## SAT_AVG -0.02071523 0.4657344 -0.0835808 -0.072715506 0.24376903
## UGDS 0.03003538 0.1280727 -0.0377809 -0.014876220 0.10102229
## UGDS_WHITE -0.55421410 -0.2148820 -0.0898453 -0.092781475 -0.02373690
## UGDS_BLACK -0.20505708 -0.1293421 -0.0943558 -0.050407889 -0.09685375
## UGDS_HISP 1.00000000 0.0330047 -0.0595001 -0.015921018 -0.08868649
## UGDS_ASIAN 0.03300474 1.0000000 -0.0487424 0.104487793 0.10686212
## UGDS_AIAN -0.05950010 -0.0487424 1.0000000 -0.005356698 -0.01723561
## UGDS_NHPI -0.01592102 0.1044878 -0.0053567 1.000000000 0.06582903
## UGDS_2MOR -0.08868649 0.1068621 -0.0172356 0.065829028 1.00000000
## UGDS_NRA -0.05727538 0.1623577 -0.0339475 -0.006349936 0.03001691
## UGDS_UNKN -0.09368349 -0.0296136 -0.0334461 -0.004536746 0.07940446
## PPTUG_EF -0.00659837 0.0391123 0.0408755 0.011534641 0.02711095
## NPT4_PUB -0.24145384 0.0294243 -0.0978197 -0.077583855 0.03663669
## NPT4_PRIV -0.18494949 0.0824256 -0.0605881 -0.000501632 0.19581515
## COSTT4_A -0.18425313 0.1582516 -0.1107736 -0.050629060 0.10983277
## TUITFTE -0.04620914 0.0257312 -0.0446480 -0.004114318 0.04468052
## INEXPFTE -0.07855471 0.0659662 0.0095265 -0.001458849 0.05104587
## PFTFAC -0.10325707 -0.0193071 0.0380120 -0.001318111 0.02074955
## PCTPELL 0.20614434 -0.1522731 0.0118459 0.039706884 -0.10555957
## C150_4 -0.06106946 0.2888132 -0.1082722 -0.007114467 0.12051172
## PFTFTUG1_EF -0.04066460 -0.0246686 -0.0186501 -0.003444423 -0.00792163
## RET_FT4 -0.02128777 0.2524328 -0.0626292 -0.018949487 0.09734256
## PCTFLOAN -0.17668579 -0.1565130 -0.1432786 -0.046850088 0.05120700
## UGDS_NRA UGDS_UNKN PPTUG_EF NPT4_PUB NPT4_PRIV
## ADM_RATE -0.22611922 -0.015315791 0.07903904 0.0278255 -0.105426630
## SAT_AVG 0.36099040 -0.104408770 -0.34703695 0.3416979 0.362922516
## UGDS 0.11321962 -0.009889212 0.16177720 0.1468039 0.132046190
## UGDS_WHITE -0.06773875 -0.197667558 -0.02627129 0.2173960 0.038350432
## UGDS_BLACK -0.12133509 -0.056017266 0.02617678 -0.0298171 -0.027839186
## UGDS_HISP -0.05727538 -0.093683489 -0.00659837 -0.2414538 -0.184949487
## UGDS_ASIAN 0.16235768 -0.029613622 0.03911231 0.0294243 0.082425571
## UGDS_AIAN -0.03394752 -0.033446091 0.04087551 -0.0978197 -0.060588083
## UGDS_NHPI -0.00634994 -0.004536746 0.01153464 -0.0775839 -0.000501632
## UGDS_2MOR 0.03001691 0.079404465 0.02711095 0.0366367 0.195815154
## UGDS_NRA 1.00000000 -0.007450126 -0.08280563 0.1859134 0.247282716
## UGDS_UNKN -0.00745013 1.000000000 0.00147759 0.0119817 0.166799674
## PPTUG_EF -0.08280563 0.001477585 1.00000000 -0.4557973 -0.022019218
## NPT4_PUB 0.18591339 0.011981654 -0.45579731 1.0000000 NA
## NPT4_PRIV 0.24728272 0.166799674 -0.02201922 NA 1.000000000
## COSTT4_A 0.31615029 0.115186024 -0.48886305 0.9227081 0.708253429
## TUITFTE 0.08550662 0.060405858 -0.05028508 0.5806977 0.552026283
## INEXPFTE 0.12273734 -0.000812433 -0.02684930 0.2999074 0.269072986
## PFTFAC 0.11459295 -0.076987123 -0.23922490 0.2348420 -0.031521872
## PCTPELL -0.28692824 0.021042973 -0.13224343 -0.1060222 -0.308497254
## C150_4 0.23192301 -0.144386225 -0.40474959 0.5679805 0.336135199
## PFTFTUG1_EF 0.15824294 -0.063471918 -0.71555914 0.4000409 0.121247282
## RET_FT4 0.21675128 -0.193997676 -0.36163965 0.3238603 0.190148653
## PCTFLOAN -0.09951822 0.161087155 -0.22986218 0.6133979 0.291874643
## COSTT4_A TUITFTE INEXPFTE PFTFAC PCTPELL
## ADM_RATE -0.2775023 -0.14143032 -0.408986460 -0.17256667 0.2448835
## SAT_AVG 0.5211713 0.52528127 0.635109575 0.16899071 -0.7122496
## UGDS -0.1591728 -0.05231641 0.042658066 0.04540840 -0.2594390
## UGDS_WHITE 0.0862927 -0.01081433 0.066011899 0.17884731 -0.3514166
## UGDS_BLACK -0.0715568 0.01418705 -0.065652524 -0.14709910 0.3540659
## UGDS_HISP -0.1842531 -0.04620914 -0.078554714 -0.10325707 0.2061443
## UGDS_ASIAN 0.1582516 0.02573125 0.065966199 -0.01930712 -0.1522731
## UGDS_AIAN -0.1107736 -0.04464802 0.009526498 0.03801198 0.0118459
## UGDS_NHPI -0.0506291 -0.00411432 -0.001458849 -0.00131811 0.0397069
## UGDS_2MOR 0.1098328 0.04468052 0.051045871 0.02074955 -0.1055596
## UGDS_NRA 0.3161503 0.08550662 0.122737340 0.11459295 -0.2869282
## UGDS_UNKN 0.1151860 0.06040586 -0.000812433 -0.07698712 0.0210430
## PPTUG_EF -0.4888631 -0.05028508 -0.026849302 -0.23922490 -0.1322434
## NPT4_PUB 0.9227081 0.58069767 0.299907392 0.23484204 -0.1060222
## NPT4_PRIV 0.7082534 0.55202628 0.269072986 -0.03152187 -0.3084973
## COSTT4_A 1.0000000 0.76021421 0.455454439 0.12116344 -0.2527832
## TUITFTE 0.7602142 1.00000000 0.730286898 0.03024187 -0.0532675
## INEXPFTE 0.4554544 0.73028690 1.000000000 0.15337005 -0.2077151
## PFTFAC 0.1211634 0.03024187 0.153370047 1.00000000 -0.2773455
## PCTPELL -0.2527832 -0.05326752 -0.207715114 -0.27734550 1.0000000
## C150_4 0.5508938 0.37689253 0.470649399 0.29737479 -0.5196974
## PFTFTUG1_EF 0.4500205 0.36173404 0.266256095 0.22371045 -0.0282553
## RET_FT4 0.3208939 0.19783707 0.350911976 0.30244389 -0.4661687
## PCTFLOAN 0.4346896 0.13489866 -0.052051315 -0.04399367 0.4186110
## C150_4 PFTFTUG1_EF RET_FT4 PCTFLOAN
## ADM_RATE -0.32018650 -0.05908279 -0.2102650 0.1076242
## SAT_AVG 0.80147835 0.39928761 0.7419929 -0.5103375
## UGDS 0.13824336 -0.06883520 0.1682488 -0.1817825
## UGDS_WHITE 0.23617301 0.07690362 0.2171344 0.0109133
## UGDS_BLACK -0.31723700 -0.06455469 -0.3053423 0.2139277
## UGDS_HISP -0.06106946 -0.04066460 -0.0212878 -0.1766858
## UGDS_ASIAN 0.28881316 -0.02466865 0.2524328 -0.1565130
## UGDS_AIAN -0.10827221 -0.01865012 -0.0626292 -0.1432786
## UGDS_NHPI -0.00711447 -0.00344442 -0.0189495 -0.0468501
## UGDS_2MOR 0.12051172 -0.00792163 0.0973426 0.0512070
## UGDS_NRA 0.23192301 0.15824294 0.2167513 -0.0995182
## UGDS_UNKN -0.14438622 -0.06347192 -0.1939977 0.1610872
## PPTUG_EF -0.40474959 -0.71555914 -0.3616396 -0.2298622
## NPT4_PUB 0.56798052 0.40004087 0.3238603 0.6133979
## NPT4_PRIV 0.33613520 0.12124728 0.1901487 0.2918746
## COSTT4_A 0.55089376 0.45002045 0.3208939 0.4346896
## TUITFTE 0.37689253 0.36173404 0.1978371 0.1348987
## INEXPFTE 0.47064940 0.26625610 0.3509120 -0.0520513
## PFTFAC 0.29737479 0.22371045 0.3024439 -0.0439937
## PCTPELL -0.51969735 -0.02825530 -0.4661687 0.4186110
## C150_4 1.00000000 0.42646167 0.5810345 -0.0934037
## PFTFTUG1_EF 0.42646167 1.00000000 0.4135862 0.2171052
## RET_FT4 0.58103449 0.41358619 1.0000000 -0.2543560
## PCTFLOAN -0.09340366 0.21710522 -0.2543560 1.0000000
correlation_matrix_round<-round(cor(projectdata_matrix_quant, y = projectdata_matrix_quant, use = "pairwise.complete.obs", method = "pearson"),2)
library(corrplot)
## corrplot 0.84 loaded
corrplot(correlation_matrix_round, method="circle")
Could not determine the variables with significant correlations in R, so used Excel to determine the correlations above 0.3 and below -0.3. -SAT_AVE has a correlation with UGDS_ASIAN (0.47), UGDS_NRA (0.36), NPT4_PUB (0.34), NPT4_PRIV (0.36), COSTT4_A (0.52), TUITFTE (0.53), INEXPFTE (0.64), C150_4 (0.8), PFTFTUG1_EF (0.4) and RET_FT4 (0.74). -UGDS_WHITE has a significant correlation with PCTPELL(0.35), C150_4 (-0.32). -UGDS_NRA has a significant correlation with COSTT4_A(0.32). -NPT4_PUB has a significant correlation with COSTT4_A(0.92) and TUITFTE (0.58); C150_4(0.57), PFTFTUG1_EF (0.4), RET_FT4(0.32), PCTFLOAN (0.61). -NPT4_PRIV has a significant correlation with COSTT4_A (0.71), TUITFTE (0.55), C150_4 (0.34). -COSTT4_A has a significant correlation with TUITFTE (0.76), INEXPFTE (0.46), C150_4 (0.55), PFTFTUG1_EF (0.45), RET_FT4 (0.32), PCTFLOAN (0.43). -TUITFE has a significant correlation with INEXPFTE (0.73), C150_4 (0.38), PFTFTUG1_EF (0.36). -INEXPFTEC has a significant correlation with 150_4 (0.47), RET_FT4 (0.35). -PCTPELL has a significant correlation with PCTFLOAN (0.42).
The blank locations in the correlation matrix visualization indicate that there is no significant relation.
###Question 4 strongest relation: INEXPFTE and TUITFE have the largest covariance 161489240.1. The correlation matrix indicates that their correlation score is 0.73028690.
weakest relation UGDS_NRA and UGDS_NHPI have the smallest covariance as -0.0000104207. The correlation matrix indicates that their correlation score is -0.006349936.
###Question 5 Grouping the schools:
projectdata_public <- subset(projectdata , CONTROL == "1")
projectdata_private_np <- subset(projectdata , CONTROL == "2")
projectdata_private_fp <- subset(projectdata , CONTROL == "3")
Running the descriptives for groups of schools: -Descriptives for public schools:
options(digits=6)
options(scipen = 999)
descriptives_public<-describeBy(projectdata_public[,5:30], na.rm = TRUE)
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning
## Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in describeBy(projectdata_public[, 5:30], na.rm = TRUE): no
## grouping variable requested
print(descriptives_public)
## vars n mean sd median trimmed mad min
## CONTROL 1 2044 1.00 0.00 1.00 1.00 0.00 1.00
## CCBASIC* 2 2044 15.73 11.95 13.00 15.29 16.31 1.00
## ADM_RATE 3 636 0.69 0.18 0.71 0.70 0.19 0.16
## SAT_AVG 4 516 1038.56 111.64 1029.00 1032.59 102.30 774.00
## UGDS 5 1970 6032.04 7745.08 3211.50 4462.04 4077.89 5.00
## UGDS_WHITE 6 1970 0.59 0.26 0.64 0.61 0.25 0.00
## UGDS_BLACK 7 1970 0.14 0.18 0.08 0.10 0.09 0.00
## UGDS_HISP 8 1970 0.14 0.18 0.06 0.10 0.07 0.00
## UGDS_ASIAN 9 1970 0.03 0.06 0.01 0.02 0.02 0.00
## UGDS_AIAN 10 1970 0.03 0.11 0.00 0.01 0.00 0.00
## UGDS_NHPI 11 1970 0.01 0.05 0.00 0.00 0.00 0.00
## UGDS_2MOR 12 1970 0.03 0.03 0.02 0.02 0.02 0.00
## UGDS_NRA 13 1970 0.01 0.02 0.00 0.01 0.01 0.00
## UGDS_UNKN 14 1970 0.03 0.04 0.01 0.02 0.02 0.00
## PPTUG_EF 15 1970 0.35 0.24 0.35 0.35 0.31 0.00
## NPT4_PUB 16 1911 9624.66 4669.67 8751.00 9341.96 4293.61 -2434.00
## NPT4_PRIV 17 0 NaN NA NA NaN NA Inf
## COSTT4_A 18 1619 14922.46 4956.71 13646.00 14485.10 4318.81 4610.00
## TUITFTE 19 1984 4505.03 5395.95 3119.50 3726.21 2775.43 9.00
## INEXPFTE 20 1984 8340.32 11578.61 6332.50 6743.12 2484.10 0.00
## PFTFAC 21 1627 0.60 0.28 0.57 0.60 0.37 0.00
## PCTPELL 22 1968 0.42 0.17 0.40 0.41 0.16 0.00
## C150_4 23 668 0.46 0.18 0.44 0.45 0.18 0.04
## PFTFTUG1_EF 24 1597 0.45 0.20 0.43 0.44 0.23 0.02
## RET_FT4 25 624 0.74 0.12 0.75 0.75 0.11 0.00
## PCTFLOAN 26 1968 0.32 0.26 0.31 0.30 0.34 0.00
## max range skew kurtosis se
## CONTROL 1.00 0.00 NaN NaN 0.00
## CCBASIC* 35.00 34.00 0.28 -1.40 0.26
## ADM_RATE 1.00 0.84 -0.44 -0.36 0.01
## SAT_AVG 1400.00 626.00 0.53 0.27 4.91
## UGDS 77657.00 77652.00 2.60 10.47 174.50
## UGDS_WHITE 1.00 1.00 -0.66 -0.44 0.01
## UGDS_BLACK 0.96 0.96 2.36 6.11 0.00
## UGDS_HISP 1.00 1.00 2.41 6.36 0.00
## UGDS_ASIAN 0.44 0.44 3.56 15.03 0.00
## UGDS_AIAN 1.00 1.00 7.12 53.41 0.00
## UGDS_NHPI 1.00 1.00 17.81 335.12 0.00
## UGDS_2MOR 0.43 0.43 5.46 51.55 0.00
## UGDS_NRA 0.29 0.29 3.53 19.80 0.00
## UGDS_UNKN 0.44 0.44 3.67 21.92 0.00
## PPTUG_EF 1.00 1.00 0.17 -1.08 0.01
## NPT4_PUB 28201.00 30635.00 0.61 0.10 106.82
## NPT4_PRIV -Inf -Inf NA NA NA
## COSTT4_A 33826.00 29216.00 0.86 0.43 123.19
## TUITFTE 109761.00 109752.00 8.62 131.16 121.14
## INEXPFTE 357489.00 357489.00 16.72 435.66 259.95
## PFTFAC 1.00 1.00 0.14 -1.35 0.01
## PCTPELL 1.00 1.00 0.57 0.40 0.00
## C150_4 0.94 0.90 0.24 -0.35 0.01
## PFTFTUG1_EF 1.00 0.98 0.24 -0.77 0.01
## RET_FT4 1.00 1.00 -0.79 2.74 0.00
## PCTFLOAN 1.00 1.00 0.33 -0.91 0.01
-Descriptives for Private non-profit schools:
descriptives_private_np<-describeBy(projectdata_private_np[,5:30], na.rm = TRUE)
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning
## Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in describeBy(projectdata_private_np[, 5:30], na.rm = TRUE): no
## grouping variable requested
print(descriptives_private_np)
## vars n mean sd median trimmed mad min
## CONTROL 1 1956 2.00 0.00 2.00 2.00 0.00 2
## CCBASIC* 2 1956 15.04 7.52 15.00 14.89 5.93 1
## ADM_RATE 3 1177 0.65 0.21 0.67 0.66 0.20 0
## SAT_AVG 4 782 1073.09 144.36 1049.00 1060.34 109.71 720
## UGDS 5 1636 1694.12 3054.70 930.50 1133.83 1146.79 0
## UGDS_WHITE 6 1636 0.57 0.28 0.64 0.60 0.25 0
## UGDS_BLACK 7 1636 0.14 0.20 0.06 0.09 0.07 0
## UGDS_HISP 8 1636 0.12 0.21 0.06 0.07 0.06 0
## UGDS_ASIAN 9 1636 0.04 0.07 0.02 0.02 0.02 0
## UGDS_AIAN 10 1636 0.01 0.07 0.00 0.00 0.00 0
## UGDS_NHPI 11 1636 0.00 0.03 0.00 0.00 0.00 0
## UGDS_2MOR 12 1636 0.02 0.03 0.02 0.02 0.03 0
## UGDS_NRA 13 1636 0.04 0.08 0.01 0.02 0.02 0
## UGDS_UNKN 14 1636 0.05 0.07 0.02 0.03 0.03 0
## PPTUG_EF 15 1623 0.15 0.22 0.06 0.10 0.09 0
## NPT4_PUB 16 0 NaN NA NA NaN NA Inf
## NPT4_PRIV 17 1477 20266.36 7663.64 20045.00 20210.20 6932.64 1881
## COSTT4_A 18 1396 35607.98 13581.30 35299.00 35346.38 14768.18 6428
## TUITFTE 19 1872 15178.31 30872.95 13017.50 13641.76 6831.08 0
## INEXPFTE 20 1872 11332.52 20326.59 8285.00 9015.70 4754.70 0
## PFTFAC 21 1521 0.66 0.29 0.67 0.68 0.41 0
## PCTPELL 22 1623 0.42 0.21 0.40 0.41 0.20 0
## C150_4 23 1274 0.54 0.21 0.54 0.54 0.21 0
## PFTFTUG1_EF 24 1321 0.63 0.25 0.68 0.65 0.26 0
## RET_FT4 25 1272 0.74 0.17 0.76 0.75 0.14 0
## PCTFLOAN 26 1623 0.56 0.26 0.63 0.59 0.20 0
## max range skew kurtosis se
## CONTROL 2.00 0.00 NaN NaN 0.00
## CCBASIC* 35.00 34.00 0.33 0.86 0.17
## ADM_RATE 1.00 1.00 -0.48 -0.02 0.01
## SAT_AVG 1545.00 825.00 0.81 0.88 5.16
## UGDS 49340.00 49340.00 7.37 83.70 75.52
## UGDS_WHITE 1.00 1.00 -0.67 -0.47 0.01
## UGDS_BLACK 1.00 1.00 2.63 6.78 0.01
## UGDS_HISP 1.00 1.00 3.22 10.19 0.01
## UGDS_ASIAN 0.95 0.95 5.22 42.56 0.00
## UGDS_AIAN 0.95 0.95 11.74 142.12 0.00
## UGDS_NHPI 0.95 0.95 29.87 1036.52 0.00
## UGDS_2MOR 0.53 0.53 5.22 73.88 0.00
## UGDS_NRA 0.93 0.93 5.20 40.78 0.00
## UGDS_UNKN 0.71 0.71 3.60 20.53 0.00
## PPTUG_EF 1.00 1.00 2.06 4.06 0.01
## NPT4_PUB -Inf -Inf NA NA NA
## NPT4_PRIV 46509.00 44628.00 0.15 0.15 199.41
## COSTT4_A 64988.00 58560.00 0.12 -0.78 363.50
## TUITFTE 1292154.00 1292154.00 37.87 1560.77 713.55
## INEXPFTE 735077.00 735077.00 25.19 861.92 469.80
## PFTFAC 1.00 1.00 -0.34 -1.11 0.01
## PCTPELL 1.00 1.00 0.52 -0.23 0.01
## C150_4 1.00 1.00 -0.17 -0.26 0.01
## PFTFTUG1_EF 1.00 1.00 -0.62 -0.45 0.01
## RET_FT4 1.00 1.00 -1.30 3.21 0.00
## PCTFLOAN 1.00 1.00 -0.90 -0.07 0.01
-Descriptives for Private for-profit schools:
descriptives_private_fp<-describeBy(projectdata_private_fp[,5:30], na.rm = TRUE)
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning
## Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in describeBy(projectdata_private_fp[, 5:30], na.rm = TRUE): no
## grouping variable requested
print(descriptives_private_fp)
## vars n mean sd median trimmed mad min
## CONTROL 1 3703 3.00 0.00 3.00 3.00 0.00 3.0
## CCBASIC* 2 3703 8.74 11.86 1.00 6.46 0.00 1.0
## ADM_RATE 3 385 0.83 0.17 0.88 0.86 0.15 0.1
## SAT_AVG 4 6 995.67 129.57 971.50 995.67 103.04 855.0
## UGDS 5 3384 486.73 3180.20 161.00 219.51 166.05 0.0
## UGDS_WHITE 6 3384 0.43 0.29 0.42 0.43 0.36 0.0
## UGDS_BLACK 7 3384 0.24 0.25 0.16 0.20 0.19 0.0
## UGDS_HISP 8 3384 0.20 0.25 0.09 0.14 0.12 0.0
## UGDS_ASIAN 9 3384 0.03 0.09 0.01 0.01 0.02 0.0
## UGDS_AIAN 10 3384 0.01 0.02 0.00 0.00 0.00 0.0
## UGDS_NHPI 11 3384 0.00 0.02 0.00 0.00 0.00 0.0
## UGDS_2MOR 12 3384 0.02 0.03 0.01 0.02 0.02 0.0
## UGDS_NRA 13 3384 0.01 0.04 0.00 0.00 0.00 0.0
## UGDS_UNKN 14 3384 0.06 0.12 0.01 0.02 0.02 0.0
## PPTUG_EF 15 3376 0.19 0.24 0.08 0.14 0.11 0.0
## NPT4_PUB 16 0 NaN NA NA NaN NA Inf
## NPT4_PRIV 17 3211 17293.57 6886.68 17346.00 17135.56 6763.62 -581.0
## COSTT4_A 18 1015 25902.30 6036.42 25962.00 25912.42 4213.55 8160.0
## TUITFTE 19 3414 11207.99 8380.16 9899.00 10430.98 4869.60 0.0
## INEXPFTE 20 3414 4612.49 4876.88 3701.00 4031.71 2258.00 0.0
## PFTFAC 21 897 0.35 0.28 0.26 0.31 0.22 0.0
## PCTPELL 22 3375 0.65 0.20 0.67 0.66 0.19 0.0
## C150_4 23 539 0.37 0.20 0.34 0.35 0.17 0.0
## PFTFTUG1_EF 24 746 0.52 0.31 0.53 0.52 0.40 0.0
## RET_FT4 25 397 0.54 0.28 0.52 0.55 0.26 0.0
## PCTFLOAN 26 3375 0.62 0.25 0.67 0.65 0.21 0.0
## max range skew kurtosis se
## CONTROL 3.00 0.00 NaN NaN 0.00
## CCBASIC* 35.00 34.00 1.25 0.01 0.19
## ADM_RATE 1.00 0.90 -1.48 2.41 0.01
## SAT_AVG 1211.00 356.00 0.48 -1.44 52.90
## UGDS 151558.00 151558.00 34.90 1540.42 54.67
## UGDS_WHITE 1.00 1.00 0.16 -1.14 0.00
## UGDS_BLACK 1.00 1.00 1.23 0.74 0.00
## UGDS_HISP 1.00 1.00 1.80 2.72 0.00
## UGDS_ASIAN 0.97 0.97 6.86 56.74 0.00
## UGDS_AIAN 0.40 0.40 7.45 82.40 0.00
## UGDS_NHPI 0.67 0.67 18.44 443.00 0.00
## UGDS_2MOR 0.44 0.44 3.21 20.21 0.00
## UGDS_NRA 0.87 0.87 13.05 209.30 0.00
## UGDS_UNKN 0.90 0.90 3.57 14.47 0.00
## PPTUG_EF 1.00 1.00 1.36 1.38 0.00
## NPT4_PUB -Inf -Inf NA NA NA
## NPT4_PRIV 89406.00 89987.00 1.06 7.33 121.53
## COSTT4_A 79212.00 71052.00 0.71 7.16 189.47
## TUITFTE 248996.00 248996.00 10.05 229.30 143.42
## INEXPFTE 199372.00 199372.00 20.23 753.35 83.47
## PFTFAC 1.00 1.00 1.03 0.09 0.01
## PCTPELL 1.00 1.00 -0.69 0.39 0.00
## C150_4 1.00 1.00 0.75 0.75 0.01
## PFTFTUG1_EF 1.00 1.00 -0.12 -1.27 0.01
## RET_FT4 1.00 1.00 -0.11 -0.49 0.01
## PCTFLOAN 1.00 1.00 -1.02 0.52 0.00
Compared to the public schools, private schools have much larger skewness by variables of TUITFE, INEXPFTE. The average Net tuition revenue per full-time equivalent student (TUITFE) and Instructional expenditures per full-time equivalent student (INEXPFTE) is higher in non-profit private schools than in for-profit private schools.
The overall distribution of variables is closer to normal distribution than in the ungrouped distribution of schools. Average UGDS decreases from public to non-profit and for-profit schools. Average ADM_RATE is similar between public and non-profit schools.
###Question 6
-Finding scatterplot matrix for the variables C150_4, SAT_AVG, UGDS_WHITE, PCTFLOAN for public schools.
pairs(~ C150_4 + SAT_AVG + UGDS_WHITE + PCTFLOAN, data = projectdata_public, row1attop=FALSE)
The variables C150_4 and SAT_AVG display a positive linear relation, while the SAT_AVG and PCFTLOAN display a negative linear relation. These results are similar to the ungrouped scatterplot. The variablles SAT_AVG and UGDS_WHITE display a non-linear relation.
-Finding scatterplot matrix for the variables C150_4, SAT_AVG, UGDS_WHITE, PCTFLOAN for non-profit private schools.
pairs(~ C150_4 + SAT_AVG + UGDS_WHITE + PCTFLOAN, data = projectdata_private_np, row1attop=FALSE)
The variables C150_4 and SAT_AVG display a positive linear relation, while the SAT_AVG and PCFTLOAN display a negative linear relation, which seems to be stronger than in the case of public schools. These results are similar to the ungrouped scatterplot. The variables SAT_AVG and UGDS_WHITE display a non-linear relation but it does not seem to be as strong as for the case of public schools.
-Finding scatterplot matrix for the variables C150_4, SAT_AVG, UGDS_WHITE, PCTFLOAN for for-profit private schools.
pairs(~ C150_4 + SAT_AVG + UGDS_WHITE + PCTFLOAN, data = projectdata_private_fp, row1attop=FALSE)
There is not any significant relation between these variables for the case of for-profit private schools.
###Question 7
-Covariance matrix for public schools:
projectdata_p_matrix_quant<-data.matrix(projectdata_public[7:30], rownames.force = NA)
covar_matrix_p<-cov(projectdata_p_matrix_quant, y = projectdata_p_matrix_quant, use = "pairwise.complete.obs", method = "pearson")
print(covar_matrix_p)
## ADM_RATE SAT_AVG UGDS UGDS_WHITE
## ADM_RATE 0.0336011727 -3.8686471 -212.5850 0.012402598
## SAT_AVG -3.8686471337 12463.5085234 545068.3450 7.768439572
## UGDS -212.5849738248 545068.3450102 59986256.4915 -402.643176991
## UGDS_WHITE 0.0124025981 7.7684396 -402.6432 0.065545908
## UGDS_BLACK -0.0076538385 -11.2890796 -70.2881 -0.020816997
## UGDS_HISP -0.0017084621 -0.0575393 305.1411 -0.027691548
## UGDS_ASIAN -0.0033843472 2.9721245 164.0611 -0.005566508
## UGDS_AIAN 0.0007446680 -0.5342317 -107.4524 -0.006626126
## UGDS_NHPI 0.0006792911 -0.0324790 -12.1503 -0.001916665
## UGDS_2MOR 0.0002595622 0.3434692 30.8256 -0.000936299
## UGDS_NRA -0.0004671890 1.2601887 76.8146 -0.000905470
## UGDS_UNKN -0.0008725182 -0.4301800 15.6980 -0.001085878
## PPTUG_EF 0.0000954369 -4.2294166 89.4676 -0.010066535
## NPT4_PUB 22.2879087113 154640.1284935 5346029.3930 260.162736663
## NPT4_PRIV NA NA NA NA
## COSTT4_A -103.2138961521 226194.8570397 11142000.7720 169.917311164
## TUITFTE -10.1815344619 227986.7303426 6336576.3369 187.983704810
## INEXPFTE -151.0771627139 241038.7476521 -1806406.3802 58.037873702
## PFTFAC 0.0016301072 2.4444223 -43.6577 0.005844623
## PCTPELL -0.0005362106 -9.4702374 -284.4552 -0.006598399
## C150_4 -0.0052418089 14.2573628 757.4312 0.011827856
## PFTFTUG1_EF 0.0011840643 5.2694309 181.4096 0.004703051
## RET_FT4 -0.0043001302 7.8400274 491.8125 0.000410025
## PCTFLOAN 0.0015668700 -6.5138583 91.0523 0.015825849
## UGDS_BLACK UGDS_HISP UGDS_ASIAN UGDS_AIAN
## ADM_RATE -0.007653838 -0.0017084621 -0.0033843472 0.0007446680
## SAT_AVG -11.289079618 -0.0575393001 2.9721244502 -0.5342317250
## UGDS -70.288139721 305.1411086094 164.0611333528 -107.4523552103
## UGDS_WHITE -0.020816997 -0.0276915476 -0.0055665082 -0.0066261256
## UGDS_BLACK 0.030931962 -0.0050938181 -0.0010045128 -0.0024622749
## UGDS_HISP -0.005093818 0.0326939649 0.0022412020 -0.0018724187
## UGDS_ASIAN -0.001004513 0.0022412020 0.0033785450 -0.0006222649
## UGDS_AIAN -0.002462275 -0.0018724187 -0.0006222649 0.0122932759
## UGDS_NHPI -0.000483496 -0.0002986142 0.0003475660 -0.0000765116
## UGDS_2MOR -0.000572416 -0.0000759129 0.0006569181 -0.0001897741
## UGDS_NRA -0.000302662 0.0001092018 0.0004981175 -0.0002098941
## UGDS_UNKN -0.000196143 -0.0000123227 0.0000708987 -0.0002340234
## PPTUG_EF -0.000344859 0.0082208271 0.0009672637 0.0006569355
## NPT4_PUB -24.657390012 -204.3260247216 7.9565975666 -50.3274507922
## NPT4_PRIV NA NA NA NA
## COSTT4_A -36.206960690 -169.4033123712 52.4599092872 -53.2376832673
## TUITFTE -60.699057245 -135.4007046202 26.4656643767 -43.3675931753
## INEXPFTE -43.923711823 -80.8312285453 50.2257485056 17.2605623571
## PFTFAC -0.000770282 -0.0066239972 -0.0012416101 0.0013896573
## PCTPELL 0.010390380 -0.0013850462 -0.0018940078 0.0011892883
## C150_4 -0.010463509 -0.0020199748 0.0037532020 -0.0035351973
## PFTFTUG1_EF -0.001083308 -0.0032139455 -0.0011638976 0.0006136038
## RET_FT4 -0.005770553 0.0025524841 0.0030285946 -0.0011300382
## PCTFLOAN 0.005410460 -0.0145011063 -0.0014205867 -0.0053096948
## UGDS_NHPI UGDS_2MOR UGDS_NRA UGDS_UNKN
## ADM_RATE 0.0006792911 0.00025956225 -0.0004671890 -0.0008725182
## SAT_AVG -0.0324789636 0.34346922669 1.2601886848 -0.4301800045
## UGDS -12.1503253439 30.82564738595 76.8146345980 15.6979901242
## UGDS_WHITE -0.0019166651 -0.00093629865 -0.0009054697 -0.0010858781
## UGDS_BLACK -0.0004834959 -0.00057241603 -0.0003026625 -0.0001961427
## UGDS_HISP -0.0002986142 -0.00007591289 0.0001092018 -0.0000123227
## UGDS_ASIAN 0.0003475660 0.00065691814 0.0004981175 0.0000708987
## UGDS_AIAN -0.0000765116 -0.00018977411 -0.0002098941 -0.0002340234
## UGDS_NHPI 0.0023745684 0.00008111316 0.0000247540 -0.0000526212
## UGDS_2MOR 0.0000811132 0.00090088652 0.0001255937 0.0000099156
## UGDS_NRA 0.0000247540 0.00012559370 0.0006214688 0.0000389606
## UGDS_UNKN -0.0000526212 0.00000991560 0.0000389606 0.0014612872
## PPTUG_EF 0.0000975370 0.00025801446 -0.0011201983 0.0013311214
## NPT4_PUB -17.9221260879 5.15707904691 21.8246296769 2.1374055035
## NPT4_PRIV NA NA NA NA
## COSTT4_A -23.5268873687 13.39656493215 43.9638837981 2.6420805828
## TUITFTE -6.3587580810 2.77551642169 27.9439680321 0.6592410772
## INEXPFTE -5.0313468968 -3.97746566296 16.2428071613 -8.0177549656
## PFTFAC 0.0002952458 0.00006410420 0.0012561471 -0.0002140569
## PCTPELL 0.0009818011 -0.00093171297 -0.0007846350 -0.0009678444
## C150_4 -0.0007818610 0.00023910492 0.0016349433 -0.0006543867
## PFTFTUG1_EF 0.0007842150 -0.00051591510 0.0009330706 -0.0010569228
## RET_FT4 0.0002091550 0.00006617733 0.0009719275 -0.0003378427
## PCTFLOAN -0.0009334880 0.00000454354 0.0007577333 0.0001666833
## PPTUG_EF NPT4_PUB NPT4_PRIV COSTT4_A
## ADM_RATE 0.0000954369 22.28791 NA -103.21390
## SAT_AVG -4.2294166268 154640.12849 NA 226194.85704
## UGDS 89.4675752305 5346029.39297 NA 11142000.77200
## UGDS_WHITE -0.0100665353 260.16274 NA 169.91731
## UGDS_BLACK -0.0003448588 -24.65739 NA -36.20696
## UGDS_HISP 0.0082208271 -204.32602 NA -169.40331
## UGDS_ASIAN 0.0009672637 7.95660 NA 52.45991
## UGDS_AIAN 0.0006569355 -50.32745 NA -53.23768
## UGDS_NHPI 0.0000975370 -17.92213 NA -23.52689
## UGDS_2MOR 0.0002580145 5.15708 NA 13.39656
## UGDS_NRA -0.0011201983 21.82463 NA 43.96388
## UGDS_UNKN 0.0013311214 2.13741 NA 2.64208
## PPTUG_EF 0.0559198372 -494.74628 NA -662.95187
## NPT4_PUB -494.7462837076 21805832.12310 NA 20420278.27211
## NPT4_PRIV NA NA NA NA
## COSTT4_A -662.9518662241 20420278.27211 NA 24568968.58818
## TUITFTE -399.4380944136 11215184.60953 NA 13486786.88130
## INEXPFTE -464.0903082576 6239926.76979 NA 8950068.16662
## PFTFAC -0.0255763136 292.43019 NA 368.13577
## PCTPELL -0.0075163227 -82.78810 NA -153.10940
## C150_4 -0.0177376815 450.89530 NA 595.65463
## PFTFTUG1_EF -0.0328960439 359.34794 NA 444.70495
## RET_FT4 -0.0078700556 161.13032 NA 255.23165
## PCTFLOAN -0.0271153765 729.64347 NA 717.23173
## TUITFTE INEXPFTE PFTFAC PCTPELL
## ADM_RATE -10.181534 -151.07716 0.0016301072 -0.0005362106
## SAT_AVG 227986.730343 241038.74765 2.4444222799 -9.4702374145
## UGDS 6336576.336850 -1806406.38016 -43.6577038172 -284.4552191197
## UGDS_WHITE 187.983705 58.03787 0.0058446230 -0.0065983987
## UGDS_BLACK -60.699057 -43.92371 -0.0007702821 0.0103903800
## UGDS_HISP -135.400705 -80.83123 -0.0066239972 -0.0013850462
## UGDS_ASIAN 26.465664 50.22575 -0.0012416101 -0.0018940078
## UGDS_AIAN -43.367593 17.26056 0.0013896573 0.0011892883
## UGDS_NHPI -6.358758 -5.03135 0.0002952458 0.0009818011
## UGDS_2MOR 2.775516 -3.97747 0.0000641042 -0.0009317130
## UGDS_NRA 27.943968 16.24281 0.0012561471 -0.0007846350
## UGDS_UNKN 0.659241 -8.01775 -0.0002140569 -0.0009678444
## PPTUG_EF -399.438094 -464.09031 -0.0255763136 -0.0075163227
## NPT4_PUB 11215184.609534 6239926.76979 292.4301896586 -82.7881019627
## NPT4_PRIV NA NA NA NA
## COSTT4_A 13486786.881299 8950068.16662 368.1357704102 -153.1094027403
## TUITFTE 29116322.530603 25045435.39216 297.7349183726 -98.9527060368
## INEXPFTE 25045435.392156 134064195.62761 370.6366835022 -169.9340449429
## PFTFAC 297.734918 370.63668 0.0779592446 -0.0000969015
## PCTPELL -98.952706 -169.93404 -0.0000969015 0.0288607627
## C150_4 422.699344 375.04814 0.0058391470 -0.0138992187
## PFTFTUG1_EF 323.381000 211.82217 0.0164332154 0.0028016766
## RET_FT4 197.187210 208.24043 0.0022860569 -0.0066317863
## PCTFLOAN 481.686849 385.05094 0.0196303662 0.0105145812
## C150_4 PFTFTUG1_EF RET_FT4 PCTFLOAN
## ADM_RATE -0.005241809 0.001184064 -0.0043001302 0.00156687003
## SAT_AVG 14.257362770 5.269430885 7.8400274284 -6.51385826338
## UGDS 757.431230038 181.409646380 491.8124856690 91.05230287144
## UGDS_WHITE 0.011827856 0.004703051 0.0004100246 0.01582584942
## UGDS_BLACK -0.010463509 -0.001083308 -0.0057705531 0.00541045978
## UGDS_HISP -0.002019975 -0.003213946 0.0025524841 -0.01450110632
## UGDS_ASIAN 0.003753202 -0.001163898 0.0030285946 -0.00142058665
## UGDS_AIAN -0.003535197 0.000613604 -0.0011300382 -0.00530969480
## UGDS_NHPI -0.000781861 0.000784215 0.0002091550 -0.00093348797
## UGDS_2MOR 0.000239105 -0.000515915 0.0000661773 0.00000454354
## UGDS_NRA 0.001634943 0.000933071 0.0009719275 0.00075773329
## UGDS_UNKN -0.000654387 -0.001056923 -0.0003378427 0.00016668327
## PPTUG_EF -0.017737682 -0.032896044 -0.0078700556 -0.02711537653
## NPT4_PUB 450.895296538 359.347941380 161.1303216488 729.64347456410
## NPT4_PRIV NA NA NA NA
## COSTT4_A 595.654629266 444.704951963 255.2316529109 717.23172941724
## TUITFTE 422.699344143 323.381000004 197.1872102852 481.68684943052
## INEXPFTE 375.048144975 211.822174023 208.2404316644 385.05094120121
## PFTFAC 0.005839147 0.016433215 0.0022860569 0.01963036622
## PCTPELL -0.013899219 0.002801677 -0.0066317863 0.01051458120
## C150_4 0.032312755 0.015134162 0.0164080616 0.00415976947
## PFTFTUG1_EF 0.015134162 0.040962807 0.0077027546 0.01932123718
## RET_FT4 0.016408062 0.007702755 0.0136728973 -0.00279392609
## PCTFLOAN 0.004159769 0.019321237 -0.0027939261 0.06593407537
-correlation matrix for public schools:
correlation_matrix_p<-cor(projectdata_p_matrix_quant, y = projectdata_p_matrix_quant, use = "pairwise.complete.obs", method = "pearson")
print(correlation_matrix_p)
## ADM_RATE SAT_AVG UGDS UGDS_WHITE UGDS_BLACK
## ADM_RATE 1.00000000 -0.19654913 -0.1289392 0.2609101 -0.20674287
## SAT_AVG -0.19654913 1.00000000 0.5370659 0.2907833 -0.49420723
## UGDS -0.12893924 0.53706595 1.0000000 -0.2030587 -0.05160032
## UGDS_WHITE 0.26091012 0.29078330 -0.2030587 1.0000000 -0.46231881
## UGDS_BLACK -0.20674287 -0.49420723 -0.0516003 -0.4623188 1.00000000
## UGDS_HISP -0.04954429 -0.00358081 0.2178919 -0.5981921 -0.16017912
## UGDS_ASIAN -0.28345346 0.39289918 0.3644307 -0.3740634 -0.09826229
## UGDS_AIAN 0.07237888 -0.09819105 -0.1251285 -0.2334281 -0.12626966
## UGDS_NHPI 0.06639876 -0.04654875 -0.0321936 -0.1536318 -0.05641527
## UGDS_2MOR 0.05115751 0.12441009 0.1326024 -0.1218447 -0.10843588
## UGDS_NRA -0.08244967 0.38371121 0.3978400 -0.1418703 -0.06903116
## UGDS_UNKN -0.12634535 -0.11152722 0.0530213 -0.1109534 -0.02917431
## PPTUG_EF 0.00367178 -0.32617487 0.0488491 -0.1662740 -0.00829191
## NPT4_PUB 0.02782553 0.34169792 0.1468039 0.2173960 -0.02981708
## NPT4_PRIV NA NA NA NA NA
## COSTT4_A -0.12742621 0.48318171 0.2802674 0.1358600 -0.04083219
## TUITFTE -0.01200375 0.58939726 0.1944799 0.1745755 -0.08203029
## INEXPFTE -0.20485437 0.54944882 -0.0217912 0.0211846 -0.02333119
## PFTFAC 0.04953418 0.12249666 -0.0194689 0.0833372 -0.01564803
## PCTPELL -0.01777064 -0.65483119 -0.2161195 -0.1516916 0.34760307
## C150_4 -0.18390170 0.78405891 0.4630921 0.2560546 -0.29273837
## PFTFTUG1_EF 0.03725845 0.27929752 0.1116179 0.0924172 -0.03007132
## RET_FT4 -0.25587034 0.73079261 0.4765554 0.0134587 -0.24170436
## PCTFLOAN 0.04480260 -0.39718439 0.0457688 0.2407071 0.11975269
## UGDS_HISP UGDS_ASIAN UGDS_AIAN UGDS_NHPI UGDS_2MOR
## ADM_RATE -0.04954429 -0.2834535 0.0723789 0.06639876 0.051157507
## SAT_AVG -0.00358081 0.3928992 -0.0981911 -0.04654875 0.124410094
## UGDS 0.21789188 0.3644307 -0.1251285 -0.03219361 0.132602369
## UGDS_WHITE -0.59819206 -0.3740634 -0.2334281 -0.15363179 -0.121844678
## UGDS_BLACK -0.16017912 -0.0982623 -0.1262697 -0.05641527 -0.108435880
## UGDS_HISP 1.00000000 0.2132467 -0.0933975 -0.03389099 -0.013987712
## UGDS_ASIAN 0.21324672 1.0000000 -0.0965554 0.12271002 0.376540208
## UGDS_AIAN -0.09339752 -0.0965554 1.0000000 -0.01416123 -0.057025369
## UGDS_NHPI -0.03389099 0.1227100 -0.0141612 1.00000000 0.055457965
## UGDS_2MOR -0.01398771 0.3765402 -0.0570254 0.05545796 1.000000000
## UGDS_NRA 0.02422625 0.3437615 -0.0759376 0.02037714 0.167850694
## UGDS_UNKN -0.00178281 0.0319085 -0.0552151 -0.02824886 0.008642035
## PPTUG_EF 0.19226435 0.0703715 0.0250557 0.00846436 0.036351811
## NPT4_PUB -0.24145384 0.0294243 -0.0978197 -0.07758385 0.036636685
## NPT4_PRIV NA NA NA NA NA
## COSTT4_A -0.18504184 0.1724786 -0.0939407 -0.08858280 0.089495452
## TUITFTE -0.17799642 0.1082204 -0.0929608 -0.03101322 0.021985286
## INEXPFTE -0.04176526 0.0807232 0.0145424 -0.00964505 -0.012383414
## PFTFAC -0.12819576 -0.0712472 0.0434149 0.01968683 0.007593058
## PCTPELL -0.04507280 -0.1917202 0.0631076 0.11853811 -0.182696543
## C150_4 -0.06191522 0.3105361 -0.1877793 -0.09553903 0.047267729
## PFTFTUG1_EF -0.08622650 -0.0934539 0.0257779 0.07182541 -0.084243762
## RET_FT4 0.11813979 0.3793869 -0.1030183 0.03812590 0.021151584
## PCTFLOAN -0.31221251 -0.0951377 -0.1864073 -0.07456623 0.000589443
## UGDS_NRA UGDS_UNKN PPTUG_EF NPT4_PUB NPT4_PRIV
## ADM_RATE -0.0824497 -0.12634535 0.00367178 0.0278255 NA
## SAT_AVG 0.3837112 -0.11152722 -0.32617487 0.3416979 NA
## UGDS 0.3978400 0.05302130 0.04884913 0.1468039 NA
## UGDS_WHITE -0.1418703 -0.11095337 -0.16627397 0.2173960 NA
## UGDS_BLACK -0.0690312 -0.02917431 -0.00829191 -0.0298171 NA
## UGDS_HISP 0.0242263 -0.00178281 0.19226435 -0.2414538 NA
## UGDS_ASIAN 0.3437615 0.03190845 0.07037155 0.0294243 NA
## UGDS_AIAN -0.0759376 -0.05521507 0.02505567 -0.0978197 NA
## UGDS_NHPI 0.0203771 -0.02824886 0.00846436 -0.0775839 NA
## UGDS_2MOR 0.1678507 0.00864203 0.03635181 0.0366367 NA
## UGDS_NRA 1.0000000 0.04088352 -0.19002131 0.1859134 NA
## UGDS_UNKN 0.0408835 1.00000000 0.14725394 0.0119817 NA
## PPTUG_EF -0.1900213 0.14725394 1.00000000 -0.4557973 NA
## NPT4_PUB 0.1859134 0.01198165 -0.45579731 1.0000000 NA
## NPT4_PRIV NA NA NA NA NA
## COSTT4_A 0.3355728 0.01378607 -0.60014713 0.9227081 NA
## TUITFTE 0.2664402 0.00409934 -0.40157978 0.5806977 NA
## INEXPFTE 0.0608721 -0.01959601 -0.18338783 0.2999074 NA
## PFTFAC 0.1696686 -0.02048827 -0.41317250 0.2348420 NA
## PCTPELL -0.1851993 -0.14898227 -0.18706290 -0.1060222 NA
## C150_4 0.2871060 -0.11057601 -0.59422589 0.5679805 NA
## PFTFTUG1_EF 0.1738276 -0.13971730 -0.73767993 0.4000409 NA
## RET_FT4 0.2688806 -0.08874030 -0.48068026 0.3238603 NA
## PCTFLOAN 0.1183278 0.01697540 -0.44647466 0.6133979 NA
## COSTT4_A TUITFTE INEXPFTE PFTFAC PCTPELL
## ADM_RATE -0.1274262 -0.01200375 -0.20485437 0.04953418 -0.01777064
## SAT_AVG 0.4831817 0.58939726 0.54944882 0.12249666 -0.65483119
## UGDS 0.2802674 0.19447995 -0.02179121 -0.01946888 -0.21611950
## UGDS_WHITE 0.1358600 0.17457550 0.02118460 0.08333716 -0.15169159
## UGDS_BLACK -0.0408322 -0.08203029 -0.02333119 -0.01564803 0.34760307
## UGDS_HISP -0.1850418 -0.17799642 -0.04176526 -0.12819576 -0.04507280
## UGDS_ASIAN 0.1724786 0.10822037 0.08072318 -0.07124724 -0.19172018
## UGDS_AIAN -0.0939407 -0.09296081 0.01454237 0.04341494 0.06310760
## UGDS_NHPI -0.0885828 -0.03101322 -0.00964505 0.01968683 0.11853811
## UGDS_2MOR 0.0894955 0.02198529 -0.01238341 0.00759306 -0.18269654
## UGDS_NRA 0.3355728 0.26644025 0.06087213 0.16966860 -0.18519928
## UGDS_UNKN 0.0137861 0.00409934 -0.01959601 -0.02048827 -0.14898227
## PPTUG_EF -0.6001471 -0.40157978 -0.18338783 -0.41317250 -0.18706290
## NPT4_PUB 0.9227081 0.58069767 0.29990739 0.23484204 -0.10602216
## NPT4_PRIV NA NA NA NA NA
## COSTT4_A 1.0000000 0.78742176 0.49775812 0.26572184 -0.20918041
## TUITFTE 0.7874218 1.00000000 0.40087034 0.23098977 -0.13850326
## INEXPFTE 0.4977581 0.40087034 1.00000000 0.11164839 -0.09348854
## PFTFAC 0.2657218 0.23098977 0.11164839 1.00000000 -0.00240791
## PCTPELL -0.2091804 -0.13850326 -0.09348854 -0.00240791 1.00000000
## C150_4 0.6872294 0.65096780 0.51615425 0.15561680 -0.54165168
## PFTFTUG1_EF 0.4443971 0.47193010 0.30866292 0.29140123 0.09639879
## RET_FT4 0.4783138 0.47833487 0.46812526 0.09968770 -0.39660397
## PCTFLOAN 0.6252842 0.44606284 0.14015064 0.30634055 0.24103667
## C150_4 PFTFTUG1_EF RET_FT4 PCTFLOAN
## ADM_RATE -0.1839017 0.0372585 -0.2558703 0.044802604
## SAT_AVG 0.7840589 0.2792975 0.7307926 -0.397184388
## UGDS 0.4630921 0.1116179 0.4765554 0.045768841
## UGDS_WHITE 0.2560546 0.0924172 0.0134587 0.240707119
## UGDS_BLACK -0.2927384 -0.0300713 -0.2417044 0.119752693
## UGDS_HISP -0.0619152 -0.0862265 0.1181398 -0.312212514
## UGDS_ASIAN 0.3105361 -0.0934539 0.3793869 -0.095137722
## UGDS_AIAN -0.1877793 0.0257779 -0.1030183 -0.186407327
## UGDS_NHPI -0.0955390 0.0718254 0.0381259 -0.074566228
## UGDS_2MOR 0.0472677 -0.0842438 0.0211516 0.000589443
## UGDS_NRA 0.2871060 0.1738276 0.2688806 0.118327842
## UGDS_UNKN -0.1105760 -0.1397173 -0.0887403 0.016975399
## PPTUG_EF -0.5942259 -0.7376799 -0.4806803 -0.446474665
## NPT4_PUB 0.5679805 0.4000409 0.3238603 0.613397901
## NPT4_PRIV NA NA NA NA
## COSTT4_A 0.6872294 0.4443971 0.4783138 0.625284169
## TUITFTE 0.6509678 0.4719301 0.4783349 0.446062843
## INEXPFTE 0.5161543 0.3086629 0.4681253 0.140150643
## PFTFAC 0.1556168 0.2914012 0.0996877 0.306340546
## PCTPELL -0.5416517 0.0963988 -0.3966040 0.241036669
## C150_4 1.0000000 0.4631214 0.7911738 0.123278190
## PFTFTUG1_EF 0.4631214 1.0000000 0.3729936 0.419293438
## RET_FT4 0.7911738 0.3729936 1.0000000 -0.139265961
## PCTFLOAN 0.1232782 0.4192934 -0.1392660 1.000000000
library(corrplot)
corrplot(correlation_matrix_p, method="circle")
The UGDS_BLACK and UGDS_HISP variables ae negatively related to the UGDS_WHITE. PCTPELL is negatively related to SAT_AVG. PFTFTUG1_EF (Share of undergraduate students who are first-time, full-time, degreeseeking undergraduates) is negatively related to PPTUG_EF (share of undergraduate degree/certificate-seeking students who are part-time) which is expected due to the part-time/full-time distribution. SAT_AVG is positively related to the RET_FT4 (First-time, full-time student retention rate at four-year institutions) which can be due to the informed preference for institution of the students who have higher SAT scores. The financial variables (tuition, loans etc.) are generally positively related with less strength. The relation of ethnic distributions to financial variables also seems to be weaker. However, the PCTPELL (Percentage of undergraduates who receive a Pell Grant) is reversely related to UGDS_WHITE (negative) and to UGDS_BLACK (positive).
-Covariance matrix for private non-profit schools:
projectdata_np_matrix_quant<-data.matrix(projectdata_private_np[7:30], rownames.force = NA)
covar_matrix_np<-cov(projectdata_np_matrix_quant, y = projectdata_np_matrix_quant, use = "pairwise.complete.obs", method = "pearson")
print(covar_matrix_np)
## ADM_RATE SAT_AVG UGDS UGDS_WHITE
## ADM_RATE 0.0453689515 -11.908729 -110.76148 0.012746322
## SAT_AVG -11.9087289581 20838.907239 155728.63317 5.106529069
## UGDS -110.7614787900 155728.633174 9331218.64065 -20.594208659
## UGDS_WHITE 0.0127463220 5.106529 -20.59421 0.077553693
## UGDS_BLACK -0.0030065986 -11.607361 -40.18257 -0.029650378
## UGDS_HISP -0.0023196088 -0.126625 11.42529 -0.032652941
## UGDS_ASIAN -0.0031350879 4.317976 21.03326 -0.004098247
## UGDS_AIAN -0.0000871576 -0.193983 -9.41054 -0.002588628
## UGDS_NHPI -0.0000507946 -0.100983 -1.79909 -0.000569806
## UGDS_2MOR -0.0009704476 1.023550 9.21648 -0.000139274
## UGDS_NRA -0.0029370425 2.752199 15.93747 -0.003803603
## UGDS_UNKN -0.0002404558 -1.171670 19.54116 -0.002297023
## PPTUG_EF 0.0015815362 -6.234855 -3.96359 -0.006997340
## NPT4_PUB NA NA NA NA
## NPT4_PRIV -178.8793297321 320389.904046 5167513.08795 258.692059381
## COSTT4_A -800.1455263117 1113884.973655 10438624.22181 588.729348241
## TUITFTE -404.4904502340 517356.750215 3227969.28929 -97.161577467
## INEXPFTE -853.1034785319 1032819.728866 3532012.28581 48.659956834
## PFTFAC -0.0033877668 7.623504 23.94198 0.007097292
## PCTPELL 0.0053266341 -16.304788 -122.88680 -0.023807666
## C150_4 -0.0132518603 20.460465 131.29893 0.009853973
## PFTFTUG1_EF -0.0077760267 12.719346 73.50802 0.004431194
## RET_FT4 -0.0056044842 13.333832 95.72951 0.007280116
## PCTFLOAN 0.0029752079 -16.271363 -16.27088 0.001903675
## UGDS_BLACK UGDS_HISP UGDS_ASIAN
## ADM_RATE -0.003006599 -0.0023196088 -0.00313508788
## SAT_AVG -11.607360984 -0.1266249693 4.31797630260
## UGDS -40.182569660 11.4252946328 21.03326428748
## UGDS_WHITE -0.029650378 -0.0326529408 -0.00409824677
## UGDS_BLACK 0.041404575 -0.0055318546 -0.00178374345
## UGDS_HISP -0.005531855 0.0429432174 -0.00016361327
## UGDS_ASIAN -0.001783743 -0.0001636133 0.00463245988
## UGDS_AIAN -0.000708290 -0.0006828854 -0.00023905014
## UGDS_NHPI -0.000192794 -0.0000422158 0.00011943681
## UGDS_2MOR -0.000595397 -0.0004574627 0.00035459357
## UGDS_NRA -0.001947165 -0.0012528201 0.00130168874
## UGDS_UNKN -0.000570368 -0.0017786575 -0.00000959184
## PPTUG_EF 0.006105967 0.0020299213 -0.00114918383
## NPT4_PUB NA NA NA
## NPT4_PRIV -197.306048083 -405.8424164275 112.52150417329
## COSTT4_A -633.316257688 -604.4282642539 266.37872495491
## TUITFTE 37.640371438 -203.0514036913 96.46181580454
## INEXPFTE -143.810851630 -202.8239868737 173.51245863820
## PFTFAC -0.001420142 -0.0032362485 -0.00049797771
## PCTPELL 0.017144172 0.0145329908 -0.00260347249
## C150_4 -0.012219382 -0.0046496476 0.00414471275
## PFTFTUG1_EF -0.005915548 -0.0006838199 0.00138854396
## RET_FT4 -0.009843601 -0.0013492707 0.00279455864
## PCTFLOAN 0.013921967 -0.0101884062 -0.00250230920
## UGDS_AIAN UGDS_NHPI UGDS_2MOR
## ADM_RATE -0.00008715762 -0.00005079462 -0.0009704476
## SAT_AVG -0.19398334370 -0.10098307600 1.0235497292
## UGDS -9.41053645858 -1.79909269061 9.2164786896
## UGDS_WHITE -0.00258862776 -0.00056980559 -0.0001392742
## UGDS_BLACK -0.00070829013 -0.00019279361 -0.0005953970
## UGDS_HISP -0.00068288545 -0.00004221583 -0.0004574627
## UGDS_ASIAN -0.00023905014 0.00011943681 0.0003545936
## UGDS_AIAN 0.00469859540 0.00000242073 -0.0000982416
## UGDS_NHPI 0.00000242073 0.00069664266 0.0000556151
## UGDS_2MOR -0.00009824162 0.00005561508 0.0007657043
## UGDS_NRA -0.00023532369 -0.00003241483 0.0001630202
## UGDS_UNKN -0.00011546092 -0.00002626496 0.0000217808
## PPTUG_EF 0.00032368053 0.00008230183 -0.0004914570
## NPT4_PUB NA NA NA
## NPT4_PRIV -65.63268523242 -3.86793811331 64.4358427011
## COSTT4_A -137.20053279940 -22.07545476995 134.2530488192
## TUITFTE -66.79438955305 -10.23563023858 81.6440330612
## INEXPFTE -15.72634727466 -7.20926909172 62.1188635466
## PFTFAC 0.00015439198 -0.00004781543 -0.0000838565
## PCTPELL 0.00185053150 0.00036780322 -0.0012181417
## C150_4 -0.00114052948 0.00029056971 0.0014005282
## PFTFTUG1_EF -0.00056061368 -0.00030392701 0.0008282901
## RET_FT4 -0.00114303563 -0.00027706113 0.0006279306
## PCTFLOAN -0.00165586128 -0.00033147641 0.0008719972
## UGDS_NRA UGDS_UNKN PPTUG_EF NPT4_PUB
## ADM_RATE -0.00293704245 -0.00024045579 0.0015815362 NA
## SAT_AVG 2.75219870289 -1.17167007263 -6.2348550578 NA
## UGDS 15.93747031261 19.54115695326 -3.9635885352 NA
## UGDS_WHITE -0.00380360334 -0.00229702258 -0.0069973397 NA
## UGDS_BLACK -0.00194716522 -0.00057036753 0.0061059666 NA
## UGDS_HISP -0.00125282009 -0.00177865750 0.0020299213 NA
## UGDS_ASIAN 0.00130168874 -0.00000959184 -0.0011491838 NA
## UGDS_AIAN -0.00023532369 -0.00011546092 0.0003236805 NA
## UGDS_NHPI -0.00003241483 -0.00002626496 0.0000823018 NA
## UGDS_2MOR 0.00016302017 0.00002178079 -0.0004914570 NA
## UGDS_NRA 0.00591900769 0.00000790681 -0.0019970661 NA
## UGDS_UNKN 0.00000790681 0.00490897074 0.0020931840 NA
## PPTUG_EF -0.00199706606 0.00209318398 0.0463292422 NA
## NPT4_PUB NA NA NA NA
## NPT4_PRIV 127.24092957824 111.26084492356 -235.0124229864 NA
## COSTT4_A 260.27267622177 150.10181839110 -919.5262339096 NA
## TUITFTE 87.86793405931 70.08076511929 511.3895290661 NA
## INEXPFTE 117.19028369426 -29.00857894024 74.8328648514 NA
## PFTFAC -0.00072988636 -0.00149102242 -0.0092037444 NA
## PCTPELL -0.00498958210 -0.00153071952 0.0039124762 NA
## C150_4 0.00289521278 -0.00079732767 -0.0120833723 NA
## PFTFTUG1_EF 0.00238261759 -0.00156715181 -0.0303882137 NA
## RET_FT4 0.00245667251 -0.00054661046 -0.0079115045 NA
## PCTFLOAN -0.00388669549 0.00248778217 0.0028908245 NA
## NPT4_PRIV COSTT4_A TUITFTE INEXPFTE
## ADM_RATE -178.87933 -800.1455 -404.4905 -853.10348
## SAT_AVG 320389.90405 1113884.9737 517356.7502 1032819.72887
## UGDS 5167513.08795 10438624.2218 3227969.2893 3532012.28581
## UGDS_WHITE 258.69206 588.7293 -97.1616 48.65996
## UGDS_BLACK -197.30605 -633.3163 37.6404 -143.81085
## UGDS_HISP -405.84242 -604.4283 -203.0514 -202.82399
## UGDS_ASIAN 112.52150 266.3787 96.4618 173.51246
## UGDS_AIAN -65.63269 -137.2005 -66.7944 -15.72635
## UGDS_NHPI -3.86794 -22.0755 -10.2356 -7.20927
## UGDS_2MOR 64.43584 134.2530 81.6440 62.11886
## UGDS_NRA 127.24093 260.2727 87.8679 117.19028
## UGDS_UNKN 111.26084 150.1018 70.0808 -29.00858
## PPTUG_EF -235.01242 -919.5262 511.3895 74.83286
## NPT4_PUB NA NA NA NA
## NPT4_PRIV 58731353.43213 79611798.0941 38021056.4114 19866656.71815
## COSTT4_A 79611798.09413 184451822.5193 71409621.9854 65152783.84114
## TUITFTE 38021056.41143 71409621.9854 953139208.7634 542929991.97454
## INEXPFTE 19866656.71815 65152783.8411 542929991.9745 413170211.25653
## PFTFAC -75.20620 316.5501 294.9318 589.58433
## PCTPELL -839.79281 -1882.0627 -896.8566 -897.30906
## C150_4 708.65096 1800.9177 731.5198 961.19691
## PFTFTUG1_EF 424.01687 1497.0406 519.0187 681.79408
## RET_FT4 344.58691 1051.2906 425.5470 563.24991
## PCTFLOAN 526.22334 420.6266 -243.4516 -688.55275
## PFTFAC PCTPELL C150_4 PFTFTUG1_EF
## ADM_RATE -0.0033877668 0.005326634 -0.013251860 -0.007776027
## SAT_AVG 7.6235039809 -16.304788297 20.460465273 12.719346411
## UGDS 23.9419796076 -122.886799153 131.298927028 73.508015485
## UGDS_WHITE 0.0070972925 -0.023807666 0.009853973 0.004431194
## UGDS_BLACK -0.0014201421 0.017144172 -0.012219382 -0.005915548
## UGDS_HISP -0.0032362485 0.014532991 -0.004649648 -0.000683820
## UGDS_ASIAN -0.0004979777 -0.002603472 0.004144713 0.001388544
## UGDS_AIAN 0.0001543920 0.001850532 -0.001140529 -0.000560614
## UGDS_NHPI -0.0000478154 0.000367803 0.000290570 -0.000303927
## UGDS_2MOR -0.0000838565 -0.001218142 0.001400528 0.000828290
## UGDS_NRA -0.0007298864 -0.004989582 0.002895213 0.002382618
## UGDS_UNKN -0.0014910224 -0.001530720 -0.000797328 -0.001567152
## PPTUG_EF -0.0092037444 0.003912476 -0.012083372 -0.030388214
## NPT4_PUB NA NA NA NA
## NPT4_PRIV -75.2061973363 -839.792811020 708.650963767 424.016870727
## COSTT4_A 316.5500623777 -1882.062735106 1800.917662807 1497.040594097
## TUITFTE 294.9317664121 -896.856597545 731.519820716 519.018653502
## INEXPFTE 589.5843259118 -897.309061919 961.196907620 681.794083882
## PFTFAC 0.0826429088 -0.007665825 0.010467380 0.013912811
## PCTPELL -0.0076658250 0.046162849 -0.024428260 -0.012576102
## C150_4 0.0104673800 -0.024428260 0.045770300 0.020763589
## PFTFTUG1_EF 0.0139128113 -0.012576102 0.020763589 0.061324537
## RET_FT4 0.0067340070 -0.016154747 0.019680184 0.015086202
## PCTFLOAN -0.0043503049 0.009849093 -0.005069604 -0.004878319
## RET_FT4 PCTFLOAN
## ADM_RATE -0.005604484 0.002975208
## SAT_AVG 13.333832398 -16.271363429
## UGDS 95.729509510 -16.270877988
## UGDS_WHITE 0.007280116 0.001903675
## UGDS_BLACK -0.009843601 0.013921967
## UGDS_HISP -0.001349271 -0.010188406
## UGDS_ASIAN 0.002794559 -0.002502309
## UGDS_AIAN -0.001143036 -0.001655861
## UGDS_NHPI -0.000277061 -0.000331476
## UGDS_2MOR 0.000627931 0.000871997
## UGDS_NRA 0.002456673 -0.003886695
## UGDS_UNKN -0.000546610 0.002487782
## PPTUG_EF -0.007911504 0.002890825
## NPT4_PUB NA NA
## NPT4_PRIV 344.586910636 526.223336920
## COSTT4_A 1051.290593681 420.626556191
## TUITFTE 425.547029882 -243.451586630
## INEXPFTE 563.249908317 -688.552747737
## PFTFAC 0.006734007 -0.004350305
## PCTPELL -0.016154747 0.009849093
## C150_4 0.019680184 -0.005069604
## PFTFTUG1_EF 0.015086202 -0.004878319
## RET_FT4 0.028206445 -0.010527662
## PCTFLOAN -0.010527662 0.065085252
-Correlation matrix for private non-profit schools:
correlation_matrix_np<-cor(projectdata_np_matrix_quant, y = projectdata_np_matrix_quant, use = "pairwise.complete.obs", method = "pearson")
print(correlation_matrix_np)
## ADM_RATE SAT_AVG UGDS UGDS_WHITE UGDS_BLACK
## ADM_RATE 1.00000000 -0.4081278 -0.17562870 0.25171516 -0.08274692
## SAT_AVG -0.40812784 1.0000000 0.34235941 0.17115911 -0.46664959
## UGDS -0.17562870 0.3423594 1.00000000 -0.02420888 -0.06464640
## UGDS_WHITE 0.25171516 0.1711591 -0.02420888 1.00000000 -0.52324468
## UGDS_BLACK -0.08274692 -0.4666496 -0.06464640 -0.52324468 1.00000000
## UGDS_HISP -0.07168950 -0.0103186 0.01804891 -0.56581410 -0.13118957
## UGDS_ASIAN -0.25105772 0.5559890 0.10116532 -0.21621759 -0.12879601
## UGDS_AIAN -0.03056691 -0.1035242 -0.04494291 -0.13560770 -0.05078120
## UGDS_NHPI -0.00781908 -0.0873136 -0.02231410 -0.07752119 -0.03589744
## UGDS_2MOR -0.18793468 0.3417968 0.10903484 -0.01807336 -0.10574307
## UGDS_NRA -0.20472686 0.3520770 0.06781500 -0.17752909 -0.12438104
## UGDS_UNKN -0.01730108 -0.1460293 0.09130317 -0.11772503 -0.04000695
## PPTUG_EF 0.04644420 -0.3340795 -0.00601116 -0.11736925 0.13905951
## NPT4_PUB NA NA NA NA NA
## NPT4_PRIV -0.11602308 0.3736482 0.22292129 0.12262826 -0.12553620
## COSTT4_A -0.29794493 0.7097195 0.24921743 0.16331019 -0.23390664
## TUITFTE -0.27089575 0.5726043 0.03241599 -0.01072718 0.00567438
## INEXPFTE -0.41632623 0.6722937 0.05700451 0.00863415 -0.03484281
## PFTFAC -0.06060472 0.2013073 0.02679969 0.09642936 -0.02573481
## PCTPELL 0.13760766 -0.7355873 -0.18669795 -0.39867084 0.39103271
## C150_4 -0.32304393 0.8121478 0.19280411 0.17772058 -0.29202472
## PFTFTUG1_EF -0.15745050 0.4259549 0.10212831 0.06778534 -0.12121931
## RET_FT4 -0.18620908 0.7568601 0.17878605 0.16663777 -0.29896125
## PCTFLOAN 0.06276960 -0.6526462 -0.02081854 0.02684698 0.26742511
## UGDS_HISP UGDS_ASIAN UGDS_AIAN UGDS_NHPI UGDS_2MOR
## ADM_RATE -0.07168950 -0.25105772 -0.03056691 -0.00781908 -0.1879347
## SAT_AVG -0.01031857 0.55598897 -0.10352416 -0.08731361 0.3417968
## UGDS 0.01804891 0.10116532 -0.04494291 -0.02231410 0.1090348
## UGDS_WHITE -0.56581410 -0.21621759 -0.13560770 -0.07752119 -0.0180734
## UGDS_BLACK -0.13118957 -0.12879601 -0.05078120 -0.03589744 -0.1057431
## UGDS_HISP 1.00000000 -0.01160020 -0.04807470 -0.00771832 -0.0797770
## UGDS_ASIAN -0.01160020 1.00000000 -0.05123882 0.06648556 0.1882758
## UGDS_AIAN -0.04807470 -0.05123882 1.00000000 0.00133801 -0.0517942
## UGDS_NHPI -0.00771832 0.06648556 0.00133801 1.00000000 0.0761477
## UGDS_2MOR -0.07977704 0.18827579 -0.05179418 0.07614773 1.0000000
## UGDS_NRA -0.07858089 0.24858611 -0.04462284 -0.01596299 0.0765749
## UGDS_UNKN -0.12250387 -0.00201141 -0.02404118 -0.01420288 0.0112343
## PPTUG_EF 0.04537615 -0.08136710 0.02185314 0.01443016 -0.0825216
## NPT4_PUB NA NA NA NA NA
## NPT4_PRIV -0.25010803 0.22724819 -0.11888230 -0.01829407 0.3492992
## COSTT4_A -0.23325534 0.33173370 -0.13678672 -0.05735471 0.4163865
## TUITFTE -0.03022032 0.04347091 -0.02988263 -0.01189237 0.0905414
## INEXPFTE -0.04851425 0.12566981 -0.01130743 -0.01346177 0.1107142
## PFTFAC -0.05884598 -0.03213546 0.00747275 -0.02076612 -0.0108205
## PCTPELL 0.32541626 -0.18464830 0.12516230 0.06460384 -0.2048507
## C150_4 -0.11925499 0.32150919 -0.08935202 0.04626030 0.2849257
## PFTFTUG1_EF -0.01448851 0.09489338 -0.03064254 -0.04417119 0.1422659
## RET_FT4 -0.04318268 0.27678361 -0.11404995 -0.05612715 0.1585182
## PCTFLOAN -0.19213015 -0.14946458 -0.09432048 -0.04903436 0.1234979
## UGDS_NRA UGDS_UNKN PPTUG_EF NPT4_PUB NPT4_PRIV
## ADM_RATE -0.20472686 -0.01730108 0.04644420 NA -0.1160231
## SAT_AVG 0.35207705 -0.14602928 -0.33407952 NA 0.3736482
## UGDS 0.06781500 0.09130317 -0.00601116 NA 0.2229213
## UGDS_WHITE -0.17752909 -0.11772503 -0.11736925 NA 0.1226283
## UGDS_BLACK -0.12438104 -0.04000695 0.13905951 NA -0.1255362
## UGDS_HISP -0.07858089 -0.12250387 0.04537615 NA -0.2501080
## UGDS_ASIAN 0.24858611 -0.00201141 -0.08136710 NA 0.2272482
## UGDS_AIAN -0.04462284 -0.02404118 0.02185314 NA -0.1188823
## UGDS_NHPI -0.01596299 -0.01420288 0.01443016 NA -0.0182941
## UGDS_2MOR 0.07657488 0.01123435 -0.08252155 NA 0.3492992
## UGDS_NRA 1.00000000 0.00146684 -0.12051537 NA 0.2211421
## UGDS_UNKN 0.00146684 1.00000000 0.13869313 NA 0.2174576
## PPTUG_EF -0.12051537 0.13869313 1.00000000 NA -0.1691922
## NPT4_PUB NA NA NA NA NA
## NPT4_PRIV 0.22114209 0.21745762 -0.16919218 NA 1.0000000
## COSTT4_A 0.25016485 0.16772262 -0.37104040 NA 0.7717459
## TUITFTE 0.03503441 0.03068879 0.07266169 NA 0.6739957
## INEXPFTE 0.07509536 -0.02041571 0.01709140 NA 0.2770262
## PFTFAC -0.03214406 -0.07302097 -0.16108951 NA -0.0358702
## PCTPELL -0.30162399 -0.10164310 0.08493861 NA -0.5168906
## C150_4 0.18891061 -0.05468976 -0.33811189 NA 0.4487164
## PFTFTUG1_EF 0.11982486 -0.09769252 -0.69149972 NA 0.2261556
## RET_FT4 0.19833296 -0.04851303 -0.28669646 NA 0.2753361
## PCTFLOAN -0.19787332 0.13912323 0.05297763 NA 0.2765490
## COSTT4_A TUITFTE INEXPFTE PFTFAC PCTPELL
## ADM_RATE -0.2979449 -0.27089575 -0.41632623 -0.06060472 0.1376077
## SAT_AVG 0.7097195 0.57260434 0.67229367 0.20130734 -0.7355873
## UGDS 0.2492174 0.03241599 0.05700451 0.02679969 -0.1866980
## UGDS_WHITE 0.1633102 -0.01072718 0.00863415 0.09642936 -0.3986708
## UGDS_BLACK -0.2339066 0.00567438 -0.03484281 -0.02573481 0.3910327
## UGDS_HISP -0.2332553 -0.03022032 -0.04851425 -0.05884598 0.3254163
## UGDS_ASIAN 0.3317337 0.04347091 0.12566981 -0.03213546 -0.1846483
## UGDS_AIAN -0.1367867 -0.02988263 -0.01130743 0.00747275 0.1251623
## UGDS_NHPI -0.0573547 -0.01189237 -0.01346177 -0.02076612 0.0646038
## UGDS_2MOR 0.4163865 0.09054137 0.11071416 -0.01082051 -0.2048507
## UGDS_NRA 0.2501649 0.03503441 0.07509536 -0.03214406 -0.3016240
## UGDS_UNKN 0.1677226 0.03068879 -0.02041571 -0.07302097 -0.1016431
## PPTUG_EF -0.3710404 0.07266169 0.01709140 -0.16108951 0.0849386
## NPT4_PUB NA NA NA NA NA
## NPT4_PRIV 0.7717459 0.67399569 0.27702619 -0.03587016 -0.5168906
## COSTT4_A 1.0000000 0.73577001 0.50867706 0.08439357 -0.6749549
## TUITFTE 0.7357700 1.00000000 0.86516943 0.03028048 -0.1279547
## INEXPFTE 0.5086771 0.86516943 1.00000000 0.09467623 -0.2057503
## PFTFAC 0.0843936 0.03028048 0.09467623 1.00000000 -0.1335008
## PCTPELL -0.6749549 -0.12795475 -0.20575025 -0.13350079 1.0000000
## C150_4 0.6405116 0.48939824 0.47243477 0.18015840 -0.5672350
## PFTFTUG1_EF 0.4478425 0.29162092 0.29248156 0.20648485 -0.2528092
## RET_FT4 0.4744694 0.35959455 0.35140911 0.15085146 -0.4769415
## PCTFLOAN 0.1292588 -0.02923821 -0.13290468 -0.06808458 0.1796837
## C150_4 PFTFTUG1_EF RET_FT4 PCTFLOAN
## ADM_RATE -0.3230439 -0.1574505 -0.1862091 0.0627696
## SAT_AVG 0.8121478 0.4259549 0.7568601 -0.6526462
## UGDS 0.1928041 0.1021283 0.1787861 -0.0208185
## UGDS_WHITE 0.1777206 0.0677853 0.1666378 0.0268470
## UGDS_BLACK -0.2920247 -0.1212193 -0.2989613 0.2674251
## UGDS_HISP -0.1192550 -0.0144885 -0.0431827 -0.1921301
## UGDS_ASIAN 0.3215092 0.0948934 0.2767836 -0.1494646
## UGDS_AIAN -0.0893520 -0.0306425 -0.1140499 -0.0943205
## UGDS_NHPI 0.0462603 -0.0441712 -0.0561271 -0.0490344
## UGDS_2MOR 0.2849257 0.1422659 0.1585182 0.1234979
## UGDS_NRA 0.1889106 0.1198249 0.1983330 -0.1978733
## UGDS_UNKN -0.0546898 -0.0976925 -0.0485130 0.1391232
## PPTUG_EF -0.3381119 -0.6914997 -0.2866965 0.0529776
## NPT4_PUB NA NA NA NA
## NPT4_PRIV 0.4487164 0.2261556 0.2753361 0.2765490
## COSTT4_A 0.6405116 0.4478425 0.4744694 0.1292588
## TUITFTE 0.4893982 0.2916209 0.3595946 -0.0292382
## INEXPFTE 0.4724348 0.2924816 0.3514091 -0.1329047
## PFTFAC 0.1801584 0.2064848 0.1508515 -0.0680846
## PCTPELL -0.5672350 -0.2528092 -0.4769415 0.1796837
## C150_4 1.0000000 0.4162888 0.5665879 -0.1012121
## PFTFTUG1_EF 0.4162888 1.0000000 0.4010402 -0.0813882
## RET_FT4 0.5665879 0.4010402 1.0000000 -0.2647972
## PCTFLOAN -0.1012121 -0.0813882 -0.2647972 1.0000000
library(corrplot)
corrplot(correlation_matrix_np, method="circle")
The correlation plot indicates a strong positive relation between the SAT_AVG and financial variables, with the exception of PCTPELL and PCFTLOAN which have a negative relation to SAT_AVG. NPT4_PRIV(Average net price for Title IV institutions (private for-profit and nonprofit)) and COSTT4_A (Average cost of attendance) have a negative relation to PCTPELL. The relation of UGDS_WHITE is also negatively related to UGDS_BLACK and UGDS_HISP.
-Covariance matrix for private for-profit schools:
projectdata_fp_matrix_quant<-data.matrix(projectdata_private_fp[7:30], rownames.force = NA)
covar_matrix_fp<-cov(projectdata_fp_matrix_quant, y = projectdata_fp_matrix_quant, use = "pairwise.complete.obs", method = "pearson")
print(covar_matrix_fp)
## ADM_RATE SAT_AVG UGDS
## ADM_RATE 0.030185280 -5.8130800 -47.8750644
## SAT_AVG -5.813080000 16788.6666667 -140465.8000000
## UGDS -47.875064386 -140465.8000000 10113641.9849641
## UGDS_WHITE -0.001894686 -1.0583733 -34.9333201
## UGDS_BLACK -0.000591637 -10.2013067 0.0174207
## UGDS_HISP 0.002906497 -2.5640133 1.1417769
## UGDS_ASIAN -0.001526879 2.2801600 -2.5344615
## UGDS_AIAN 0.000267522 -0.1828133 -0.1088131
## UGDS_NHPI 0.000175481 -0.0592467 0.4945693
## UGDS_2MOR 0.000302380 1.6495800 5.1066832
## UGDS_NRA -0.000460534 7.6161667 1.8725798
## UGDS_UNKN 0.000821507 2.5142600 29.2327989
## PPTUG_EF -0.003562037 -14.8132867 33.2051063
## NPT4_PUB NA NA NA
## NPT4_PRIV -263.540648813 79303.2000000 937483.0697138
## COSTT4_A -282.334004175 101399.0000000 -1054563.6294216
## TUITFTE -54.828833594 364376.3333333 361288.6912671
## INEXPFTE -70.297512995 367148.6666667 -525283.5728357
## PFTFAC -0.003889060 15.0304000 -66.6861893
## PCTPELL 0.006797696 -8.2362400 -10.1543959
## C150_4 -0.002131956 14.3739467 -100.9311464
## PFTFTUG1_EF 0.016947413 15.7048733 -132.1670958
## RET_FT4 -0.008682872 11.3060667 -115.8528014
## PCTFLOAN -0.000922060 -7.8462867 3.2957711
## UGDS_WHITE UGDS_BLACK UGDS_HISP UGDS_ASIAN
## ADM_RATE -0.00189468641 -0.000591637 0.0029064968 -0.00152687922
## SAT_AVG -1.05837333333 -10.201306667 -2.5640133333 2.28016000000
## UGDS -34.93332007912 0.017420719 1.1417768701 -2.53446151445
## UGDS_WHITE 0.08337790966 -0.033444264 -0.0363193825 -0.00468934410
## UGDS_BLACK -0.03344426402 0.060892007 -0.0190279862 -0.00284058207
## UGDS_HISP -0.03631938248 -0.019027986 0.0603273507 0.00006762256
## UGDS_ASIAN -0.00468934410 -0.002840582 0.0000676226 0.00746468639
## UGDS_AIAN 0.00048268090 -0.000704966 -0.0001546391 -0.00006426224
## UGDS_NHPI -0.00041816930 -0.000402310 -0.0000557663 0.00027503423
## UGDS_2MOR -0.00000307124 -0.000673063 -0.0009452677 -0.00003839737
## UGDS_NRA -0.00110480882 -0.000602317 -0.0000148682 0.00027930027
## UGDS_UNKN -0.00762612894 -0.003052765 -0.0037610664 -0.00043539328
## PPTUG_EF -0.00102267550 0.004049642 -0.0045413167 0.00141289628
## NPT4_PUB NA NA NA NA
## NPT4_PRIV -124.00657978294 114.638361612 -218.0677274519 6.73071699699
## COSTT4_A 136.62126605727 107.193396071 -390.8326208749 68.71750661478
## TUITFTE 25.82910801684 29.934961988 -230.7020828285 4.95773438022
## INEXPFTE 76.44523790220 -32.574193794 -129.6399315509 -1.96172537035
## PFTFAC 0.00370567694 -0.008937824 0.0034985059 -0.00000125873
## PCTPELL -0.01429712612 0.010864005 0.0075761234 -0.00245861940
## C150_4 0.00266467594 -0.009138176 0.0083801056 0.00228254530
## PFTFTUG1_EF 0.00338693415 -0.001067939 0.0037227346 -0.00130161746
## RET_FT4 0.00803120835 -0.011971118 0.0036121878 0.00190125603
## PCTFLOAN 0.00731198380 0.008454763 -0.0155242172 -0.00452514202
## UGDS_AIAN UGDS_NHPI UGDS_2MOR UGDS_NRA
## ADM_RATE 0.0002675216 0.00017548106 0.00030238000 -0.00046053363
## SAT_AVG -0.1828133333 -0.05924666667 1.64958000000 7.61616666667
## UGDS -0.1088131472 0.49456925613 5.10668321178 1.87257975607
## UGDS_WHITE 0.0004826809 -0.00041816930 -0.00000307124 -0.00110480882
## UGDS_BLACK -0.0007049658 -0.00040231046 -0.00067306258 -0.00060231669
## UGDS_HISP -0.0001546391 -0.00005576631 -0.00094526771 -0.00001486824
## UGDS_ASIAN -0.0000642622 0.00027503423 -0.00003839737 0.00027930027
## UGDS_AIAN 0.0004743664 0.00001114916 0.00005467154 -0.00002517022
## UGDS_NHPI 0.0000111492 0.00051267416 0.00006369530 -0.00000667052
## UGDS_2MOR 0.0000546715 0.00006369530 0.00110916191 -0.00005444729
## UGDS_NRA -0.0000251702 -0.00000667052 -0.00005444729 0.00155199495
## UGDS_UNKN -0.0000691083 0.00002305510 0.00050016566 -0.00001969338
## PPTUG_EF -0.0003013514 0.00000252910 0.00025562899 0.00008796026
## NPT4_PUB NA NA NA NA
## NPT4_PRIV -1.5991696091 2.58375807661 33.88250271621 41.99022906075
## COSTT4_A 5.0136015877 -0.05355118625 14.02868667298 58.64872513685
## TUITFTE -0.5818547961 8.88246206522 20.83927416210 47.63389258223
## INEXPFTE -2.2562229522 6.67966354700 9.21244940657 23.67812145687
## PFTFAC 0.0004505304 -0.00011253086 0.00020654764 0.00113484903
## PCTPELL 0.0001131439 -0.00015155570 -0.00015426804 -0.00121552725
## C150_4 -0.0000723199 0.00035142700 -0.00029852673 0.00027380861
## PFTFTUG1_EF 0.0000352866 -0.00081739147 0.00010670493 -0.00007699709
## RET_FT4 0.0000872715 0.00024518451 -0.00009552644 0.00184503234
## PCTFLOAN 0.0000770569 -0.00009339149 0.00094653304 -0.00093168696
## UGDS_UNKN PPTUG_EF NPT4_PUB NPT4_PRIV
## ADM_RATE 0.0008215073 -0.0035620368 NA -263.54065
## SAT_AVG 2.5142600000 -14.8132866667 NA 79303.20000
## UGDS 29.2327989294 33.2051063463 NA 937483.06971
## UGDS_WHITE -0.0076261289 -0.0010226755 NA -124.00658
## UGDS_BLACK -0.0030527650 0.0040496424 NA 114.63836
## UGDS_HISP -0.0037610664 -0.0045413167 NA -218.06773
## UGDS_ASIAN -0.0004353933 0.0014128963 NA 6.73072
## UGDS_AIAN -0.0000691083 -0.0003013514 NA -1.59917
## UGDS_NHPI 0.0000230551 0.0000025291 NA 2.58376
## UGDS_2MOR 0.0005001657 0.0002556290 NA 33.88250
## UGDS_NRA -0.0000196934 0.0000879603 NA 41.99023
## UGDS_UNKN 0.0144737638 0.0000566491 NA 143.85835
## PPTUG_EF 0.0000566491 0.0570955492 NA 94.44090
## NPT4_PUB NA NA NA NA
## NPT4_PRIV 143.8583517192 94.4408979587 NA 47426314.07995
## COSTT4_A 0.6722633108 78.1106850835 NA 36529821.66504
## TUITFTE 93.5043605247 173.5878360520 NA 21807216.30778
## INEXPFTE 50.3107405402 23.3256439843 NA 4297221.04031
## PFTFAC -0.0000444073 -0.0170696659 NA -25.05588
## PCTPELL -0.0000967285 -0.0039372726 NA -132.61733
## C150_4 -0.0044432380 -0.0089963812 NA 199.98953
## PFTFTUG1_EF -0.0039881814 -0.0428607141 NA -37.57500
## RET_FT4 -0.0036557521 -0.0133598090 NA 295.01101
## PCTFLOAN 0.0043713202 0.0012708673 NA 577.29911
## COSTT4_A TUITFTE INEXPFTE
## ADM_RATE -282.3340042 -54.828834 -70.29751
## SAT_AVG 101399.0000000 364376.333333 367148.66667
## UGDS -1054563.6294216 361288.691267 -525283.57284
## UGDS_WHITE 136.6212661 25.829108 76.44524
## UGDS_BLACK 107.1933961 29.934962 -32.57419
## UGDS_HISP -390.8326209 -230.702083 -129.63993
## UGDS_ASIAN 68.7175066 4.957734 -1.96173
## UGDS_AIAN 5.0136016 -0.581855 -2.25622
## UGDS_NHPI -0.0535512 8.882462 6.67966
## UGDS_2MOR 14.0286867 20.839274 9.21245
## UGDS_NRA 58.6487251 47.633893 23.67812
## UGDS_UNKN 0.6722633 93.504361 50.31074
## PPTUG_EF 78.1106851 173.587836 23.32564
## NPT4_PUB NA NA NA
## NPT4_PRIV 36529821.6650392 21807216.307780 4297221.04031
## COSTT4_A 36438414.8549606 18136430.015341 5311616.67379
## TUITFTE 18136430.0153409 70227037.783148 26923072.09514
## INEXPFTE 5311616.6737896 26923072.095138 23783911.59706
## PFTFAC 152.3768831 26.077511 254.88947
## PCTPELL -355.9557290 -191.643388 -109.74078
## C150_4 195.4150430 88.350315 108.25320
## PFTFTUG1_EF -84.2746411 200.759507 -85.61791
## RET_FT4 292.0636731 226.081248 87.75900
## PCTFLOAN 253.9930416 337.304509 32.63181
## PFTFAC PCTPELL C150_4
## ADM_RATE -0.00388906044 0.0067976960 -0.0021319555
## SAT_AVG 15.03040000000 -8.2362400000 14.3739466667
## UGDS -66.68618928737 -10.1543958705 -100.9311463560
## UGDS_WHITE 0.00370567694 -0.0142971261 0.0026646759
## UGDS_BLACK -0.00893782448 0.0108640046 -0.0091381759
## UGDS_HISP 0.00349850594 0.0075761234 0.0083801056
## UGDS_ASIAN -0.00000125873 -0.0024586194 0.0022825453
## UGDS_AIAN 0.00045053036 0.0001131439 -0.0000723199
## UGDS_NHPI -0.00011253086 -0.0001515557 0.0003514270
## UGDS_2MOR 0.00020654764 -0.0001542680 -0.0002985267
## UGDS_NRA 0.00113484903 -0.0012155272 0.0002738086
## UGDS_UNKN -0.00004440726 -0.0000967285 -0.0044432380
## PPTUG_EF -0.01706966589 -0.0039372726 -0.0089963812
## NPT4_PUB NA NA NA
## NPT4_PRIV -25.05588485374 -132.6173302476 199.9895334019
## COSTT4_A 152.37688310732 -355.9557290053 195.4150430381
## TUITFTE 26.07751071030 -191.6433881382 88.3503145909
## INEXPFTE 254.88947315842 -109.7407820002 108.2532014480
## PFTFAC 0.07681148567 -0.0079480216 0.0160575752
## PCTPELL -0.00794802163 0.0397572102 -0.0028238148
## C150_4 0.01605757515 -0.0028238148 0.0417717027
## PFTFTUG1_EF 0.00099900570 0.0119318084 0.0115778778
## RET_FT4 0.01481703235 -0.0063026671 0.0210652247
## PCTFLOAN -0.00479599544 0.0220653582 0.0000224020
## PFTFTUG1_EF RET_FT4 PCTFLOAN
## ADM_RATE 0.0169474129 -0.0086828725 -0.0009220602
## SAT_AVG 15.7048733333 11.3060666667 -7.8462866667
## UGDS -132.1670957760 -115.8528014216 3.2957710673
## UGDS_WHITE 0.0033869341 0.0080312084 0.0073119838
## UGDS_BLACK -0.0010679388 -0.0119711183 0.0084547628
## UGDS_HISP 0.0037227346 0.0036121878 -0.0155242172
## UGDS_ASIAN -0.0013016175 0.0019012560 -0.0045251420
## UGDS_AIAN 0.0000352866 0.0000872715 0.0000770569
## UGDS_NHPI -0.0008173915 0.0002451845 -0.0000933915
## UGDS_2MOR 0.0001067049 -0.0000955264 0.0009465330
## UGDS_NRA -0.0000769971 0.0018450323 -0.0009316870
## UGDS_UNKN -0.0039881814 -0.0036557521 0.0043713202
## PPTUG_EF -0.0428607141 -0.0133598090 0.0012708673
## NPT4_PUB NA NA NA
## NPT4_PRIV -37.5749984908 295.0110115608 577.2991089007
## COSTT4_A -84.2746410746 292.0636731489 253.9930416133
## TUITFTE 200.7595069851 226.0812475434 337.3045088767
## INEXPFTE -85.6179147112 87.7589996572 32.6318137137
## PFTFAC 0.0009990057 0.0148170324 -0.0047959954
## PCTPELL 0.0119318084 -0.0063026671 0.0220653582
## C150_4 0.0115778778 0.0210652247 0.0000224020
## PFTFTUG1_EF 0.0941052022 0.0129155141 0.0019342202
## RET_FT4 0.0129155141 0.0774854966 0.0001649988
## PCTFLOAN 0.0019342202 0.0001649988 0.0627023010
-Correlation matrix for private for-profit schools:
correlation_matrix_fp<-cor(projectdata_fp_matrix_quant, y = projectdata_fp_matrix_quant, use = "pairwise.complete.obs", method = "pearson")
print(correlation_matrix_fp)
## ADM_RATE SAT_AVG UGDS UGDS_WHITE UGDS_BLACK
## ADM_RATE 1.0000000 -0.5568969 -0.1139367866 -0.045495357 -0.0165705203
## SAT_AVG -0.5568969 1.0000000 -0.5145798977 -0.048155961 -0.9272817308
## UGDS -0.1139368 -0.5145799 1.0000000000 -0.038041758 0.0000221989
## UGDS_WHITE -0.0454954 -0.0481560 -0.0380417576 1.000000000 -0.4693707255
## UGDS_BLACK -0.0165705 -0.9272817 0.0000221989 -0.469370725 1.0000000000
## UGDS_HISP 0.0781878 -0.4483872 0.0014617407 -0.512101240 -0.3139464378
## UGDS_ASIAN -0.1308863 0.3712411 -0.0092241452 -0.187966657 -0.1332359478
## UGDS_AIAN 0.0760779 -0.4142103 -0.0015709799 0.076749873 -0.1311689212
## UGDS_NHPI 0.0532776 -0.2407702 0.0068683531 -0.063959627 -0.0720046366
## UGDS_2MOR 0.0681265 0.6210040 0.0482156090 -0.000319368 -0.0818988554
## UGDS_NRA -0.0734653 0.4827147 0.0149465646 -0.097121746 -0.0619583156
## UGDS_UNKN 0.0570248 0.1361392 0.0764057197 -0.219526979 -0.1028306291
## PPTUG_EF -0.0992728 -0.5284660 0.0436462386 -0.014834485 0.0686640249
## NPT4_PUB NA NA NA NA NA
## NPT4_PRIV -0.2451089 0.0609736 0.0417131129 -0.062326404 0.0673526370
## COSTT4_A -0.2903676 0.0816682 -0.0306653773 0.086925768 0.0770745745
## TUITFTE -0.0513230 0.2889447 0.0138040814 0.010879298 0.0147653811
## INEXPFTE -0.1283049 0.6052686 -0.0339516154 0.054469751 -0.0271802286
## PFTFAC -0.0800636 0.4105594 -0.0398558401 0.055537330 -0.1453901634
## PCTPELL 0.2163453 -0.5351338 -0.0159882678 -0.248328018 0.2206885528
## C150_4 -0.0838293 0.6311848 -0.0637333790 0.055175020 -0.2216000958
## PFTFTUG1_EF 0.3171937 0.5718464 -0.1658424506 0.042691099 -0.0150193033
## RET_FT4 -0.1735640 0.6074220 -0.0462813508 0.121976395 -0.2041337024
## PCTFLOAN -0.0276072 -0.3479708 0.0041321521 0.101130962 0.1367613052
## UGDS_HISP UGDS_ASIAN UGDS_AIAN UGDS_NHPI
## ADM_RATE 0.07818781 -0.1308863132 0.07607786 0.053277614
## SAT_AVG -0.44838718 0.3712411163 -0.41421026 -0.240770245
## UGDS 0.00146174 -0.0092241452 -0.00157098 0.006868353
## UGDS_WHITE -0.51210124 -0.1879666569 0.07674987 -0.063959627
## UGDS_BLACK -0.31394644 -0.1332359478 -0.13116892 -0.072004637
## UGDS_HISP 1.00000000 0.0031866086 -0.02890716 -0.010027533
## UGDS_ASIAN 0.00318661 1.0000000000 -0.03415019 0.140591883
## UGDS_AIAN -0.02890716 -0.0341501899 1.00000000 0.022608115
## UGDS_NHPI -0.01002753 0.1405918832 0.02260811 1.000000000
## UGDS_2MOR -0.11555806 -0.0133443745 0.07537145 0.084467412
## UGDS_NRA -0.00153659 0.0820579051 -0.02933492 -0.007478147
## UGDS_UNKN -0.12728087 -0.0418875729 -0.02637438 0.008463609
## PPTUG_EF -0.07734242 0.0686222379 -0.05790028 0.000466929
## NPT4_PUB NA NA NA NA
## NPT4_PRIV -0.12748663 0.0125485808 -0.01058156 0.016373075
## COSTT4_A -0.28911473 0.2417902780 0.05935236 -0.000337176
## TUITFTE -0.11422730 0.0070338168 -0.00324646 0.047668259
## INEXPFTE -0.10858521 -0.0047082497 -0.02129568 0.060640671
## PFTFAC 0.05485111 -0.0000856367 0.07649870 -0.014880133
## PCTPELL 0.15465220 -0.1430376858 0.02604121 -0.033517211
## C150_4 0.20669041 0.2190671611 -0.02037234 0.056560038
## PFTFTUG1_EF 0.06102147 -0.0858416650 0.00738219 -0.097799395
## RET_FT4 0.06455256 0.1548141778 0.02326999 0.025316590
## PCTFLOAN -0.25234268 -0.2096347747 0.01412256 -0.016446551
## UGDS_2MOR UGDS_NRA UGDS_UNKN PPTUG_EF NPT4_PUB
## ADM_RATE 0.068126495 -0.07346531 0.057024811 -0.099272758 NA
## SAT_AVG 0.621003957 0.48271470 0.136139177 -0.528466037 NA
## UGDS 0.048215609 0.01494656 0.076405720 0.043646239 NA
## UGDS_WHITE -0.000319368 -0.09712175 -0.219526979 -0.014834485 NA
## UGDS_BLACK -0.081898855 -0.06195832 -0.102830629 0.068664025 NA
## UGDS_HISP -0.115558055 -0.00153659 -0.127280870 -0.077342418 NA
## UGDS_ASIAN -0.013344375 0.08205791 -0.041887573 0.068622238 NA
## UGDS_AIAN 0.075371449 -0.02933492 -0.026374381 -0.057900285 NA
## UGDS_NHPI 0.084467412 -0.00747815 0.008463609 0.000466929 NA
## UGDS_2MOR 1.000000000 -0.04149862 0.124831921 0.032102090 NA
## UGDS_NRA -0.041498615 1.00000000 -0.004155127 0.010432075 NA
## UGDS_UNKN 0.124831921 -0.00415513 1.000000000 0.001969471 NA
## PPTUG_EF 0.032102090 0.01043207 0.001969471 1.000000000 NA
## NPT4_PUB NA NA NA NA NA
## NPT4_PRIV 0.147230594 0.21158213 0.175455457 0.063226846 NA
## COSTT4_A 0.084108668 0.29786438 0.000745963 0.056007057 NA
## TUITFTE 0.076083794 0.14691938 0.094455492 0.088948630 NA
## INEXPFTE 0.056898153 0.12354459 0.085974477 0.020075514 NA
## PFTFAC 0.025923718 0.11140965 -0.001378630 -0.258219918 NA
## PCTPELL -0.023216129 -0.17268570 -0.004028267 -0.082703168 NA
## C150_4 -0.055908652 0.03438323 -0.136293799 -0.176071281 NA
## PFTFTUG1_EF 0.014081728 -0.00582273 -0.097429529 -0.612138572 NA
## RET_FT4 -0.013354879 0.11908737 -0.080655490 -0.178099556 NA
## PCTFLOAN 0.113428335 -0.10539831 0.144960029 0.021248057 NA
## NPT4_PRIV COSTT4_A TUITFTE INEXPFTE PFTFAC
## ADM_RATE -0.2451089 -0.290367647 -0.05132298 -0.12830488 -0.0800635926
## SAT_AVG 0.0609736 0.081668228 0.28894473 0.60526858 0.4105594313
## UGDS 0.0417131 -0.030665377 0.01380408 -0.03395162 -0.0398558401
## UGDS_WHITE -0.0623264 0.086925768 0.01087930 0.05446975 0.0555373300
## UGDS_BLACK 0.0673526 0.077074574 0.01476538 -0.02718023 -0.1453901634
## UGDS_HISP -0.1274866 -0.289114733 -0.11422730 -0.10858521 0.0548511078
## UGDS_ASIAN 0.0125486 0.241790278 0.00703382 -0.00470825 -0.0000856367
## UGDS_AIAN -0.0105816 0.059352356 -0.00324646 -0.02129568 0.0764986957
## UGDS_NHPI 0.0163731 -0.000337176 0.04766826 0.06064067 -0.0148801326
## UGDS_2MOR 0.1472306 0.084108668 0.07608379 0.05689815 0.0259237181
## UGDS_NRA 0.2115821 0.297864383 0.14691938 0.12354459 0.1114096525
## UGDS_UNKN 0.1754555 0.000745963 0.09445549 0.08597448 -0.0013786304
## PPTUG_EF 0.0632268 0.056007057 0.08894863 0.02007551 -0.2582199178
## NPT4_PUB NA NA NA NA NA
## NPT4_PRIV 1.0000000 0.967357557 0.45727302 0.18241881 -0.0150769478
## COSTT4_A 0.9673576 1.000000000 0.40204302 0.31963997 0.1019986964
## TUITFTE 0.4572730 0.402043024 1.00000000 0.65876543 0.0139443941
## INEXPFTE 0.1824188 0.319639973 0.65876543 1.00000000 0.2956579316
## PFTFAC -0.0150769 0.101998696 0.01394439 0.29565793 1.0000000000
## PCTPELL -0.1017748 -0.332307279 -0.11780181 -0.11328996 -0.1658382295
## C150_4 0.1804407 0.185581964 0.08245811 0.18622118 0.3429518514
## PFTFTUG1_EF -0.0199389 -0.046508764 0.08331965 -0.09867887 0.0133467776
## RET_FT4 0.1756141 0.177174856 0.14915732 0.12505271 0.2309278001
## PCTFLOAN 0.3412980 0.208851934 0.16520422 0.02684142 -0.0878965077
## PCTPELL C150_4 PFTFTUG1_EF RET_FT4 PCTFLOAN
## ADM_RATE 0.21634529 -0.083829278 0.31719373 -0.17356395 -0.027607175
## SAT_AVG -0.53513378 0.631184774 0.57184638 0.60742201 -0.347970803
## UGDS -0.01598827 -0.063733379 -0.16584245 -0.04628135 0.004132152
## UGDS_WHITE -0.24832802 0.055175020 0.04269110 0.12197640 0.101130962
## UGDS_BLACK 0.22068855 -0.221600096 -0.01501930 -0.20413370 0.136761305
## UGDS_HISP 0.15465220 0.206690405 0.06102147 0.06455256 -0.252342682
## UGDS_ASIAN -0.14303769 0.219067161 -0.08584166 0.15481418 -0.209634775
## UGDS_AIAN 0.02604121 -0.020372343 0.00738219 0.02326999 0.014122556
## UGDS_NHPI -0.03351721 0.056560038 -0.09779939 0.02531659 -0.016446551
## UGDS_2MOR -0.02321613 -0.055908652 0.01408173 -0.01335488 0.113428335
## UGDS_NRA -0.17268570 0.034383230 -0.00582273 0.11908737 -0.105398310
## UGDS_UNKN -0.00402827 -0.136293799 -0.09742953 -0.08065549 0.144960029
## PPTUG_EF -0.08270317 -0.176071281 -0.61213857 -0.17809956 0.021248057
## NPT4_PUB NA NA NA NA NA
## NPT4_PRIV -0.10177479 0.180440701 -0.01993887 0.17561405 0.341297971
## COSTT4_A -0.33230728 0.185581964 -0.04650876 0.17717486 0.208851934
## TUITFTE -0.11780181 0.082458112 0.08331965 0.14915732 0.165204223
## INEXPFTE -0.11328996 0.186221181 -0.09867887 0.12505271 0.026841418
## PFTFAC -0.16583823 0.342951851 0.01334678 0.23092780 -0.087896508
## PCTPELL 1.00000000 -0.092793128 0.24013401 -0.14047872 0.441937946
## C150_4 -0.09279313 1.000000000 0.20299115 0.37974183 0.000721788
## PFTFTUG1_EF 0.24013401 0.202991152 1.00000000 0.17019919 0.036983617
## RET_FT4 -0.14047872 0.379741827 0.17019919 1.00000000 0.003430510
## PCTFLOAN 0.44193795 0.000721788 0.03698362 0.00343051 1.000000000
library(corrplot)
corrplot(correlation_matrix_fp, method="circle")
The case of for-profit schools differs from non-profit schools. There is a strong negative relation between UGDS_BLACK and SAT_AVG. Also a relatively weaker negative relation between UGDS_HISP and SAT_AVG. These variables also have a negative covariance. The SAT_AVG has a strong positive relation with INEXPFTE which indicates that instructional expenditures per full-time equivalent student is positively related to the SAT_AVG.These variables also have a strong covariance (367148.66667). SAT_AVG and ADM_RATE are also negatively related with a negative covariance (-5.81308000), which can be an expected result to see that admission rates decrease as institutions accept higher SAT scores.
###Question 8
The analyses indicate that -The relation of SAT_AVG to ADM_RATE is significantly stronger for for-profit private schools than for non-profit private and public schools. -The negative relation of UGDS_BLACK and SAT_AVG is also significantly stronger for for-profit private schools. -The positive relation of SAT_AVG to INEXPFTE is stronger in private schools than in public schools. -PCTPELL (Percentage of undergraduates who receive a Pell Grant) is negatively related to COSTT4_A (Average cost of attendance) for private schools while they have a positive relation in public schools, while they have a negative covariance in each case.
###Question 9
REGRESSION ANALYSIS: y=COSTT4_A, x=SAT_AVG Installing the necessary packages:
library(tidyverse)
## -- Attaching packages ---------------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1 v purrr 0.3.2
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 0.8.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts ------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x ggplot2::%+%() masks psych::%+%()
## x ggplot2::alpha() masks psych::alpha()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggpubr)
## Loading required package: magrittr
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
detach(package:dplyr)
theme_set(theme_pubr())
Scatter Plot for the variables:
ggplot(projectdata, aes(x = SAT_AVG, y = COSTT4_A)) +
geom_point() +
stat_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 6405 rows containing non-finite values (stat_smooth).
## Warning: Removed 6405 rows containing missing values (geom_point).
cor(x=projectdata$SAT_AVG, y = projectdata$COSTT4_A, use = "complete.obs",
method = "pearson")
## [1] 0.521171
Looking for best-fit line:
model_9 <- lm(COSTT4_A ~ SAT_AVG, data = projectdata)
model_9
##
## Call:
## lm(formula = COSTT4_A ~ SAT_AVG, data = projectdata)
##
## Coefficients:
## (Intercept) SAT_AVG
## -22454.5 51.9
Plotting with regression line:
ggplot(projectdata, aes(COSTT4_A, SAT_AVG)) +
geom_point() +
stat_smooth(method = lm)
## Warning: Removed 6405 rows containing non-finite values (stat_smooth).
## Warning: Removed 6405 rows containing missing values (geom_point).
Model assessment:
print(summary(model_9))
##
## Call:
## lm(formula = COSTT4_A ~ SAT_AVG, data = projectdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32053 -10000 942 9354 25867
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -22454.50 2520.61 -8.91 <0.0000000000000002 ***
## SAT_AVG 51.90 2.36 21.98 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11400 on 1296 degrees of freedom
## (6405 observations deleted due to missingness)
## Multiple R-squared: 0.272, Adjusted R-squared: 0.271
## F-statistic: 483 on 1 and 1296 DF, p-value: <0.0000000000000002
The model can only explain 27% of the variation. Cost of attendance and SAT average have a positive linear relation with a correlation of 0.52.
###Question 10
REGRESSION ANALYSIS: y=COSTT4_A, x=ADM_RATE
ggplot(projectdata, aes(x = ADM_RATE, y = COSTT4_A)) +
geom_point() +
stat_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 5707 rows containing non-finite values (stat_smooth).
## Warning: Removed 5707 rows containing missing values (geom_point).
cor(x=projectdata$ADM_RATE, y = projectdata$COSTT4_A, use = "complete.obs",
method = "pearson")
## [1] -0.277502
cor= -0.277502
model_10 <- lm(COSTT4_A ~ ADM_RATE, data = projectdata)
ggplot(projectdata, aes(COSTT4_A, ADM_RATE)) +
geom_point() +
stat_smooth(method = lm)
## Warning: Removed 5707 rows containing non-finite values (stat_smooth).
## Warning: Removed 5707 rows containing missing values (geom_point).
print(summary(model_10))
##
## Call:
## lm(formula = COSTT4_A ~ ADM_RATE, data = projectdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31765 -9986 -1323 9705 36241
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 43930 982 44.8 <0.0000000000000002 ***
## ADM_RATE -17799 1380 -12.9 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12600 on 1994 degrees of freedom
## (5707 observations deleted due to missingness)
## Multiple R-squared: 0.077, Adjusted R-squared: 0.0765
## F-statistic: 166 on 1 and 1994 DF, p-value: <0.0000000000000002
The prediction distribution is positively skewed (median<mean=“0”) The R-squared=0.077 meaning that the analysis can explain only 7.7% of the variance which is low. There are alos a number of outliers in the data.
###Question 11
REGRESSION ANALYSIS: y=UGDS, x=SAT_AVG
ggplot(projectdata, aes(x = SAT_AVG, y = UGDS)) +
geom_point() +
stat_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 6399 rows containing non-finite values (stat_smooth).
## Warning: Removed 6399 rows containing missing values (geom_point).
cor(x=projectdata$SAT_AVG, y = projectdata$UGDS, use = "complete.obs",
method = "pearson")
## [1] 0.249611
model_11 <- lm(UGDS ~ SAT_AVG, data = projectdata)
ggplot(projectdata, aes(SAT_AVG, UGDS)) +
geom_point() +
stat_smooth(method = lm)
## Warning: Removed 6399 rows containing non-finite values (stat_smooth).
## Warning: Removed 6399 rows containing missing values (geom_point).
print(summary(model_11))
##
## Call:
## lm(formula = UGDS ~ SAT_AVG, data = projectdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11197 -3981 -2485 1077 45035
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8999.26 1573.21 -5.72 0.000000013 ***
## SAT_AVG 13.71 1.47 9.30 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7090 on 1302 degrees of freedom
## (6399 observations deleted due to missingness)
## Multiple R-squared: 0.0623, Adjusted R-squared: 0.0616
## F-statistic: 86.5 on 1 and 1302 DF, p-value: <0.0000000000000002
The prediction distribution is positively skewed (median<mean=“0”) The R-squared=0.0623 meaning that the analysis can explain only 6.2% of the variance which is low. The concentration of the variables at the low levels of UGDS indicates that there are outliers in UGDS. Removal of these outliers is required for an accurate prediction.
Trying to remove outliers from UGDS in projectdata.
Running the regression analysis with the trimmed UGDS:
model_11_2 <- lm(UGDS ~ SAT_AVG, data = projectdata_UGDS_Trimmed)
ggplot(projectdata_UGDS_Trimmed, aes(SAT_AVG, UGDS)) +
geom_point() +
stat_smooth(method = lm)
## Warning: Removed 5878 rows containing non-finite values (stat_smooth).
## Warning: Removed 5878 rows containing missing values (geom_point).
print(summary(model_11_2))
##
## Call:
## lm(formula = UGDS ~ SAT_AVG, data = projectdata_UGDS_Trimmed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2222 -806 -202 647 2993
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 305.96 303.08 1.01 0.31
## SAT_AVG 1.41 0.29 4.88 0.0000013 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1080 on 872 degrees of freedom
## (5878 observations deleted due to missingness)
## Multiple R-squared: 0.0266, Adjusted R-squared: 0.0254
## F-statistic: 23.8 on 1 and 872 DF, p-value: 0.00000128
The resulting analysis has a closer-to-normal distribution. The new R-squared is lower than the untrimmed version.
###Question 12
REGRESSION ANALYSIS: y=UGDS, x=ADM_RATE I will run this analysis with trimmed values of UGDS.
ggplot(projectdata_UGDS_Trimmed, aes(x = ADM_RATE, y = UGDS)) +
geom_point() +
stat_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 5045 rows containing non-finite values (stat_smooth).
## Warning: Removed 5045 rows containing missing values (geom_point).
cor(x=projectdata_UGDS_Trimmed$ADM_RATE, y = projectdata_UGDS_Trimmed$UGDS, use = "complete.obs",
method = "pearson")
## [1] -0.154717
cor=(-0.12728)
model_12 <- lm(UGDS ~ ADM_RATE, data = projectdata_UGDS_Trimmed)
ggplot(projectdata_UGDS_Trimmed, aes(UGDS, ADM_RATE)) +
geom_point() +
stat_smooth(method = lm)
## Warning: Removed 5045 rows containing non-finite values (stat_smooth).
## Warning: Removed 5045 rows containing missing values (geom_point).
print(summary(model_12))
##
## Call:
## lm(formula = UGDS ~ ADM_RATE, data = projectdata_UGDS_Trimmed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1724 -850 -351 608 3483
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1822 94 19.37 < 0.0000000000000002 ***
## ADM_RATE -827 128 -6.47 0.00000000013 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1100 on 1705 degrees of freedom
## (5045 observations deleted due to missingness)
## Multiple R-squared: 0.0239, Adjusted R-squared: 0.0234
## F-statistic: 41.8 on 1 and 1705 DF, p-value: 0.000000000131
Correlation value between UGDS and ADM_RATE is cor=(-0.12728), which is relatively small. There is not a significant relation between these variables.
###QUESTION 13 REGRESSION ANALYSIS: y=C150_4, x=SAT_AVG
ggplot(projectdata, aes(x = SAT_AVG, y = C150_4)) +
geom_point() +
stat_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 6432 rows containing non-finite values (stat_smooth).
## Warning: Removed 6432 rows containing missing values (geom_point).
cor(x=projectdata$SAT_AVG, y = projectdata$C150_4, use = "complete.obs",
method = "pearson")
## [1] 0.801478
cor= 0.801478
model_13 <- lm(C150_4 ~ SAT_AVG, data = projectdata)
ggplot(projectdata, aes(SAT_AVG, C150_4)) +
geom_point() +
stat_smooth(method = lm)
## Warning: Removed 6432 rows containing non-finite values (stat_smooth).
## Warning: Removed 6432 rows containing missing values (geom_point).
print(summary(model_13))
##
## Call:
## lm(formula = C150_4 ~ SAT_AVG, data = projectdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.5219 -0.0654 0.0072 0.0690 0.4613
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.5741737 0.0237624 -24.2 <0.0000000000000002 ***
## SAT_AVG 0.0010598 0.0000222 47.7 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.105 on 1269 degrees of freedom
## (6432 observations deleted due to missingness)
## Multiple R-squared: 0.642, Adjusted R-squared: 0.642
## F-statistic: 2.28e+03 on 1 and 1269 DF, p-value: <0.0000000000000002
This regression has a much higher R-squared value that is 0.642, meaning that 64% of variance can be explained by the regression model. The residuals have a relatively normal distribution since the absolute min and max values are close and the median (0.0072) is close to zero. These findings indicate that the average SAT scores can be used to predict the completion rate. The correlation is indeed higher with (0.801478).
###QUESTION 14 REGRESSION ANALYSIS: y=C150_4, x=ADM_RATE
ggplot(projectdata, aes(x = ADM_RATE, y = C150_4)) +
geom_point() +
stat_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 5897 rows containing non-finite values (stat_smooth).
## Warning: Removed 5897 rows containing missing values (geom_point).
cor(x=projectdata$ADM_RATE, y = projectdata$C150_4, use = "complete.obs",
method = "pearson")
## [1] -0.320186
cor=-0.320186
model_14 <- lm(C150_4 ~ ADM_RATE, data = projectdata)
ggplot(projectdata, aes(ADM_RATE, C150_4)) +
geom_point() +
stat_smooth(method = lm)
## Warning: Removed 5897 rows containing non-finite values (stat_smooth).
## Warning: Removed 5897 rows containing missing values (geom_point).
print(summary(model_14))
##
## Call:
## lm(formula = C150_4 ~ ADM_RATE, data = projectdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.5848 -0.1237 0.0017 0.1391 0.5728
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.7348 0.0151 48.7 <0.0000000000000002 ***
## ADM_RATE -0.3076 0.0214 -14.4 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.182 on 1804 degrees of freedom
## (5897 observations deleted due to missingness)
## Multiple R-squared: 0.103, Adjusted R-squared: 0.102
## F-statistic: 206 on 1 and 1804 DF, p-value: <0.0000000000000002
The correlation is -0.320186, therefore rate of completion and admission rate are negatively related. The R-squared is 0.103, meaning that this model can only explain 10% of the variance in the prediction. The residuals have a relatively normal distribution.
Therefore the admission rate is not a very good predictor of the completion rate.
###QUESTION 15 REGRESSION ANALYSIS: y=PCTFLOAN, x=SAT_AVG
ggplot(projectdata, aes(x = SAT_AVG, y = PCTFLOAN)) +
geom_point() +
stat_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 6401 rows containing non-finite values (stat_smooth).
## Warning: Removed 6401 rows containing missing values (geom_point).
cor(x=projectdata$SAT_AVG, y = projectdata$PCTFLOAN, use = "complete.obs",
method = "pearson")
## [1] -0.510337
cor=(-0.510337)
model_15 <- lm(PCTFLOAN ~ SAT_AVG, data = projectdata)
ggplot(projectdata, aes(SAT_AVG, PCTFLOAN)) +
geom_point() +
stat_smooth(method = lm)
## Warning: Removed 6401 rows containing non-finite values (stat_smooth).
## Warning: Removed 6401 rows containing missing values (geom_point).
print(summary(model_15))
##
## Call:
## lm(formula = PCTFLOAN ~ SAT_AVG, data = projectdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.7234 -0.0792 0.0172 0.1009 0.3460
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.2759381 0.0323876 39.4 <0.0000000000000002 ***
## SAT_AVG -0.0006493 0.0000303 -21.4 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.146 on 1300 degrees of freedom
## (6401 observations deleted due to missingness)
## Multiple R-squared: 0.26, Adjusted R-squared: 0.26
## F-statistic: 458 on 1 and 1300 DF, p-value: <0.0000000000000002
The regression model can explain 26% of the variation in the prediction. The correlation is cor=(-0.510337) which indicates a positive relation between the two variables. The residuals have a long tail on to the left. SAT_AVG is a better predictor of PCTFLOAN for higher values of SAT_AVG.
###QUESTION 16 REGRESSION ANALYSIS: y=PCTFLOAN, x=ADM_RATE
ggplot(projectdata, aes(x = ADM_RATE, y = PCTFLOAN)) +
geom_point() +
stat_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 5508 rows containing non-finite values (stat_smooth).
## Warning: Removed 5508 rows containing missing values (geom_point).
cor(x=projectdata$ADM_RATE, y = projectdata$PCTFLOAN, use = "complete.obs",
method = "pearson")
## [1] 0.107624
cor=0.107624
model_16 <- lm(PCTFLOAN ~ ADM_RATE, data = projectdata)
ggplot(projectdata, aes(ADM_RATE, PCTFLOAN)) +
geom_point() +
stat_smooth(method = lm)
## Warning: Removed 5508 rows containing non-finite values (stat_smooth).
## Warning: Removed 5508 rows containing missing values (geom_point).
print(summary(model_16))
##
## Call:
## lm(formula = PCTFLOAN ~ ADM_RATE, data = projectdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.6243 -0.1127 0.0309 0.1566 0.4431
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5119 0.0160 31.92 < 0.0000000000000002 ***
## ADM_RATE 0.1125 0.0222 5.07 0.00000043 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.217 on 2193 degrees of freedom
## (5508 observations deleted due to missingness)
## Multiple R-squared: 0.0116, Adjusted R-squared: 0.0111
## F-statistic: 25.7 on 1 and 2193 DF, p-value: 0.000000432
The model can only explain 1% of the variance in the prediction. The correlation is not strong since cor=(0.107624). Therefore ADM_RATE is not a good predictor of PCTFLOAN.
###EXTRA REGRESSION ANALYSIS 1 REGRESSION ANALYSIS: y=UGDS_WHITE, x=COSTT4_A for private for-profit schools
ggplot(projectdata_private_fp, aes(x = COSTT4_A, y = UGDS_ASIAN)) +
geom_point() +
stat_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 2688 rows containing non-finite values (stat_smooth).
## Warning: Removed 2688 rows containing missing values (geom_point).
cor(x=projectdata_private_fp$COSTT4_A, y = projectdata_private_fp$UGDS_ASIAN, use = "complete.obs",
method = "pearson")
## [1] 0.24179
cor= 0.24179
model_17 <- lm(UGDS_ASIAN ~ COSTT4_A, data = projectdata_private_fp)
ggplot(projectdata_private_fp, aes(COSTT4_A, UGDS_ASIAN)) +
geom_point() +
stat_smooth(method = lm)
## Warning: Removed 2688 rows containing non-finite values (stat_smooth).
## Warning: Removed 2688 rows containing missing values (geom_point).
print(summary(model_17))
##
## Call:
## lm(formula = UGDS_ASIAN ~ COSTT4_A, data = projectdata_private_fp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0593 -0.0198 -0.0111 0.0032 0.6091
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.023182923 0.006324043 -3.67 0.00026 ***
## COSTT4_A 0.000001886 0.000000238 7.93 0.0000000000000057 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0457 on 1013 degrees of freedom
## (2688 observations deleted due to missingness)
## Multiple R-squared: 0.0585, Adjusted R-squared: 0.0575
## F-statistic: 62.9 on 1 and 1013 DF, p-value: 0.00000000000000572
The R-squares has a low value (5%) and the correlation is (0.24179) The residuals distribution has a long tail to the right. This regression analysis is affected by the outliers in the high values of cost of attendance. For this analysis, for the case of private for-profit schools, COSTT4_a is not a good predictor of UGDS_ASIAN.
###EXTRA REGRESSION ANALYSIS 2 REGRESSION ANALYSIS: y=INEXPFTE, x=COSTT4_A
ggplot(projectdata, aes(x = COSTT4_A, y = INEXPFTE)) +
geom_point() +
stat_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 3675 rows containing non-finite values (stat_smooth).
## Warning: Removed 3675 rows containing missing values (geom_point).
cor(x=projectdata$COSTT4_A, y = projectdata$INEXPFTE, use = "complete.obs",
method = "pearson")
## [1] 0.455454
cor=0.455454
model_18 <- lm(INEXPFTE ~ COSTT4_A, data = projectdata)
ggplot(projectdata, aes(COSTT4_A, INEXPFTE)) +
geom_point() +
stat_smooth(method = lm)
## Warning: Removed 3675 rows containing non-finite values (stat_smooth).
## Warning: Removed 3675 rows containing missing values (geom_point).
print(summary(model_18))
##
## Call:
## lm(formula = INEXPFTE ~ COSTT4_A, data = projectdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10453 -3085 -518 1655 92003
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1577.6523 201.1727 7.84 0.0000000000000056 ***
## COSTT4_A 0.2337 0.0072 32.46 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5830 on 4026 degrees of freedom
## (3675 observations deleted due to missingness)
## Multiple R-squared: 0.207, Adjusted R-squared: 0.207
## F-statistic: 1.05e+03 on 1 and 4026 DF, p-value: <0.0000000000000002
The regression model can explain 20% of the variance in the prediction and cor=0.455454. The residuals have a relatively normal, lightly skewed distribution. There seem to be outliers in the INEXPFTE, therefore it his hard to see the residuals in this plot. Cleaning the outliers from the data can provide a more accurate prediction.
###QUESTION 17 -Checking normality of SAT_AVG:
describeBy(projectdata$SAT_AVG)
## Warning in describeBy(projectdata$SAT_AVG): no grouping variable requested
qqnorm(projectdata$SAT_AVG,main="QQ plot of SAT_AVG")
qqline(projectdata$SAT_AVG)
The mean and the median are relatively close. The distribution has a moderate skewness and a low kurtosis. However, the qqplot indicates that the scores deviate from the normal distribution in higher values. Since the mean and median are close with a moderate skew, we can say that the distribution is relatively normal.
-Checking normality of ADM_RATE:
describeBy(projectdata$ADM_RATE)
## Warning in describeBy(projectdata$ADM_RATE): no grouping variable requested
qqnorm(projectdata$ADM_RATE,main="QQ plot of ADM_RATE")
qqline(projectdata$ADM_RATE)
The mean and median of the distribution are close. The distribution is moderately skewed (-0.58) The qqplot indicates a strong deviation from the normal line in minimum and maximum values. The distribution of ADM_RATE is not normal for higher scores.
-Checking normality of UGDS:
describeBy(projectdata$UGDS)
## Warning in describeBy(projectdata$UGDS): no grouping variable requested
qqnorm(projectdata$UGDS,main="QQ plot of UGDS")
qqline(projectdata$UGDS)
UGDS presents a large nehative skew since mean value is much higher than the median. There are outliers in maximum values and a strong deviation from the normal line. The distribution of uGDS is not normal.
###QUEStION 18 The previous plot indicate that ADM_RATE and UGDS deviate from a normal distribution while SAT_AVG can be considered a normal distrbution although not perfect.
#-First normalizing the ADM_RATE Looking at the boxplot to see the outliers:
boxplot(projectdata$ADM_RATE)
There are outlier scores below 0.2.
-Storing outliers in a vector:
The outlier values are as follows:
print(outliers_ADM_RATE)
## [1] 0.0883 0.1077 0.1302 0.1219 0.0630 0.0876 0.1311 0.0000 0.1384 0.0596
## [11] 0.0788 0.0831 0.1150 0.0000 0.0744 0.0695 0.0843 0.1141 0.0505 0.1037
## [21] 0.0874 0.1309 0.1000 0.0509 0.1346 0.0000 0.1200 0.1111
-Finding the location of outliers:
projectdata[which(projectdata$ADM_RATE %in% outliers_ADM_RATE),]
-Removing the outliers:
projectdata_ADM_RATE_clean <- projectdata[-which(projectdata$ADM_RATE %in% outliers_ADM_RATE),]
-Checking with boxplot:
boxplot(projectdata_ADM_RATE_clean$ADM_RATE)
library(psych)
describeBy(projectdata$ADM_RATE)
## Warning in describeBy(projectdata$ADM_RATE): no grouping variable requested
describeBy(projectdata_ADM_RATE_clean$ADM_RATE)
## Warning in describeBy(projectdata_ADM_RATE_clean$ADM_RATE): no grouping
## variable requested
The mean and median are much closer now with moderate values of skewness and kurtosis.
-Checking with qplot:
qqnorm(projectdata_ADM_RATE_clean$ADM_RATE,main="QQ plot of ADM_RATE_outliers_removed")
qqline(projectdata_ADM_RATE_clean$ADM_RATE)
The scores fit better on the line. (I am not sure what to with the scores on the top-right of the plot.) The new mean is (0.7), median is (0.72), standard deviation is (0.2) and variance is (0.04).
#-Second, normalizing the UGDS:
Looking at the outliers with a box plot:
boxplot(projectdata$UGDS)
Storing outliers in a vector:
Finding the location of outliers:
projectdata[which(projectdata$UGDS %in% outliers_UGDS),]
Removing the outliers:
projectdata_UGDS_clean <- projectdata[-which(projectdata$UGDS %in% outliers_UGDS),]
Checking the new distribution with a boxplot:
boxplot(projectdata_UGDS_clean$UGDS)
The outcome still does not look very good. Checking with describeBy:
library(psych)
describeBy(projectdata_UGDS_clean$UGDS)
## Warning in describeBy(projectdata_UGDS_clean$UGDS): no grouping variable
## requested
The distribution is still largely skewed.
Transforming UGDS with square-root to decrease the skew:
projectdata_UGDS_clean$sqrt_UGDS <- sqrt(projectdata_UGDS_clean$UGDS)
describeBy(projectdata_UGDS_clean$sqrt_UGDS)
## Warning in describeBy(projectdata_UGDS_clean$sqrt_UGDS): no grouping
## variable requested
The mean and median are closer now with less skewness.
Checking with qqplot:
qqnorm(projectdata_UGDS_clean$sqrt_UGDS,main="QQ plot of sqrt_UGDS_outliers_removed")
qqline(projectdata_UGDS_clean$sqrt_UGDS)
The scores still do not follow the line but it is better than the previous version. The new mean is (22.33), median is (16.97), standard deviation is (15.92) and variance is (253.44).
###QUESTION 19 #The probability that the average SAT score for a school being greater than 1400: Finding the mean of SAT_AVG:
mean(projectdata$SAT_AVG, na.rm=TRUE)
## [1] 1059.07
Finding the standard deviation of SAT_AVG:
sd(projectdata$SAT_AVG, na.rm=TRUE)
## [1] 133.357
-Probability of SAT_AVG > 1400:
pnorm(1400, mean=1059.072, sd=133.357, lower.tail=FALSE)
## [1] 0.00528646
There is 0.52% probability of SAT_AVG > 1400.
-Probability of SAT_AVG < 800:
pnorm(800, mean=1059.072, sd=133.357, lower.tail=TRUE)
## [1] 0.0260265
There is 2.6% probability of SAT_AVG < 800.
###QUESTION 20
I need to do sampling distribution of means but could not figure out how to make it.