\[ INTRODUCTION \space TO \space REGRESSION \space IN \space R \]
Regression analysis is a statistical tool that is used to explain relationships between outcome variables as a function of one or more predictor variables. A linear regression model, we assume that the relationship can be explained with a linear function, in a simple linear regression model, it only has one outcome variable and one predictor variable.
The data in an SRM are in pairs. usually, in high school statistics, correlation topic, x will be the independent variable, and y is the dependent variable, same here also.
\(X:\) The predictor, \(x_1,x_2,…,x_n\) where n are observations from \(X\)
\(Y\): The response, \(y_1,y_2,..,y_n\) where n are observations from \(Y\)
A linear function can be determined by the intercept of and the slope of the line in this model,
\[ y=\beta_1x+\beta_0 \tag{1.1} \]
where \(\beta_1\) is the slope and \(\beta_0\) is the intercept.
In plotting the model in a Cartesian plane, we will use the method rise over run, \(\frac{\Delta y}{\Delta x}\).
The model in (1.1) is not really applicable in real life situations because errors will occur in an obtained data. The regression model has two parts, linear function and an error term. \(\beta_0\) and \(\beta_1\) are called parameters, and are estimated from the data.
\[ Y=\beta_0 +\beta_1x+\epsilon \tag{1.2} \]
The linear function predicts the mean value of the outcome for a given value of the predictor variable.
The error term is the unexplained uncertainty of the model and is a random variable with mean equal to zero and unknown variance \(\sigma^2\). The error usually, assumed to have a normal distribution. In a simple regression model, for each individual or unit \(i=1,2,..,n\) in the study, we have a pair of observations, \((x_i,y_i)\).
d <- read.csv("C:/Users/USER/Dropbox/PC/Downloads/elemapi2v2.csv")
The data is in .csv format, and we use the read.csv function to read and load it in R. The resulting output will be a data.frame object that will appear in the environment tab. By clicking on the data.frame object in the environment pane we can view it as a spreadsheet.
Checking the nature of the data to make sure it is applicable for Linear regression.
Using the function class. , it will determine what is the environment is your data is.
class(d)
## [1] "data.frame"
Using the function names. , it will return the names of the variables which is set in column.
names(d)
## [1] "snum" "dnum" "api00" "api99" "growth" "meals"
## [7] "ell" "yr_rnd" "mobility" "acs_k3" "acs_46" "not_hsg"
## [13] "hsg" "some_col" "col_grad" "grad_sch" "avg_ed" "full"
## [19] "emer" "enroll" "mealcat" "collcat" "abv_hsg" "lgenroll"
Using the dim. , function determines how many is your row (observations) and your column (variables).
dim(d)
## [1] 400 24
d
## snum dnum api00 api99 growth meals ell yr_rnd mobility acs_k3 acs_46
## 1 906 41 693 600 93 67 9 0 11 16 22
## 2 889 41 570 501 69 92 21 0 33 15 32
## 3 887 41 546 472 74 97 29 0 36 17 25
## 4 876 41 571 487 84 90 27 0 27 20 30
## 5 888 41 478 425 53 89 30 0 44 18 31
## 6 4284 98 858 844 14 10 3 0 10 20 33
## 7 4271 98 918 864 54 5 2 0 16 19 28
## 8 2910 108 831 791 40 2 3 0 44 20 31
## 9 2899 108 860 838 22 5 6 0 10 20 30
## 10 2887 108 737 703 34 29 15 0 17 21 29
## 11 2911 108 851 808 43 1 2 0 16 20 30
## 12 2882 108 536 496 40 71 69 0 8 21 27
## 13 2907 108 847 815 32 3 2 0 11 20 29
## 14 2908 108 765 711 54 13 8 0 19 21 27
## 15 2895 108 809 802 7 7 8 0 16 20 31
## 16 2880 108 813 780 33 22 11 0 12 21 32
## 17 2890 108 856 816 40 7 6 0 13 21 29
## 18 3948 131 712 677 35 40 12 1 19 19 30
## 19 3956 131 805 759 46 10 1 0 26 21 31
## 20 3947 131 678 632 46 37 16 1 16 20 32
## 21 3952 131 619 570 49 57 25 1 24 19 31
## 22 3954 131 713 704 9 29 14 1 14 20 30
## 23 3943 131 704 618 86 57 19 1 22 20 30
## 24 3945 131 523 452 71 69 43 1 31 19 29
## 25 4293 135 655 632 23 65 44 0 21 20 29
## 26 4299 135 523 441 82 74 52 1 23 19 28
## 27 4318 135 521 473 48 74 63 0 15 19 31
## 28 4319 135 709 701 8 18 16 0 20 17 27
## 29 4296 135 505 452 53 75 59 0 27 19 30
## 30 4317 135 762 744 18 18 16 0 15 20 30
## 31 4322 135 722 684 38 11 21 0 20 20 28
## 32 4307 135 603 520 83 71 54 0 19 19 30
## 33 4302 135 657 579 78 67 36 0 17 19 30
## 34 4314 135 705 655 50 46 42 0 32 17 27
## 35 4292 135 754 748 6 25 25 0 18 20 27
## 36 4304 135 490 431 59 71 54 0 25 23 30
## 37 4308 135 698 680 18 44 18 0 21 19 29
## 38 600 140 843 805 38 25 6 0 17 20 32
## 39 596 140 800 769 31 56 6 0 41 19 30
## 40 611 140 857 857 0 13 4 0 12 20 32
## 41 595 140 713 664 49 63 35 0 15 21 32
## 42 592 140 804 873 -69 38 8 0 33 21 35
## 43 602 140 864 831 33 24 5 0 35 21 35
## 44 5222 166 940 917 23 0 2 0 3 20 31
## 45 5210 166 792 779 13 9 3 0 19 20 31
## 46 5217 166 887 882 5 3 5 0 14 21 31
## 47 3644 209 775 726 49 26 13 0 18 21 33
## 48 3643 209 822 766 56 16 10 1 47 21 32
## 49 3624 209 806 745 61 21 6 1 25 19 30
## 50 3623 209 816 745 71 16 6 0 18 19 32
## 51 3629 209 553 485 68 86 38 1 32 20 32
## 52 3622 209 686 633 53 75 33 1 24 21 32
## 53 4017 238 483 444 39 79 40 1 24 20 28
## 54 58 248 801 765 36 20 12 0 18 19 30
## 55 70 248 753 694 59 25 19 0 14 21 29
## 56 65 248 903 889 14 7 8 0 21 22 30
## 57 657 253 454 443 11 100 53 1 21 19 29
## 58 697 253 440 428 12 100 67 1 19 19 29
## 59 646 253 519 491 28 93 33 0 19 19 29
## 60 640 253 512 477 35 84 33 1 24 19 29
## 61 629 253 525 522 3 92 41 1 15 19 30
## 62 659 253 486 480 6 90 44 1 27 19 30
## 63 699 253 437 409 28 90 70 1 17 19 29
## 64 637 253 447 409 38 100 51 0 22 20 29
## 65 663 253 518 499 19 86 22 0 18 19 29
## 66 633 253 479 466 13 100 59 1 21 19 30
## 67 690 253 424 343 81 100 36 0 20 17 26
## 68 2982 259 536 465 71 93 84 0 14 21 29
## 69 3000 259 628 554 74 75 74 0 25 20 28
## 70 2997 259 627 546 81 71 71 0 15 20 33
## 71 3005 259 596 534 62 65 46 0 20 21 29
## 72 2972 259 664 617 47 72 71 0 13 19 30
## 73 3011 259 655 620 35 56 52 0 19 21 29
## 74 2977 259 616 566 50 75 65 0 23 21 31
## 75 3013 259 538 497 41 80 82 0 19 21 31
## 76 3024 259 727 670 57 60 50 0 24 21 30
## 77 3004 259 822 787 35 8 4 0 10 19 27
## 78 3010 259 638 569 69 80 71 0 11 20 31
## 79 2460 284 581 497 84 70 26 0 14 23 25
## 80 2479 284 784 739 45 23 25 0 21 21 36
## 81 2459 284 871 840 31 14 15 0 15 21 34
## 82 105 294 567 535 32 62 43 0 20 20 31
## 83 94 294 620 592 28 51 20 0 20 19 31
## 84 93 294 761 726 35 15 3 0 21 20 29
## 85 116 294 513 508 5 82 56 0 16 NA 31
## 86 3236 316 912 900 12 6 7 0 22 21 30
## 87 3240 316 883 821 62 7 20 0 35 19 31
## 88 3241 316 844 834 10 12 14 0 21 21 26
## 89 3256 316 880 841 39 11 7 0 11 21 31
## 90 3250 316 894 881 13 4 4 0 13 20 26
## 91 3247 316 858 818 40 4 8 0 17 20 27
## 92 3258 316 847 817 30 7 7 0 17 20 31
## 93 1497 395 604 529 75 100 68 1 12 20 35
## 94 1478 395 553 466 87 100 70 1 11 19 35
## 95 1474 395 657 567 90 64 19 0 25 19 32
## 96 1511 395 492 500 -8 89 48 0 42 18 31
## 97 1539 395 700 711 -11 80 51 0 11 20 34
## 98 1490 395 590 552 38 89 58 1 16 19 33
## 99 1500 395 777 752 25 37 11 0 15 20 34
## 100 1515 395 667 625 42 85 52 1 18 19 34
## 101 1512 395 769 752 17 47 20 0 17 16 25
## 102 1475 395 650 631 19 78 33 0 21 19 32
## 103 1522 395 561 427 134 100 86 1 8 20 35
## 104 1472 395 561 537 24 83 28 0 21 19 34
## 105 1516 395 597 543 54 100 71 1 15 20 34
## 106 1489 395 568 507 61 84 55 1 16 20 33
## 107 1493 395 637 622 15 78 30 0 28 19 29
## 108 1606 401 588 538 50 61 1 0 20 16 26
## 109 1866 401 821 793 28 48 9 0 20 18 29
## 110 1747 401 559 511 48 100 74 0 11 17 27
## 111 1905 401 693 663 30 50 4 0 11 18 27
## 112 1699 401 542 456 86 99 53 0 10 19 30
## 113 2077 401 840 816 24 33 25 0 11 20 29
## 114 1959 401 621 605 16 82 45 0 17 18 25
## 115 1914 401 487 489 -2 91 48 1 22 18 27
## 116 1685 401 686 621 65 62 21 0 10 16 26
## 117 1757 401 463 438 25 88 75 1 24 19 27
## 118 2087 401 476 430 46 90 70 1 10 18 29
## 119 1946 401 821 767 54 21 4 0 12 18 29
## 120 1821 401 412 379 33 100 77 1 8 19 29
## 121 1881 401 394 382 12 99 42 1 22 17 26
## 122 1862 401 815 785 30 24 18 0 15 18 30
## 123 1932 401 694 631 63 56 24 0 24 18 30
## 124 1919 401 433 425 8 100 74 1 15 19 31
## 125 1788 401 560 504 56 72 1 0 26 17 27
## 126 1664 401 572 535 37 76 44 0 12 18 25
## 127 1885 401 605 600 5 68 31 0 15 17 24
## 128 1682 401 755 716 39 27 2 0 10 16 31
## 129 1853 401 475 415 60 95 64 0 21 20 28
## 130 2082 401 520 476 44 95 53 1 27 18 29
## 131 1806 401 406 367 39 100 75 1 15 19 32
## 132 1997 401 471 416 55 99 46 1 15 19 33
## 133 1701 401 664 594 70 52 22 0 19 16 31
## 134 1926 401 457 399 58 91 74 1 31 20 28
## 135 1633 401 584 501 83 97 61 0 15 14 25
## 136 1690 401 735 693 42 71 24 0 10 18 27
## 137 1696 401 536 508 28 77 9 0 18 19 29
## 138 1775 401 771 708 63 31 11 0 16 17 28
## 139 1673 401 483 436 47 100 76 0 12 19 31
## 140 1863 401 387 338 49 96 80 1 14 19 28
## 141 1990 401 650 611 39 75 64 1 9 20 28
## 142 1752 401 467 442 25 100 70 0 16 19 28
## 143 1782 401 548 512 36 70 59 0 14 16 34
## 144 1995 401 756 717 39 17 4 0 14 19 31
## 145 1638 401 440 403 37 97 54 0 20 18 30
## 146 1815 401 581 548 33 100 63 0 23 19 24
## 147 1709 401 515 487 28 96 66 0 18 19 27
## 148 1812 401 917 902 15 6 3 0 8 17 31
## 149 1742 401 657 615 42 63 23 0 22 16 30
## 150 1799 401 723 673 50 41 11 1 15 18 31
## 151 1907 401 506 482 24 79 7 0 24 17 27
## 152 1924 401 569 514 55 76 28 0 13 16 24
## 153 1781 401 466 442 24 94 13 0 16 18 29
## 154 1897 401 660 624 36 55 31 0 14 18 29
## 155 1596 401 536 456 80 95 70 1 13 20 29
## 156 1868 401 486 422 64 96 75 1 16 19 30
## 157 1909 401 453 423 30 98 53 0 13 18 29
## 158 1600 401 683 631 52 63 5 0 21 17 28
## 159 1894 401 459 426 33 99 69 1 8 19 24
## 160 1895 401 456 415 41 95 65 1 20 18 29
## 161 1889 401 612 555 57 83 20 0 19 19 27
## 162 1741 401 387 365 22 99 75 1 21 18 28
## 163 1769 401 420 407 13 98 91 1 31 19 30
## 164 1941 401 495 451 44 96 41 1 21 18 29
## 165 1744 401 552 491 61 97 86 1 10 19 28
## 166 1680 401 744 737 7 31 2 0 22 16 29
## 167 1858 401 477 424 53 98 56 0 22 19 28
## 168 2074 401 474 427 47 89 64 0 28 19 27
## 169 1949 401 696 636 60 66 22 0 17 16 29
## 170 1903 401 592 489 103 100 67 0 8 18 27
## 171 1651 401 650 599 51 73 40 0 7 19 29
## 172 1723 401 369 354 15 99 83 1 16 19 31
## 173 1783 401 437 377 60 99 90 1 9 19 28
## 174 1634 401 551 505 46 76 59 0 16 16 28
## 175 2088 401 429 419 10 92 78 1 15 19 26
## 176 1925 401 435 425 10 98 72 1 14 18 27
## 177 1851 401 747 735 12 47 14 0 19 18 24
## 178 1616 401 512 480 32 88 27 0 13 16 22
## 179 2092 401 474 423 51 98 65 1 14 18 28
## 180 1652 401 850 832 18 12 5 0 5 18 31
## 181 1820 401 551 504 47 83 37 1 17 19 28
## 182 1791 401 482 441 41 100 67 1 35 18 29
## 183 1677 401 534 482 52 100 49 0 22 19 25
## 184 1836 401 506 462 44 87 22 0 22 18 23
## 185 1743 401 556 508 48 86 56 0 18 18 27
## 186 1801 401 457 395 62 100 59 1 21 19 27
## 187 1678 401 497 440 57 95 66 1 18 19 30
## 188 1731 401 719 684 35 79 48 0 15 19 32
## 189 1961 401 582 538 44 89 67 1 26 19 27
## 190 1900 401 486 453 33 96 84 1 19 19 26
## 191 1740 401 521 452 69 99 64 0 12 18 29
## 192 1977 401 386 358 28 96 65 1 23 19 25
## 193 1952 401 397 333 64 98 84 1 8 19 27
## 194 1625 401 631 604 27 65 18 0 23 18 27
## 195 1994 401 428 396 32 97 43 1 24 19 30
## 196 1978 401 894 860 34 13 1 0 22 19 26
## 197 1805 401 421 399 22 99 86 1 9 18 29
## 198 1704 401 524 511 13 97 53 1 9 17 26
## 199 1597 401 532 506 26 100 74 1 7 19 29
## 200 1872 401 674 634 40 72 44 0 7 17 21
## 201 1729 401 589 511 78 82 43 0 20 18 25
## 202 1612 401 437 363 74 97 71 1 25 19 29
## 203 1621 401 717 667 50 72 29 0 20 18 29
## 204 1763 401 586 539 47 80 50 0 19 19 27
## 205 1854 401 470 444 26 98 45 0 17 19 29
## 206 1611 401 474 415 59 100 73 1 12 19 31
## 207 1778 401 699 660 39 67 35 0 21 19 27
## 208 1615 401 657 625 32 75 6 0 26 18 26
## 209 1795 401 759 725 34 33 18 0 19 14 29
## 210 2080 401 493 484 9 82 67 1 20 19 28
## 211 1671 401 853 812 41 37 16 0 15 18 34
## 212 416 473 866 836 30 1 1 0 11 22 33
## 213 425 473 657 593 64 45 11 0 16 21 31
## 214 419 473 850 808 42 12 5 0 13 21 33
## 215 402 473 893 845 48 7 4 0 16 20 30
## 216 430 473 745 663 82 17 3 0 17 20 33
## 217 406 473 719 670 49 31 6 0 17 21 30
## 218 412 473 783 725 58 46 5 0 21 20 29
## 219 413 473 615 523 92 75 36 0 17 22 29
## 220 3070 491 479 443 36 91 82 0 14 20 28
## 221 3072 491 763 769 -6 15 19 0 18 19 NA
## 222 3060 491 669 670 -1 48 40 0 19 18 28
## 223 3051 491 892 882 10 3 1 0 20 19 30
## 224 3055 491 590 587 3 66 52 0 10 19 NA
## 225 194 507 514 527 -13 76 48 0 16 19 30
## 226 211 507 386 360 26 75 48 1 14 20 29
## 227 182 507 411 413 -2 81 45 0 22 19 28
## 228 167 507 774 757 17 69 51 0 7 19 26
## 229 203 507 844 827 17 11 2 0 6 19 29
## 230 187 507 498 480 18 77 30 0 19 20 30
## 231 201 507 624 558 66 59 16 0 13 18 29
## 232 210 507 432 430 2 90 29 0 17 19 29
## 233 184 507 435 445 -10 87 70 1 4 19 30
## 234 165 507 449 434 15 71 21 0 19 20 31
## 235 181 507 724 714 10 15 3 0 18 21 29
## 236 198 507 894 859 35 9 2 0 7 20 30
## 237 2240 541 599 541 58 76 12 0 20 19 24
## 238 2247 541 508 486 22 77 43 0 16 20 28
## 239 2267 570 897 888 9 11 1 0 13 22 31
## 240 2278 570 457 404 53 93 62 1 16 21 30
## 241 2282 570 582 553 29 64 25 0 20 21 31
## 242 4448 575 903 884 19 2 12 0 7 20 28
## 243 4435 575 863 839 24 5 10 0 12 18 33
## 244 4449 575 844 848 -4 3 13 0 15 19 28
## 245 4431 575 858 826 32 6 5 0 21 20 29
## 246 4427 575 802 784 18 21 11 0 11 18 32
## 247 4443 575 875 848 27 4 9 0 9 19 29
## 248 3698 600 720 643 77 55 9 0 19 18 26
## 249 3696 600 483 450 33 89 43 0 38 19 27
## 250 3715 600 696 636 60 50 11 0 23 19 30
## 251 3697 600 690 653 37 65 5 0 26 21 25
## 252 3700 600 717 665 52 77 11 0 29 19 27
## 253 3701 600 687 617 70 62 18 0 19 21 26
## 254 3518 605 623 564 59 60 23 0 16 19 27
## 255 3511 605 688 662 26 40 10 0 23 20 28
## 256 3520 605 643 575 68 63 29 1 19 19 27
## 257 3516 605 574 519 55 65 21 0 24 20 29
## 258 3525 605 632 568 64 51 18 1 19 19 31
## 259 3537 605 694 678 16 12 4 1 15 19 29
## 260 3519 605 532 525 7 62 36 0 19 19 34
## 261 3523 605 657 618 39 48 14 0 23 20 32
## 262 3535 605 705 673 32 20 6 0 15 19 30
## 263 3765 620 758 701 57 45 10 0 21 22 34
## 264 3736 620 556 542 14 61 46 0 39 22 25
## 265 3785 620 459 439 20 98 52 0 36 19 27
## 266 3741 620 573 516 57 100 51 0 31 20 32
## 267 3757 620 707 667 40 57 11 0 22 21 31
## 268 3751 620 592 533 59 100 22 0 35 20 30
## 269 3791 620 724 674 50 49 20 0 24 19 35
## 270 3772 620 829 824 5 22 7 0 14 19 33
## 271 3758 620 690 642 48 98 51 0 10 20 28
## 272 3794 620 610 540 70 100 47 0 31 20 23
## 273 3793 620 528 460 68 99 35 0 27 18 33
## 274 3759 620 585 556 29 54 11 0 27 25 37
## 275 3784 620 859 801 58 18 1 0 9 21 33
## 276 3754 620 732 672 60 63 8 0 37 19 30
## 277 3196 621 799 762 37 21 8 0 12 19 32
## 278 3203 621 845 799 46 14 4 0 12 19 30
## 279 3200 621 854 854 0 6 1 0 13 19 32
## 280 3184 621 790 758 32 26 15 0 17 19 32
## 281 3202 621 839 810 29 12 3 0 17 17 30
## 282 3187 621 689 682 7 51 36 0 13 19 32
## 283 3193 621 841 840 1 6 5 0 17 19 31
## 284 4132 627 493 471 22 94 41 1 33 20 32
## 285 4128 627 643 622 21 82 24 1 19 20 27
## 286 4131 627 698 686 12 62 29 1 15 20 32
## 287 4173 627 486 476 10 95 40 1 24 20 34
## 288 4167 627 612 551 61 75 26 0 16 20 32
## 289 4140 627 630 550 80 67 21 1 21 20 32
## 290 4145 627 525 519 6 78 18 1 25 20 31
## 291 4143 627 485 448 37 94 40 1 37 19 32
## 292 4136 627 474 437 37 92 34 1 28 20 31
## 293 4488 630 521 568 -47 97 77 0 14 19 NA
## 294 4576 630 801 762 39 32 18 0 14 20 30
## 295 4506 630 709 619 90 59 17 0 20 16 33
## 296 4554 630 836 785 51 29 27 0 17 19 32
## 297 4518 630 769 723 46 47 6 0 13 19 25
## 298 4486 630 706 701 5 62 15 0 13 21 28
## 299 4537 630 655 614 41 78 27 0 16 19 32
## 300 4585 630 845 843 2 31 24 0 25 19 31
## 301 4573 630 789 747 42 41 27 0 16 20 30
## 302 4534 630 445 421 24 100 70 0 NA NA 29
## 303 4581 630 708 691 17 74 2 0 24 19 30
## 304 4507 630 751 679 72 71 19 0 20 17 20
## 305 4522 630 461 441 20 100 74 0 22 19 31
## 306 4530 630 757 723 34 58 14 0 16 18 30
## 307 4583 630 865 837 28 40 8 0 17 19 30
## 308 4485 630 567 524 43 81 45 0 26 18 25
## 309 4558 630 748 684 64 62 21 0 11 18 29
## 310 4480 630 612 528 84 79 34 0 15 20 31
## 311 4533 630 500 463 37 95 69 0 24 19 26
## 312 4574 630 740 672 68 54 18 0 14 20 33
## 313 4580 630 740 712 28 66 2 0 19 19 29
## 314 4596 630 883 864 19 4 3 0 9 20 30
## 315 4514 630 799 789 10 38 10 0 14 19 30
## 316 4519 630 887 852 35 19 8 0 10 19 27
## 317 4502 630 493 489 4 91 71 0 27 19 30
## 318 4516 630 586 529 57 93 59 0 21 20 32
## 319 4511 630 626 626 0 79 36 0 11 19 33
## 320 4487 630 528 476 52 91 68 1 10 18 27
## 321 4528 630 741 700 41 44 18 0 17 20 33
## 322 4539 630 620 621 -1 71 23 0 14 20 31
## 323 4547 630 803 751 52 48 18 0 17 18 29
## 324 4731 632 655 635 20 56 28 0 13 18 32
## 325 4720 632 597 576 21 71 37 0 8 17 30
## 326 4783 632 789 730 59 20 11 0 4 19 25
## 327 4737 632 808 812 -4 61 26 0 7 21 32
## 328 4736 632 532 450 82 83 39 0 10 18 28
## 329 4714 632 503 504 -1 68 29 0 10 18 21
## 330 4781 632 655 638 17 53 12 0 7 20 23
## 331 4698 632 781 750 31 28 26 0 9 20 29
## 332 4775 632 808 806 2 14 5 0 3 19 29
## 333 4747 632 643 661 -18 72 13 0 12 19 27
## 334 4744 632 700 666 34 78 44 0 5 20 25
## 335 4729 632 558 496 62 82 41 0 8 19 20
## 336 4780 632 795 781 14 15 6 0 5 19 30
## 337 4774 632 682 641 41 35 22 0 2 19 27
## 338 4777 632 682 675 7 75 6 0 10 19 25
## 339 5387 635 892 861 31 2 3 0 13 19 31
## 340 5386 635 876 796 80 9 3 0 14 19 33
## 341 5370 635 790 753 37 17 10 0 17 18 28
## 342 5366 635 558 508 50 57 47 0 22 17 30
## 343 5358 635 575 578 -3 56 50 0 19 18 30
## 344 5388 635 686 658 28 28 23 0 13 18 31
## 345 5371 635 710 684 26 37 23 0 18 20 27
## 346 5362 635 487 454 33 62 47 0 21 18 27
## 347 3867 636 762 739 23 23 1 0 8 20 42
## 348 3848 636 801 794 7 27 1 0 19 20 45
## 349 3834 636 640 622 18 69 42 0 22 19 43
## 350 3833 636 601 499 102 100 24 0 31 20 33
## 351 3828 636 624 596 28 78 12 0 36 20 44
## 352 3839 636 863 848 15 14 1 0 18 20 46
## 353 3843 636 709 690 19 52 3 0 24 20 31
## 354 3854 636 883 851 32 20 1 0 24 21 40
## 355 3845 636 763 748 15 40 2 0 24 17 41
## 356 3850 636 727 655 72 40 1 0 29 18 39
## 357 3864 636 568 527 41 78 30 0 36 20 49
## 358 3853 636 743 726 17 48 2 0 27 20 42
## 359 3869 636 724 686 38 48 25 0 19 20 42
## 360 3826 636 831 842 -11 11 2 0 9 20 46
## 361 3824 636 876 850 26 23 1 0 21 23 34
## 362 3835 636 556 537 19 100 28 0 32 20 50
## 363 3865 636 847 837 10 9 0 0 8 20 35
## 364 3822 636 622 588 34 76 6 0 38 20 35
## 365 3129 653 531 502 29 82 74 1 13 22 28
## 366 3128 653 521 502 19 90 74 1 12 21 28
## 367 3127 653 473 437 36 95 86 1 15 21 30
## 368 3121 653 529 502 27 88 75 1 13 19 29
## 369 3145 653 806 815 -9 32 14 0 20 21 30
## 370 3133 653 744 677 67 58 35 0 13 20 31
## 371 3151 653 444 413 31 91 88 1 14 21 27
## 372 6068 689 714 672 42 45 21 0 13 18 28
## 373 6072 689 865 843 22 5 0 0 4 20 30
## 374 6057 689 641 616 25 45 17 0 13 20 32
## 375 6065 689 772 745 27 11 2 0 16 18 28
## 376 6062 689 750 730 20 21 4 0 15 19 31
## 377 6060 689 738 708 30 18 2 0 13 19 30
## 378 4879 716 529 482 47 83 28 0 16 18 23
## 379 4871 716 554 513 41 84 24 0 23 19 26
## 380 4875 716 504 451 53 90 46 1 19 20 26
## 381 4859 716 592 556 36 80 33 0 22 18 27
## 382 4877 716 627 582 45 86 35 1 16 20 25
## 383 4880 716 625 578 47 59 8 1 19 19 28
## 384 4881 716 436 415 21 100 49 0 19 19 24
## 385 4876 716 487 443 44 86 26 0 26 18 21
## 386 4868 716 610 520 90 84 34 0 27 20 27
## 387 4862 716 516 433 83 100 35 1 33 18 25
## 388 4878 716 465 440 25 90 27 0 21 21 26
## 389 5920 779 654 581 73 48 15 0 16 18 31
## 390 5917 779 735 672 63 42 4 0 16 19 29
## 391 5927 779 576 518 58 68 26 1 21 19 29
## 392 5926 779 469 483 -14 91 54 0 17 18 28
## 393 5933 779 515 468 47 91 28 0 32 17 26
## 394 469 796 543 527 16 82 25 0 23 20 27
## 395 468 796 783 758 25 31 6 0 15 19 27
## 396 482 796 745 736 9 27 12 0 11 18 28
## 397 489 796 720 678 42 34 8 0 20 19 24
## 398 504 796 802 787 15 26 10 0 21 19 33
## 399 488 796 539 424 115 98 12 0 18 20 24
## 400 479 796 512 447 65 98 34 0 31 19 32
## not_hsg hsg some_col col_grad grad_sch avg_ed full emer enroll mealcat
## 1 0 0 0 0 0 NA 76 24 247 2
## 2 0 0 0 0 0 NA 79 19 463 3
## 3 0 0 0 0 0 NA 68 29 395 3
## 4 36 45 9 9 0 1.91 87 11 418 3
## 5 50 50 0 0 0 1.50 87 13 520 3
## 6 1 8 24 36 31 3.89 100 0 343 1
## 7 1 4 18 34 43 4.13 100 0 303 1
## 8 0 4 16 50 30 4.06 96 2 1513 1
## 9 2 9 15 42 33 3.96 100 0 660 1
## 10 8 25 34 27 7 2.98 96 7 362 1
## 11 0 3 16 45 37 4.15 100 2 768 1
## 12 61 15 6 10 8 1.90 100 6 404 2
## 13 1 5 17 41 37 4.08 97 3 586 1
## 14 5 34 27 31 3 2.93 98 2 633 1
## 15 5 27 25 37 6 3.12 89 7 379 1
## 16 10 35 23 27 6 2.84 100 0 417 1
## 17 1 10 15 42 32 3.95 100 0 670 1
## 18 14 30 31 19 6 2.71 85 12 589 1
## 19 2 16 41 29 12 3.31 93 7 770 1
## 20 12 27 29 27 6 2.87 83 15 694 1
## 21 19 25 29 20 6 2.68 75 25 611 2
## 22 11 29 28 26 7 2.88 80 15 594 1
## 23 15 32 35 14 5 2.62 69 29 564 2
## 24 32 34 18 12 4 2.22 71 29 645 2
## 25 0 0 0 0 0 NA 100 0 414 2
## 26 0 0 0 100 0 4.00 87 13 631 2
## 27 0 0 0 0 0 NA 94 3 343 2
## 28 0 0 0 0 0 NA 96 7 404 1
## 29 0 0 0 0 0 NA 94 6 504 2
## 30 0 100 0 0 0 2.00 100 0 512 1
## 31 0 0 0 0 0 NA 95 5 637 1
## 32 50 25 25 0 0 1.75 88 15 594 2
## 33 22 44 33 0 0 2.11 100 0 500 2
## 34 0 0 0 100 0 4.00 96 0 320 1
## 35 13 40 33 0 13 2.60 100 4 309 1
## 36 0 0 0 0 0 NA 65 43 748 2
## 37 100 0 0 0 0 1.00 100 0 527 1
## 38 6 35 22 28 9 2.98 91 6 465 1
## 39 9 32 32 21 5 2.82 94 6 451 2
## 40 4 11 35 31 19 3.48 100 0 432 1
## 41 40 29 15 12 4 2.09 92 8 524 2
## 42 10 30 35 20 5 2.79 97 3 427 1
## 43 4 19 33 30 14 3.31 100 3 562 1
## 44 0 0 4 28 67 4.62 90 0 305 1
## 45 2 5 17 38 38 4.05 84 0 467 1
## 46 0 4 7 32 57 4.42 93 10 520 1
## 47 4 10 35 36 16 3.51 93 7 816 1
## 48 1 11 29 38 19 3.63 95 3 830 1
## 49 5 30 28 21 16 3.12 90 5 487 1
## 50 6 17 27 36 14 3.35 100 0 307 1
## 51 33 28 27 8 4 2.20 95 5 819 3
## 52 32 35 16 14 3 2.21 98 2 744 2
## 53 37 27 27 7 3 2.13 87 13 672 2
## 54 7 16 25 36 15 3.35 89 14 416 1
## 55 10 22 19 34 16 3.25 85 20 286 1
## 56 1 9 15 34 41 4.06 77 23 488 1
## 57 47 33 11 8 0 1.82 93 11 656 3
## 58 58 26 12 3 1 1.64 88 19 577 3
## 59 43 27 19 10 1 2.01 100 5 523 3
## 60 27 35 21 16 2 2.31 98 10 672 3
## 61 45 34 14 6 1 1.85 98 2 678 3
## 62 47 32 12 8 2 1.85 94 15 680 3
## 63 64 20 8 7 0 1.59 93 15 700 3
## 64 61 19 15 4 2 1.67 100 7 396 3
## 65 15 36 27 19 2 2.57 93 13 495 3
## 66 59 26 9 5 1 1.63 94 10 929 3
## 67 46 28 16 10 0 1.91 87 20 205 3
## 68 0 0 0 0 0 NA 92 8 570 3
## 69 0 0 50 50 0 3.50 84 16 485 2
## 70 39 24 15 20 2 2.22 96 4 410 2
## 71 40 42 14 3 0 1.81 100 0 224 2
## 72 19 30 26 23 2 2.58 91 13 306 2
## 73 24 34 32 10 0 2.27 96 4 447 2
## 74 34 31 15 19 1 2.22 93 7 432 2
## 75 24 42 22 7 5 2.27 91 9 544 2
## 76 21 24 17 25 12 2.83 100 0 547 2
## 77 0 19 19 29 33 3.76 87 13 313 1
## 78 64 30 4 2 0 1.44 93 7 705 2
## 79 22 29 23 21 6 2.60 63 32 182 2
## 80 5 21 20 44 10 3.32 58 25 166 1
## 81 1 8 13 41 36 4.02 83 13 404 1
## 82 19 24 32 21 4 2.67 80 25 626 2
## 83 10 28 40 15 7 2.79 88 12 242 2
## 84 1 14 30 37 19 3.60 86 14 267 1
## 85 25 36 26 10 4 2.31 46 42 558 3
## 86 0 0 33 0 67 4.33 97 6 611 1
## 87 0 0 0 100 0 4.00 100 0 440 1
## 88 0 0 67 33 0 3.33 100 3 381 1
## 89 0 50 50 0 0 2.50 90 3 503 1
## 90 0 0 0 0 0 NA 97 0 466 1
## 91 0 20 0 80 0 3.60 95 5 357 1
## 92 0 0 0 0 0 NA 100 0 388 1
## 93 48 33 13 6 1 1.78 37 53 591 3
## 94 62 25 9 3 0 1.54 73 17 637 3
## 95 23 46 17 14 1 2.24 90 3 403 2
## 96 32 32 24 10 2 2.19 44 47 810 3
## 97 26 27 30 13 4 2.44 83 14 295 2
## 98 29 39 22 7 2 2.13 58 29 837 3
## 99 4 32 23 31 10 3.11 84 9 669 1
## 100 41 36 12 8 3 1.97 80 16 621 3
## 101 16 11 30 30 13 3.13 95 5 467 2
## 102 19 37 26 11 8 2.52 85 15 262 2
## 103 44 37 14 5 0 1.81 48 31 795 3
## 104 7 39 25 26 3 2.79 45 41 750 3
## 105 55 26 13 5 1 1.71 68 32 590 3
## 106 33 47 14 4 2 1.96 78 15 1029 3
## 107 24 25 24 22 5 2.59 59 23 891 2
## 108 0 10 36 48 6 3.49 73 19 285 2
## 109 5 17 15 40 23 3.58 82 18 393 2
## 110 53 28 12 5 2 1.75 84 8 419 3
## 111 13 42 19 21 5 2.64 81 12 353 2
## 112 33 33 20 15 0 2.16 67 24 414 3
## 113 0 0 0 0 0 NA 100 0 276 1
## 114 42 50 8 0 0 1.67 86 11 320 3
## 115 33 32 22 8 5 2.21 72 21 717 3
## 116 10 40 28 19 3 2.66 85 15 278 2
## 117 48 32 10 11 1 1.85 75 20 631 3
## 118 28 43 14 14 1 2.18 71 25 852 3
## 119 4 25 22 35 14 3.31 76 12 405 1
## 120 67 21 8 4 1 1.50 62 28 1264 3
## 121 29 35 23 11 3 2.23 68 24 629 3
## 122 10 9 10 32 39 3.81 100 0 300 1
## 123 18 36 16 25 5 2.64 81 12 361 2
## 124 49 29 9 5 8 1.96 50 36 859 3
## 125 3 6 52 27 13 3.39 73 19 321 2
## 126 19 26 24 22 8 2.73 90 7 380 2
## 127 11 19 15 52 3 3.16 81 19 277 2
## 128 1 18 34 29 19 3.48 84 8 283 1
## 129 32 36 18 11 2 2.14 67 29 307 3
## 130 43 30 17 7 3 1.97 48 42 1036 3
## 131 0 100 0 0 0 2.00 59 33 699 3
## 132 36 27 15 13 9 2.31 64 30 743 3
## 133 4 17 28 38 13 3.39 64 20 371 2
## 134 45 36 11 8 1 1.84 66 23 1059 3
## 135 52 43 2 3 0 1.56 86 7 244 3
## 136 7 17 17 31 29 3.58 80 7 388 2
## 137 7 25 28 36 4 3.05 50 35 528 2
## 138 10 26 23 26 14 3.07 81 15 328 1
## 139 31 33 14 16 5 2.30 59 26 492 3
## 140 67 21 7 4 1 1.50 53 32 1013 3
## 141 19 24 13 38 6 2.88 80 10 848 2
## 142 52 32 9 6 1 1.73 77 17 621 3
## 143 13 31 17 30 9 2.90 83 17 336 2
## 144 5 48 10 32 6 2.85 90 10 455 1
## 145 26 27 12 32 1 2.55 45 39 1149 3
## 146 43 35 12 8 2 1.92 80 12 290 3
## 147 9 21 18 29 24 3.38 61 21 726 3
## 148 1 2 7 37 54 4.41 95 0 436 1
## 149 19 19 29 29 3 2.77 63 27 403 2
## 150 11 34 19 30 7 2.87 85 9 807 1
## 151 46 25 16 12 1 1.96 56 38 637 2
## 152 14 40 22 22 3 2.60 73 27 306 2
## 153 8 31 33 25 3 2.84 73 29 715 3
## 154 12 20 19 31 18 3.22 80 15 559 2
## 155 28 43 19 7 2 2.12 61 24 514 3
## 156 79 17 3 2 0 1.28 47 53 617 3
## 157 19 36 17 25 4 2.57 61 30 570 3
## 158 3 20 22 48 7 3.37 64 30 428 2
## 159 46 29 13 7 4 1.94 77 25 833 3
## 160 98 1 1 1 0 1.06 58 28 821 3
## 161 2 48 25 15 11 2.85 81 6 259 3
## 162 23 41 13 16 7 2.43 66 28 761 3
## 163 49 35 10 6 1 1.75 67 23 1529 3
## 164 0 100 0 0 0 2.00 57 30 1127 3
## 165 32 30 21 16 2 2.26 69 27 1009 3
## 166 1 6 21 55 16 3.78 81 11 326 1
## 167 10 48 10 33 0 2.67 42 39 446 3
## 168 44 30 18 5 4 1.94 75 17 450 3
## 169 11 44 25 19 2 2.57 96 11 373 2
## 170 42 21 19 16 3 2.18 80 10 443 3
## 171 7 25 16 43 10 3.24 56 33 722 2
## 172 74 18 7 1 0 1.34 60 33 1009 3
## 173 71 19 5 4 1 1.44 65 31 943 3
## 174 0 50 25 25 0 2.75 67 24 167 2
## 175 61 23 10 4 3 1.66 62 26 768 3
## 176 41 28 17 13 2 2.06 73 21 965 3
## 177 7 29 28 25 10 3.02 95 5 239 2
## 178 10 24 35 30 2 2.90 85 19 297 3
## 179 32 32 23 6 6 2.22 52 33 587 3
## 180 3 5 13 38 41 4.08 94 6 235 1
## 181 29 37 15 17 3 2.28 76 17 651 3
## 182 42 32 17 9 0 1.93 75 21 787 3
## 183 43 41 12 3 1 1.77 67 21 259 3
## 184 22 47 25 5 1 2.13 72 17 213 3
## 185 46 31 2 16 4 2.02 81 14 372 3
## 186 18 52 14 16 0 2.29 79 21 333 3
## 187 45 36 11 8 0 1.82 67 33 1112 3
## 188 7 19 28 34 13 3.26 69 23 367 2
## 189 26 17 16 28 13 2.84 68 32 401 3
## 190 48 27 13 11 1 1.89 69 27 739 3
## 191 49 25 17 7 1 1.87 79 15 484 3
## 192 40 60 0 0 0 1.60 46 42 364 3
## 193 33 0 0 67 0 3.00 54 40 1004 3
## 194 13 24 31 25 6 2.86 82 15 464 2
## 195 37 30 14 18 1 2.16 59 39 922 3
## 196 1 21 19 43 15 3.50 93 5 615 1
## 197 67 0 0 0 33 2.33 83 12 1081 3
## 198 39 29 16 13 2 2.09 61 33 338 3
## 199 42 26 15 15 3 2.13 75 20 1007 3
## 200 58 34 5 2 1 1.53 57 36 151 2
## 201 43 30 15 10 2 1.97 73 18 245 3
## 202 89 11 0 0 0 1.11 51 37 704 3
## 203 33 36 12 17 2 2.20 90 10 236 2
## 204 39 41 12 8 1 1.90 79 24 499 2
## 205 28 56 12 3 1 1.94 50 31 703 3
## 206 41 35 15 7 1 1.92 65 26 738 3
## 207 6 12 20 46 16 3.54 70 17 665 2
## 208 4 23 25 36 12 3.29 81 22 508 2
## 209 2 39 18 34 7 3.05 94 13 355 1
## 210 37 33 14 13 2 2.09 65 29 1570 3
## 211 2 6 11 29 53 4.25 92 8 311 1
## 212 0 2 18 51 30 4.07 100 3 517 1
## 213 10 24 40 19 7 2.90 100 3 510 1
## 214 5 7 27 40 22 3.68 100 3 422 1
## 215 0 3 14 39 43 4.22 93 4 382 1
## 216 4 26 42 23 5 3.00 89 11 240 1
## 217 6 25 34 26 8 3.05 100 0 384 1
## 218 7 35 38 15 5 2.77 100 4 333 1
## 219 0 0 0 0 0 NA 91 15 419 2
## 220 55 21 16 6 2 1.79 94 6 396 3
## 221 7 18 36 27 11 3.17 100 0 187 1
## 222 20 17 33 20 10 2.83 91 13 235 2
## 223 0 4 10 38 48 4.29 100 0 523 1
## 224 30 23 26 15 6 2.46 87 13 198 2
## 225 31 44 16 8 1 2.03 76 2 602 2
## 226 34 42 21 4 0 1.94 57 22 413 2
## 227 12 57 22 8 1 2.30 64 29 494 3
## 228 19 30 24 20 8 2.67 95 0 255 2
## 229 4 25 24 31 16 3.29 100 0 179 1
## 230 31 32 31 5 1 2.13 78 11 322 2
## 231 20 31 31 12 5 2.50 100 0 191 2
## 232 11 32 19 35 4 2.88 58 25 595 3
## 233 51 24 17 5 3 1.85 69 15 643 3
## 234 30 55 9 6 0 1.92 45 31 367 2
## 235 2 15 32 34 18 3.51 93 7 196 1
## 236 0 13 17 34 36 3.92 95 5 213 1
## 237 0 0 0 0 0 NA 61 44 198 2
## 238 0 0 0 0 0 NA 44 48 333 2
## 239 1 28 23 39 8 3.25 75 25 547 1
## 240 47 32 16 3 1 1.79 53 44 684 3
## 241 19 30 28 17 6 2.61 86 14 312 2
## 242 1 2 10 40 47 4.32 100 0 696 1
## 243 0 4 16 40 40 4.16 100 7 393 1
## 244 0 3 21 41 34 4.05 100 0 462 1
## 245 0 6 24 38 31 3.94 100 2 672 1
## 246 8 9 35 29 20 3.43 100 0 541 1
## 247 0 2 16 43 38 4.17 100 0 606 1
## 248 10 68 18 5 0 2.17 100 4 332 2
## 249 46 44 8 3 0 1.69 100 0 334 3
## 250 6 64 23 6 0 2.32 100 0 495 2
## 251 2 90 7 1 0 2.07 100 0 319 2
## 252 28 42 24 6 0 2.08 100 0 351 2
## 253 5 65 25 4 1 2.30 100 0 318 2
## 254 25 31 28 14 3 2.40 94 16 384 2
## 255 15 18 38 17 13 2.96 92 11 554 1
## 256 24 20 28 10 17 2.76 87 23 573 2
## 257 24 26 36 10 4 2.44 84 16 243 2
## 258 21 27 33 12 7 2.57 91 15 555 2
## 259 3 12 41 27 17 3.41 89 16 617 1
## 260 39 29 22 6 3 2.06 87 23 541 2
## 261 14 33 42 7 4 2.54 89 17 515 2
## 262 7 19 38 18 19 3.24 96 11 490 1
## 263 0 46 25 23 5 2.87 94 12 207 1
## 264 22 26 30 20 3 2.57 96 15 657 2
## 265 31 29 22 17 1 2.28 100 17 645 3
## 266 37 31 15 16 2 2.14 100 6 434 3
## 267 4 15 46 23 12 3.23 100 11 309 2
## 268 0 85 11 2 2 2.22 100 13 222 3
## 269 0 22 33 36 9 3.33 100 11 401 2
## 270 10 26 21 30 13 3.09 100 0 329 1
## 271 28 25 28 15 4 2.40 100 5 230 3
## 272 75 15 0 0 10 1.55 100 18 170 3
## 273 19 41 30 8 3 2.36 96 4 358 3
## 274 8 31 22 29 10 3.02 100 8 133 2
## 275 0 7 17 35 42 4.12 100 0 406 1
## 276 16 36 24 22 2 2.59 100 9 292 2
## 277 15 43 21 16 5 2.54 98 3 560 1
## 278 6 25 27 35 9 3.16 97 6 410 1
## 279 4 31 25 36 4 3.04 97 0 586 1
## 280 16 27 23 29 6 2.82 95 7 539 1
## 281 8 20 24 39 9 3.23 100 8 203 1
## 282 31 25 16 23 5 2.47 90 10 519 2
## 283 3 23 24 35 16 3.39 97 3 510 1
## 284 53 27 15 4 1 1.71 73 29 590 3
## 285 30 35 22 9 4 2.23 85 18 522 3
## 286 19 33 22 19 7 2.62 98 0 583 2
## 287 55 31 12 2 0 1.61 87 15 645 3
## 288 28 24 34 11 3 2.38 100 5 277 2
## 289 25 28 30 12 4 2.42 85 13 607 2
## 290 23 32 30 10 4 2.40 86 14 439 2
## 291 60 23 14 3 1 1.61 76 18 583 3
## 292 48 31 17 3 0 1.77 81 19 470 3
## 293 61 18 15 4 1 1.68 79 4 210 3
## 294 7 22 29 33 9 3.15 98 0 548 1
## 295 10 25 36 20 9 2.94 100 0 135 2
## 296 19 13 8 26 33 3.41 100 0 231 1
## 297 7 18 36 20 19 3.25 96 4 295 2
## 298 12 27 27 29 6 2.90 100 2 645 2
## 299 21 26 32 16 7 2.62 100 0 438 2
## 300 3 5 11 27 54 4.24 100 0 469 1
## 301 12 26 25 32 5 2.92 100 2 575 1
## 302 44 25 15 13 3 2.07 90 0 512 3
## 303 1 55 31 10 4 2.61 98 2 603 2
## 304 10 16 41 16 18 3.16 100 0 130 2
## 305 55 31 10 3 1 1.65 85 3 726 3
## 306 6 17 36 30 10 3.21 98 4 525 2
## 307 4 17 27 29 23 3.51 100 0 344 1
## 308 45 31 18 5 1 1.88 97 3 400 3
## 309 11 20 39 21 9 2.98 95 5 253 2
## 310 26 31 25 14 4 2.38 82 7 354 2
## 311 45 29 12 8 5 1.99 97 0 350 3
## 312 10 21 33 28 9 3.05 100 0 612 2
## 313 2 27 44 23 4 3.00 100 0 543 2
## 314 1 10 9 52 28 3.98 97 3 556 1
## 315 5 18 29 26 21 3.40 100 0 378 1
## 316 3 14 19 33 31 3.74 100 0 256 1
## 317 70 19 8 3 1 1.46 87 7 410 3
## 318 53 35 8 4 0 1.65 90 3 763 3
## 319 24 29 28 14 6 2.49 92 8 159 2
## 320 47 28 14 8 4 1.94 81 4 459 3
## 321 5 21 41 22 11 3.13 100 2 353 1
## 322 12 33 24 22 9 2.84 41 59 216 2
## 323 9 22 36 24 8 3.00 100 0 266 2
## 324 9 26 34 20 12 3.02 82 14 267 2
## 325 19 53 17 11 1 2.21 81 0 197 2
## 326 2 7 17 37 37 3.99 76 10 237 1
## 327 8 33 23 32 4 2.93 81 11 329 2
## 328 22 39 24 15 0 2.33 69 21 335 3
## 329 31 38 17 13 1 2.14 83 13 206 2
## 330 9 74 11 4 1 2.14 94 0 187 2
## 331 7 14 17 38 24 3.60 95 5 280 1
## 332 1 7 12 35 45 4.15 87 10 448 1
## 333 9 36 35 16 4 2.68 94 6 229 2
## 334 31 38 15 13 3 2.17 78 6 215 2
## 335 47 37 7 6 3 1.80 90 0 189 3
## 336 0 6 14 43 36 4.08 77 17 404 1
## 337 4 9 14 45 28 3.84 80 0 222 1
## 338 0 36 29 29 7 3.07 63 26 240 2
## 339 0 3 10 46 40 4.23 78 5 602 1
## 340 2 5 13 54 27 3.99 96 0 387 1
## 341 5 17 28 38 11 3.33 96 4 350 1
## 342 41 30 18 9 2 2.02 100 0 327 2
## 343 32 24 24 14 5 2.37 92 0 257 2
## 344 6 15 33 27 18 3.36 95 5 281 1
## 345 16 18 28 17 21 3.09 71 8 344 1
## 346 28 44 14 13 1 2.14 86 9 310 2
## 347 1 17 38 36 8 3.32 100 0 337 1
## 348 1 24 33 30 11 3.26 95 5 317 1
## 349 8 30 34 24 4 2.87 95 5 297 2
## 350 30 31 25 13 1 2.26 85 10 269 3
## 351 16 28 39 12 4 2.60 86 10 290 2
## 352 0 9 24 39 28 3.85 100 0 355 1
## 353 7 34 37 18 4 2.77 100 0 339 2
## 354 1 14 28 35 22 3.64 96 4 394 1
## 355 11 24 33 22 11 2.98 80 15 300 1
## 356 10 35 25 23 8 2.85 100 0 146 1
## 357 17 38 35 9 2 2.40 75 10 321 2
## 358 6 28 38 21 7 2.95 92 8 367 2
## 359 10 26 33 27 4 2.90 90 7 457 2
## 360 1 6 27 48 18 3.75 88 8 318 1
## 361 3 11 28 29 30 3.71 95 0 399 1
## 362 33 26 25 13 3 2.27 79 15 410 3
## 363 1 15 25 38 21 3.62 95 5 312 1
## 364 7 59 29 6 0 2.34 95 5 222 2
## 365 43 37 16 3 1 1.80 83 17 602 3
## 366 54 28 14 3 1 1.70 75 24 779 3
## 367 83 8 8 0 0 1.25 71 29 590 3
## 368 46 33 15 4 3 1.86 83 17 611 3
## 369 10 24 43 14 9 2.88 91 9 695 1
## 370 22 26 32 10 10 2.60 88 12 688 2
## 371 58 30 10 1 0 1.56 70 30 593 3
## 372 50 50 0 0 0 1.50 96 0 368 1
## 373 0 0 0 0 0 NA 100 0 451 1
## 374 50 0 0 50 0 2.50 100 0 657 1
## 375 0 0 0 0 0 NA 96 0 374 1
## 376 0 0 0 0 0 NA 100 0 397 1
## 377 0 0 0 100 0 4.00 97 3 392 1
## 378 31 23 29 14 4 2.37 80 13 352 3
## 379 75 9 10 5 1 1.47 83 12 535 3
## 380 33 17 20 20 10 2.57 83 17 657 3
## 381 22 31 29 16 2 2.45 92 8 532 2
## 382 41 26 22 8 2 2.03 80 9 612 3
## 383 18 44 25 13 1 2.36 93 5 559 2
## 384 40 20 13 28 0 2.28 80 17 363 3
## 385 50 0 0 50 0 2.50 80 8 300 3
## 386 27 39 15 18 1 2.27 82 12 331 3
## 387 32 26 25 13 4 2.31 72 21 704 3
## 388 13 25 40 17 4 2.73 74 15 466 3
## 389 21 35 21 19 4 2.50 100 0 364 2
## 390 8 42 24 20 7 2.75 100 0 360 1
## 391 36 37 16 9 2 2.03 96 4 657 2
## 392 52 36 6 6 0 1.66 100 3 391 3
## 393 40 45 11 4 0 1.79 88 12 218 3
## 394 27 46 16 9 1 2.12 80 6 400 3
## 395 7 38 22 23 10 2.89 96 8 317 1
## 396 5 29 23 37 7 3.12 95 5 266 1
## 397 8 34 26 26 6 2.88 85 7 461 1
## 398 3 27 21 37 12 3.29 91 5 360 1
## 399 10 75 13 1 0 2.06 93 7 301 3
## 400 31 48 11 9 1 2.01 83 6 269 3
## collcat abv_hsg lgenroll
## 1 1 100 2.392697
## 2 1 100 2.665581
## 3 1 100 2.596597
## 4 1 64 2.621176
## 5 1 50 2.716003
## 6 2 99 2.535294
## 7 2 99 2.481443
## 8 2 100 3.179839
## 9 2 98 2.819544
## 10 3 92 2.558709
## 11 2 100 2.885361
## 12 1 39 2.606381
## 13 2 99 2.767898
## 14 3 95 2.801404
## 15 3 95 2.578639
## 16 2 90 2.620136
## 17 2 99 2.826075
## 18 3 86 2.770115
## 19 3 98 2.886491
## 20 3 88 2.841359
## 21 3 81 2.786041
## 22 3 89 2.773786
## 23 3 85 2.751279
## 24 2 68 2.809560
## 25 1 100 2.617000
## 26 1 100 2.800029
## 27 1 100 2.535294
## 28 1 100 2.606381
## 29 1 100 2.702431
## 30 1 100 2.709270
## 31 1 100 2.804139
## 32 3 50 2.773786
## 33 3 78 2.698970
## 34 1 100 2.505150
## 35 3 87 2.489958
## 36 1 100 2.873902
## 37 1 0 2.721811
## 38 2 94 2.667453
## 39 3 91 2.654177
## 40 3 96 2.635484
## 41 2 60 2.719331
## 42 3 90 2.630428
## 43 3 96 2.749736
## 44 1 100 2.484300
## 45 2 98 2.669317
## 46 1 100 2.716003
## 47 3 96 2.911690
## 48 3 99 2.919078
## 49 3 95 2.687529
## 50 3 94 2.487138
## 51 3 67 2.913284
## 52 2 68 2.871573
## 53 3 63 2.827369
## 54 3 93 2.619093
## 55 2 90 2.456366
## 56 2 99 2.688420
## 57 1 53 2.816904
## 58 1 42 2.761176
## 59 2 57 2.718502
## 60 2 73 2.827369
## 61 1 55 2.831230
## 62 1 53 2.832509
## 63 1 36 2.845098
## 64 2 39 2.597695
## 65 3 85 2.694605
## 66 1 41 2.968016
## 67 2 54 2.311754
## 68 1 100 2.755875
## 69 3 100 2.685742
## 70 2 61 2.612784
## 71 1 60 2.350248
## 72 3 81 2.485721
## 73 3 76 2.650308
## 74 2 66 2.635484
## 75 2 76 2.735599
## 76 2 79 2.737987
## 77 2 100 2.495544
## 78 1 36 2.848189
## 79 2 78 2.260071
## 80 2 95 2.220108
## 81 1 99 2.606381
## 82 3 81 2.796574
## 83 3 90 2.383815
## 84 3 99 2.426511
## 85 3 75 2.746634
## 86 3 100 2.786041
## 87 1 100 2.643453
## 88 3 100 2.580925
## 89 3 100 2.701568
## 90 1 100 2.668386
## 91 1 100 2.552668
## 92 1 100 2.588832
## 93 1 52 2.771587
## 94 1 38 2.804139
## 95 2 77 2.605305
## 96 2 68 2.908485
## 97 3 74 2.469822
## 98 2 71 2.922725
## 99 2 96 2.825426
## 100 1 59 2.793092
## 101 3 84 2.669317
## 102 3 81 2.418301
## 103 1 56 2.900367
## 104 3 93 2.875061
## 105 1 45 2.770852
## 106 1 67 3.012415
## 107 2 76 2.949878
## 108 3 100 2.454845
## 109 2 95 2.594393
## 110 1 47 2.622214
## 111 2 87 2.547775
## 112 2 67 2.617000
## 113 1 100 2.440909
## 114 1 58 2.505150
## 115 2 67 2.855519
## 116 3 90 2.444045
## 117 1 52 2.800029
## 118 1 72 2.930440
## 119 2 96 2.607455
## 120 1 33 3.101747
## 121 2 71 2.798651
## 122 1 90 2.477121
## 123 2 82 2.557507
## 124 1 51 2.933993
## 125 3 97 2.506505
## 126 2 81 2.579784
## 127 2 89 2.442480
## 128 3 99 2.451786
## 129 2 68 2.487138
## 130 2 57 3.015360
## 131 1 100 2.844477
## 132 2 64 2.870989
## 133 3 96 2.569374
## 134 1 55 3.024896
## 135 1 48 2.387390
## 136 2 93 2.588832
## 137 3 93 2.722634
## 138 2 90 2.515874
## 139 1 69 2.691965
## 140 1 33 3.005609
## 141 1 81 2.928396
## 142 1 48 2.793092
## 143 2 87 2.526339
## 144 1 95 2.658011
## 145 1 74 3.060320
## 146 1 57 2.462398
## 147 2 91 2.860937
## 148 1 99 2.639486
## 149 3 81 2.605305
## 150 2 89 2.906874
## 151 2 54 2.804139
## 152 2 86 2.485721
## 153 3 92 2.854306
## 154 2 88 2.747412
## 155 2 72 2.710963
## 156 1 21 2.790285
## 157 2 81 2.755875
## 158 2 97 2.631444
## 159 1 54 2.920645
## 160 1 2 2.914343
## 161 3 98 2.413300
## 162 1 77 2.881385
## 163 1 51 3.184407
## 164 1 100 3.051924
## 165 2 68 3.003891
## 166 2 99 2.513218
## 167 1 90 2.649335
## 168 2 56 2.653213
## 169 3 89 2.571709
## 170 2 58 2.646404
## 171 2 93 2.858537
## 172 1 26 3.003891
## 173 1 29 2.974512
## 174 3 100 2.222716
## 175 1 39 2.885361
## 176 2 59 2.984527
## 177 3 93 2.378398
## 178 3 90 2.472756
## 179 2 68 2.768638
## 180 1 97 2.371068
## 181 2 71 2.813581
## 182 2 58 2.895975
## 183 1 57 2.413300
## 184 3 78 2.328380
## 185 1 54 2.570543
## 186 1 82 2.522444
## 187 1 55 3.046105
## 188 3 93 2.564666
## 189 2 74 2.603144
## 190 1 52 2.868644
## 191 2 51 2.684845
## 192 1 60 2.561101
## 193 1 67 3.001734
## 194 3 87 2.666518
## 195 1 63 2.964731
## 196 2 99 2.788875
## 197 1 33 3.033826
## 198 2 61 2.528917
## 199 2 58 3.003029
## 200 1 42 2.178977
## 201 2 57 2.389166
## 202 1 11 2.847573
## 203 1 67 2.372912
## 204 1 61 2.698101
## 205 1 72 2.846955
## 206 2 59 2.868056
## 207 2 94 2.822822
## 208 3 96 2.705864
## 209 2 98 2.550228
## 210 1 63 3.195900
## 211 1 98 2.492760
## 212 2 100 2.713491
## 213 3 90 2.707570
## 214 3 95 2.625312
## 215 1 100 2.582063
## 216 3 96 2.380211
## 217 3 94 2.584331
## 218 3 93 2.522444
## 219 1 100 2.622214
## 220 2 45 2.597695
## 221 3 93 2.271842
## 222 3 80 2.371068
## 223 1 100 2.718502
## 224 3 70 2.296665
## 225 2 69 2.779596
## 226 2 66 2.615950
## 227 2 88 2.693727
## 228 2 81 2.406540
## 229 2 96 2.252853
## 230 3 69 2.507856
## 231 3 80 2.281033
## 232 2 89 2.774517
## 233 2 49 2.808211
## 234 1 70 2.564666
## 235 3 98 2.292256
## 236 2 100 2.328380
## 237 1 100 2.296665
## 238 1 100 2.522444
## 239 2 99 2.737987
## 240 2 53 2.835056
## 241 3 81 2.494155
## 242 1 99 2.842609
## 243 2 100 2.594393
## 244 2 100 2.664642
## 245 2 100 2.827369
## 246 3 92 2.733197
## 247 2 100 2.782473
## 248 2 90 2.521138
## 249 1 54 2.523746
## 250 2 94 2.694605
## 251 1 98 2.503791
## 252 2 72 2.545307
## 253 3 95 2.502427
## 254 3 75 2.584331
## 255 3 85 2.743510
## 256 3 76 2.758155
## 257 3 76 2.385606
## 258 3 79 2.744293
## 259 3 97 2.790285
## 260 2 61 2.733197
## 261 3 86 2.711807
## 262 3 93 2.690196
## 263 3 100 2.315970
## 264 3 78 2.817565
## 265 2 69 2.809560
## 266 2 63 2.637490
## 267 3 96 2.489958
## 268 1 100 2.346353
## 269 3 100 2.603144
## 270 2 90 2.517196
## 271 3 72 2.361728
## 272 1 25 2.230449
## 273 3 81 2.553883
## 274 2 92 2.123852
## 275 2 100 2.608526
## 276 2 84 2.465383
## 277 2 85 2.748188
## 278 3 94 2.612784
## 279 3 96 2.767898
## 280 2 84 2.731589
## 281 2 92 2.307496
## 282 2 69 2.715167
## 283 2 97 2.707570
## 284 2 47 2.770852
## 285 2 70 2.717671
## 286 2 81 2.765669
## 287 1 45 2.809560
## 288 3 72 2.442480
## 289 3 75 2.783189
## 290 3 77 2.642465
## 291 1 40 2.765669
## 292 2 52 2.672098
## 293 2 39 2.322219
## 294 3 93 2.738781
## 295 3 90 2.130334
## 296 1 81 2.363612
## 297 3 93 2.469822
## 298 3 88 2.809560
## 299 3 79 2.641474
## 300 1 97 2.671173
## 301 3 88 2.759668
## 302 2 56 2.709270
## 303 3 99 2.780317
## 304 3 90 2.113943
## 305 1 45 2.860937
## 306 3 94 2.720159
## 307 3 96 2.536558
## 308 2 55 2.602060
## 309 3 89 2.403121
## 310 3 74 2.549003
## 311 1 55 2.544068
## 312 3 90 2.786751
## 313 3 98 2.734800
## 314 1 99 2.745075
## 315 3 95 2.577492
## 316 2 97 2.408240
## 317 1 30 2.612784
## 318 1 47 2.882525
## 319 3 76 2.201397
## 320 1 53 2.661813
## 321 3 95 2.547775
## 322 2 88 2.334454
## 323 3 91 2.424882
## 324 3 91 2.426511
## 325 2 81 2.294466
## 326 2 98 2.374748
## 327 2 92 2.517196
## 328 2 78 2.525045
## 329 2 69 2.313867
## 330 1 91 2.271842
## 331 2 93 2.447158
## 332 1 99 2.651278
## 333 3 91 2.359835
## 334 2 69 2.332438
## 335 1 53 2.276462
## 336 1 100 2.606381
## 337 1 96 2.346353
## 338 3 100 2.380211
## 339 1 100 2.779596
## 340 1 98 2.587711
## 341 3 95 2.544068
## 342 2 59 2.514548
## 343 2 68 2.409933
## 344 3 94 2.448706
## 345 3 84 2.536558
## 346 1 72 2.491362
## 347 3 99 2.527630
## 348 3 99 2.501059
## 349 3 92 2.472756
## 350 3 70 2.429752
## 351 3 84 2.462398
## 352 2 100 2.550228
## 353 3 93 2.530200
## 354 3 99 2.595496
## 355 3 89 2.477121
## 356 3 90 2.164353
## 357 3 83 2.506505
## 358 3 94 2.564666
## 359 3 90 2.659916
## 360 3 99 2.502427
## 361 3 97 2.600973
## 362 3 67 2.612784
## 363 3 99 2.494155
## 364 3 93 2.346353
## 365 2 57 2.779596
## 366 1 46 2.891537
## 367 1 17 2.770852
## 368 2 54 2.786041
## 369 3 90 2.841985
## 370 3 78 2.837588
## 371 1 42 2.773055
## 372 1 50 2.565848
## 373 1 100 2.654177
## 374 1 50 2.817565
## 375 1 100 2.572872
## 376 1 100 2.598791
## 377 1 100 2.593286
## 378 3 69 2.546543
## 379 1 25 2.728354
## 380 2 67 2.817565
## 381 3 78 2.725912
## 382 2 59 2.786751
## 383 3 82 2.747412
## 384 1 60 2.559907
## 385 1 50 2.477121
## 386 2 73 2.519828
## 387 3 68 2.847573
## 388 3 87 2.668386
## 389 2 79 2.561101
## 390 2 92 2.556303
## 391 2 64 2.817565
## 392 1 48 2.592177
## 393 1 60 2.338456
## 394 2 73 2.602060
## 395 2 93 2.501059
## 396 2 95 2.424882
## 397 3 92 2.663701
## 398 2 97 2.556303
## 399 1 90 2.478566
## 400 1 69 2.429752
Descriptive analysis will help us to understand our data better and investigate if there are any problems in the data. In this workshop we skip this part, but in a real study descriptive analysis is an important part of a good data analysis.
In Descriptive analysis we usually get the Central Tendency and Measurement of variance.
str(d)
## 'data.frame': 400 obs. of 24 variables:
## $ snum : int 906 889 887 876 888 4284 4271 2910 2899 2887 ...
## $ dnum : int 41 41 41 41 41 98 98 108 108 108 ...
## $ api00 : int 693 570 546 571 478 858 918 831 860 737 ...
## $ api99 : int 600 501 472 487 425 844 864 791 838 703 ...
## $ growth : int 93 69 74 84 53 14 54 40 22 34 ...
## $ meals : int 67 92 97 90 89 10 5 2 5 29 ...
## $ ell : int 9 21 29 27 30 3 2 3 6 15 ...
## $ yr_rnd : int 0 0 0 0 0 0 0 0 0 0 ...
## $ mobility: int 11 33 36 27 44 10 16 44 10 17 ...
## $ acs_k3 : int 16 15 17 20 18 20 19 20 20 21 ...
## $ acs_46 : int 22 32 25 30 31 33 28 31 30 29 ...
## $ not_hsg : int 0 0 0 36 50 1 1 0 2 8 ...
## $ hsg : int 0 0 0 45 50 8 4 4 9 25 ...
## $ some_col: int 0 0 0 9 0 24 18 16 15 34 ...
## $ col_grad: int 0 0 0 9 0 36 34 50 42 27 ...
## $ grad_sch: int 0 0 0 0 0 31 43 30 33 7 ...
## $ avg_ed : num NA NA NA 1.91 1.5 ...
## $ full : int 76 79 68 87 87 100 100 96 100 96 ...
## $ emer : int 24 19 29 11 13 0 0 2 0 7 ...
## $ enroll : int 247 463 395 418 520 343 303 1513 660 362 ...
## $ mealcat : int 2 3 3 3 3 1 1 1 1 1 ...
## $ collcat : int 1 1 1 1 1 2 2 2 2 3 ...
## $ abv_hsg : int 100 100 100 64 50 99 99 100 98 92 ...
## $ lgenroll: num 2.39 2.67 2.6 2.62 2.72 ...
summary(d)
## snum dnum api00 api99
## Min. : 58 Min. : 41.0 Min. :369.0 Min. :333.0
## 1st Qu.:1720 1st Qu.:395.0 1st Qu.:523.8 1st Qu.:484.8
## Median :3008 Median :401.0 Median :643.0 Median :602.0
## Mean :2867 Mean :457.7 Mean :647.6 Mean :610.2
## 3rd Qu.:4198 3rd Qu.:630.0 3rd Qu.:762.2 3rd Qu.:731.2
## Max. :6072 Max. :796.0 Max. :940.0 Max. :917.0
##
## growth meals ell yr_rnd
## Min. :-69.00 Min. : 0.00 Min. : 0.00 Min. :0.00
## 1st Qu.: 19.00 1st Qu.: 31.00 1st Qu.: 9.75 1st Qu.:0.00
## Median : 36.00 Median : 67.50 Median :25.00 Median :0.00
## Mean : 37.41 Mean : 60.31 Mean :31.45 Mean :0.23
## 3rd Qu.: 53.25 3rd Qu.: 90.00 3rd Qu.:50.25 3rd Qu.:0.00
## Max. :134.00 Max. :100.00 Max. :91.00 Max. :1.00
##
## mobility acs_k3 acs_46 not_hsg
## Min. : 2.00 Min. :14.00 Min. :20.00 Min. : 0.00
## 1st Qu.:13.00 1st Qu.:18.00 1st Qu.:27.00 1st Qu.: 4.00
## Median :17.00 Median :19.00 Median :29.00 Median : 14.00
## Mean :18.25 Mean :19.16 Mean :29.69 Mean : 21.25
## 3rd Qu.:22.00 3rd Qu.:20.00 3rd Qu.:31.00 3rd Qu.: 34.00
## Max. :47.00 Max. :25.00 Max. :50.00 Max. :100.00
## NA's :1 NA's :2 NA's :3
## hsg some_col col_grad grad_sch
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.000
## 1st Qu.: 17.00 1st Qu.:12.00 1st Qu.: 7.0 1st Qu.: 1.000
## Median : 26.00 Median :19.00 Median : 16.0 Median : 4.000
## Mean : 26.02 Mean :19.71 Mean : 19.7 Mean : 8.637
## 3rd Qu.: 34.00 3rd Qu.:28.00 3rd Qu.: 30.0 3rd Qu.:10.000
## Max. :100.00 Max. :67.00 Max. :100.0 Max. :67.000
##
## avg_ed full emer enroll
## Min. :1.000 Min. : 37.00 Min. : 0.00 Min. : 130.0
## 1st Qu.:2.070 1st Qu.: 76.00 1st Qu.: 3.00 1st Qu.: 320.0
## Median :2.600 Median : 88.00 Median :10.00 Median : 435.0
## Mean :2.668 Mean : 84.55 Mean :12.66 Mean : 483.5
## 3rd Qu.:3.220 3rd Qu.: 97.00 3rd Qu.:19.00 3rd Qu.: 608.0
## Max. :4.620 Max. :100.00 Max. :59.00 Max. :1570.0
## NA's :19
## mealcat collcat abv_hsg lgenroll
## Min. :1.000 Min. :1.00 Min. : 0.00 Min. :2.114
## 1st Qu.:1.000 1st Qu.:1.00 1st Qu.: 66.00 1st Qu.:2.505
## Median :2.000 Median :2.00 Median : 86.00 Median :2.638
## Mean :2.015 Mean :2.02 Mean : 78.75 Mean :2.640
## 3rd Qu.:3.000 3rd Qu.:3.00 3rd Qu.: 96.00 3rd Qu.:2.784
## Max. :3.000 Max. :3.00 Max. :100.00 Max. :3.196
##
is used to fit models, carry out regression, analysis of variance and covariance.
In our data example we are interested to study the relationship between student academic performance and characteristics of the school. For example we can use variable api00, a school-wide measure of academic performance, as the outcome, and variable enroll, the number of students in the school, as the predictor.
for our example, let us use the api00 for academic performance and full for emer, percentage of teachers with emergency credential.
making sure our data is an lm function
m1 <- lm(api00~ell
, data =d)
class( m1)
## [1] "lm"
print(m1)
##
## Call:
## lm(formula = api00 ~ ell, data = d)
##
## Coefficients:
## (Intercept) ell
## 785.890 -4.396
The estimated linear function is: \[ ap\hat{i}00 = 785.890 -4.396(ell) \] The coefficient for \(ell\) is -4.4, hence, for every one unit increase in ell, we would expect a 4.4 decrease in apio. Example, for a 200 english language learners, we would expect, on average, to have an api00 score 440 less than a 300 english language learners.
summary(m1)
##
## Call:
## lm(formula = api00 ~ ell, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -262.741 -63.605 1.443 68.242 212.310
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 785.890 7.370 106.64 <2e-16 ***
## ell -4.396 0.184 -23.89 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 91.28 on 398 degrees of freedom
## Multiple R-squared: 0.5893, Adjusted R-squared: 0.5882
## F-statistic: 571 on 1 and 398 DF, p-value: < 2.2e-16
we can see in the summary that both the intercept and slope are statistically signficant.
Multiple R-squared of the model is equal to 0.5893 and adjust R-squared is 0.5992, which is adjusted for number of predictors.
In the simple linear regression model \(R^2\) is equal to the square of the correlation between the response and predictor variables. We can run the function cor() to confirm this.
r<-cor(d$ell,d$api00)
r^2
## [1] 0.5892615
The last line gives the overall F-statistic, testing the fit of the current model against the null model, the model with only an intercept. F=44.83 with 1 and 398 degrees of freedom and with p-value equal to 7.339e-11. This p-value in a simple regression model is exactly equal to p-value of the slope, We can check the anova by using the function aov().
anova(m1)
## Analysis of Variance Table
##
## Response: api00
## Df Sum Sq Mean Sq F value Pr(>F)
## ell 1 4757504 4757504 570.99 < 2.2e-16 ***
## Residuals 398 3316168 8332
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As we said the output of the lm() function is an object of class lm. We can use functions ls() or str() to list the components of object m1.
ls(m1)
## [1] "assign" "call" "coefficients" "df.residual"
## [5] "effects" "fitted.values" "model" "qr"
## [9] "rank" "residuals" "terms" "xlevels"
m1$coefficients
## (Intercept) ell
## 785.890265 -4.396082
a vector of fitted values
m1$fitted.values[1:10]
## 1 2 3 4 5 6 7 8
## 746.3255 693.5725 658.4039 667.1961 654.0078 772.7020 777.0981 772.7020
## 9 10
## 759.5138 719.9490
Storing extracted components to new object
residuals <- m1$resid
getting confident intervals for the coefficient of the model,
confint(m1)
## 2.5 % 97.5 %
## (Intercept) 771.401848 800.378681
## ell -4.757761 -4.034403
Using anova to extract sum of squares model
anova(m1)
## Analysis of Variance Table
##
## Response: api00
## Df Sum Sq Mean Sq F value Pr(>F)
## ell 1 4757504 4757504 570.99 < 2.2e-16 ***
## Residuals 398 3316168 8332
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The F-value in the anova states that the relationship has significance.
Another important function is predict() which can be use to get predicted value for new data points.
new.data <- data.frame(ell = c(500, 600, 700))
predict(m1, newdata = new.data)
## 1 2 3
## -1412.151 -1851.759 -2291.367
library(sjPlot)
## Warning: package 'sjPlot' was built under R version 4.3.2
## Learn more about sjPlot with 'browseVignettes("sjPlot")'.
tab_model(m1)
| api00 | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 785.89 | 771.40 – 800.38 | <0.001 |
| ell | -4.40 | -4.76 – -4.03 | <0.001 |
| Observations | 400 | ||
| R2 / R2 adjusted | 0.589 / 0.588 | ||
plot(api00 ~ ell, data = d)
abline(m1, col = "blue")
It appears in the figure that there are no outliers,
plot(api00 ~ ell, data = d)
text(d$ell, d$api00+20, labels = d$snum, cex = .7)
abline(m1, col = "blue")
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.3.2
ggplot(d, aes(x = ell, y = api00)) +
geom_point() +
stat_smooth(method = "lm", col = "red")
## `geom_smooth()` using formula = 'y ~ x'
### Exercise 1 In the data set elemapi2v2 the variable full is the
percentage teachers with full credential for each school. Run the
regression model of api00 on full. Use this model to predict the mean of
api00 for full equal to 25%, 50%, 75%, and 90%.
Is the predicted value for full =25% valid?
In regression analysis, it is possible to include categorical predictors.
R uses object class factor to store and manipulate categorical variables. The lm function automatically treat variable type character as factor, but it is always safer to change the variable’s class to factor before the analysis. The function factor() is used to encode a variable (a vector) as a factor.
mealcat_F <-factor(d$mealcat)
str(mealcat_F)
## Factor w/ 3 levels "1","2","3": 2 3 3 3 3 1 1 1 1 1 ...
d$yr_rnd_F <- factor(d$yr_rnd)
levels(d$yr_rnd_F)<-c("NO","Yes")
table(d$yr_rnd_F)
##
## NO Yes
## 308 92
In R when we include a factor as a predictor to the model, the lm function by default generates dummy variables for each category of the factor. A dummy variable is a variable that takes values 0 and 1. If we are in that category the dummy variable is 1 and if we are not in that category the dummy variable is 0.The number of dummy variables is equal to the number of levels for the categorical variable minus 1. For example if we have 3 levels for the categorical variables, lm generates 2 dummy variables. If value for both dummy variable is 0 then we are in the third category. This third category is called the reference category.
###Regression of api00 on yr_rnd a two level categorical variable
m2<-lm(api00~yr_rnd_F, data=d)
summary(m2)
##
## Call:
## lm(formula = api00 ~ yr_rnd_F, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -273.539 -95.662 0.967 103.341 297.967
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 684.54 7.14 95.88 <2e-16 ***
## yr_rnd_FYes -160.51 14.89 -10.78 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 125.3 on 398 degrees of freedom
## Multiple R-squared: 0.226, Adjusted R-squared: 0.2241
## F-statistic: 116.2 on 1 and 398 DF, p-value: < 2.2e-16
model.matrix(m2)[1:6,]
## (Intercept) yr_rnd_FYes
## 1 1 0
## 2 1 0
## 3 1 0
## 4 1 0
## 5 1 0
## 6 1 0
by writing out the regrresion equation, we have
\[api00 = 684.539 - 160.5064 \times yr\_rnd\_F(YES)\]
If a school is not a year-round school (i.e. yr_rnd_F is 0) the regression equation would simplify to
\[api00 = 684.539\]
If a school is a year-round school, the regression equation would simplify to \[api00=524.0326\] This indicates the mean of api00 given the school is not a year-round school is 524.0326.
In general when we have a two level categorical variable the intercept is the expected outcome for the reference group and the slope of the other category is the difference of expected outcome of that group with the reference category.
plot(api00~yr_rnd, data=d)
abline(m2)
Now, let’s look at an example of multiple regression, in which we have one outcome (dependent) variable and multiple predictors.
The percentage of variability of api00 explained by variable enroll was only 10.12%. In order to explain more variation, we can add more predictors. In R we can do this by adding variables with + to the formula of our lm() function. We add meals, the percentage of students who get full meals as an indicator of socioeconomic status, and full, the percentage of teachers with full credentials, to our model.
m3 <- lm(api00~enroll+meals+full, data=d)
summary(m3)
##
## Call:
## lm(formula = api00 ~ enroll + meals + full, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -181.721 -40.802 1.129 39.983 158.774
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 801.82983 26.42660 30.342 < 2e-16 ***
## enroll -0.05146 0.01384 -3.719 0.000229 ***
## meals -3.65973 0.10880 -33.639 < 2e-16 ***
## full 1.08109 0.23945 4.515 8.37e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 58.73 on 396 degrees of freedom
## Multiple R-squared: 0.8308, Adjusted R-squared: 0.8295
## F-statistic: 648.2 on 3 and 396 DF, p-value: < 2.2e-16
anova(m3)
## Analysis of Variance Table
##
## Response: api00
## Df Sum Sq Mean Sq F value Pr(>F)
## enroll 1 817326 817326 236.947 < 2.2e-16 ***
## meals 1 5820066 5820066 1687.263 < 2.2e-16 ***
## full 1 70313 70313 20.384 8.369e-06 ***
## Residuals 396 1365967 3449
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We see that, on the F-test, the overall model is a significant improvement in fit compared to the intercept-only model. Also, all of the tests of the coefficients against the null value of zero are significant.
The R-square is 0.8308, meaning that approximately 83% of the variability of api00 is accounted for by the variables in the model. The adjusted R-square shows after taking the account of number of predictors in the model R_square is still about 0.83.
The coefficients for each of the variables indicates the amount of change one could expect in api00 given a one-unit increase in the value of that variable, given that all other variables in the model are held constant. For example, consider the variable meals. We would expect a decrease of about 3.66 in the api00 score for every one unit increase in percent free meals, assuming that all other variables in the model are held constant.
We see quite a difference in the coefficient of variable enroll compared to the simple linear regression. Before the coefficient for variable enroll was -.1999 and now it is -0.05146.
The ANOVA table shows the sum of squares explained by adding each variable sequentially to the model, or equivalently, the amount of sum of square residuals reduced by each additional variable.
For example variable enroll reduces the total error (RSS) by 817326. By adding variable meals we reduce additional 5820066 from the residual sum of squares and by adding variable full we reduce the error by 70313. Finally we have 1365967 left as unexplained error. The total sum of squares (TSS) is the sum of all of the sums of squares added together. To get the total sum of square of variable api00 we can multiply its’ variance by (n−1).
sum(anova(m3)$Sum)
## [1] 8073672
(400-1)*var(d$api00)
## [1] 8073672
Some researchers are interested in comparing the relative strength of the various predictors within the model.To address this problem we use standardized regression coefficients, which can be obtain by transforming the outcome and predictor variables all to their standardized scores, also called z-scores, before running the regression.
m3.sd <- lm(scale(api00) ~ scale(enroll) + scale(meals) + scale(full), data = d)
summary(m3.sd)
##
## Call:
## lm(formula = scale(api00) ~ scale(enroll) + scale(meals) + scale(full),
## data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.27749 -0.28683 0.00793 0.28108 1.11617
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.454e-16 2.064e-02 0.000 1.000000
## scale(enroll) -8.191e-02 2.203e-02 -3.719 0.000229 ***
## scale(meals) -8.210e-01 2.441e-02 -33.639 < 2e-16 ***
## scale(full) 1.136e-01 2.517e-02 4.515 8.37e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4129 on 396 degrees of freedom
## Multiple R-squared: 0.8308, Adjusted R-squared: 0.8295
## F-statistic: 648.2 on 3 and 396 DF, p-value: < 2.2e-16
In the standardized regression coefficients summary we see that the intercept is zero and all t-statistics (p-values) for the other coefficients are exactly the same as the original model.
Because the coefficients are all in the same standardized units, standard deviations, you can compare these coefficients to assess the relative strength of each of the predictors. In this example, meals has the largest Beta coefficient, -0.821.
Thus, a one standard deviation increase in meals leads to a 0.821 standard deviation decrease in predicted api00, with the other variables held constant.
In this topic, we consider the interaction between the predictors
m4 <- lm(api00 ~ enroll + meals + enroll:meals , data = d)
summary(m4)
##
## Call:
## lm(formula = api00 ~ enroll + meals + enroll:meals, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -211.186 -38.834 -1.137 38.997 163.713
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.833e+02 1.665e+01 53.034 <2e-16 ***
## enroll 8.835e-04 3.362e-02 0.026 0.9790
## meals -3.425e+00 2.344e-01 -14.614 <2e-16 ***
## enroll:meals -9.537e-04 4.292e-04 -2.222 0.0269 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 59.85 on 396 degrees of freedom
## Multiple R-squared: 0.8243, Adjusted R-squared: 0.823
## F-statistic: 619.3 on 3 and 396 DF, p-value: < 2.2e-16
The intercept interpreted as before and is the expected api00 when both enroll and meals are set to zero.
The coefficient of enroll is interpreted as increase of the mean of api00 by 0.0008835 for one unit increase of number of enrollment given the value of meals is zero.
The coefficient of meals is interpreted as average decrease of the mean of api00 by 3.425 for one unit increase of meals (one percentage increase) given the value of enroll is zero.
Adding interaction term means that the effect of enroll and meals is no longer constant for different values of the other predictor.
The interaction term can be interpreted in two ways.
If we use enroll as moderator variable then we can say that the effect of meals changes by −0.0009537 for one unit increase of enroll.
If we use meals as moderator variable then we can say that the effect of enroll changes by −0.0009537 for one unit increase of meals.
In here, instead of two continuous variables, we consider the interaction between the two categorical variables
d$mealcat_F <- relevel(mealcat_F, ref="3")
m6 <- lm(api00 ~ d$yr_rnd_F*mealcat_F, data = d)
summary(m6)
##
## Call:
## lm(formula = api00 ~ d$yr_rnd_F * mealcat_F, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -207.533 -50.764 -1.843 48.874 179.000
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 521.493 8.414 61.978 < 2e-16 ***
## d$yr_rnd_FYes -33.493 11.771 -2.845 0.00467 **
## mealcat_F1 288.193 10.443 27.597 < 2e-16 ***
## mealcat_F2 123.781 10.552 11.731 < 2e-16 ***
## d$yr_rnd_FYes:mealcat_F1 -40.764 29.231 -1.395 0.16394
## d$yr_rnd_FYes:mealcat_F2 -18.248 22.256 -0.820 0.41278
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 68.87 on 394 degrees of freedom
## Multiple R-squared: 0.7685, Adjusted R-squared: 0.7656
## F-statistic: 261.6 on 5 and 394 DF, p-value: < 2.2e-16
The intercept interpreted as the expected api00 when both yr_rnd_F and mealcat_F are at their reference level. Here, it is the expected api00 given school is not year round and the meals category is at level “3”.
The coefficient of yr_rnd_FYes is the difference of the expected api00 for year round school with school is not year round when meals category is at level “3”
The coefficient of mealcat_F1 is the difference of the expected api00 between mealcat_F at level “1” and mealcat_F at level “3” given school is not year round.
The coefficient of mealcat_F2 is the difference of the expected api00 between mealcat_F at level “2” and mealcat_F at level “3” given schoo is not year round.
The interaction terms can be interpreted in two ways. Here we only interpret it in one way:
The coefficient of yr_rnd_FYes:mealcat_F1 is the expected difference api00 for year round school with school is not year round if the meals category is “1” minus the expected difference api00 for year round school with school is not year round if the meals category is “3”.
The coefficient of yr_rnd_FYes:mealcat_F2 is the expected api00 for year round school with school is not year round if the meals category is “2” minus the expected api00 for year round school with school is not year round if the meals category is “3”.
For example the expected mean of api00 for a school that is year round with the mealcat equal to “2” is:
\[521.493+(-33.493)+123.781+(-18.248) = 593.533\tag{5.2.1}\] The expected mean of api00 for a school that is not year round with the mealcat equals to “2” is:
\[521+123.781=645.274 \tag{5.2.2}\] Thus, the expected difference api00 for year round school with school is not year round if the meals category equals to “2” is:
in (5.2.1) the expected mean of apio00 for a school that is year
round with mealcat = 2, and in (5.2.2) is the expected mean of api00 for
a school that is not a year round with the mealcat = 2, Here we subtract
the sum in (5.2.1) and (5.2.2)
\[593.533-645.274 = -51.731
\tag{5.2.3}\] The difference of the expected api00 for year round
school with school is not year round when meals category is at level “3
is the coefficient of yr_rnd_FYes = -33.493
If we calculate the difference between the above differences we get: \[-51.741-(-33.493) = -18.248 \tag{5.2.4}\] which is the coefficient for yr_rnd_FYes:mealcat_F2.
If we need to test overall effect of interaction terms (i.e. simultaneously test both interaction coefficients equal to zero) we can use likelihood ratio F test using anova function. To do so, we run a model without interaction terms and use anova function to test the difference between two models.
m0 <- lm(api00 ~ yr_rnd_F + mealcat_F , data = d)
anova(m0, m6)
## Analysis of Variance Table
##
## Model 1: api00 ~ yr_rnd_F + mealcat_F
## Model 2: api00 ~ d$yr_rnd_F * mealcat_F
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 396 1879528
## 2 394 1868944 2 10584 1.1156 0.3288
The F test means that we do not have enough evidence to reject the base model m0 and therefore, the effect yr_rnd unlikely change for different levels of mealcat.
Here we consider two predictors, one is continuous and the other is categorical, in the above example, let us use enroll for continuous and yr_rnd_F for categorical, for their interactions, we simply multiply the two predictors in the linear model formula
m7 <- lm(api00 ~ yr_rnd_F * enroll , data = d)
summary(m7)
##
## Call:
## lm(formula = api00 ~ yr_rnd_F * enroll, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -274.043 -94.781 0.417 97.666 309.573
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 682.068556 18.797436 36.285 <2e-16 ***
## yr_rnd_FYes -74.858236 48.224281 -1.552 0.1214
## enroll 0.006021 0.042396 0.142 0.8871
## yr_rnd_FYes:enroll -0.120218 0.072075 -1.668 0.0961 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 125 on 396 degrees of freedom
## Multiple R-squared: 0.2335, Adjusted R-squared: 0.2277
## F-statistic: 40.21 on 3 and 396 DF, p-value: < 2.2e-16
The intercept interpreted as before and is the expected api00 when enroll is zero and the school is not year round.
The coefficient of enroll is interpreted as increase of the mean of api00 by for one unit increase of number of enrollment given the school is not year round.
The coefficient of yr_rnd_FYes is interpreted as the difference between the expected api00 for year round school and not year round school when given the value of enroll is zero.
The interaction term can be interpreted as: The change of the effect of enroll to the expected mean api00 when the school is year round to when the school is not year round.
Sometimes it is a good idea to plot the predicted values for different levels of predictors to visualize the interaction. We use function predict for a new data to plot this interaction.
First get the range of continuous predictor
range(d$enroll)
## [1] 130 1570
#make a sequence of number of enroll from range of enroll 130-1570 increment by 5
new.enroll <- seq(130, 1570, 5)
new.yr_rnd <- rep(levels(d$yr_rnd_F), each = length(new.enroll))
new.data.plot <- data.frame(enroll = rep(new.enroll, times = 2), yr_rnd_F = new.yr_rnd)
new.data.plot$predicted_api00 <- predict(m7, newdata = new.data.plot)
library(ggplot2)
ggplot(new.data.plot, aes(x = enroll, y = predicted_api00, colour = yr_rnd_F)) +
geom_line(lwd=1.2)
In the previous part, we learned how to do ordinary linear regression with R. Without verifying that the data have met the assumptions underlying OLS regression, results of regression analysis may be misleading. Here will explore how you can use R to check on how well your data meet the assumptions of OLS regression. In particular, we will consider the following assumptions.
Homogeneity of variance (homoscedasticity): The error variance should be constant
Linearity: the relationships between the predictors and the outcome variable should be linear.
Independence: The errors associated with one observation are not correlated with the errors of any other observation
Normality: the errors should be normally distributed. Technically normality is necessary only for hypothesis tests to be valid.
Model specification: The model should be properly specified (including all relevant variables, and excluding irrelevant variables)
Additionally, there are issues that can arise during the analysis that, while strictly speaking are not assumptions of regression, are none the less, of great concern to data analysts.
Influence: individual observations that exert undue influence on the coefficients
Collinearity: predictors that are highly collinear, i.e., linearly related, can cause problems in estimating the regression coefficients.
Many graphical methods and numerical tests have been developed over the years for regression diagnostics.
R has many of these methods in stats package which is already installed and loaded in R. There are some other tools in different packages that we can use by installing and loading those packages in our R environment.
###unusual and Influential data
A single observation that is substantially different from all other observations can make a large difference in the results of your regression analysis. If a single observation (or small group of observations) substantially changes your results, you would want to know about this and investigate further. There are three ways that an observation can be unusual.
Outliers: In linear regression, an outlier is an observation with large residual. In other words, it is an observation whose dependent-variable value is unusual given its values on the predictor variables. An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem.
Leverage: An observation with an extreme value on a predictor variable is called a point with high leverage. Leverage is a measure of how far an observation deviates from the mean of that variable. These leverage points can have an effect on the estimate of regression coefficients.
Influence: An observation is said to be influential if removing the observation substantially changes the estimate of coefficients. Influence can be thought of as the product of leverage and outlierness.
library(car)
## Warning: package 'car' was built under R version 4.3.2
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.3.2
m9<-lm(api00~enroll+meals+full, data=d)
scatterplotMatrix(~api00 + enroll + meals+full, data=d)
The graphs of api00 with other variables show some potential problems. In every plot, we seeone or more data point that is far away from the rest of the data points.
Studentized residuals can be used to identify outliers. In R we use rstandard() function tocompute Studentized residuals.
res.std <- rstandard(m9)
plot(res.std, ylab="Standardized Residual", ylim=c(-3.5,3.5))
abline(h =c(-3,0,3),lty =2)
index <- which(res.std > 3 | res.std < -3)
text(index-20, res.std[index] , labels = d$snum[index])
print(index)
## 226
## 226
print(d$snum[index])
## [1] 211
We should pay attention to studentized residuals that exceed +2 or -2, and get even more concerned about residuals that exceed +2.5 or -2.5 and even yet more concerned about residuals that exceed +3 or -3. These results show that school number 211 is the most worrisome observation.
outlierTest(m9)
## No Studentized residuals with Bonferroni p < 0.05
## Largest |rstudent|:
## rstudent unadjusted p-value Bonferroni p
## 226 -3.151186 0.00175 0.70001
library(faraway)
## Warning: package 'faraway' was built under R version 4.3.2
##
## Attaching package: 'faraway'
## The following objects are masked from 'package:car':
##
## logit, vif
h <- influence(m9)$hat
halfnorm(influence(m9)$hat, ylab = "leverage")
Cook’s distance is a measure for influence points. A point with high level of cook’s distance is considers as a point with high influence point. A cut of value for cook’s distance can be calculated as \[\frac{4}{n-p-1}\] Where n is sample size and p is number of predictors. We can plot Cook’s distance using the following code
cutoff <- 4/((nrow(d)-length(m2$coefficients)-2))
plot(m9, which = 4, cook.levels = cutoff)
We can use influencePlot() function in package “car” to identify influence point. It plots Studentized residuals against leverage with cook’s distance.
influencePlot(m9, main="Influence Plot",
sub="Circle size is proportional to Cook's Distance" )
## StudRes Hat CookD
## 8 0.18718812 0.08016299 7.652779e-04
## 93 2.76307269 0.02940688 5.687488e-02
## 210 0.03127861 0.06083329 1.588292e-05
## 226 -3.15118603 0.01417076 3.489753e-02
## 346 -2.83932062 0.00412967 8.211170e-03
infIndexPlot(m9)
### Checking Homoscedasticity
One of the main assumptions for the ordinary least squares regression is the homogeneity of variance of the residuals. If the model is well-fitted, there should be no pattern to the residuals plotted against the fitted values. If the variance of the residuals is non-constant then the residual variance is said to be “heteroscedastic.” There are graphical and non-graphical methods for detecting heteroscedasticity. A commonly used graphical method is to plot the residuals versus fitted (predicted) values.
plot(m9$resid ~ m9$fitted.values)
abline(h = 0, lty = 2)
### Checking Linearity To check linearity residuals should be plotted
against the fit as well as other predictors. If any of these plots show
systematic shapes, then the linear model is not appropriate and some
nonlinear terms may need to be added.
residualPlots(m9)
## Test stat Pr(>|Test stat|)
## enroll 0.0111 0.9911
## meals -0.6238 0.5331
## full 1.1565 0.2482
## Tukey test -0.8411 0.4003
A simple visual check would be to plot the residuals versus the time variable.
plot(m9$resid ~ d$snum)
### Checking Normality Residuals
Normality of residuals is only required for valid hypothesis testing, that is, the normality assumption assures that the p-values for the t-tests and F-test will be valid. Normality is not required in order to obtain unbiased estimates of the regression coefficients.
OLS regression merely requires that the residuals (errors) be identically and independently distributed. Furthermore, there is no assumption or requirement that the predictor variables be normally distributed. If this were the case than we would not be able to use dummy coded variables in our models.
because of large sample theory if we have large enough sample size we do not even need the residuals be normally distributed, however, for small sample sizes, normality is required.
qqnorm(m9$resid)
qqline(m9$resid)
When there is a perfect linear relationship among the predictors, the estimates for a regression model cannot be uniquely computed. The term collinearity implies that two variables are near perfect linear combinations of one another. When more than two variables are involved it is often called multicollinearity, although the two terms are often used interchangeably.
The primary concern is that as the degree of multicollinearity increases, the regression model estimates of the coefficients become unstable and the standard errors for the coefficients can get wildly inflated.
VIF, variance inflation factor, is used to measure the degree of multicollinearity. As a rule of thumb, a variable whose VIF values are greater than 10 may merit further investigation. Tolerance, defined as 1/VIF, is used by many researchers to check on the degree of collinearity. A tolerance value lower than 0.1 is comparable to a VIF of 10. It means that the variable could be considered as a linear combination of other independent variables.
car::vif(m9)
## enroll meals full
## 1.135733 1.394279 1.482305
A model specification error can occur when one or more relevant variables are omitted from the model or one or more irrelevant variables are included in the model.
If relevant variables are omitted from the model, the common variance they share with included variables may be wrongly attributed to those variables, and the error term is inflated. On the other hand, if irrelevant variables are included in the model, the common variance they share with included variables may be wrongly attributed to them. Model specification errors can substantially affect the estimate of regression coefficients.
avPlots(m9)