BLOG1: Since I used to code in Base SAS, I wanted to
connect to SAS online and use the commands via a tutorial to model logistic
regression in SAS. It was extremely easy and I found
the interface very easy to use and the output via tabular format on various
statistics values was very easy to read and follow.
Tutorial -
https://stats.oarc.ucla.edu/sas/dae/logit-regression/
Program Summary - Program 1
Code: Program 1
My console
Here we have connected to SAS studio and read the dataset
in after downloading from the tutorial link above into SAS online server
(screenshot above). It is a dataset of 400 records. The data set has a binary
response variable called admit (1 if admitted to grad school and 0 if not
admitted). 3 predictor variables are gre, gpa and rank. Rank of 1 have
the highest prestige, while those with a rank of 4 have the lowest.
We have reported descriptive statistics with PROC Means,
Freq first.
proc means
data="/home/u1102326/sasuser.v94/data/binary.sas7bdat";
var gre gpa;
run;
The MEANS
Procedure
|
Variable |
N |
Mean |
Std Dev |
Minimum |
Maximum |
|
GRE GPA |
400 400 |
587.7000000 3.3899000 |
115.5165364 0.3805668 |
220.0000000 2.2600000 |
800.0000000 4.0000000 |
proc freq
data="/home/u1102326/sasuser.v94/data/binary.sas7bdat";
tables rank admit admit*rank;
run;
The FREQ Procedure
|
RANK |
Frequency |
Percent |
Cumulative |
Cumulative |
|
1 |
61 |
15.25 |
61 |
15.25 |
|
2 |
151 |
37.75 |
212 |
53.00 |
|
3 |
121 |
30.25 |
333 |
83.25 |
|
4 |
67 |
16.75 |
400 |
100.00 |
|
ADMIT |
Frequency |
Percent |
Cumulative |
Cumulative |
|
0 |
273 |
68.25 |
273 |
68.25 |
|
1 |
127 |
31.75 |
400 |
100.00 |
|
|
||||||||||||||||||||||||||||||||||||
proc logistic
data="/home/u1102326/sasuser.v94/data/binary.sas7bdat" descending;
class rank / param=ref
;
model admit = gre
gpa rank;
run;
I have used a Logistic Model to model 1Âs rather than 0Âs
we can use the descending option. Class statement was used to specify that rank
is categorical variable. SAS has modeled probability of admit = 1.
The LOGISTIC Procedure
|
Model Information |
||
|
Data
Set |
/home/u1102326/sasuser.v94/data/binary.sas7bdat |
Written by SAS |
|
Response
Variable |
ADMIT |
 |
|
Number
of Response Levels |
2 |
 |
|
Model |
binary logit |
 |
|
Optimization
Technique |
Fisher's scoring |
 |
|
Number
of Observations Read |
400 |
|
Number
of Observations Used |
400 |
|
Response Profile |
||
|
Ordered |
ADMIT |
Total |
|
1 |
1 |
127 |
|
2 |
0 |
273 |
Probability modeled is ADMIT=1.
|
Class Level Information |
||||
|
Class |
Value |
Design Variables |
||
|
RANK |
1 |
1 |
0 |
0 |
|
 |
2 |
0 |
1 |
0 |
|
 |
3 |
0 |
0 |
1 |
|
 |
4 |
0 |
0 |
0 |
|
Model Convergence Status |
|
Convergence criterion (GCONV=1E-8)
satisfied. |
In this output section below, SAS calculates the likelihood
ratio chi-square of 41.4590 with a p-value of 0.0001 and that the model is
significant. The Score and Wald tests are performed by SAS automatically and it
is similar and shows statiscally significant.
|
Model
Fit Statistics |
||
|
Criterion |
Intercept
Only |
Intercept
and Covariates |
|
AIC |
501.977 |
470.517 |
|
SC |
505.968 |
494.466 |
|
-2 Log L |
499.977 |
458.517 |
|
Test |
Chi-Square |
DF |
Pr > ChiSq |
|
Likelihood
Ratio |
41.4590 |
5 |
<.0001 |
|
Score |
40.1603 |
5 |
<.0001 |
|
Wald |
36.1390 |
5 |
<.0001 |
Type 3 Analysis of Effects, shows the hypothesis tests for
each of the variables in the model by themselves fit the model and are
significant.
|
Type 3 Analysis of Effects |
|||
|
Effect |
DF |
Wald |
Pr > ChiSq |
|
GRE |
1 |
4.2842 |
0.0385 |
|
GPA |
1 |
5.8714 |
0.0154 |
|
RANK |
3 |
20.8949 |
0.0001 |
The table below shows the coefficients (labeled Estimate),
their standard errors (error), the Wald Chi-Square statistic, and associated
p-values.
The coefficients for gre, and gpa and terms for rank=1 and rank=2 (versus the omitted
category rank=4) are statistically significant. These coefficients show the change in the log odds of the outcome for a one unit increase in the
predictor variable.
1.
For every one unit
change in gre, the log odds of admission (versus
non-admission) increases by 0.002.
2.
For a one unit increase
in gpa, the log odds of being admitted to graduate
school increases by 0.804.
3.
The coefficients for the
categories of rank are interpreted differently. If you have attended an
undergraduate institution with a rank of 1, rather than one with 4, your log
odds of admission increases by 1.55.
|
Analysis of Maximum Likelihood Estimates |
||||||
|
Parameter |
 |
DF |
Estimate |
Standard |
Wald |
Pr > ChiSq |
|
Intercept |
 |
1 |
-5.5414 |
1.1381 |
23.7081 |
<.0001 |
|
GRE |
 |
1 |
0.00226 |
0.00109 |
4.2842 |
0.0385 |
|
GPA |
 |
1 |
0.8040 |
0.3318 |
5.8714 |
0.0154 |
|
RANK |
1 |
1 |
1.5514 |
0.4178 |
13.7870 |
0.0002 |
|
RANK |
2 |
1 |
0.8760 |
0.3667 |
5.7056 |
0.0169 |
|
RANK |
3 |
1 |
0.2112 |
0.3929 |
0.2891 |
0.5908 |
An odds ratio as reported below, is the exponentiated
coefficient, and can be interpreted as the multiplicative change in the odds
for a one unit change in the predictor variable. For each 1 unit increase
in gpa, the odds of being admitted to
graduate school (versus not being admitted) increase by a factor of 2.24. The
odds ratio Is computed by raising e to the power of the logistic coefficient
below. EXP(0.804) = 2.24.
|
Odds Ratio Estimates |
|||
|
Effect |
Point Estimate |
95% Wald |
|
|
GRE |
1.002 |
1.000 |
1.004 |
|
GPA |
2.235 |
1.166 |
4.282 |
|
RANK
1 vs 4 |
4.718 |
2.080 |
10.701 |
|
RANK
2 vs 4 |
2.401 |
1.170 |
4.927 |
|
RANK
3 vs 4 |
1.235 |
0.572 |
2.668 |
|
Association of Predicted Probabilities and Observed
Responses |
|||
|
Percent
Concordant |
69.3 |
Somers' D |
0.386 |
|
Percent
Discordant |
30.7 |
Gamma |
0.386 |
|
Percent
Tied |
0.0 |
Tau-a |
0.168 |
|
Pairs |
34671 |
c |
0.693 |
If we want to test difference between coefficients of
rank=2 and 3, compare the odds of admit for students who attended a university
with rank of 2 to students who went to rank 3 schools we can use the contrat statement below. The estimate = parm shows the difference in coefficients.The p value shows that there is difference
between rank 2 and rank 3. The difference is 0.6648, indicating that having
attended a undergrad college with rank of
2, versus an institution with a rank of 3, increases the log odds of admission
by 0.67.
proc logistic
data="/home/u1102326/sasuser.v94/data/binary.sas7bdat" descending;
class rank / param=ref
;
model admit = gre
gpa rank;
contrast 'rank 2 vs 3' rank 0 1 -1 /
estimate=parm;
run;
|
Contrast Test Results |
|||
|
Contrast |
DF |
Wald |
Pr > ChiSq |
|
rank
2 vs 3 |
1 |
5.5052 |
0.0190 |
|
Contrast Estimation and Testing Results by Row |
|||||||||
|
Contrast |
Type |
Row |
Estimate |
Standard |
Alpha |
Confidence Limits |
Wald |
Pr > ChiSq |
|
|
rank
2 vs 3 |
PARM |
1 |
0.6648 |
0.2833 |
0.05 |
0.1095 |
1.2200 |
5.5052 |
0.0190 |
We can also use contrast statement
to model predicted probability by changing GRE scores but keeping gpa at mean value of 3.3899 and rank at 2 constant. Below shows the command to run the logit model to
model probabilities.
As we can see below, the predicted Estimate in the second
we can see that the predicted probability of being admitted is 0.18 if GRE
score is 200, but increases to 0.47 score is 800, when gpa
and rank are kept constant at mean value of GPA and RANK 2.
proc logistic
data="/home/u1102326/sasuser.v94/data/binary.sas7bdat" descending;
class rank / param=ref
;
model admit = gre
gpa rank;
contrast 'gre=200'
intercept 1 gre 200 gpa
3.3899 rank 0 1 0Â /
estimate=prob;
contrast 'gre=300'
intercept 1 gre 300 gpa
3.3899 rank 0 1 0Â /
estimate=prob;
contrast 'gre=400'
intercept 1 gre 400 gpa
3.3899 rank 0 1 0Â /
estimate=prob;
contrast 'gre=500'
intercept 1 gre 500 gpa
3.3899 rank 0 1 0Â /
estimate=prob;
contrast 'gre=600'
intercept 1 gre 600 gpa
3.3899 rank 0 1 0Â /
estimate=prob;
contrast 'gre=700'
intercept 1 gre 700 gpa
3.3899 rank 0 1 0Â /
estimate=prob;
contrast 'gre=800'
intercept 1 gre 800 gpa
3.3899 rank 0 1 0Â /
estimate=prob;
run;
The LOGISTIC Procedure
|
Contrast Test Results |
|||
|
Contrast |
DF |
Wald |
Pr > ChiSq |
|
gre=200 |
1 |
9.7752 |
0.0018 |
|
gre=300 |
1 |
11.2483 |
0.0008 |
|
gre=400 |
1 |
13.3231 |
0.0003 |
|
gre=500 |
1 |
15.0984 |
0.0001 |
|
gre=600 |
1 |
11.2291 |
0.0008 |
|
gre=700 |
1 |
3.0769 |
0.0794 |
|
gre=800 |
1 |
0.2175 |
0.6409 |
|
Contrast Estimation and Testing
Results by Row |
|||||||||
|
Contrast |
Type |
Row |
Estimate |
Standard |
Alpha |
Confidence Limits |
Wald |
Pr>ChiSq |
|
|
gre=200 |
PROB |
1 |
0.1844 |
0.0715 |
0.05 |
0.0817 |
0.3648 |
9.7752 |
0.0018 |
|
gre=300 |
PROB |
1 |
0.2209 |
0.0647 |
0.05 |
0.1195 |
0.3719 |
11.2483 |
0.0008 |
|
gre=400 |
PROB |
1 |
0.2623 |
0.0548 |
0.05 |
0.1695 |
0.3825 |
13.3231 |
0.0003 |
|
gre=500 |
PROB |
1 |
0.3084 |
0.0443 |
0.05 |
0.2288 |
0.4013 |
15.0984 |
0.0001 |
|
gre=600 |
PROB |
1 |
0.3587 |
0.0399 |
0.05 |
0.2847 |
0.4400 |
11.2291 |
0.0008 |
|
gre=700 |
PROB |
1 |
0.4122 |
0.0490 |
0.05 |
0.3206 |
0.5104 |
3.0769 |
0.0794 |
|
gre=800 |
PROB |
1 |
0.4680 |
0.0685 |
0.05 |
0.3391 |
0.6013 |
0.2175 |
0.6409 |