BLOG1: Since I used to code in Base SAS, I wanted to connect to SAS online and use the commands via a tutorial to model logistic regression in SAS. It was extremely easy and I found the interface very easy to use and the output via tabular format on various statistics values was very easy to read and follow.

Tutorial - https://stats.oarc.ucla.edu/sas/dae/logit-regression/

Program Summary - Program 1

Code: Program 1

My console

A screenshot of a computer

Description automatically generated

 

Here we have connected to SAS studio and read the dataset in after downloading from the tutorial link above into SAS online server (screenshot above). It is a dataset of 400 records. The data set has a binary response variable called admit (1 if admitted to grad school and 0 if not admitted). 3 predictor variables are gre, gpa and rank. Rank of 1 have the highest prestige, while those with a rank of 4 have the lowest.

We have reported descriptive statistics with PROC Means, Freq first.

proc means data="/home/u1102326/sasuser.v94/data/binary.sas7bdat";

var gre gpa;

run;

 

The MEANS Procedure

Variable

N

Mean

Std Dev

Minimum

Maximum

GRE

GPA

400

400

587.7000000

3.3899000

115.5165364

0.3805668

220.0000000

2.2600000

800.0000000

4.0000000

proc freq data="/home/u1102326/sasuser.v94/data/binary.sas7bdat";

tables rank admit admit*rank;

run;

 


The FREQ Procedure

RANK

Frequency

Percent

Cumulative
Frequency

Cumulative
Percent

1

61

15.25

61

15.25

2

151

37.75

212

53.00

3

121

30.25

333

83.25

4

67

16.75

400

100.00

 

ADMIT

Frequency

Percent

Cumulative
Frequency

Cumulative
Percent

0

273

68.25

273

68.25

1

127

31.75

400

100.00

 

Frequency

Percent

Row Pct

Col Pct

Table of ADMIT by RANK

ADMIT

RANK

1

2

3

4

Total

0

28

7.00

10.26

45.90

97

24.25

35.53

64.24

93

23.25

34.07

76.86

55

13.75

20.15

82.09

273

68.25

 

 

1

33

8.25

25.98

54.10

54

13.50

42.52

35.76

28

7.00

22.05

23.14

12

3.00

9.45

17.91

127

31.75

 

 

Total

61

15.25

151

37.75

121

30.25

67

16.75

400

100.00

proc logistic data="/home/u1102326/sasuser.v94/data/binary.sas7bdat" descending;

class rank / param=ref ;

model admit = gre gpa rank;

run;

 

I have used a Logistic Model to model 1’s rather than 0’s we can use the descending option. Class statement was used to specify that rank is categorical variable. SAS has modeled probability of admit = 1.

 

The LOGISTIC Procedure

Model Information

Data Set

/home/u1102326/sasuser.v94/data/binary.sas7bdat

Written by SAS

Response Variable

ADMIT

 

Number of Response Levels

2

 

Model

binary logit

 

Optimization Technique

Fisher's scoring

 

 

Number of Observations Read

400

Number of Observations Used

400

 

Response Profile

Ordered
Value

ADMIT

Total
Frequency

1

1

127

2

0

273

Probability modeled is ADMIT=1.

Class Level Information

Class

Value

Design Variables

RANK

1

1

0

0

 

2

0

1

0

 

3

0

0

1

 

4

0

0

0

 

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

 

In this output section below, SAS calculates the likelihood ratio chi-square of 41.4590 with a p-value of 0.0001 and that the model is significant. The Score and Wald tests are performed by SAS automatically and it is similar and shows statiscally significant.

 

Model Fit Statistics

Criterion

Intercept Only

Intercept and Covariates

AIC

501.977

470.517

SC

505.968

494.466

-2 Log L

499.977

458.517

 

Test

Chi-Square

DF

Pr > ChiSq

Likelihood Ratio

41.4590

5

<.0001

Score

40.1603

5

<.0001

Wald

36.1390

5

<.0001

Type 3 Analysis of Effects, shows the hypothesis tests for each of the variables in the model by themselves fit the model and are significant.

 

Type 3 Analysis of Effects

Effect

DF

Wald
Chi-Square

Pr > ChiSq

GRE

1

4.2842

0.0385

GPA

1

5.8714

0.0154

RANK

3

20.8949

0.0001

The table below shows the coefficients (labeled Estimate), their standard errors (error), the Wald Chi-Square statistic, and associated p-values.

The coefficients for gre, and gpa and terms for rank=1 and rank=2 (versus the omitted category rank=4) are statistically significant.  These coefficients show the change in the log odds of the outcome for a one unit increase in the predictor variable.

1.    For every one unit change in gre, the log odds of admission (versus non-admission) increases by 0.002.

2.    For a one unit increase in gpa, the log odds of being admitted to graduate school increases by 0.804.

3.    The coefficients for the categories of rank are interpreted differently. If you have attended an undergraduate institution with a rank of 1, rather than one with 4, your log odds of admission increases by 1.55.

 

 

Analysis of Maximum Likelihood Estimates

Parameter

 

DF

Estimate

Standard
Error

Wald
Chi-Square

Pr > ChiSq

Intercept

 

1

-5.5414

1.1381

23.7081

<.0001

GRE

 

1

0.00226

0.00109

4.2842

0.0385

GPA

 

1

0.8040

0.3318

5.8714

0.0154

RANK

1

1

1.5514

0.4178

13.7870

0.0002

RANK

2

1

0.8760

0.3667

5.7056

0.0169

RANK

3

1

0.2112

0.3929

0.2891

0.5908

 

An odds ratio as reported below, is the exponentiated coefficient, and can be interpreted as the multiplicative change in the odds for a one unit change in the predictor variable. For each 1 unit increase in gpa, the odds of being admitted to graduate school (versus not being admitted) increase by a factor of 2.24. The odds ratio Is computed by raising e to the power of the logistic coefficient below. EXP(0.804) = 2.24.

 

Odds Ratio Estimates

Effect

Point Estimate

95% Wald
Confidence Limits

GRE

1.002

1.000

1.004

GPA

2.235

1.166

4.282

RANK 1 vs 4

4.718

2.080

10.701

RANK 2 vs 4

2.401

1.170

4.927

RANK 3 vs 4

1.235

0.572

2.668

 

Association of Predicted Probabilities and Observed Responses

Percent Concordant

69.3

Somers' D

0.386

Percent Discordant

30.7

Gamma

0.386

Percent Tied

0.0

Tau-a

0.168

Pairs

34671

c

0.693

 

If we want to test difference between coefficients of rank=2 and 3, compare the odds of admit for students who attended a university with rank of 2 to students who went to rank 3 schools we can use the contrat statement below.  The estimate = parm shows the difference in coefficients.The p value shows that there is difference between rank 2 and rank 3. The difference is 0.6648, indicating that having attended a undergrad college with rank of 2, versus an institution with a rank of 3, increases the log odds of admission by 0.67.

 

proc logistic data="/home/u1102326/sasuser.v94/data/binary.sas7bdat" descending;

class rank / param=ref ;

model admit = gre gpa rank;

contrast 'rank 2 vs 3' rank 0 1 -1 / estimate=parm;

run;

 

 

 

 

 

 

 

 

 

 

 

Contrast Test Results

Contrast

DF

Wald
Chi-Square

Pr > ChiSq

rank 2 vs 3

1

5.5052

0.0190

 

Contrast Estimation and Testing Results by Row

Contrast

Type

Row

Estimate

Standard
Error

Alpha

Confidence Limits

Wald
Chi-Square

Pr > ChiSq

rank 2 vs 3

PARM

1

0.6648

0.2833

0.05

0.1095

1.2200

5.5052

0.0190

 

We can also use contrast statement to model predicted probability by changing GRE scores but keeping gpa at mean value of 3.3899 and rank at 2 constant. Below shows the command to run the logit model to model probabilities.

As we can see below, the predicted Estimate in the second we can see that the predicted probability of being admitted is 0.18 if GRE score is 200, but increases to 0.47 score is 800, when gpa and rank are kept constant at mean value of GPA and RANK 2.

 

proc logistic data="/home/u1102326/sasuser.v94/data/binary.sas7bdat" descending;

class rank / param=ref ;

model admit = gre gpa rank;

contrast 'gre=200' intercept 1 gre 200 gpa 3.3899 rank 0 1 0  / estimate=prob;

contrast 'gre=300' intercept 1 gre 300 gpa 3.3899 rank 0 1 0  / estimate=prob;

contrast 'gre=400' intercept 1 gre 400 gpa 3.3899 rank 0 1 0  / estimate=prob;

contrast 'gre=500' intercept 1 gre 500 gpa 3.3899 rank 0 1 0  / estimate=prob;

contrast 'gre=600' intercept 1 gre 600 gpa 3.3899 rank 0 1 0  / estimate=prob;

contrast 'gre=700' intercept 1 gre 700 gpa 3.3899 rank 0 1 0  / estimate=prob;

contrast 'gre=800' intercept 1 gre 800 gpa 3.3899 rank 0 1 0  / estimate=prob;

run;

 

 

The LOGISTIC Procedure

 

 

Contrast Test Results

Contrast

DF

Wald
Chi-Square

Pr > ChiSq

gre=200

1

9.7752

0.0018

gre=300

1

11.2483

0.0008

gre=400

1

13.3231

0.0003

gre=500

1

15.0984

0.0001

gre=600

1

11.2291

0.0008

gre=700

1

3.0769

0.0794

gre=800

1

0.2175

0.6409

 

Contrast Estimation and Testing Results by Row

Contrast

Type

Row

Estimate

Standard
Error

Alpha

Confidence Limits

Wald
Chi-Square

Pr>ChiSq

gre=200

PROB

1

0.1844

0.0715

0.05

0.0817

0.3648

9.7752

0.0018

gre=300

PROB

1

0.2209

0.0647

0.05

0.1195

0.3719

11.2483

0.0008

gre=400

PROB

1

0.2623

0.0548

0.05

0.1695

0.3825

13.3231

0.0003

gre=500

PROB

1

0.3084

0.0443

0.05

0.2288

0.4013

15.0984

0.0001

gre=600

PROB

1

0.3587

0.0399

0.05

0.2847

0.4400

11.2291

0.0008

gre=700

PROB

1

0.4122

0.0490

0.05

0.3206

0.5104

3.0769

0.0794

gre=800

PROB

1

0.4680

0.0685

0.05

0.3391

0.6013

0.2175

0.6409