Dataset and its variables

We will use the birthwt dataset from the MASS library in R.

# variable name variable label coded levels
1 low indicator of birth weight less than 2.5 kg 0, 1
2 age mother’s age in years continuous variable
3 lwt mother’s weight in pounds at last menstrual period continuous variable
4 race mother’s race (1 = white, 2 = black, 3 = other) 1, 2, 3
5 smoke smoking status during pregnancy 0, 1
6 ptl number of previous premature labours 0, 1, 2, 3
7 ht history of hypertension 0, 1
8 ui presence of uterine irritability 0, 1
9 ftv number of physician visits during the first trimester 0, 1, 2, 3, 4, 6
10 bwt birth weight in grams continuous variable

Import the dataset

%let path=/folders/myfolders/birthwt;

<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>

50   ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
50 ! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
51
52 %let path=/folders/myfolders/birthwt;
53
54 ods html5 (id=saspy_internal) close;ods listing;

55
proc import datafile="&path/birthwt.csv" out=ibwt dbms=csv replace;

<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>

57   ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
57 ! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
58
59 proc import datafile="&path/birthwt.csv" out=ibwt dbms=csv replace;
60
61 ods html5 (id=saspy_internal) close;ods listing;

62

Describe and display the dataset

Information about the variables and the dataset: proc contents

proc contents data=ibwt varnum;
run;
<!DOCTYPE html> SAS Output

The SAS System

The CONTENTS Procedure

Data Set Name WORK.IBWT Observations 189
Member Type DATA Variables 10
Engine V9 Indexes 0
Created 10/14/2019 21:04:41 Observation Length 80
Last Modified 10/14/2019 21:04:41 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64    
Encoding utf-8 Unicode (UTF-8)    
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 817
Obs in First Data Page 189
Number of Data Set Repairs 0
Filename /tmp/SAS_work2F0900000E52_localhost.localdomain/ibwt.sas7bdat
Release Created 9.0401M6
Host Created Linux
Inode Number 671803
Access Permission rw-r–r–
Owner Name sasdemo
File Size 128KB
File Size (bytes) 131072
Variables in Creation Order
# Variable Type Len Format Informat
1 low Num 8 BEST12. BEST32.
2 age Num 8 BEST12. BEST32.
3 lwt Num 8 BEST12. BEST32.
4 race Num 8 BEST12. BEST32.
5 smoke Num 8 BEST12. BEST32.
6 ptl Num 8 BEST12. BEST32.
7 ht Num 8 BEST12. BEST32.
8 ui Num 8 BEST12. BEST32.
9 ftv Num 8 BEST12. BEST32.
10 bwt Num 8 BEST12. BEST32.

Glimpse the dataset: proc print

proc print data=ibwt (obs=5);
run;
<!DOCTYPE html> SAS Output

The SAS System

Obs low age lwt race smoke ptl ht ui ftv bwt
1 0 19 182 2 0 0 0 1 0 2523
2 0 33 155 3 0 0 0 0 3 2551
3 0 20 105 1 1 0 0 0 1 2557
4 0 21 108 1 1 0 0 1 2 2594
5 0 18 107 1 1 0 0 1 0 2600

Two-way frequency tables: proc freq

proc freq data=ibwt;
tables low*(race smoke ptl ht ui ftv)/nocumfreq nocumpercent nopercent norowpct nocolpct;
run;
<!DOCTYPE html> SAS Output

The SAS System

The FREQ Procedure

Frequency
Table of low by race
low race
1 2 3 Total
0
73
15
42
130
1
23
11
25
59
Total
96
26
67
189
Frequency
Table of low by smoke
low smoke
0 1 Total
0
86
44
130
1
29
30
59
Total
115
74
189
Frequency
Table of low by ptl
low ptl
0 1 2 3 Total
0
118
8
3
1
130
1
41
16
2
0
59
Total
159
24
5
1
189
Frequency
Table of low by ht
low ht
0 1 Total
0
125
5
130
1
52
7
59
Total
177
12
189
Frequency
Table of low by ui
low ui
0 1 Total
0
116
14
130
1
45
14
59
Total
161
28
189
Frequency
Table of low by ftv
low ftv
0 1 2 3 4 6 Total
0
64
36
23
3
3
1
130
1
36
11
7
4
1
0
59
Total
100
47
30
7
4
1
189

PLOT CATEGORICAL OUTCOME VARIABLE AND ITS ASSOCIATED VARIABLES

Lets’s assume that we are intersested to see how ‘low’ could be pedicted on other variables (except ‘bwt’).

Plot a categorical variable

Bar plots

ods graphics/noborder;

ods layout Start rows=2 columns=2;

ods region row=1 column=1;
proc sgplot data=ibwt;
    vbar low;
    yaxis grid;
title "Frequency (count) for variable low";
run;
title;


ods region row=1 column=2;
proc sgplot data=ibwt;
    vbar low/stat=percent;
    yaxis grid;
title "Frequency (percent) for variable low";
run;
title;

ods region row=2 column=1;
proc sgplot data=ibwt;
    vbar race;
    yaxis grid;
title "Frequency (count) for variable race";
run;
title;


ods region row=2 column=2;
proc sgplot data=ibwt;
    vbar race/stat=percent;
    yaxis grid;
title "Frequency (percent) for variable race";
run;
title;


ods layout end;
<!DOCTYPE html> SAS Output

Frequency and dot plot

ods graphics/noborder;

ods noproctitle;

ods layout start rows=2 columns=2;


ods region row=1 column=1;
proc freq data=ibwt;
ods select FreqPlot;
    tables low /plots=FreqPlot;
run;

ods region row=1 column=2;
proc freq data=ibwt;
ods select FreqPlot;
    tables low /plots=FreqPlot (scale=percent);
run;

ods region row=2 column=1;
proc freq data=ibwt;
ods select FreqPlot;
    tables low /plots=FreqPlot (type=dotplot);
run;

ods region row=2 column=2;
proc freq data=ibwt;
ods select FreqPlot;
    tables low /plots=FreqPlot (type=dotplot scale=percent);
run;

ods layout end;
<!DOCTYPE html> SAS Output

Plot a categorical variable against another categorical variable

Cluster bars

ods graphics/noborder;

ods layout start rows=2 columns=2;


ods region row=1 column=1;
proc sgplot data=ibwt;
    vbar low/ group=race groupdisplay=cluster;
    yaxis grid;
title "Distribution (count) of low by race";
run;
title;

ods region row=1 column=2;
proc sgplot data=ibwt;
    vbar low/ group=race stat=percent groupdisplay=cluster;
    yaxis grid;
title "Distribution (percent) of low by race";
run;
title;


ods region row=2 column=1;
proc sgplot data=ibwt;
    vbar low/ group=smoke groupdisplay=cluster;
    yaxis grid;
title "Distribution (count) of low by smoke";
run;
title;



ods region row=2 column=2;
proc sgplot data=ibwt;
    vbar low/ group=smoke stat=percecnt groupdisplay=cluster;
    yaxis grid;
title "Distribution (percent) of low by smoke";
run;
title;

ods layout end;
<!DOCTYPE html> SAS Output

Mosaic plot

ods noproctitle;

ods layout start rows=1 columns=2;


ods region row=1 column=1;
proc freq data=ibwt;
ods select MosaicPlot;
    tables low*race/plots=mosaic;
run;


ods region row=1 column=2;
proc freq data=ibwt;
ods select MosaicPlot;
    tables low*smoke/plots=mosaic;
run;

ods layout end;
<!DOCTYPE html> SAS Output

The SAS System

Frequency and dot plot

ods graphics/noborder;

ods noproctitle;

ods layout start rows=2 columns=2;


ods region row=1 column=1;
proc freq data=ibwt;
ods select FreqPlot;
    tables low*race /plots=FreqPlot;
run;

ods region row=1 column=2;
proc freq data=ibwt;
ods select FreqPlot;
    tables low*race /plots=FreqPlot(scale=percent);
run;

ods region row=2 column=1;
proc freq data=ibwt;
ods select FreqPlot;
    tables low*race /plots=FreqPlot (type=dotplot);
run;

ods region row=2 column=2;
proc freq data=ibwt;
ods select FreqPlot;
    tables low*race /plots=FreqPlot (type=dotplot scale=percent);
run;


ods layout end;
<!DOCTYPE html> SAS Output