Dataset and its variables
We will use the birthwt dataset from the MASS library in R.
| 1 |
low |
indicator of birth weight less than 2.5 kg |
0, 1 |
| 2 |
age |
mother’s age in years |
continuous variable |
| 3 |
lwt |
mother’s weight in pounds at last menstrual period |
continuous variable |
| 4 |
race |
mother’s race (1 = white, 2 = black, 3 = other) |
1, 2, 3 |
| 5 |
smoke |
smoking status during pregnancy |
0, 1 |
| 6 |
ptl |
number of previous premature labours |
0, 1, 2, 3 |
| 7 |
ht |
history of hypertension |
0, 1 |
| 8 |
ui |
presence of uterine irritability |
0, 1 |
| 9 |
ftv |
number of physician visits during the first trimester |
0, 1, 2, 3, 4, 6 |
| 10 |
bwt |
birth weight in grams |
continuous variable |
Import the dataset
%let path=/folders/myfolders/birthwt;
<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>
50 ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
50 ! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
51
52 %let path=/folders/myfolders/birthwt;
53
54 ods html5 (id=saspy_internal) close;ods listing;
55
proc import datafile="&path/birthwt.csv" out=ibwt dbms=csv replace;
<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>
57 ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
57 ! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
58
59 proc import datafile="&path/birthwt.csv" out=ibwt dbms=csv replace;
60
61 ods html5 (id=saspy_internal) close;ods listing;
62
Describe and display the dataset
Information about the variables and the dataset: proc contents
proc contents data=ibwt varnum;
run;
<!DOCTYPE html>
SAS Output
|
WORK.IBWT
|
189
|
|
DATA
|
10
|
|
V9
|
0
|
|
10/14/2019 21:04:41
|
80
|
|
10/14/2019 21:04:41
|
0
|
|
|
NO
|
|
|
NO
|
|
|
|
|
SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64
|
|
|
utf-8 Unicode (UTF-8)
|
|
|
65536
|
|
1
|
|
1
|
|
817
|
|
189
|
|
0
|
|
/tmp/SAS_work2F0900000E52_localhost.localdomain/ibwt.sas7bdat
|
|
9.0401M6
|
|
Linux
|
|
671803
|
|
rw-r–r–
|
|
sasdemo
|
|
128KB
|
|
131072
|
|
low
|
Num
|
8
|
BEST12.
|
BEST32.
|
|
age
|
Num
|
8
|
BEST12.
|
BEST32.
|
|
lwt
|
Num
|
8
|
BEST12.
|
BEST32.
|
|
race
|
Num
|
8
|
BEST12.
|
BEST32.
|
|
smoke
|
Num
|
8
|
BEST12.
|
BEST32.
|
|
ptl
|
Num
|
8
|
BEST12.
|
BEST32.
|
|
ht
|
Num
|
8
|
BEST12.
|
BEST32.
|
|
ui
|
Num
|
8
|
BEST12.
|
BEST32.
|
|
ftv
|
Num
|
8
|
BEST12.
|
BEST32.
|
|
bwt
|
Num
|
8
|
BEST12.
|
BEST32.
|
Glimpse the dataset: proc print
proc print data=ibwt (obs=5);
run;
<!DOCTYPE html>
SAS Output
|
0
|
19
|
182
|
2
|
0
|
0
|
0
|
1
|
0
|
2523
|
|
0
|
33
|
155
|
3
|
0
|
0
|
0
|
0
|
3
|
2551
|
|
0
|
20
|
105
|
1
|
1
|
0
|
0
|
0
|
1
|
2557
|
|
0
|
21
|
108
|
1
|
1
|
0
|
0
|
1
|
2
|
2594
|
|
0
|
18
|
107
|
1
|
1
|
0
|
0
|
1
|
0
|
2600
|
Two-way frequency tables: proc freq
proc freq data=ibwt;
tables low*(race smoke ptl ht ui ftv)/nocumfreq nocumpercent nopercent norowpct nocolpct;
run;
<!DOCTYPE html>
SAS Output
PLOT CATEGORICAL OUTCOME VARIABLE AND ITS ASSOCIATED VARIABLES
Lets’s assume that we are intersested to see how ‘low’ could be pedicted on other variables (except ‘bwt’).
Plot a categorical variable
Bar plots
ods graphics/noborder;
ods layout Start rows=2 columns=2;
ods region row=1 column=1;
proc sgplot data=ibwt;
vbar low;
yaxis grid;
title "Frequency (count) for variable low";
run;
title;
ods region row=1 column=2;
proc sgplot data=ibwt;
vbar low/stat=percent;
yaxis grid;
title "Frequency (percent) for variable low";
run;
title;
ods region row=2 column=1;
proc sgplot data=ibwt;
vbar race;
yaxis grid;
title "Frequency (count) for variable race";
run;
title;
ods region row=2 column=2;
proc sgplot data=ibwt;
vbar race/stat=percent;
yaxis grid;
title "Frequency (percent) for variable race";
run;
title;
ods layout end;
<!DOCTYPE html>
SAS Output
Frequency and dot plot
ods graphics/noborder;
ods noproctitle;
ods layout start rows=2 columns=2;
ods region row=1 column=1;
proc freq data=ibwt;
ods select FreqPlot;
tables low /plots=FreqPlot;
run;
ods region row=1 column=2;
proc freq data=ibwt;
ods select FreqPlot;
tables low /plots=FreqPlot (scale=percent);
run;
ods region row=2 column=1;
proc freq data=ibwt;
ods select FreqPlot;
tables low /plots=FreqPlot (type=dotplot);
run;
ods region row=2 column=2;
proc freq data=ibwt;
ods select FreqPlot;
tables low /plots=FreqPlot (type=dotplot scale=percent);
run;
ods layout end;
<!DOCTYPE html>
SAS Output
Plot a categorical variable against another categorical variable
Cluster bars
ods graphics/noborder;
ods layout start rows=2 columns=2;
ods region row=1 column=1;
proc sgplot data=ibwt;
vbar low/ group=race groupdisplay=cluster;
yaxis grid;
title "Distribution (count) of low by race";
run;
title;
ods region row=1 column=2;
proc sgplot data=ibwt;
vbar low/ group=race stat=percent groupdisplay=cluster;
yaxis grid;
title "Distribution (percent) of low by race";
run;
title;
ods region row=2 column=1;
proc sgplot data=ibwt;
vbar low/ group=smoke groupdisplay=cluster;
yaxis grid;
title "Distribution (count) of low by smoke";
run;
title;
ods region row=2 column=2;
proc sgplot data=ibwt;
vbar low/ group=smoke stat=percecnt groupdisplay=cluster;
yaxis grid;
title "Distribution (percent) of low by smoke";
run;
title;
ods layout end;
<!DOCTYPE html>
SAS Output
Mosaic plot
ods noproctitle;
ods layout start rows=1 columns=2;
ods region row=1 column=1;
proc freq data=ibwt;
ods select MosaicPlot;
tables low*race/plots=mosaic;
run;
ods region row=1 column=2;
proc freq data=ibwt;
ods select MosaicPlot;
tables low*smoke/plots=mosaic;
run;
ods layout end;
<!DOCTYPE html>
SAS Output
Frequency and dot plot
ods graphics/noborder;
ods noproctitle;
ods layout start rows=2 columns=2;
ods region row=1 column=1;
proc freq data=ibwt;
ods select FreqPlot;
tables low*race /plots=FreqPlot;
run;
ods region row=1 column=2;
proc freq data=ibwt;
ods select FreqPlot;
tables low*race /plots=FreqPlot(scale=percent);
run;
ods region row=2 column=1;
proc freq data=ibwt;
ods select FreqPlot;
tables low*race /plots=FreqPlot (type=dotplot);
run;
ods region row=2 column=2;
proc freq data=ibwt;
ods select FreqPlot;
tables low*race /plots=FreqPlot (type=dotplot scale=percent);
run;
ods layout end;
<!DOCTYPE html>
SAS Output