Structure of the dataset

library(MASS)

data("birthwt")

str(birthwt)
## 'data.frame':    189 obs. of  10 variables:
##  $ low  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ age  : int  19 33 20 21 18 21 22 17 29 26 ...
##  $ lwt  : int  182 155 105 108 107 124 118 103 123 113 ...
##  $ race : int  2 3 1 1 1 3 1 3 1 1 ...
##  $ smoke: int  0 0 1 1 1 0 0 0 1 1 ...
##  $ ptl  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ ht   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ ui   : int  1 0 0 1 1 0 0 0 0 0 ...
##  $ ftv  : int  0 3 1 2 0 0 1 1 1 0 ...
##  $ bwt  : int  2523 2551 2557 2594 2600 2622 2637 2637 2663 2665 ...

Describe the variables

# variable name variable label coded levels
1 low indicator of birth weight less than 2.5 kg 0, 1
2 age mother’s age in years continous variable
3 lwt mother’s weight in pounds at last menstrual period continous variable
4 race mother’s race (1 = white, 2 = black, 3 = other) 1, 2, 3
5 smoke smoking status during pregnancy 0, 1
6 ptl number of previous premature labours 0, 1, 2, 3
7 ht history of hypertension 0, 1
8 ui presence of uterine irritability 0, 1
9 ftv number of physician visits during the first trimester 0, 1, 2, 3, 4, 6
10 bwt birth weight in grams continous variable

Assign a library

%let path=/folders/myfolders/birthwt;
libname r "&path";

<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>

1680  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
1680! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
1681
1682 %let path=/folders/myfolders/birthwt;
1683 libname r "&path";
NOTE: Libref R was successfully assigned as follows:
Engine: V9
Physical Name: /folders/myfolders/birthwt
1684
1685 ods html5 (id=saspy_internal) close;ods listing;

1686

Import the data file

proc import

proc import datafile="&path/birthwt.csv" out=r.mbwt dbms=csv replace;
run;

<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>

1688  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
1688! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
1689
1690 proc import datafile="&path/birthwt.csv" out=r.mbwt dbms=csv replace;
1691 run;
1692 /**********************************************************************
1693 * PRODUCT: SAS
1694 * VERSION: 9.4
1695 * CREATOR: External File Interface
1696 * DATE: 12JAN20
1697 * DESC: Generated SAS Datastep Code
1698 * TEMPLATE SOURCE: (None Specified.)
1699 ***********************************************************************/
1700 data R.MBWT ;
1701 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
1702 infile '/folders/myfolders/birthwt/birthwt.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
1703 informat low best32. ;
1704 informat age best32. ;
1705 informat lwt best32. ;
1706 informat race best32. ;
1707 informat smoke best32. ;
1708 informat ptl best32. ;
1709 informat ht best32. ;
1710 informat ui best32. ;
1711 informat ftv best32. ;
1712 informat bwt best32. ;
1713 format low best12. ;
1714 format age best12. ;
1715 format lwt best12. ;
1716 format race best12. ;
1717 format smoke best12. ;
1718 format ptl best12. ;
1719 format ht best12. ;
1720 format ui best12. ;
1721 format ftv best12. ;
1722 format bwt best12. ;
1723 input
1724 low
1725 age
1726 lwt
1727 race
1728 smoke
1729 ptl
1730 ht
1731 ui
1732 ftv
1733 bwt
1734 ;
1735 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
1736 run;
NOTE: The infile '/folders/myfolders/birthwt/birthwt.csv' is:
Filename=/folders/myfolders/birthwt/birthwt.csv,
Owner Name=root,Group Name=vboxsf,
Access Permission=-rwxrwx---,
Last Modified=22Sep2019:22:51:35,
File Size (bytes)=4935

NOTE: 189 records were read from the infile '/folders/myfolders/birthwt/birthwt.csv'.
The minimum record length was 24.
The maximum record length was 25.
NOTE: The data set R.MBWT has 189 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.00 seconds

189 rows created in R.MBWT from /folders/myfolders/birthwt/birthwt.csv.



NOTE: R.MBWT data set was successfully created.
NOTE: The data set R.MBWT has 189 observations and 10 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.14 seconds
cpu time 0.08 seconds

1737
1738 ods html5 (id=saspy_internal) close;ods listing;

1739

Glimpse the data set

proc contents

title "Data content";
proc contents data=r.mbwt;
run;

title;
<!DOCTYPE html> SAS Output

Data content

The CONTENTS Procedure

Data Set Name R.MBWT Observations 189
Member Type DATA Variables 10
Engine V9 Indexes 0
Created 01/12/2020 02:46:27 Observation Length 80
Last Modified 01/12/2020 02:46:27 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64    
Encoding utf-8 Unicode (UTF-8)    
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 817
Obs in First Data Page 189
Number of Data Set Repairs 0
Filename /folders/myfolders/birthwt/mbwt.sas7bdat
Release Created 9.0401M6
Host Created Linux
Inode Number 1542
Access Permission rwxrwx—
Owner Name root
File Size 128KB
File Size (bytes) 131072
Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat
2 age Num 8 BEST12. BEST32.
10 bwt Num 8 BEST12. BEST32.
9 ftv Num 8 BEST12. BEST32.
7 ht Num 8 BEST12. BEST32.
1 low Num 8 BEST12. BEST32.
3 lwt Num 8 BEST12. BEST32.
6 ptl Num 8 BEST12. BEST32.
4 race Num 8 BEST12. BEST32.
5 smoke Num 8 BEST12. BEST32.
8 ui Num 8 BEST12. BEST32.

proc print

title "Print 5 rows of the data set";
proc print data=r.mbwt (obs=5);
run;

title;
<!DOCTYPE html> SAS Output

Print 5 rows of the data set

Obs low age lwt race smoke ptl ht ui ftv bwt
1 0 19 182 2 0 0 0 1 0 2523
2 0 33 155 3 0 0 0 0 3 2551
3 0 20 105 1 1 0 0 0 1 2557
4 0 21 108 1 1 0 0 1 2 2594
5 0 18 107 1 1 0 0 1 0 2600

Transform continuous variables

bwt

Determine quantiles using proc univariate

/*Determine quantiles */

title "Bwt quantiles";
ods select quantiles;
proc univariate data=r.mbwt;
    var bwt;
run;

title;

ods select all;
<!DOCTYPE html> SAS Output

Bwt quantiles

The UNIVARIATE Procedure

Variable: bwt

Quantiles (Definition 5)
Level Quantile
100% Max 4990
99% 4593
95% 3997
90% 3884
75% Q3 3487
50% Median 2977
25% Q1 2414
10% 1970
5% 1790
1% 1021
0% Min 709

Create a numeric categorical variable based on quantiles

/* classify bwt as (qunatile) multinomials */

data r.mbwt;
    set r.mbwt;
    
lowq=99;

if bwt < 2414 then lowq=3;
if (bwt >= 2414) and (bwt < 2977) then lowq=2;
if (bwt >= 2977) and (bwt < 3487) then lowq=1;
if bwt >=3487 then lowq=0;

run;

proc print data=r.mbwt (obs=5);
run;
<!DOCTYPE html> SAS Output
Obs low age lwt race smoke ptl ht ui ftv bwt lowq
1 0 19 182 2 0 0 0 1 0 2523 2
2 0 33 155 3 0 0 0 0 3 2551 2
3 0 20 105 1 1 0 0 0 1 2557 2
4 0 21 108 1 1 0 0 1 2 2594 2
5 0 18 107 1 1 0 0 1 0 2600 2

Create a character categorical variable based on quantiles

/* classify bwt as (qunatile) labelled multinomials */

data r.mbwt;
    length lowcq $ 8;
set r.mbwt;
    
if bwt < 2414 then lowcq="1q_bwt";
if (bwt >= 2414) and (bwt < 2977) then lowcq="2q_bwt";
if (bwt >= 2977) and (bwt < 3487) then lowcq="3q_bwt";
if bwt >=3487 then lowcq="4q_bwt";

run;

proc print data=r.mbwt (obs=5);
run;
<!DOCTYPE html> SAS Output
Obs lowcq low age lwt race smoke ptl ht ui ftv bwt lowq
1 2q_bwt 0 19 182 2 0 0 0 1 0 2523 2
2 2q_bwt 0 33 155 3 0 0 0 0 3 2551 2
3 2q_bwt 0 20 105 1 1 0 0 0 1 2557 2
4 2q_bwt 0 21 108 1 1 0 0 1 2 2594 2
5 2q_bwt 0 18 107 1 1 0 0 1 0 2600 2

Check categorization using proc freq

proc freq data=r.mbwt;
    tables lowq*lowcq/list;
run;
<!DOCTYPE html> SAS Output

The FREQ Procedure

lowq lowcq Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 4q_bwt 48 25.40 48 25.40
1 3q_bwt 48 25.40 96 50.79
2 2q_bwt 46 24.34 142 75.13
3 1q_bwt 47 24.87 189 100.00

lwt

Determine quantiles

/*Determine quantiles */

title "lwt quantiles";
ods select quantiles;
proc univariate data=r.mbwt;
    var lwt;
run;

title;

ods select all;
<!DOCTYPE html> SAS Output

lwt quantiles

The UNIVARIATE Procedure

Variable: lwt

Quantiles (Definition 5)
Level Quantile
100% Max 250
99% 241
95% 189
90% 170
75% Q3 140
50% Median 121
25% Q1 110
10% 98
5% 94
1% 85
0% Min 80

Create a categorical numeric variable based on quantiles

/* classify lwt based on qunatiles to multinomials */

data r.mbwt;
    set r.mbwt;
    
lwtq=99;

if lwt < 110 then lwtq=3;
if (lwt >= 110) and (lwt < 121) then lwtq=2;
if (lwt >= 121) and (lwt < 140) then lwtq=1;
if lwt >=140 then lwtq=0;

run;

proc print data=r.mbwt (obs=5);
run;
<!DOCTYPE html> SAS Output
Obs lowcq low age lwt race smoke ptl ht ui ftv bwt lowq lwtq
1 2q_bwt 0 19 182 2 0 0 0 1 0 2523 2 0
2 2q_bwt 0 33 155 3 0 0 0 0 3 2551 2 0
3 2q_bwt 0 20 105 1 1 0 0 0 1 2557 2 3
4 2q_bwt 0 21 108 1 1 0 0 1 2 2594 2 3
5 2q_bwt 0 18 107 1 1 0 0 1 0 2600 2 3

Create a character categorical variable based on quantiles

/* classify lwt based on qunatiles to labelled-multinomials */

data r.mbwt;
    length lwtcq $8;
set r.mbwt;
    
if lwt < 110 then lwtcq="1q_lwt";
if (lwt >= 110) and (lwt < 121) then lwtcq="2q_lwt";
if (lwt >= 121) and (lwt < 140) then lwtcq="3q_lwt";
if lwt >=140 then lwtcq="4q_lwt";

run;

proc print data=r.mbwt (obs=5);
run;
<!DOCTYPE html> SAS Output
Obs lwtcq lowcq low age lwt race smoke ptl ht ui ftv bwt lowq lwtq
1 4q_lwt 2q_bwt 0 19 182 2 0 0 0 1 0 2523 2 0
2 4q_lwt 2q_bwt 0 33 155 3 0 0 0 0 3 2551 2 0
3 1q_lwt 2q_bwt 0 20 105 1 1 0 0 0 1 2557 2 3
4 1q_lwt 2q_bwt 0 21 108 1 1 0 0 1 2 2594 2 3
5 1q_lwt 2q_bwt 0 18 107 1 1 0 0 1 0 2600 2 3

Check categorization using proc freq

title " Frquency check for lwtq and lwtcq";
proc freq data=r.mbwt;
    tables lwtq*lwtcq/list;
run;
title;
<!DOCTYPE html> SAS Output

Frquency check for lwtq and lwtcq

The FREQ Procedure

lwtq lwtcq Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 4q_lwt 50 26.46 50 26.46
1 3q_lwt 47 24.87 97 51.32
2 2q_lwt 50 26.46 147 77.78
3 1q_lwt 42 22.22 189 100.00

Make dummy variables

/* make dummies */

data r.mbwt;
    set r.mbwt;
    
lwtq1=0;
lwtq2=0;
lwtq3=0;
lwtq4=0;


if lwtq=0 then lwtq1=1;
if lwtq=1 then lwtq2=1;
if lwtq=2 then lwtq3=1;
if lwtq=3 then lwtq4=1;

run;

proc print data=r.mbwt (obs=5);
run;
<!DOCTYPE html> SAS Output
Obs lwtcq lowcq low age lwt race smoke ptl ht ui ftv bwt lowq lwtq lwtq1 lwtq2 lwtq3 lwtq4
1 4q_lwt 2q_bwt 0 19 182 2 0 0 0 1 0 2523 2 0 1 0 0 0
2 4q_lwt 2q_bwt 0 33 155 3 0 0 0 0 3 2551 2 0 1 0 0 0
3 1q_lwt 2q_bwt 0 20 105 1 1 0 0 0 1 2557 2 3 0 0 0 1
4 1q_lwt 2q_bwt 0 21 108 1 1 0 0 1 2 2594 2 3 0 0 0 1
5 1q_lwt 2q_bwt 0 18 107 1 1 0 0 1 0 2600 2 3 0 0 0 1

Transform categorical variables

race

Determine frequency

title "Frequency of variable race";
ods noproctitle;
proc freq data=r.mbwt;
    table race;
run;
title;  
<!DOCTYPE html> SAS Output

Frequency of variable race

race Frequency Percent Cumulative
Frequency
Cumulative
Percent
1 96 50.79 96 50.79
2 26 13.76 122 64.55
3 67 35.45 189 100.00

Recode to numeric, character and dummy variables

title "Recategorize race to numeric (0, 1 and 2), character (white, balck and other) and dummy variables";

data r.mbwt;
    length racename $ 8;
set r.mbwt;

/* recode race to a new character variable */
    
if race=1 then racename="white";
if race=2 then racename="black";
if race=3 then racename="other";


/* recode race to a new numeric variable */

racegr=99;

if race=1 then racegr=0;
if race=2 then racegr=1;
if race=3 then racegr=2;


/* make dummies */

if racegr=0 then white=1;
else white=0;

if racegr=1 then black=1;
else black=0;

if racegr=2 then other=1;
else other=0;

run;

proc print data=r.mbwt (obs=6);
run;

title;
<!DOCTYPE html> SAS Output

Recategorize race to numeric (0, 1 and 2), character (white, balck and other) and dummy variables

Obs racename lwtcq lowcq low age lwt race smoke ptl ht ui ftv bwt lowq lwtq lwtq1 lwtq2 lwtq3 lwtq4 racegr white black other
1 black 4q_lwt 2q_bwt 0 19 182 2 0 0 0 1 0 2523 2 0 1 0 0 0 1 0 1 0
2 other 4q_lwt 2q_bwt 0 33 155 3 0 0 0 0 3 2551 2 0 1 0 0 0 2 0 0 1
3 white 1q_lwt 2q_bwt 0 20 105 1 1 0 0 0 1 2557 2 3 0 0 0 1 0 1 0 0
4 white 1q_lwt 2q_bwt 0 21 108 1 1 0 0 1 2 2594 2 3 0 0 0 1 0 1 0 0
5 white 1q_lwt 2q_bwt 0 18 107 1 1 0 0 1 0 2600 2 3 0 0 0 1 0 1 0 0
6 other 3q_lwt 2q_bwt 0 21 124 3 0 0 0 0 0 2622 2 1 0 1 0 0 2 0 0 1

Check recoding using proc freq

/* check frequency distribution */

proc freq data=r.mbwt;
    tables race*racename/list;
run;

proc freq data=r.mbwt;
    tables race*racegr/list;
run;


proc freq data=r.mbwt;
    tables white black other;
run;
<!DOCTYPE html> SAS Output
race racename Frequency Percent Cumulative
Frequency
Cumulative
Percent
1 white 96 50.79 96 50.79
2 black 26 13.76 122 64.55
3 other 67 35.45 189 100.00

race racegr Frequency Percent Cumulative
Frequency
Cumulative
Percent
1 0 96 50.79 96 50.79
2 1 26 13.76 122 64.55
3 2 67 35.45 189 100.00

white Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 93 49.21 93 49.21
1 96 50.79 189 100.00
black Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 163 86.24 163 86.24
1 26 13.76 189 100.00
other Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 122 64.55 122 64.55
1 67 35.45 189 100.00

ptl

Determine frequency

title "Frequency for variable ptl";
ods noproctitle;
proc freq data=r.mbwt;
    table ptl;
run;
title; 
ods proctitle;
<!DOCTYPE html> SAS Output

Frequency for variable ptl

ptl Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 159 84.13 159 84.13
1 24 12.70 183 96.83
2 5 2.65 188 99.47
3 1 0.53 189 100.00

Recode to numeric, character and dummy variables

title "Recategorize ptl to numeric (0, 1 and 2), character (...) and dummy variables";

data r.mbwt;
length prterm_labor $ 25;
    set r.mbwt;
    
/* recode to a new numeric variable */
    
ptlgr=99;

if ptl=0 then ptlgr=0;
if ptl=1 then ptlgr=1;
if ptl=2 or ptl=3 then ptlgr=2;


/* recode to a new character variable */

if ptlgr=0 then prterm_labor="zero";
if ptlgr=1 then prterm_labor="one";
if ptlgr=2 then prterm_labor="two_or_more";


/* make dummies */

if ptlgr=0 then no_ptm_labor=1;
else no_ptm_labor=0;

if ptlgr=1 then one_ptm_labor=1;
else one_ptm_labor=0;

if ptlgr=2 then two_or_more_ptm_labor=1;
else two_or_more_ptm_labor=0;

run;

proc print data=r.mbwt (obs=6);
run;

title;
<!DOCTYPE html> SAS Output

Recategorize ptl to numeric (0, 1 and 2), character (…) and dummy variables

Obs prterm_labor racename lwtcq lowcq low age lwt race smoke ptl ht ui ftv bwt lowq lwtq lwtq1 lwtq2 lwtq3 lwtq4 racegr white black other ptlgr no_ptm_labor one_ptm_labor two_or_more_ptm_labor
1 zero black 4q_lwt 2q_bwt 0 19 182 2 0 0 0 1 0 2523 2 0 1 0 0 0 1 0 1 0 0 1 0 0
2 zero other 4q_lwt 2q_bwt 0 33 155 3 0 0 0 0 3 2551 2 0 1 0 0 0 2 0 0 1 0 1 0 0
3 zero white 1q_lwt 2q_bwt 0 20 105 1 1 0 0 0 1 2557 2 3 0 0 0 1 0 1 0 0 0 1 0 0
4 zero white 1q_lwt 2q_bwt 0 21 108 1 1 0 0 1 2 2594 2 3 0 0 0 1 0 1 0 0 0 1 0 0
5 zero white 1q_lwt 2q_bwt 0 18 107 1 1 0 0 1 0 2600 2 3 0 0 0 1 0 1 0 0 0 1 0 0
6 zero other 3q_lwt 2q_bwt 0 21 124 3 0 0 0 0 0 2622 2 1 0 1 0 0 2 0 0 1 0 1 0 0

Check recoding using proc frequency

ods noproctitle;

title "Frequency for variable ptl and its derivatives";

proc freq data=r.mbwt;
    table ptl*ptlgr/list;
run;

proc freq data=r.mbwt;
    table ptl*prterm_labor/list;
run;


proc freq data=r.mbwt;
    table no_ptm_labor one_ptm_labor two_or_more_ptm_labor;
run;

title; 

ods proctitle;
<!DOCTYPE html> SAS Output

Frequency for variable ptl and its derivatives

ptl ptlgr Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 0 159 84.13 159 84.13
1 1 24 12.70 183 96.83
2 2 5 2.65 188 99.47
3 2 1 0.53 189 100.00

Frequency for variable ptl and its derivatives

ptl prterm_labor Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 zero 159 84.13 159 84.13
1 one 24 12.70 183 96.83
2 two_or_more 5 2.65 188 99.47
3 two_or_more 1 0.53 189 100.00

Frequency for variable ptl and its derivatives

no_ptm_labor Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 30 15.87 30 15.87
1 159 84.13 189 100.00
one_ptm_labor Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 165 87.30 165 87.30
1 24 12.70 189 100.00
two_or_more_ptm_labor Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 183 96.83 183 96.83
1 6 3.17 189 100.00

ftv

Determine frequency

title "Frequency distribution for variable ftv";
proc freq data=r.mbwt;
    table ftv;
run;
title; 
<!DOCTYPE html> SAS Output

Frequency distribution for variable ftv

The FREQ Procedure

ftv Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 100 52.91 100 52.91
1 47 24.87 147 77.78
2 30 15.87 177 93.65
3 7 3.70 184 97.35
4 4 2.12 188 99.47
6 1 0.53 189 100.00

Recode to new numeric, character and dummy variable

title "Recategorize ftv to numeric (0, 1 and 2), character (...) and dummy variables";

data r.mbwt;
    length ft_dr_visit $ 25;
    set r.mbwt;
    
    
/* recode to a new character variable (i.e., ft_dr_visit) initiated above */
    
if ftv=0 then ft_dr_visit="none";
if ftv=1 then ft_dr_visit="one";
if ftv in (2, 3, 4, 6) then ft_dr_visit="two_or_more";


/* recode to a new numeric variable */

ftvgr=99;

if ftv=0 then ftvgr=0;
if ftv=1 then ftvgr=1;
if ftv in (2, 3, 4, 6) then ftvgr=2;


/* make dummy variables */

if ftvgr=0 then no_dr_visit=1;
else no_dr_visit=0;

if ftvgr=1 then one_dr_visit=1;
else one_dr_visit=0;

if ftvgr=2 then two_or_more_dr_visit=1;
else two_or_more_dr_visit=0;

run;

proc print data=r.mbwt (obs=6);
run;

title;
<!DOCTYPE html> SAS Output

Recategorize ftv to numeric (0, 1 and 2), character (…) and dummy variables

Obs ft_dr_visit prterm_labor racename lwtcq lowcq low age lwt race smoke ptl ht ui ftv bwt lowq lwtq lwtq1 lwtq2 lwtq3 lwtq4 racegr white black other ptlgr no_ptm_labor one_ptm_labor two_or_more_ptm_labor ftvgr no_dr_visit one_dr_visit two_or_more_dr_visit
1 none zero black 4q_lwt 2q_bwt 0 19 182 2 0 0 0 1 0 2523 2 0 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0
2 two_or_more zero other 4q_lwt 2q_bwt 0 33 155 3 0 0 0 0 3 2551 2 0 1 0 0 0 2 0 0 1 0 1 0 0 2 0 0 1
3 one zero white 1q_lwt 2q_bwt 0 20 105 1 1 0 0 0 1 2557 2 3 0 0 0 1 0 1 0 0 0 1 0 0 1 0 1 0
4 two_or_more zero white 1q_lwt 2q_bwt 0 21 108 1 1 0 0 1 2 2594 2 3 0 0 0 1 0 1 0 0 0 1 0 0 2 0 0 1
5 none zero white 1q_lwt 2q_bwt 0 18 107 1 1 0 0 1 0 2600 2 3 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0
6 none zero other 3q_lwt 2q_bwt 0 21 124 3 0 0 0 0 0 2622 2 1 0 1 0 0 2 0 0 1 0 1 0 0 0 1 0 0

Check recoding using proc frequency

ods noproctitle;

title "Frequency for variable ftv and its derivatives";

proc freq data=r.mbwt;
    table ftv*ftvgr/list;
run;

proc freq data=r.mbwt;
    table ftv*ft_dr_visit/list;
run;

proc freq data=r.mbwt;
    table no_dr_visit one_dr_visit two_or_more_dr_visit;
run;

title; 

ods proctitle;
<!DOCTYPE html> SAS Output

Frequency for variable ftv and its derivatives

ftv ftvgr Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 0 100 52.91 100 52.91
1 1 47 24.87 147 77.78
2 2 30 15.87 177 93.65
3 2 7 3.70 184 97.35
4 2 4 2.12 188 99.47
6 2 1 0.53 189 100.00

Frequency for variable ftv and its derivatives

ftv ft_dr_visit Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 none 100 52.91 100 52.91
1 one 47 24.87 147 77.78
2 two_or_more 30 15.87 177 93.65
3 two_or_more 7 3.70 184 97.35
4 two_or_more 4 2.12 188 99.47
6 two_or_more 1 0.53 189 100.00

Frequency for variable ftv and its derivatives

no_dr_visit Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 89 47.09 89 47.09
1 100 52.91 189 100.00
one_dr_visit Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 142 75.13 142 75.13
1 47 24.87 189 100.00
two_or_more_dr_visit Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 147 77.78 147 77.78
1 42 22.22 189 100.00

smoke

Recode to character and dummy variables

title "Recode smoke to character and dummy variable";

data r.mbwt;
    length smokec $ 12;
    set r.mbwt;
    
/* recode to a character variable (i.e., smokec) initiated above */

if smoke=0 then smokec="non_smoker";
if smoke=1 then smokec="smoker";


/* create dummy variables */

if smoke=0 then non_smoker=1;
else non_smoker=0;

if smoke=1 then smoker=1;
else smoker=0;

run;

proc print data=r.mbwt (obs=5);
run;

title;
<!DOCTYPE html> SAS Output

Recode smoke to character and dummy variable

Obs smokec ft_dr_visit prterm_labor racename lwtcq lowcq low age lwt race smoke ptl ht ui ftv bwt lowq lwtq lwtq1 lwtq2 lwtq3 lwtq4 racegr white black other ptlgr no_ptm_labor one_ptm_labor two_or_more_ptm_labor ftvgr no_dr_visit one_dr_visit two_or_more_dr_visit non_smoker smoker
1 non_smoker none zero black 4q_lwt 2q_bwt 0 19 182 2 0 0 0 1 0 2523 2 0 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 1 0
2 non_smoker two_or_more zero other 4q_lwt 2q_bwt 0 33 155 3 0 0 0 0 3 2551 2 0 1 0 0 0 2 0 0 1 0 1 0 0 2 0 0 1 1 0
3 smoker one zero white 1q_lwt 2q_bwt 0 20 105 1 1 0 0 0 1 2557 2 3 0 0 0 1 0 1 0 0 0 1 0 0 1 0 1 0 0 1
4 smoker two_or_more zero white 1q_lwt 2q_bwt 0 21 108 1 1 0 0 1 2 2594 2 3 0 0 0 1 0 1 0 0 0 1 0 0 2 0 0 1 0 1
5 smoker none zero white 1q_lwt 2q_bwt 0 18 107 1 1 0 0 1 0 2600 2 3 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1

Check recoding using proc frequency

ods noproctitle;

title "Frequency for variable smoke and its derivatives";

proc freq data=r.mbwt;
    table smoke*smokec/list;
run;

proc freq data=r.mbwt;
    table non_smoker smoker;
run;

title; 

ods proctitle;
<!DOCTYPE html> SAS Output

Frequency for variable smoke and its derivatives

smoke smokec Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 non_smoker 115 60.85 115 60.85
1 smoker 74 39.15 189 100.00

Frequency for variable smoke and its derivatives

non_smoker Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 74 39.15 74 39.15
1 115 60.85 189 100.00
smoker Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 115 60.85 115 60.85
1 74 39.15 189 100.00

ht

Recode to character and dummy variables

title "Recode ht to character and dummy variables";

data r.mbwt;
    length htc $ 12;
    set r.mbwt;
    
/* recode to a character variable (i.e., htc) initiated above */
    
if ht=0 then htc="non_hypert";
if ht=1 then htc="hypert";

/* create dummy variables */

if ht=0 then non_hypert=1;
else non_hypert=0;

if ht=1 then hypert=1;
else hypert=0;

run;

proc print data=r.mbwt (obs=5);
run;

title;
<!DOCTYPE html> SAS Output

Recode ht to character and dummy variables

Obs htc smokec ft_dr_visit prterm_labor racename lwtcq lowcq low age lwt race smoke ptl ht ui ftv bwt lowq lwtq lwtq1 lwtq2 lwtq3 lwtq4 racegr white black other ptlgr no_ptm_labor one_ptm_labor two_or_more_ptm_labor ftvgr no_dr_visit one_dr_visit two_or_more_dr_visit non_smoker smoker non_hypert hypert
1 non_hypert non_smoker none zero black 4q_lwt 2q_bwt 0 19 182 2 0 0 0 1 0 2523 2 0 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 1 0 1 0
2 non_hypert non_smoker two_or_more zero other 4q_lwt 2q_bwt 0 33 155 3 0 0 0 0 3 2551 2 0 1 0 0 0 2 0 0 1 0 1 0 0 2 0 0 1 1 0 1 0
3 non_hypert smoker one zero white 1q_lwt 2q_bwt 0 20 105 1 1 0 0 0 1 2557 2 3 0 0 0 1 0 1 0 0 0 1 0 0 1 0 1 0 0 1 1 0
4 non_hypert smoker two_or_more zero white 1q_lwt 2q_bwt 0 21 108 1 1 0 0 1 2 2594 2 3 0 0 0 1 0 1 0 0 0 1 0 0 2 0 0 1 0 1 1 0
5 non_hypert smoker none zero white 1q_lwt 2q_bwt 0 18 107 1 1 0 0 1 0 2600 2 3 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0

Check recoding using proc frequency

ods noproctitle;

title "Frequency for variable ht and its derivatives";

proc freq data=r.mbwt;
    table ht*htc/list;
run;

proc freq data=r.mbwt;
    table non_hypert hypert;
run;

title; 

ods proctitle;
<!DOCTYPE html> SAS Output

Frequency for variable ht and its derivatives

ht htc Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 non_hypert 177 93.65 177 93.65
1 hypert 12 6.35 189 100.00

Frequency for variable ht and its derivatives

non_hypert Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 12 6.35 12 6.35
1 177 93.65 189 100.00
hypert Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 177 93.65 177 93.65
1 12 6.35 189 100.00

ui

Recode to character and dummy variables

title "Recode ui to character and dummy variables";

data r.mbwt;
    length uic $ 12;
    set r.mbwt;
    
/* populate values of a character variable (i.e., uic) initiated above */
    
if ui=0 then uic="non_u_irrit";
if ui=1 then uic="u_irrit";


/* make dummies for variable ui */

if ui=0 then non_u_irrit=1;
else non_u_irrit=0;

if ui=1 then u_irrit=1;
else u_irrit=0;

run;

proc print data=r.mbwt (obs=5);
run;

title;
<!DOCTYPE html> SAS Output

Recode ui to character and dummy variables

Obs uic htc smokec ft_dr_visit prterm_labor racename lwtcq lowcq low age lwt race smoke ptl ht ui ftv bwt lowq lwtq lwtq1 lwtq2 lwtq3 lwtq4 racegr white black other ptlgr no_ptm_labor one_ptm_labor two_or_more_ptm_labor ftvgr no_dr_visit one_dr_visit two_or_more_dr_visit non_smoker smoker non_hypert hypert non_u_irrit u_irrit
1 u_irrit non_hypert non_smoker none zero black 4q_lwt 2q_bwt 0 19 182 2 0 0 0 1 0 2523 2 0 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1
2 non_u_irrit non_hypert non_smoker two_or_more zero other 4q_lwt 2q_bwt 0 33 155 3 0 0 0 0 3 2551 2 0 1 0 0 0 2 0 0 1 0 1 0 0 2 0 0 1 1 0 1 0 1 0
3 non_u_irrit non_hypert smoker one zero white 1q_lwt 2q_bwt 0 20 105 1 1 0 0 0 1 2557 2 3 0 0 0 1 0 1 0 0 0 1 0 0 1 0 1 0 0 1 1 0 1 0
4 u_irrit non_hypert smoker two_or_more zero white 1q_lwt 2q_bwt 0 21 108 1 1 0 0 1 2 2594 2 3 0 0 0 1 0 1 0 0 0 1 0 0 2 0 0 1 0 1 1 0 0 1
5 u_irrit non_hypert smoker none zero white 1q_lwt 2q_bwt 0 18 107 1 1 0 0 1 0 2600 2 3 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 1

Check recoding using proc freq

ods noproctitle;

title "Frequency for variable ui and its derivatives";

proc freq data=r.mbwt;
    table ui*uic/list;
run;

proc freq data=r.mbwt;
    table non_u_irrit u_irrit;
run;

title; 

ods proctitle;
<!DOCTYPE html> SAS Output

Frequency for variable ui and its derivatives

ui uic Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 non_u_irrit 161 85.19 161 85.19
1 u_irrit 28 14.81 189 100.00

Frequency for variable ui and its derivatives

non_u_irrit Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 28 14.81 28 14.81
1 161 85.19 189 100.00
u_irrit Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 161 85.19 161 85.19
1 28 14.81 189 100.00

By-group processing

proc sort data=r.mbwt;
   by smokec racename;
run;

data first last;
    set r.mbwt;
    by smokec racename;
    if FIRST.smokec or FIRST.racename then output first;
    if LAST.smokec or LAST.racename then output last;
run;

<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>

2335  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
2335! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
2336
2337 proc sort data=r.mbwt;
2338 by smokec racename;
2339 run;
NOTE: There were 189 observations read from the data set R.MBWT.
NOTE: The data set R.MBWT has 189 observations and 42 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.08 seconds
cpu time 0.03 seconds

2340
2341 data first last;
2342 set r.mbwt;
2343 by smokec racename;
2344 if FIRST.smokec or FIRST.racename then output first;
2345 if LAST.smokec or LAST.racename then output last;
2346 run;
NOTE: There were 189 observations read from the data set R.MBWT.
NOTE: The data set WORK.FIRST has 6 observations and 42 variables.
NOTE: The data set WORK.LAST has 6 observations and 42 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds

2347
2348 ods html5 (id=saspy_internal) close;ods listing;

2349
proc print data=first;
    var smokec racename age lwt bwt;
run;
<!DOCTYPE html> SAS Output
Obs smokec racename age lwt bwt
1 non_smoker black 19 182 2523
2 non_smoker other 33 155 2551
3 non_smoker white 22 118 2637
4 smoker black 26 168 2920
5 smoker other 22 85 3090
6 smoker white 20 105 2557
proc print data=last;
     var smokec racename age lwt bwt;
run;
<!DOCTYPE html> SAS Output
Obs smokec racename age lwt bwt
1 non_smoker black 17 142 2495
2 non_smoker other 14 100 2495
3 non_smoker white 15 110 2353
4 smoker black 24 105 2381
5 smoker other 23 94 2495
6 smoker white 21 130 2495

String processing

Find word using findw()

data _null_;
set r.mbwt;

x=FINDW(smokec,'smoker');

put x;
run;

<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>

2369  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
2369! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
2370
2371 data _null_;
2372 set r.mbwt;
2373
2374 x=FINDW(smokec,'smoker');
2375
2376 put x;
2377 run;
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
NOTE: There were 189 observations read from the data set R.MBWT.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

2378
2379 ods html5 (id=saspy_internal) close;ods listing;

2380

Extract string using substr()

data _null_;
set r.mbwt;

rcs=substr(racename,1,1);

put rcs;

run;

<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>

2382  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
2382! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
2383
2384 data _null_;
2385 set r.mbwt;
2386
2387 rcs=substr(racename,1,1);
2388
2389 put rcs;
2390
2391 run;
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
b
b
b
b
b
b
b
b
b
b
o
o
o
o
o
o
o
o
o
o
o
o
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
NOTE: There were 189 observations read from the data set R.MBWT.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

2392
2393 ods html5 (id=saspy_internal) close;ods listing;

2394

Length of the observations in a numeric variable

Using length()

data _null_;
set r.mbwt;

agn=length(age);

put agn;

run;

<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>

2396  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
2396! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
2397
2398 data _null_;
2399 set r.mbwt;
2400
2401 agn=length(age);
2402
2403 put agn;
2404
2405 run;
NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
2401:12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
NOTE: There were 189 observations read from the data set R.MBWT.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

2406
2407 ods html5 (id=saspy_internal) close;ods listing;

2408

Using array

data _null_;
    set r.mbwt;

array ag{*} age;

do i=1 to dim(ag);

    ag(i)=length(ag(i));

    put ag(i);

end;

run;

<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>

2410  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
2410! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
2411
2412 data _null_;
2413 set r.mbwt;
2414
2415 array ag{*} age;
2416
2417 do i=1 to dim(ag);
2418
2419 ag(i)=length(ag(i));
2420
2421 put ag(i);
2422
2423 end;
2424
2425 run;
NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
2419:18
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
NOTE: There were 189 observations read from the data set R.MBWT.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds

2426
2427 ods html5 (id=saspy_internal) close;ods listing;

2428

Length of the observations in a character variable

Using length()

data _null_;
set r.mbwt;

smkn=length(smokec);

put smkn;

run;

<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>

2430  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
2430! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
2431
2432 data _null_;
2433 set r.mbwt;
2434
2435 smkn=length(smokec);
2436
2437 put smkn;
2438
2439 run;
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
NOTE: There were 189 observations read from the data set R.MBWT.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

2440
2441 ods html5 (id=saspy_internal) close;ods listing;

2442

Using array

data _null_;
    set r.mbwt;

array rc{*} $ smokec;

do i=1 to dim(rc);

    rc(i)=length(rc(i));

    put rc(i);

end;

run;

<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>

2444  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
2444! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
2445
2446 data _null_;
2447 set r.mbwt;
2448
2449 array rc{*} $ smokec;
2450
2451 do i=1 to dim(rc);
2452
2453 rc(i)=length(rc(i));
2454
2455 put rc(i);
2456
2457 end;
2458
2459 run;
NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
2453:5
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
NOTE: There were 189 observations read from the data set R.MBWT.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

2460
2461 ods html5 (id=saspy_internal) close;ods listing;

2462

Select/keep variables of interest

Set statement has the keep option

/* keep option is in the set statement */

data r.mbwt1;
    set r.mbwt (keep=bwt age lwt smokec racename uic htc ft_dr_visit prterm_labor);
run;

<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>

2464  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
2464! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
2465
2466 /* keep option is in the set statement */
2467
2468 data r.mbwt1;
2469 set r.mbwt (keep=bwt age lwt smokec racename uic htc ft_dr_visit prterm_labor);
2470 run;
NOTE: There were 189 observations read from the data set R.MBWT.
NOTE: The data set R.MBWT1 has 189 observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 0.05 seconds
cpu time 0.02 seconds

2471
2472 ods html5 (id=saspy_internal) close;ods listing;

2473
proc print data=r.mbwt1 (obs=5);
run;
<!DOCTYPE html> SAS Output
Obs uic htc smokec ft_dr_visit prterm_labor racename age lwt bwt
1 u_irrit non_hypert non_smoker none zero black 19 182 2523
2 non_u_irrit non_hypert non_smoker none zero black 15 98 2778
3 non_u_irrit non_hypert non_smoker one zero black 17 113 2920
4 non_u_irrit non_hypert non_smoker one zero black 17 113 2920
5 non_u_irrit non_hypert non_smoker none zero black 25 125 2977

Data statement has the keep option

/* keep option is in the data statement */

data r.mbwt2 (keep=bwt age lwt smokec racename uic htc ft_dr_visit prterm_labor);
    set r.mbwt;
run;

<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>

2483  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
2483! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
2484
2485 /* keep option is in the data statement */
2486
2487 data r.mbwt2 (keep=bwt age lwt smokec racename uic htc ft_dr_visit prterm_labor);
2488 set r.mbwt;
2489 run;
NOTE: There were 189 observations read from the data set R.MBWT.
NOTE: The data set R.MBWT2 has 189 observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.01 seconds

2490
2491 ods html5 (id=saspy_internal) close;ods listing;

2492
proc print data=r.mbwt2 (obs=5);
run;
<!DOCTYPE html> SAS Output
Obs uic htc smokec ft_dr_visit prterm_labor racename age lwt bwt
1 u_irrit non_hypert non_smoker none zero black 19 182 2523
2 non_u_irrit non_hypert non_smoker none zero black 15 98 2778
3 non_u_irrit non_hypert non_smoker one zero black 17 113 2920
4 non_u_irrit non_hypert non_smoker one zero black 17 113 2920
5 non_u_irrit non_hypert non_smoker none zero black 25 125 2977

Keep statement specifies the list of variables

/* keep statement specify variables of interest */

data r.mbwt3;
    set r.mbwt;
    keep bwt age lwt smokec racename uic htc ft_dr_visit prterm_labor;
run;

<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>

2502  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
2502! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
2503
2504 /* keep statement specify variables of interest */
2505
2506 data r.mbwt3;
2507 set r.mbwt;
2508 keep bwt age lwt smokec racename uic htc ft_dr_visit prterm_labor;
2509 run;
NOTE: There were 189 observations read from the data set R.MBWT.
NOTE: The data set R.MBWT3 has 189 observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.00 seconds

2510
2511 ods html5 (id=saspy_internal) close;ods listing;

2512
proc print data=r.mbwt3 (obs=5);
run;
<!DOCTYPE html> SAS Output
Obs uic htc smokec ft_dr_visit prterm_labor racename age lwt bwt
1 u_irrit non_hypert non_smoker none zero black 19 182 2523
2 non_u_irrit non_hypert non_smoker none zero black 15 98 2778
3 non_u_irrit non_hypert non_smoker one zero black 17 113 2920
4 non_u_irrit non_hypert non_smoker one zero black 17 113 2920
5 non_u_irrit non_hypert non_smoker none zero black 25 125 2977

Export a data set

proc export data=r.mbwt1 outfile="&path/mbwt1.csv";
run;

<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>

2522  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
2522! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
2523
2524 proc export data=r.mbwt1 outfile="&path/mbwt1.csv";
2525 run;
NOTE: Export cancelled. Output file /folders/myfolders/birthwt/mbwt1.csv already exists. Specify REPLACE option to overwrite it.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE EXPORT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

2526
2527 ods html5 (id=saspy_internal) close;ods listing;

2528

Descriptive statistics

For the whole data set using proc means

proc means data=r.mbwt;
run;
<!DOCTYPE html> SAS Output

The MEANS Procedure

Variable N Mean Std Dev Minimum Maximum
low
age
lwt
race
smoke
ptl
ht
ui
ftv
bwt
lowq
lwtq
lwtq1
lwtq2
lwtq3
lwtq4
racegr
white
black
other
ptlgr
no_ptm_labor
one_ptm_labor
two_or_more_ptm_labor
ftvgr
no_dr_visit
one_dr_visit
two_or_more_dr_visit
non_smoker
smoker
non_hypert
hypert
non_u_irrit
u_irrit
189
189
189
189
189
189
189
189
189
189
189
189
189
189
189
189
189
189
189
189
189
189
189
189
189
189
189
189
189
189
189
189
189
189
0.3121693
23.2380952
129.8148148
1.8465608
0.3915344
0.1957672
0.0634921
0.1481481
0.7936508
2944.59
1.4867725
1.4444444
0.2645503
0.2486772
0.2645503
0.2222222
0.8465608
0.5079365
0.1375661
0.3544974
0.1904762
0.8412698
0.1269841
0.0317460
0.6931217
0.5291005
0.2486772
0.2222222
0.6084656
0.3915344
0.9365079
0.0634921
0.8518519
0.1481481
0.4646093
5.2986779
30.5793804
0.9183422
0.4893898
0.4933419
0.2444936
0.3561903
1.0592861
729.2142952
1.1232952
1.1076779
0.4422650
0.4333944
0.4422650
0.4168439
0.9183422
0.5012649
0.3453589
0.4796313
0.4678087
0.3663949
0.3338395
0.1757889
0.8128001
0.5004782
0.4333944
0.4168439
0.4893898
0.4893898
0.2444936
0.2444936
0.3561903
0.3561903
0
14.0000000
80.0000000
1.0000000
0
0
0
0
0
709.0000000
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1.0000000
45.0000000
250.0000000
3.0000000
1.0000000
3.0000000
1.0000000
1.0000000
6.0000000
4990.00
3.0000000
3.0000000
1.0000000
1.0000000
1.0000000
1.0000000
2.0000000
1.0000000
1.0000000
1.0000000
2.0000000
1.0000000
1.0000000
1.0000000
2.0000000
1.0000000
1.0000000
1.0000000
1.0000000
1.0000000
1.0000000
1.0000000
1.0000000
1.0000000

For continuous variables of interest using proc means

proc means data=r.mbwt;
    var age lwt bwt;
run;
<!DOCTYPE html> SAS Output

The MEANS Procedure

Variable N Mean Std Dev Minimum Maximum
age
lwt
bwt
189
189
189
23.2380952
129.8148148
2944.59
5.2986779
30.5793804
729.2142952
14.0000000
80.0000000
709.0000000
45.0000000
250.0000000
4990.00

By classification/categorical variables using proc means

proc means data=r.mbwt;
class racename;
var age lwt bwt;
run;
<!DOCTYPE html> SAS Output

The MEANS Procedure

racename N Obs Variable N Mean Std Dev Minimum Maximum
black 26
age
lwt
bwt
26
26
26
21.5384615
146.8076923
2719.69
5.1086653
39.6393938
638.6838823
15.0000000
98.0000000
1135.00
35.0000000
241.0000000
3860.00
other 67
age
lwt
bwt
67
67
67
22.3880597
120.0149254
2805.28
4.5359013
25.1302622
722.1943583
14.0000000
80.0000000
709.0000000
33.0000000
250.0000000
4054.00
white 96
age
lwt
bwt
96
96
96
24.2916667
132.0520833
3102.72
5.6548380
29.0938119
727.8861493
14.0000000
90.0000000
1021.00
45.0000000
235.0000000
4990.00

For a single continuous variable using proc univariate

proc univariate data=r.mbwt;
    var bwt;
run;
<!DOCTYPE html> SAS Output

The UNIVARIATE Procedure

Variable: bwt

Moments
N 189 Sum Weights 189
Mean 2944.5873 Sum Observations 556527
Std Deviation 729.214295 Variance 531753.488
Skewness -0.208637 Kurtosis -0.0838389
Uncorrected SS 1738711993 Corrected SS 99969655.8
Coeff Variation 24.764567 Std Error Mean 53.042535
Basic Statistical Measures
Location Variability
Mean 2944.587 Std Deviation 729.21430
Median 2977.000 Variance 531753
Mode 3062.000 Range 4281
    Interquartile Range 1073
Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 55.5137 Pr > |t| <.0001
Sign M 94.5 Pr >= |M| <.0001
Signed Rank S 8977.5 Pr >= |S| <.0001
Quantiles (Definition 5)
Level Quantile
100% Max 4990
99% 4593
95% 3997
90% 3884
75% Q3 3487
50% Median 2977
25% Q1 2414
10% 1970
5% 1790
1% 1021
0% Min 709
Extreme Observations
Lowest Highest
Value Obs Value Obs
709 133 4167 108
1021 112 4174 109
1135 120 4238 170
1330 52 4593 110
1474 53 4990 111

For a categorical variable using proc freq

proc freq data=r.mbwt;
    tables racename;
run;
<!DOCTYPE html> SAS Output

The FREQ Procedure

racename Frequency Percent Cumulative
Frequency
Cumulative
Percent
black 26 13.76 26 13.76
other 67 35.45 93 49.21
white 96 50.79 189 100.00

For cross-tabulated variables using proc freq

proc freq data=r.mbwt;
    tables racename*smokec/nopercent norow nocol;
run;
<!DOCTYPE html> SAS Output

The FREQ Procedure

Frequency
Table of racename by smokec
racename smokec
non_smoker smoker Total
black
16
10
26
other
55
12
67
white
44
52
96
Total
115
74
189