SPSS to R - recoding of a Survey Syntax

Library needed to reproduce the SPSS syntax household survey table in R are below. Author of the ‘expss’ package notes that it should be loaded only after the ‘heaven’ package.

## 
## Use 'expss_output_viewer()' to display tables in the RStudio Viewer.
##  To return to the console output, use 'expss_output_default()'.

## 
## Attaching package: 'expss'

## The following objects are masked from 'package:haven':
## 
##     is.labelled, read_spss

path to *.sav file with the survey dataset.

path = 'C:/Users/mitro/OneDrive - UNICEF/MICS Pacific/MICS6 Countries/MICS6TON/15 Outputs/Output Tonga v10a Oct 01 - SE/MICS6TON/SPSS'
setwd(path)

data = read.spss('mn.sav', to.data.frame = T, use.value.labels = FALSE) ##data set is loaded with numerical values to save keystrokes

## re-encoding from UTF-8

data %>% filter(data$MWM17 == 1) # selects only responses from completed interviews with men age 15-49 in the survey

computing literacy as in SPSS syntax, re-written

data = compute(data, {
  literate = 2
})

data$literate<- ifelse((data$MWB6A >=2 & data$MWB6A <8), 1,
                       ifelse(data$MWB14==3,1,2))

# making data numeric in R dataframe for calculations
data$literate<-as.numeric(as.character(data$literate))

# adding value labels in R, comparable to 'value labels' function in SPSS. 
val_lab(data$literate) = num_lab("
                                 1 Literate
                                 2 Illiterate
                                 ")
# adding variable with overall percent of literate respondents. This is comparable to 'compute' function in SPSS. All variables have identical names as in SPSS syntax for easier reference.  
data = compute(data, {
  literateP = 0
})

data$literateP<- ifelse(data$literate ==1,100,0)


#recode(data$mwelevel) = c(0 ~ 0, 1 ~ 1, 2:3 ~ 2, 9 ~ 9, other ~ NA)

# adds value labels to the column headings in the table in R. This is not in SPSS syntax as those are already pre-defined in the *sav dataset
data$mwelevel<-as.numeric(as.character(data$mwelevel))
val_lab(data$mwelevel) = num_lab("
1 Up to primary
2 Lower secondary
3 Upper Secondary or higher [A]
9 Don't know/ Missing
                                 ")
# computes additional columns as in SPSS syntax, for overall number of cases and a total percentage. 'numMen' and 'tot' are identical variable names as in SPSS syntax
data = compute(data, {
  numMen = 1
})

data = compute(data, {
  tot = 1
})

# adding variable  labels. Labels added identical as in comparable SPSS syntax. In addition labels also added to the background characteristic variables that are shown in rows. This is not seen in SPSS syntax as they are already in the *.sav file. 
data<-apply_labels(data, 
                   literateP =  "Total percentage literate [1]",
                   numMen = "Number of men",
                   HH6 = "Area", # up to variable 'mwelevel' all variables are background characteristics. 
                   HH7 = "Region",
                   mdisability = "Functional difficulty",
                   religion = "Religion of the household head",
                   ethnicity = "Ethnicity of the household head",
                   windex5 = "Wealth quintile",
                   mwelevel = "Percent Distribution of highest level attended and literacy",
                   tot = "Total",
                   literate ="Literacy rate")

# adding value labels as in SPSS. next to syntax specific value labels, that are shown in the comparative document, the labels for background characteristic variables are also added. 
val_lab(data$tot) = num_lab("
1 |  " )

val_lab(data$HH6) = num_lab("
1 Urban
2 Rural")

val_lab(data$mdisability) = num_lab("
1 Has functinal difficulty
2 No funcitonal difficulty")

val_lab(data$windex5) = num_lab("
1 Poorest
2 Second
3 Middle
4 Fourth
5 Richest")

val_lab(data$religion) = num_lab("
1 Free Wesleyan Church
2 Latter Day Saints
3 Roman Catholic
4 Free Church of Tonga
5 Other religion
99 Don't know/missing")

val_lab(data$ethnicity) = num_lab("
1 Tongan
2 Chinese 
3 Fijian
4 Other ethnicity 
99 Don'tknow/missing")


val_lab(data$HH7) = num_lab("
1 Tongatapu
2 Vava'u
3 Ha'apai
4 'Eua
5 Ongo Niua")

expss_output_viewer() # function to see the table in the viewer in R studio
data %>%
  tab_total_row_position("none")%>% # suppresses the total values for each row. As SPSS syntax 'ctables' does not show totals, it is disabled in R code as well
  tab_cells(tot, HH6, HH7, mdisability, ethnicity ,religion, windex5) %>% #defines rows for the table
  tab_cols(literate, mwelevel, tot) %>% # defines columns to shoe percentage for with 'tab_stat_rpct'
  tab_weight(weight = mnweight) %>% #adds weights from the dataset
  tab_stat_rpct(total_label = NULL,total_statistic = "w_cases",)%>%
  tab_cols(net(literateP, "Total percentage literate" = greater_or_equal(1), "TO_DELETE" = other))%>% #defines mean for the additional column as a different calculation method, and marks a columnt to be deleted 
  tab_stat_rpct(total_label = NULL,total_statistic = "w_cases",)%>%
  tab_cols(total(numMen))%>% # last column and different calculation method only to count total cases 
  tab_stat_cases()%>%
  tab_last_round(digits = get_expss_digits())%>%
  tab_pivot(stat_position = "outside_columns")%>%
  where(!grepl("TO_DELETE", row_labels)) %>%
   drop_empty_rows()%>%
  set_caption( "Table SR.6.1M: Literacy (men)
               Percent distribution of men age 15-49 years by highest level of school attended and literacy, and the total percentage literate, " ) #adds caption on the top of the table

	Literacy rate		Percent Distribution of highest level attended and literacy				Total	Total percentage literate [1]		Number of men
Table SR.6.1M: Literacy (men) Percent distribution of men age 15-49 years by highest level of school attended and literacy, and the total percentage literate,
	Literate	Illiterate	Up to primary	Lower secondary	Upper Secondary or higher [A]	Don’t know/ Missing		Total percentage literate	TO_DELETE	#Total
Total
	99.7	0.3	1.0	26.0	72.9	0.1	100	99.7	100	1232.0
Area
Urban	99.7	0.3	0.8	22.5	76.4	0.3	100	99.7	100	275.5
Rural	99.7	0.3	1.1	27.0	71.9	0.0	100	99.7	100	956.5
Region
Tongatapu	99.9	0.1	0.8	23.3	75.7	0.1	100	99.9	100	873.8
Vava’u	99.2	0.8	1.9	32.7	65.4		100	99.2	100	198.1
Ha’apai	99.0	1.0	2.1	31.0	66.9		100	99.0	100	81.8
’Eua	99.5	0.5		34.3	65.2	0.5	100	99.5	100	63.5
Ongo Niua	98.6	1.4	1.4	28.9	69.8		100	98.6	100	14.8
Functional difficulty
Has functinal difficulty	96.9	3.1	7.0	37.6	55.4		100	96.9	100	28.3
No funcitonal difficulty	99.7	0.3	1.1	22.7	76.2	0.1	100	99.7	100	1026.3
Ethnicity of the household head
Tongan	99.7	0.3	0.8	25.6	73.4	0.1	100	99.7	100	1200.2
Chinese	100.0		11.8	44.9	43.2		100	100.0	100	24.3
Other ethnicity	100.0			18.3	81.7		100	100.0	100	7.4
Religion of the household head
Free Wesleyan Church	99.8	0.2	0.4	24.0	75.5	0.1	100	99.8	100	435.0
Latter Day Saints	99.8	0.2	0.2	30.1	69.7		100	99.8	100	231.3
Roman Catholic	99.2	0.8	2.0	24.0	73.4	0.5	100	99.2	100	163.7
Free Church of Tonga	99.4	0.6	1.0	24.4	74.6		100	99.4	100	142.2
Other religion	100.0		2.2	27.7	70.1		100	100.0	100	259.8
Wealth quintile
Poorest	99.5	0.5	1.3	35.9	62.7	0.1	100	99.5	100	270.6
Second	99.4	0.6	3.3	29.3	67.1	0.3	100	99.4	100	241.1
Middle	99.6	0.4	0.6	24.8	74.6		100	99.6	100	239.5
Fourth	100.0			24.9	75.1		100	100.0	100	242.4
Richest	100.0			13.7	86.3		100	100.0	100	238.4

SPSS to R - recoding of a Survey Syntax

Filip Mitrovic

Library needed to reproduce the SPSS syntax household survey table in R are below. Author of the ‘expss’ package notes that it should be loaded only after the ‘heaven’ package.

path to *.sav file with the survey dataset.

computing literacy as in SPSS syntax, re-written