Importing Data from SPSS

The Haven Package

The haven package allows us to import many file types, including .sav. First, you need to install the haven package:

install.packages("haven")

From here, we can use the haven package to import the .sav file

dat_haven <- haven::read_sav(file.choose())

Now, we can use the head function to look examine our data:

head(dat_haven)
## # A tibble: 6 x 13
##      ID gender   age class   agen_1   agen_2   agen_3 achiev_1 achiev_2 achiev_3
##   <dbl> <chr>  <dbl> <chr> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb>
## 1  1028 female    14 A     2 [Disa~ 2 [Disa~ 3 [Neit~ 3 [Neit~ 4 [Agre~ 3 [Neit~
## 2  1049 male      13 A     3 [Neit~ 3 [Neit~ 2 [Disa~ 2 [Disa~ 3 [Neit~ 3 [Neit~
## 3  1052 female    14 B     3 [Neit~ 3 [Neit~ 3 [Neit~ 2 [Disa~ 2 [Disa~ 3 [Neit~
## 4  1063 male      14 A     3 [Neit~ 3 [Neit~ 3 [Neit~ 2 [Disa~ 2 [Disa~ 3 [Neit~
## 5  1119 male      13 A     3 [Neit~ 3 [Neit~ 3 [Neit~ 3 [Neit~ 4 [Agre~ 4 [Agre~
## 6  1142 female    13 A     3 [Neit~ 3 [Neit~ 3 [Neit~ 3 [Neit~ 4 [Agre~ 3 [Neit~
## # ... with 3 more variables: motiv_1 <dbl+lbl>, motiv_2 <dbl+lbl>,
## #   motiv_3 <dbl+lbl>

Looking at the output, we can see that our data was read in as a tibble (a type of data frame used in the tidyverse), and includes label information that was part of the original .sav file.

The Foreign Package

Another package that can be used to import a .sav file is the foreign package. This package comes pre-installed with R as part of the default package list, so we do not need to install it.

By default, foreign reads the data as a list, but we can include the argument to.data.frame=T to have foreign convert the data into a dataframe.

dat_foreign <- foreign::read.spss(file.choose(),
                                  to.data.frame = T)
## re-encoding from UTF-8

Now, we can use the head function again to examine our foreign data:

head(dat_foreign)
##     ID gender age class                    agen_1                    agen_2
## 1 1028 female  14     A                  Disagree                  Disagree
## 2 1049 male    13     A Neither Agree or Disagree Neither Agree or Disagree
## 3 1052 female  14     B Neither Agree or Disagree Neither Agree or Disagree
## 4 1063 male    14     A Neither Agree or Disagree Neither Agree or Disagree
## 5 1119 male    13     A Neither Agree or Disagree Neither Agree or Disagree
## 6 1142 female  13     A Neither Agree or Disagree Neither Agree or Disagree
##                      agen_3                  achiev_1                  achiev_2
## 1 Neither Agree or Disagree Neither Agree or Disagree                     Agree
## 2                  Disagree                  Disagree Neither Agree or Disagree
## 3 Neither Agree or Disagree                  Disagree                  Disagree
## 4 Neither Agree or Disagree                  Disagree                  Disagree
## 5 Neither Agree or Disagree Neither Agree or Disagree                     Agree
## 6 Neither Agree or Disagree Neither Agree or Disagree                     Agree
##                    achiev_3                   motiv_1                   motiv_2
## 1 Neither Agree or Disagree Neither Agree or Disagree                  Disagree
## 2 Neither Agree or Disagree Neither Agree or Disagree                     Agree
## 3 Neither Agree or Disagree                  Disagree Neither Agree or Disagree
## 4 Neither Agree or Disagree Neither Agree or Disagree Neither Agree or Disagree
## 5                     Agree                  Disagree Neither Agree or Disagree
## 6 Neither Agree or Disagree                  Disagree                     Agree
##                     motiv_3
## 1 Neither Agree or Disagree
## 2                     Agree
## 3 Neither Agree or Disagree
## 4                  Disagree
## 5                  Disagree
## 6                     Agree

Looking at this output, we can see that our data was imported as a data frame and without some label information that was part of the original .sav file. We also see that our scale variables were read in as factors with the labels of the factor included.

Save .csv from SPSS and read in .csv file

The final option we’ll cover involves exporting a .csv file from SPSS and importing that .csv file directly into R. This option is only available if you have access to SPSS, of course. Once you save out a .csv file, you can import the .csv using the base R function read.csv().

dat_base <- read.csv(file.choose())

Now, we will once again use head() to examine our data:

head(dat_base)
##   ï..ID gender age class agen_1 agen_2 agen_3 achiev_1 achiev_2 achiev_3
## 1  1028 female  14     A      2      2      3        3        4        3
## 2  1049   male  13     A      3      3      2        2        3        3
## 3  1052 female  14     B      3      3      3        2        2        3
## 4  1063   male  14     A      3      3      3        2        2        3
## 5  1119   male  13     A      3      3      3        3        4        4
## 6  1142 female  13     A      3      3      3        3        4        3
##   motiv_1 motiv_2 motiv_3
## 1       3       2       3
## 2       3       4       4
## 3       2       3       3
## 4       3       3       2
## 5       2       3       2
## 6       2       4       4

We can see from the output that the data was read in as a data frame, and all of the original label information has been stripped. R will assign classes to the data columns based on the kind of data included, and we can see that our scale items are now read in as integers instead of factors (which is preferrable for SEM). We also see that the first variable name has an artifact at the start, which is common for .csv files created by other programs. We can fix this easily by renaming the first variable bacak to “ID”.

names(dat_base)
##  [1] "ï..ID"    "gender"   "age"      "class"    "agen_1"   "agen_2"  
##  [7] "agen_3"   "achiev_1" "achiev_2" "achiev_3" "motiv_1"  "motiv_2" 
## [13] "motiv_3"
names(dat_base)[1] <- "ID"
names(dat_base)
##  [1] "ID"       "gender"   "age"      "class"    "agen_1"   "agen_2"  
##  [7] "agen_3"   "achiev_1" "achiev_2" "achiev_3" "motiv_1"  "motiv_2" 
## [13] "motiv_3"

If you have the option, I recommend saving a .csv from the probram you’re data is saved in and using the read.csv() function to import the csv into R. This prevents any of the extra label information from causing issues in analysis down the line.

Exporting data from R for Mplus

If you have been cleaning and working with your data in R, but want to bring your data into Mplus for modeling, there is a simple way to do so using the MplusAutomation package. We’ll have to install this package first:

install.packages("MplusAutomation")

Next, we use this package to automatically convert our data into a .dat file ready for Mplus, as well as create a starter .imp file that includes your variable names!

MplusAutomation::prepareMplusData(dat_base,filename = "AAM_Mplus.dat")
## 
## -----
## Factor: gender 
## Conversion:
##   level number
##  female      1
##    male      2
## -----
## 
## 
## -----
## Factor: class 
## Conversion:
##  level number
##      A      1
##      B      2
## -----
## 
## TITLE: Your title goes here
## DATA: FILE = "AAM_Mplus.dat";
## VARIABLE: 
## NAMES = ID gender age class agen_1 agen_2 agen_3 achiev_1 achiev_2 achiev_3 motiv_1
##      motiv_2 motiv_3; 
## MISSING=.;

This converts our gender and class variable into numeric values, and prints a .imp starter script into your R output. If you want MplusAutomation to create the .imp file in your working directory, you can use the following argument:

MplusAutomation::prepareMplusData(dat_base,filename = "AAM_Mplus2.dat",inpfile = T)
## 
## -----
## Factor: gender 
## Conversion:
##   level number
##  female      1
##    male      2
## -----
## 
## 
## -----
## Factor: class 
## Conversion:
##  level number
##      A      1
##      B      2
## -----

Now you’re all set up to use your data to model in Mplus!