Để import file .sav dữ liệu từ SPSS vào R thì các bạn thực hiện theo cách tiếp cận sau nhé.

Bước 1: Load library(haven), sử dụng lệnh read_sav() để R nhập file .sav SPSS vào thành 1 tibble trong R để xử lý.

Các bạn có thể download các file .sav example ở đây và làm thử nha.

http://spss.allenandunwin.com.s3-website-ap-southeast-2.amazonaws.com/data-files.html

Trong ví dụ này mình import file experim.savlink này

options(digits = 2)
options(width = 200)
library(haven)
data_spss <- read_sav("experim.sav") ## file SPSS cần đặt trong thư mục project
sapply(data_spss, class)
## $id
## [1] "numeric"
## 
## $sex
## [1] "haven_labelled" "vctrs_vctr"     "double"        
## 
## $age
## [1] "numeric"
## 
## $group
## [1] "haven_labelled" "vctrs_vctr"     "double"        
## 
## $fost1
## [1] "numeric"
## 
## $confid1
## [1] "numeric"
## 
## $depress1
## [1] "numeric"
## 
## $fost2
## [1] "numeric"
## 
## $confid2
## [1] "numeric"
## 
## $depress2
## [1] "numeric"
## 
## $fost3
## [1] "numeric"
## 
## $confid3
## [1] "numeric"
## 
## $depress3
## [1] "numeric"
## 
## $exam
## [1] "numeric"
## 
## $mah_1
## [1] "numeric"
## 
## $DepT1gp2
## [1] "haven_labelled" "vctrs_vctr"     "double"        
## 
## $DepT2Gp2
## [1] "haven_labelled" "vctrs_vctr"     "double"        
## 
## $DepT3gp2
## [1] "haven_labelled" "vctrs_vctr"     "double"
print(data_spss, n = 30) # xem nội dung file SPSS được import vào object `data`
## # A tibble: 30 × 18
##       id sex          age group                   fost1 confid1 depress1 fost2 confid2 depress2 fost3 confid3 depress3  exam  mah_1 DepT1gp2          DepT2Gp2          DepT3gp2         
##    <dbl> <dbl+lbl>  <dbl> <dbl+lbl>               <dbl>   <dbl>    <dbl> <dbl>   <dbl>    <dbl> <dbl>   <dbl>    <dbl> <dbl>  <dbl> <dbl+lbl>         <dbl+lbl>         <dbl+lbl>        
##  1     4 1 [male]      23 2 [confidence building]    50      15       44    48      16       44    45      14       40    52  0.570 0 [not depressed] 0 [not depressed] 0 [not depressed]
##  2    10 1 [male]      21 2 [confidence building]    47      14       42    45      15       42    44      18       40    55  1.66  0 [not depressed] 0 [not depressed] 0 [not depressed]
##  3     9 1 [male]      25 1 [maths skills]           44      12       40    39      18       40    36      19       38    58  3.54  0 [not depressed] 0 [not depressed] 0 [not depressed]
##  4     3 1 [male]      30 1 [maths skills]           47      11       43    42      16       43    41      20       43    60  2.45  0 [not depressed] 0 [not depressed] 0 [not depressed]
##  5    12 1 [male]      45 2 [confidence building]    46      16       44    45      16       45    43      20       43    58  0.944 0 [not depressed] 1 [depressed]     0 [not depressed]
##  6    11 1 [male]      22 1 [maths skills]           39      13       43    40      20       42    39      22       38    62  1.63  0 [not depressed] 0 [not depressed] 0 [not depressed]
##  7     6 1 [male]      22 2 [confidence building]    32      21       37    33      22       36    32      23       35    59  4.17  0 [not depressed] 0 [not depressed] 0 [not depressed]
##  8     5 1 [male]      26 1 [maths skills]           44      17       46    37      20       47    32      26       42    70  1.03  1 [depressed]     1 [depressed]     0 [not depressed]
##  9     8 1 [male]      23 2 [confidence building]    40      22       37    40      23       37    40      26       35    60  1.71  0 [not depressed] 0 [not depressed] 0 [not depressed]
## 10    13 1 [male]      21 1 [maths skills]           47      20       50    45      25       48    46      27       46    70  3.09  1 [depressed]     1 [depressed]     1 [depressed]    
## 11    14 1 [male]      23 2 [confidence building]    38      28       39    37      27       36    32      29       34    72  2.91  0 [not depressed] 0 [not depressed] 0 [not depressed]
## 12     1 1 [male]      19 1 [maths skills]           32      20       44    28      25       43    23      30       40    82  0.347 0 [not depressed] 0 [not depressed] 0 [not depressed]
## 13    15 1 [male]      23 1 [maths skills]           39      21       47    35      26       47    35      30       47    79  1.59  1 [depressed]     1 [depressed]     1 [depressed]    
## 14     7 1 [male]      19 1 [maths skills]           36      24       38    32      28       35    30      32       35    80  1.51  0 [not depressed] 0 [not depressed] 0 [not depressed]
## 15     2 1 [male]      21 2 [confidence building]    37      29       50    36      30       47    34      34       45    90 10.2   1 [depressed]     1 [depressed]     1 [depressed]    
## 16    27 2 [female]    20 1 [maths skills]           41      16       45    40      14       44    38      18       40    56  1.18  1 [depressed]     0 [not depressed] 0 [not depressed]
## 17    25 2 [female]    24 1 [maths skills]           38      14       42    37      14       40    35      19       39    53  1.06  0 [not depressed] 0 [not depressed] 0 [not depressed]
## 18    19 2 [female]    27 1 [maths skills]           42      15       49    41      13       49    40      20       44    59  3.87  1 [depressed]     1 [depressed]     0 [not depressed]
## 19    18 2 [female]    23 2 [confidence building]    44      13       39    39      20       30    34      22       30    64  2.71  0 [not depressed] 0 [not depressed] 0 [not depressed]
## 20    23 2 [female]    22 1 [maths skills]           32      22       39    31      18       38    32      22       36    63  3.55  0 [not depressed] 0 [not depressed] 0 [not depressed]
## 21    21 2 [female]    46 1 [maths skills]           39      21       44    40      19       44    38      23       44    64  0.501 0 [not depressed] 0 [not depressed] 0 [not depressed]
## 22    26 2 [female]    19 2 [confidence building]    42      13       43    38      20       39    36      23       37    63  1.47  0 [not depressed] 0 [not depressed] 0 [not depressed]
## 23    29 2 [female]    22 1 [maths skills]           37      28       33    38      22       33    36      26       32    67  9.13  0 [not depressed] 0 [not depressed] 0 [not depressed]
## 24    17 2 [female]    37 1 [maths skills]           41      29       39    40      22       40    40      27       40    71  6.21  0 [not depressed] 0 [not depressed] 0 [not depressed]
## 25    20 2 [female]    32 2 [confidence building]    43      17       47    36      26       45    34      28       42    73  1.72  1 [depressed]     1 [depressed]     0 [not depressed]
## 26    28 2 [female]    30 2 [confidence building]    46      20       38    40      28       30    37      29       29    80  1.50  0 [not depressed] 0 [not depressed] 0 [not depressed]
## 27    22 2 [female]    25 2 [confidence building]    30      24       45    28      28       40    25      30       38    83  1.92  1 [depressed]     0 [not depressed] 0 [not depressed]
## 28    24 2 [female]    21 2 [confidence building]    33      12       50    29      20       48    25      30       50    85  7.56  1 [depressed]     1 [depressed]     1 [depressed]    
## 29    16 2 [female]    45 2 [confidence building]    40      22       45    30      35       40    25      32       42    78  1.19  1 [depressed]     0 [not depressed] 0 [not depressed]
## 30    30 2 [female]    21 2 [confidence building]    39      21       34    36      30       30    30      32       32    84  6.05  0 [not depressed] 0 [not depressed] 0 [not depressed]

Bước 2: Sau đó các bạn chuyển tibble này qua dạng data.frame

Lý do: Bởi vì SPSS có những cột được đánh dấu label (chính là những biến phân loại) khi được import vào ở dạng tibble trong R thì nhìn hơi bị rối mắt. Do đó chúng ta nên chuyển qua data.frame để gọn gàng hơn.

Sau đó, nếu cần thiết thì ta sẽ convert các cột label qua dạng factor.

data_ok <- as.data.frame(data_spss)
sapply(data_ok, class) ## lưu ý là dù chuyển qua data frame nhưng class của các cột ở `data_ok` vẫn y chang như `data_spss` (tức là vẫn sẽ chứa các thông tin label)
## $id
## [1] "numeric"
## 
## $sex
## [1] "haven_labelled" "vctrs_vctr"     "double"        
## 
## $age
## [1] "numeric"
## 
## $group
## [1] "haven_labelled" "vctrs_vctr"     "double"        
## 
## $fost1
## [1] "numeric"
## 
## $confid1
## [1] "numeric"
## 
## $depress1
## [1] "numeric"
## 
## $fost2
## [1] "numeric"
## 
## $confid2
## [1] "numeric"
## 
## $depress2
## [1] "numeric"
## 
## $fost3
## [1] "numeric"
## 
## $confid3
## [1] "numeric"
## 
## $depress3
## [1] "numeric"
## 
## $exam
## [1] "numeric"
## 
## $mah_1
## [1] "numeric"
## 
## $DepT1gp2
## [1] "haven_labelled" "vctrs_vctr"     "double"        
## 
## $DepT2Gp2
## [1] "haven_labelled" "vctrs_vctr"     "double"        
## 
## $DepT3gp2
## [1] "haven_labelled" "vctrs_vctr"     "double"
print(data_ok)
##    id sex age group fost1 confid1 depress1 fost2 confid2 depress2 fost3 confid3 depress3 exam mah_1 DepT1gp2 DepT2Gp2 DepT3gp2
## 1   4   1  23     2    50      15       44    48      16       44    45      14       40   52  0.57        0        0        0
## 2  10   1  21     2    47      14       42    45      15       42    44      18       40   55  1.66        0        0        0
## 3   9   1  25     1    44      12       40    39      18       40    36      19       38   58  3.54        0        0        0
## 4   3   1  30     1    47      11       43    42      16       43    41      20       43   60  2.45        0        0        0
## 5  12   1  45     2    46      16       44    45      16       45    43      20       43   58  0.94        0        1        0
## 6  11   1  22     1    39      13       43    40      20       42    39      22       38   62  1.63        0        0        0
## 7   6   1  22     2    32      21       37    33      22       36    32      23       35   59  4.17        0        0        0
## 8   5   1  26     1    44      17       46    37      20       47    32      26       42   70  1.03        1        1        0
## 9   8   1  23     2    40      22       37    40      23       37    40      26       35   60  1.71        0        0        0
## 10 13   1  21     1    47      20       50    45      25       48    46      27       46   70  3.09        1        1        1
## 11 14   1  23     2    38      28       39    37      27       36    32      29       34   72  2.91        0        0        0
## 12  1   1  19     1    32      20       44    28      25       43    23      30       40   82  0.35        0        0        0
## 13 15   1  23     1    39      21       47    35      26       47    35      30       47   79  1.59        1        1        1
## 14  7   1  19     1    36      24       38    32      28       35    30      32       35   80  1.51        0        0        0
## 15  2   1  21     2    37      29       50    36      30       47    34      34       45   90 10.24        1        1        1
## 16 27   2  20     1    41      16       45    40      14       44    38      18       40   56  1.18        1        0        0
## 17 25   2  24     1    38      14       42    37      14       40    35      19       39   53  1.06        0        0        0
## 18 19   2  27     1    42      15       49    41      13       49    40      20       44   59  3.87        1        1        0
## 19 18   2  23     2    44      13       39    39      20       30    34      22       30   64  2.71        0        0        0
## 20 23   2  22     1    32      22       39    31      18       38    32      22       36   63  3.55        0        0        0
## 21 21   2  46     1    39      21       44    40      19       44    38      23       44   64  0.50        0        0        0
## 22 26   2  19     2    42      13       43    38      20       39    36      23       37   63  1.47        0        0        0
## 23 29   2  22     1    37      28       33    38      22       33    36      26       32   67  9.13        0        0        0
## 24 17   2  37     1    41      29       39    40      22       40    40      27       40   71  6.21        0        0        0
## 25 20   2  32     2    43      17       47    36      26       45    34      28       42   73  1.72        1        1        0
## 26 28   2  30     2    46      20       38    40      28       30    37      29       29   80  1.50        0        0        0
## 27 22   2  25     2    30      24       45    28      28       40    25      30       38   83  1.92        1        0        0
## 28 24   2  21     2    33      12       50    29      20       48    25      30       50   85  7.56        1        1        1
## 29 16   2  45     2    40      22       45    30      35       40    25      32       42   78  1.19        1        0        0
## 30 30   2  21     2    39      21       34    36      30       30    30      32       32   84  6.05        0        0        0

Các bạn so sánh kết quả ở data.frame nhìn sẽ gọn gàng hơn ở dạng tibble.

Giờ chúng ta sẽ xử lý một chút để gán các kết quả label qua dạng factor.

Bước 3: Chuyển cột biến phân loại qua factor

## tìm những cột có lable
sapply(data_ok, class) -> class_ok
names(grep(pattern = "labelled", class_ok, value = TRUE))
## [1] "sex"      "group"    "DepT1gp2" "DepT2Gp2" "DepT3gp2"
## tách thông tin lable ở cột `sex`
names(attributes(data_ok$sex)$labels)
## [1] "male"   "female"
## gán thông tin đó vào cột `sex` và convert qua factor
data_ok$sex <- factor(data_ok$sex, labels = names(attributes(data_ok$sex)$labels))

## tương tự
data_ok$group <- factor(data_ok$group, labels = names(attributes(data_ok$group)$labels))
data_ok$DepT1gp2 <- factor(data_ok$DepT1gp2, labels = names(attributes(data_ok$DepT1gp2)$labels))
data_ok$DepT2Gp2 <- factor(data_ok$DepT2Gp2, labels = names(attributes(data_ok$DepT2Gp2)$labels))
data_ok$DepT3gp2 <- factor(data_ok$DepT3gp2, labels = names(attributes(data_ok$DepT3gp2)$labels))

## dataset hoàn chỉnh
print(data_ok) 
##    id    sex age               group fost1 confid1 depress1 fost2 confid2 depress2 fost3 confid3 depress3 exam mah_1      DepT1gp2      DepT2Gp2      DepT3gp2
## 1   4   male  23 confidence building    50      15       44    48      16       44    45      14       40   52  0.57 not depressed not depressed not depressed
## 2  10   male  21 confidence building    47      14       42    45      15       42    44      18       40   55  1.66 not depressed not depressed not depressed
## 3   9   male  25        maths skills    44      12       40    39      18       40    36      19       38   58  3.54 not depressed not depressed not depressed
## 4   3   male  30        maths skills    47      11       43    42      16       43    41      20       43   60  2.45 not depressed not depressed not depressed
## 5  12   male  45 confidence building    46      16       44    45      16       45    43      20       43   58  0.94 not depressed     depressed not depressed
## 6  11   male  22        maths skills    39      13       43    40      20       42    39      22       38   62  1.63 not depressed not depressed not depressed
## 7   6   male  22 confidence building    32      21       37    33      22       36    32      23       35   59  4.17 not depressed not depressed not depressed
## 8   5   male  26        maths skills    44      17       46    37      20       47    32      26       42   70  1.03     depressed     depressed not depressed
## 9   8   male  23 confidence building    40      22       37    40      23       37    40      26       35   60  1.71 not depressed not depressed not depressed
## 10 13   male  21        maths skills    47      20       50    45      25       48    46      27       46   70  3.09     depressed     depressed     depressed
## 11 14   male  23 confidence building    38      28       39    37      27       36    32      29       34   72  2.91 not depressed not depressed not depressed
## 12  1   male  19        maths skills    32      20       44    28      25       43    23      30       40   82  0.35 not depressed not depressed not depressed
## 13 15   male  23        maths skills    39      21       47    35      26       47    35      30       47   79  1.59     depressed     depressed     depressed
## 14  7   male  19        maths skills    36      24       38    32      28       35    30      32       35   80  1.51 not depressed not depressed not depressed
## 15  2   male  21 confidence building    37      29       50    36      30       47    34      34       45   90 10.24     depressed     depressed     depressed
## 16 27 female  20        maths skills    41      16       45    40      14       44    38      18       40   56  1.18     depressed not depressed not depressed
## 17 25 female  24        maths skills    38      14       42    37      14       40    35      19       39   53  1.06 not depressed not depressed not depressed
## 18 19 female  27        maths skills    42      15       49    41      13       49    40      20       44   59  3.87     depressed     depressed not depressed
## 19 18 female  23 confidence building    44      13       39    39      20       30    34      22       30   64  2.71 not depressed not depressed not depressed
## 20 23 female  22        maths skills    32      22       39    31      18       38    32      22       36   63  3.55 not depressed not depressed not depressed
## 21 21 female  46        maths skills    39      21       44    40      19       44    38      23       44   64  0.50 not depressed not depressed not depressed
## 22 26 female  19 confidence building    42      13       43    38      20       39    36      23       37   63  1.47 not depressed not depressed not depressed
## 23 29 female  22        maths skills    37      28       33    38      22       33    36      26       32   67  9.13 not depressed not depressed not depressed
## 24 17 female  37        maths skills    41      29       39    40      22       40    40      27       40   71  6.21 not depressed not depressed not depressed
## 25 20 female  32 confidence building    43      17       47    36      26       45    34      28       42   73  1.72     depressed     depressed not depressed
## 26 28 female  30 confidence building    46      20       38    40      28       30    37      29       29   80  1.50 not depressed not depressed not depressed
## 27 22 female  25 confidence building    30      24       45    28      28       40    25      30       38   83  1.92     depressed not depressed not depressed
## 28 24 female  21 confidence building    33      12       50    29      20       48    25      30       50   85  7.56     depressed     depressed     depressed
## 29 16 female  45 confidence building    40      22       45    30      35       40    25      32       42   78  1.19     depressed not depressed not depressed
## 30 30 female  21 confidence building    39      21       34    36      30       30    30      32       32   84  6.05 not depressed not depressed not depressed
## check dataset
sapply(data_ok, class)
##        id       sex       age     group     fost1   confid1  depress1     fost2   confid2  depress2     fost3   confid3  depress3      exam     mah_1  DepT1gp2  DepT2Gp2  DepT3gp2 
## "numeric"  "factor" "numeric"  "factor" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"  "factor"  "factor"  "factor"
summary(data_ok)
##        id           sex          age                     group        fost1       confid1        depress1      fost2       confid2      depress2      fost3       confid3      depress3       exam   
##  Min.   : 1.0   male  :15   Min.   :19   maths skills       :15   Min.   :30   Min.   :11.0   Min.   :33   Min.   :28   Min.   :13   Min.   :30   Min.   :23   Min.   :14   Min.   :29   Min.   :52  
##  1st Qu.: 8.2   female:15   1st Qu.:21   confidence building:15   1st Qu.:37   1st Qu.:14.2   1st Qu.:39   1st Qu.:35   1st Qu.:18   1st Qu.:37   1st Qu.:32   1st Qu.:20   1st Qu.:35   1st Qu.:59  
##  Median :15.5               Median :23                            Median :40   Median :20.0   Median :43   Median :38   Median :21   Median :41   Median :36   Median :26   Median :40   Median :66  
##  Mean   :15.5               Mean   :26                            Mean   :40   Mean   :19.0   Mean   :43   Mean   :38   Mean   :22   Mean   :41   Mean   :35   Mean   :25   Mean   :39   Mean   :68  
##  3rd Qu.:22.8               3rd Qu.:27                            3rd Qu.:44   3rd Qu.:22.0   3rd Qu.:45   3rd Qu.:40   3rd Qu.:26   3rd Qu.:45   3rd Qu.:40   3rd Qu.:30   3rd Qu.:43   3rd Qu.:79  
##  Max.   :30.0               Max.   :46                            Max.   :50   Max.   :29.0   Max.   :50   Max.   :48   Max.   :35   Max.   :49   Max.   :46   Max.   :34   Max.   :50   Max.   :90  
##      mah_1               DepT1gp2           DepT2Gp2           DepT3gp2 
##  Min.   : 0.3   not depressed:20   not depressed:22   not depressed:26  
##  1st Qu.: 1.3   depressed    :10   depressed    : 8   depressed    : 4  
##  Median : 1.7                                                           
##  Mean   : 2.9                                                           
##  3rd Qu.: 3.5                                                           
##  Max.   :10.2
str(data_ok)
## 'data.frame':    30 obs. of  18 variables:
##  $ id      : num  4 10 9 3 12 11 6 5 8 13 ...
##   ..- attr(*, "format.spss")= chr "F8.0"
##  $ sex     : Factor w/ 2 levels "male","female": 1 1 1 1 1 1 1 1 1 1 ...
##  $ age     : num  23 21 25 30 45 22 22 26 23 21 ...
##   ..- attr(*, "format.spss")= chr "F8.0"
##  $ group   : Factor w/ 2 levels "maths skills",..: 2 2 1 1 2 1 2 1 2 1 ...
##  $ fost1   : num  50 47 44 47 46 39 32 44 40 47 ...
##   ..- attr(*, "label")= chr "fear of stats time1"
##   ..- attr(*, "format.spss")= chr "F8.0"
##  $ confid1 : num  15 14 12 11 16 13 21 17 22 20 ...
##   ..- attr(*, "label")= chr "confidence time1"
##   ..- attr(*, "format.spss")= chr "F8.0"
##  $ depress1: num  44 42 40 43 44 43 37 46 37 50 ...
##   ..- attr(*, "label")= chr "depression time1"
##   ..- attr(*, "format.spss")= chr "F8.0"
##  $ fost2   : num  48 45 39 42 45 40 33 37 40 45 ...
##   ..- attr(*, "label")= chr "fear of stats time2"
##   ..- attr(*, "format.spss")= chr "F8.0"
##  $ confid2 : num  16 15 18 16 16 20 22 20 23 25 ...
##   ..- attr(*, "label")= chr "confidence time2"
##   ..- attr(*, "format.spss")= chr "F8.0"
##  $ depress2: num  44 42 40 43 45 42 36 47 37 48 ...
##   ..- attr(*, "label")= chr "depression time2"
##   ..- attr(*, "format.spss")= chr "F8.0"
##  $ fost3   : num  45 44 36 41 43 39 32 32 40 46 ...
##   ..- attr(*, "label")= chr "fear of stats time3"
##   ..- attr(*, "format.spss")= chr "F8.0"
##  $ confid3 : num  14 18 19 20 20 22 23 26 26 27 ...
##   ..- attr(*, "label")= chr "confidence time3"
##   ..- attr(*, "format.spss")= chr "F8.0"
##  $ depress3: num  40 40 38 43 43 38 35 42 35 46 ...
##   ..- attr(*, "label")= chr "depression time3"
##   ..- attr(*, "format.spss")= chr "F8.0"
##  $ exam    : num  52 55 58 60 58 62 59 70 60 70 ...
##   ..- attr(*, "label")= chr "stats exam"
##   ..- attr(*, "format.spss")= chr "F8.0"
##  $ mah_1   : num  0.57 1.659 3.54 2.454 0.944 ...
##   ..- attr(*, "label")= chr "Mahalanobis Distance"
##   ..- attr(*, "format.spss")= chr "F11.5"
##  $ DepT1gp2: Factor w/ 2 levels "not depressed",..: 1 1 1 1 1 1 1 2 1 2 ...
##  $ DepT2Gp2: Factor w/ 2 levels "not depressed",..: 1 1 1 1 2 1 1 2 1 2 ...
##  $ DepT3gp2: Factor w/ 2 levels "not depressed",..: 1 1 1 1 1 1 1 1 1 2 ...

Bước 4: Làm sạch dataset 100%

Thực tế ở bước 3 các bạn đã có data.frame hoàn chỉnh rồi, tuy nhiên khi dùng lệnh str() thì ta thấy vẫn còn các cột có attributes dính đến chữ SPSS. Nếu muốn cho sạch hết các thông tin này thì các bạn gán giá trị NULL vào.

sapply(data_ok, attributes)
## $id
## $id$format.spss
## [1] "F8.0"
## 
## 
## $sex
## $sex$levels
## [1] "male"   "female"
## 
## $sex$class
## [1] "factor"
## 
## 
## $age
## $age$format.spss
## [1] "F8.0"
## 
## 
## $group
## $group$levels
## [1] "maths skills"        "confidence building"
## 
## $group$class
## [1] "factor"
## 
## 
## $fost1
## $fost1$label
## [1] "fear of stats time1"
## 
## $fost1$format.spss
## [1] "F8.0"
## 
## 
## $confid1
## $confid1$label
## [1] "confidence time1"
## 
## $confid1$format.spss
## [1] "F8.0"
## 
## 
## $depress1
## $depress1$label
## [1] "depression time1"
## 
## $depress1$format.spss
## [1] "F8.0"
## 
## 
## $fost2
## $fost2$label
## [1] "fear of stats time2"
## 
## $fost2$format.spss
## [1] "F8.0"
## 
## 
## $confid2
## $confid2$label
## [1] "confidence time2"
## 
## $confid2$format.spss
## [1] "F8.0"
## 
## 
## $depress2
## $depress2$label
## [1] "depression time2"
## 
## $depress2$format.spss
## [1] "F8.0"
## 
## 
## $fost3
## $fost3$label
## [1] "fear of stats time3"
## 
## $fost3$format.spss
## [1] "F8.0"
## 
## 
## $confid3
## $confid3$label
## [1] "confidence time3"
## 
## $confid3$format.spss
## [1] "F8.0"
## 
## 
## $depress3
## $depress3$label
## [1] "depression time3"
## 
## $depress3$format.spss
## [1] "F8.0"
## 
## 
## $exam
## $exam$label
## [1] "stats exam"
## 
## $exam$format.spss
## [1] "F8.0"
## 
## 
## $mah_1
## $mah_1$label
## [1] "Mahalanobis Distance"
## 
## $mah_1$format.spss
## [1] "F11.5"
## 
## 
## $DepT1gp2
## $DepT1gp2$levels
## [1] "not depressed" "depressed"    
## 
## $DepT1gp2$class
## [1] "factor"
## 
## 
## $DepT2Gp2
## $DepT2Gp2$levels
## [1] "not depressed" "depressed"    
## 
## $DepT2Gp2$class
## [1] "factor"
## 
## 
## $DepT3gp2
## $DepT3gp2$levels
## [1] "not depressed" "depressed"    
## 
## $DepT3gp2$class
## [1] "factor"
attributes(data_ok$id) <- NULL
attributes(data_ok$age) <- NULL
attributes(data_ok$fost1) <- NULL
attributes(data_ok$confid1) <- NULL
attributes(data_ok$depress1) <- NULL
attributes(data_ok$fost2) <- NULL
attributes(data_ok$confid2) <- NULL
attributes(data_ok$depress2) <- NULL
attributes(data_ok$fost3) <- NULL
attributes(data_ok$confid3) <- NULL
attributes(data_ok$depress3) <- NULL
attributes(data_ok$exam) <- NULL
attributes(data_ok$mah_1) <- NULL

## dataset hoàn chỉnh
print(data_ok) 
##    id    sex age               group fost1 confid1 depress1 fost2 confid2 depress2 fost3 confid3 depress3 exam mah_1      DepT1gp2      DepT2Gp2      DepT3gp2
## 1   4   male  23 confidence building    50      15       44    48      16       44    45      14       40   52  0.57 not depressed not depressed not depressed
## 2  10   male  21 confidence building    47      14       42    45      15       42    44      18       40   55  1.66 not depressed not depressed not depressed
## 3   9   male  25        maths skills    44      12       40    39      18       40    36      19       38   58  3.54 not depressed not depressed not depressed
## 4   3   male  30        maths skills    47      11       43    42      16       43    41      20       43   60  2.45 not depressed not depressed not depressed
## 5  12   male  45 confidence building    46      16       44    45      16       45    43      20       43   58  0.94 not depressed     depressed not depressed
## 6  11   male  22        maths skills    39      13       43    40      20       42    39      22       38   62  1.63 not depressed not depressed not depressed
## 7   6   male  22 confidence building    32      21       37    33      22       36    32      23       35   59  4.17 not depressed not depressed not depressed
## 8   5   male  26        maths skills    44      17       46    37      20       47    32      26       42   70  1.03     depressed     depressed not depressed
## 9   8   male  23 confidence building    40      22       37    40      23       37    40      26       35   60  1.71 not depressed not depressed not depressed
## 10 13   male  21        maths skills    47      20       50    45      25       48    46      27       46   70  3.09     depressed     depressed     depressed
## 11 14   male  23 confidence building    38      28       39    37      27       36    32      29       34   72  2.91 not depressed not depressed not depressed
## 12  1   male  19        maths skills    32      20       44    28      25       43    23      30       40   82  0.35 not depressed not depressed not depressed
## 13 15   male  23        maths skills    39      21       47    35      26       47    35      30       47   79  1.59     depressed     depressed     depressed
## 14  7   male  19        maths skills    36      24       38    32      28       35    30      32       35   80  1.51 not depressed not depressed not depressed
## 15  2   male  21 confidence building    37      29       50    36      30       47    34      34       45   90 10.24     depressed     depressed     depressed
## 16 27 female  20        maths skills    41      16       45    40      14       44    38      18       40   56  1.18     depressed not depressed not depressed
## 17 25 female  24        maths skills    38      14       42    37      14       40    35      19       39   53  1.06 not depressed not depressed not depressed
## 18 19 female  27        maths skills    42      15       49    41      13       49    40      20       44   59  3.87     depressed     depressed not depressed
## 19 18 female  23 confidence building    44      13       39    39      20       30    34      22       30   64  2.71 not depressed not depressed not depressed
## 20 23 female  22        maths skills    32      22       39    31      18       38    32      22       36   63  3.55 not depressed not depressed not depressed
## 21 21 female  46        maths skills    39      21       44    40      19       44    38      23       44   64  0.50 not depressed not depressed not depressed
## 22 26 female  19 confidence building    42      13       43    38      20       39    36      23       37   63  1.47 not depressed not depressed not depressed
## 23 29 female  22        maths skills    37      28       33    38      22       33    36      26       32   67  9.13 not depressed not depressed not depressed
## 24 17 female  37        maths skills    41      29       39    40      22       40    40      27       40   71  6.21 not depressed not depressed not depressed
## 25 20 female  32 confidence building    43      17       47    36      26       45    34      28       42   73  1.72     depressed     depressed not depressed
## 26 28 female  30 confidence building    46      20       38    40      28       30    37      29       29   80  1.50 not depressed not depressed not depressed
## 27 22 female  25 confidence building    30      24       45    28      28       40    25      30       38   83  1.92     depressed not depressed not depressed
## 28 24 female  21 confidence building    33      12       50    29      20       48    25      30       50   85  7.56     depressed     depressed     depressed
## 29 16 female  45 confidence building    40      22       45    30      35       40    25      32       42   78  1.19     depressed not depressed not depressed
## 30 30 female  21 confidence building    39      21       34    36      30       30    30      32       32   84  6.05 not depressed not depressed not depressed
## check lại thấy dataset đã sạch hoàn toàn thông tin SPSS
sapply(data_ok, class)
##        id       sex       age     group     fost1   confid1  depress1     fost2   confid2  depress2     fost3   confid3  depress3      exam     mah_1  DepT1gp2  DepT2Gp2  DepT3gp2 
## "numeric"  "factor" "numeric"  "factor" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"  "factor"  "factor"  "factor"
summary(data_ok)
##        id           sex          age                     group        fost1       confid1        depress1      fost2       confid2      depress2      fost3       confid3      depress3       exam   
##  Min.   : 1.0   male  :15   Min.   :19   maths skills       :15   Min.   :30   Min.   :11.0   Min.   :33   Min.   :28   Min.   :13   Min.   :30   Min.   :23   Min.   :14   Min.   :29   Min.   :52  
##  1st Qu.: 8.2   female:15   1st Qu.:21   confidence building:15   1st Qu.:37   1st Qu.:14.2   1st Qu.:39   1st Qu.:35   1st Qu.:18   1st Qu.:37   1st Qu.:32   1st Qu.:20   1st Qu.:35   1st Qu.:59  
##  Median :15.5               Median :23                            Median :40   Median :20.0   Median :43   Median :38   Median :21   Median :41   Median :36   Median :26   Median :40   Median :66  
##  Mean   :15.5               Mean   :26                            Mean   :40   Mean   :19.0   Mean   :43   Mean   :38   Mean   :22   Mean   :41   Mean   :35   Mean   :25   Mean   :39   Mean   :68  
##  3rd Qu.:22.8               3rd Qu.:27                            3rd Qu.:44   3rd Qu.:22.0   3rd Qu.:45   3rd Qu.:40   3rd Qu.:26   3rd Qu.:45   3rd Qu.:40   3rd Qu.:30   3rd Qu.:43   3rd Qu.:79  
##  Max.   :30.0               Max.   :46                            Max.   :50   Max.   :29.0   Max.   :50   Max.   :48   Max.   :35   Max.   :49   Max.   :46   Max.   :34   Max.   :50   Max.   :90  
##      mah_1               DepT1gp2           DepT2Gp2           DepT3gp2 
##  Min.   : 0.3   not depressed:20   not depressed:22   not depressed:26  
##  1st Qu.: 1.3   depressed    :10   depressed    : 8   depressed    : 4  
##  Median : 1.7                                                           
##  Mean   : 2.9                                                           
##  3rd Qu.: 3.5                                                           
##  Max.   :10.2
str(data_ok)
## 'data.frame':    30 obs. of  18 variables:
##  $ id      : num  4 10 9 3 12 11 6 5 8 13 ...
##  $ sex     : Factor w/ 2 levels "male","female": 1 1 1 1 1 1 1 1 1 1 ...
##  $ age     : num  23 21 25 30 45 22 22 26 23 21 ...
##  $ group   : Factor w/ 2 levels "maths skills",..: 2 2 1 1 2 1 2 1 2 1 ...
##  $ fost1   : num  50 47 44 47 46 39 32 44 40 47 ...
##  $ confid1 : num  15 14 12 11 16 13 21 17 22 20 ...
##  $ depress1: num  44 42 40 43 44 43 37 46 37 50 ...
##  $ fost2   : num  48 45 39 42 45 40 33 37 40 45 ...
##  $ confid2 : num  16 15 18 16 16 20 22 20 23 25 ...
##  $ depress2: num  44 42 40 43 45 42 36 47 37 48 ...
##  $ fost3   : num  45 44 36 41 43 39 32 32 40 46 ...
##  $ confid3 : num  14 18 19 20 20 22 23 26 26 27 ...
##  $ depress3: num  40 40 38 43 43 38 35 42 35 46 ...
##  $ exam    : num  52 55 58 60 58 62 59 70 60 70 ...
##  $ mah_1   : num  0.57 1.659 3.54 2.454 0.944 ...
##  $ DepT1gp2: Factor w/ 2 levels "not depressed",..: 1 1 1 1 1 1 1 2 1 2 ...
##  $ DepT2Gp2: Factor w/ 2 levels "not depressed",..: 1 1 1 1 2 1 1 2 1 2 ...
##  $ DepT3gp2: Factor w/ 2 levels "not depressed",..: 1 1 1 1 1 1 1 1 1 2 ...

FUNCTION HÓA VIỆC IMPORT SPSS VÀO R

Nếu theo 4 bước ở trên thì các bạn hoàn toàn làm thủ công theo từng dòng lệnh trong R, hoặc các bạn có thể tham khảo các cách xử lý dữ liệu SPSS khác vào R ở link này.

https://cran.r-project.org/web/packages/labelled/vignettes/intro_labelled.html

Tuy nhiên việc đọc hiểu các cách tiếp cận khác nhau khi import SPSS vào R cũng hơi dài dòng, để thuận tiện thì mình xây dựng function này để các bạn chỉ cần copy vào R rồi run là xong. Chúng ta sẽ có dataset sạch sẽ để xử lý trong R cho các công đoạn tiếp theo.

options(digits = 2)
options(width = 200)

library(haven)
data_experim <- read_sav("experim.sav") ## import file SPSS ví dụ

clean_file_spss <- function(input_data) {
  data_ok <- as.data.frame(input_data)

  # lấy tên những cột có label
  sapply(data_ok, class) -> class_ok
  vi_tri_cot <- grep(pattern = "labelled", class_ok)

  # convert các cột label
  for (i in vi_tri_cot) {
    data_ok[, i] <- factor(data_ok[, i],
      labels = names(attributes(data_ok[, i])$labels)
    )
  }

  # lấy tên những cột có chữ spss
  sapply(data_ok, attributes) -> class_spss
  vi_tri_cot_spss <- grep(pattern = "spss", class_spss)

  # làm sạch chữ spss
  for (j in vi_tri_cot_spss) {
    attributes(data_ok[, j]) <- NULL
  }

  return(data_ok)
}

data_ok_2 <- clean_file_spss(data_experim) ## chạy function để clean dataset

print(data_ok_2)
##    id    sex age               group fost1 confid1 depress1 fost2 confid2 depress2 fost3 confid3 depress3 exam mah_1      DepT1gp2      DepT2Gp2      DepT3gp2
## 1   4   male  23 confidence building    50      15       44    48      16       44    45      14       40   52  0.57 not depressed not depressed not depressed
## 2  10   male  21 confidence building    47      14       42    45      15       42    44      18       40   55  1.66 not depressed not depressed not depressed
## 3   9   male  25        maths skills    44      12       40    39      18       40    36      19       38   58  3.54 not depressed not depressed not depressed
## 4   3   male  30        maths skills    47      11       43    42      16       43    41      20       43   60  2.45 not depressed not depressed not depressed
## 5  12   male  45 confidence building    46      16       44    45      16       45    43      20       43   58  0.94 not depressed     depressed not depressed
## 6  11   male  22        maths skills    39      13       43    40      20       42    39      22       38   62  1.63 not depressed not depressed not depressed
## 7   6   male  22 confidence building    32      21       37    33      22       36    32      23       35   59  4.17 not depressed not depressed not depressed
## 8   5   male  26        maths skills    44      17       46    37      20       47    32      26       42   70  1.03     depressed     depressed not depressed
## 9   8   male  23 confidence building    40      22       37    40      23       37    40      26       35   60  1.71 not depressed not depressed not depressed
## 10 13   male  21        maths skills    47      20       50    45      25       48    46      27       46   70  3.09     depressed     depressed     depressed
## 11 14   male  23 confidence building    38      28       39    37      27       36    32      29       34   72  2.91 not depressed not depressed not depressed
## 12  1   male  19        maths skills    32      20       44    28      25       43    23      30       40   82  0.35 not depressed not depressed not depressed
## 13 15   male  23        maths skills    39      21       47    35      26       47    35      30       47   79  1.59     depressed     depressed     depressed
## 14  7   male  19        maths skills    36      24       38    32      28       35    30      32       35   80  1.51 not depressed not depressed not depressed
## 15  2   male  21 confidence building    37      29       50    36      30       47    34      34       45   90 10.24     depressed     depressed     depressed
## 16 27 female  20        maths skills    41      16       45    40      14       44    38      18       40   56  1.18     depressed not depressed not depressed
## 17 25 female  24        maths skills    38      14       42    37      14       40    35      19       39   53  1.06 not depressed not depressed not depressed
## 18 19 female  27        maths skills    42      15       49    41      13       49    40      20       44   59  3.87     depressed     depressed not depressed
## 19 18 female  23 confidence building    44      13       39    39      20       30    34      22       30   64  2.71 not depressed not depressed not depressed
## 20 23 female  22        maths skills    32      22       39    31      18       38    32      22       36   63  3.55 not depressed not depressed not depressed
## 21 21 female  46        maths skills    39      21       44    40      19       44    38      23       44   64  0.50 not depressed not depressed not depressed
## 22 26 female  19 confidence building    42      13       43    38      20       39    36      23       37   63  1.47 not depressed not depressed not depressed
## 23 29 female  22        maths skills    37      28       33    38      22       33    36      26       32   67  9.13 not depressed not depressed not depressed
## 24 17 female  37        maths skills    41      29       39    40      22       40    40      27       40   71  6.21 not depressed not depressed not depressed
## 25 20 female  32 confidence building    43      17       47    36      26       45    34      28       42   73  1.72     depressed     depressed not depressed
## 26 28 female  30 confidence building    46      20       38    40      28       30    37      29       29   80  1.50 not depressed not depressed not depressed
## 27 22 female  25 confidence building    30      24       45    28      28       40    25      30       38   83  1.92     depressed not depressed not depressed
## 28 24 female  21 confidence building    33      12       50    29      20       48    25      30       50   85  7.56     depressed     depressed     depressed
## 29 16 female  45 confidence building    40      22       45    30      35       40    25      32       42   78  1.19     depressed not depressed not depressed
## 30 30 female  21 confidence building    39      21       34    36      30       30    30      32       32   84  6.05 not depressed not depressed not depressed
sapply(data_ok, class)
##        id       sex       age     group     fost1   confid1  depress1     fost2   confid2  depress2     fost3   confid3  depress3      exam     mah_1  DepT1gp2  DepT2Gp2  DepT3gp2 
## "numeric"  "factor" "numeric"  "factor" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"  "factor"  "factor"  "factor"
summary(data_ok)
##        id           sex          age                     group        fost1       confid1        depress1      fost2       confid2      depress2      fost3       confid3      depress3       exam   
##  Min.   : 1.0   male  :15   Min.   :19   maths skills       :15   Min.   :30   Min.   :11.0   Min.   :33   Min.   :28   Min.   :13   Min.   :30   Min.   :23   Min.   :14   Min.   :29   Min.   :52  
##  1st Qu.: 8.2   female:15   1st Qu.:21   confidence building:15   1st Qu.:37   1st Qu.:14.2   1st Qu.:39   1st Qu.:35   1st Qu.:18   1st Qu.:37   1st Qu.:32   1st Qu.:20   1st Qu.:35   1st Qu.:59  
##  Median :15.5               Median :23                            Median :40   Median :20.0   Median :43   Median :38   Median :21   Median :41   Median :36   Median :26   Median :40   Median :66  
##  Mean   :15.5               Mean   :26                            Mean   :40   Mean   :19.0   Mean   :43   Mean   :38   Mean   :22   Mean   :41   Mean   :35   Mean   :25   Mean   :39   Mean   :68  
##  3rd Qu.:22.8               3rd Qu.:27                            3rd Qu.:44   3rd Qu.:22.0   3rd Qu.:45   3rd Qu.:40   3rd Qu.:26   3rd Qu.:45   3rd Qu.:40   3rd Qu.:30   3rd Qu.:43   3rd Qu.:79  
##  Max.   :30.0               Max.   :46                            Max.   :50   Max.   :29.0   Max.   :50   Max.   :48   Max.   :35   Max.   :49   Max.   :46   Max.   :34   Max.   :50   Max.   :90  
##      mah_1               DepT1gp2           DepT2Gp2           DepT3gp2 
##  Min.   : 0.3   not depressed:20   not depressed:22   not depressed:26  
##  1st Qu.: 1.3   depressed    :10   depressed    : 8   depressed    : 4  
##  Median : 1.7                                                           
##  Mean   : 2.9                                                           
##  3rd Qu.: 3.5                                                           
##  Max.   :10.2
str(data_ok)
## 'data.frame':    30 obs. of  18 variables:
##  $ id      : num  4 10 9 3 12 11 6 5 8 13 ...
##  $ sex     : Factor w/ 2 levels "male","female": 1 1 1 1 1 1 1 1 1 1 ...
##  $ age     : num  23 21 25 30 45 22 22 26 23 21 ...
##  $ group   : Factor w/ 2 levels "maths skills",..: 2 2 1 1 2 1 2 1 2 1 ...
##  $ fost1   : num  50 47 44 47 46 39 32 44 40 47 ...
##  $ confid1 : num  15 14 12 11 16 13 21 17 22 20 ...
##  $ depress1: num  44 42 40 43 44 43 37 46 37 50 ...
##  $ fost2   : num  48 45 39 42 45 40 33 37 40 45 ...
##  $ confid2 : num  16 15 18 16 16 20 22 20 23 25 ...
##  $ depress2: num  44 42 40 43 45 42 36 47 37 48 ...
##  $ fost3   : num  45 44 36 41 43 39 32 32 40 46 ...
##  $ confid3 : num  14 18 19 20 20 22 23 26 26 27 ...
##  $ depress3: num  40 40 38 43 43 38 35 42 35 46 ...
##  $ exam    : num  52 55 58 60 58 62 59 70 60 70 ...
##  $ mah_1   : num  0.57 1.659 3.54 2.454 0.944 ...
##  $ DepT1gp2: Factor w/ 2 levels "not depressed",..: 1 1 1 1 1 1 1 2 1 2 ...
##  $ DepT2Gp2: Factor w/ 2 levels "not depressed",..: 1 1 1 1 2 1 1 2 1 2 ...
##  $ DepT3gp2: Factor w/ 2 levels "not depressed",..: 1 1 1 1 1 1 1 1 1 2 ...

Kiểm tra dataset bằng cách function và code thủ công là y chang nhau.

identical(data_ok, data_ok_2) ## kết quả TRUE cho thấy function thực hiện thành công việc clean dataset.
## [1] TRUE

Sơ kết

Trên đây là hướng dẫn cách import file SPSS vào R. Để học R bài bản từ A đến Z, thân mời Bạn tham gia khóa học “HDSD R để xử lý dữ liệu” để có nền tảng vững chắc về R nhằm tự tay làm các câu chuyện dữ liệu của riêng mình!

ĐĂNG KÝ NGAY: https://www.tuhocr.com/register