www.tuhocr.comĐể import file .sav dữ liệu từ SPSS vào R thì các bạn
thực hiện theo cách tiếp cận sau nhé.
Bước 1: Load
library(haven), sử dụng lệnh read_sav() để R
nhập file .sav SPSS vào thành 1 tibble trong R
để xử lý.
Các bạn có thể download các file .sav example ở đây và
làm thử nha.
http://spss.allenandunwin.com.s3-website-ap-southeast-2.amazonaws.com/data-files.html
Trong ví dụ này mình import file experim.sav ở link
này
options(digits = 2)
options(width = 200)
library(haven)
data_spss <- read_sav("experim.sav") ## file SPSS cần đặt trong thư mục project
sapply(data_spss, class)## $id
## [1] "numeric"
##
## $sex
## [1] "haven_labelled" "vctrs_vctr" "double"
##
## $age
## [1] "numeric"
##
## $group
## [1] "haven_labelled" "vctrs_vctr" "double"
##
## $fost1
## [1] "numeric"
##
## $confid1
## [1] "numeric"
##
## $depress1
## [1] "numeric"
##
## $fost2
## [1] "numeric"
##
## $confid2
## [1] "numeric"
##
## $depress2
## [1] "numeric"
##
## $fost3
## [1] "numeric"
##
## $confid3
## [1] "numeric"
##
## $depress3
## [1] "numeric"
##
## $exam
## [1] "numeric"
##
## $mah_1
## [1] "numeric"
##
## $DepT1gp2
## [1] "haven_labelled" "vctrs_vctr" "double"
##
## $DepT2Gp2
## [1] "haven_labelled" "vctrs_vctr" "double"
##
## $DepT3gp2
## [1] "haven_labelled" "vctrs_vctr" "double"
print(data_spss, n = 30) # xem nội dung file SPSS được import vào object `data`## # A tibble: 30 × 18
## id sex age group fost1 confid1 depress1 fost2 confid2 depress2 fost3 confid3 depress3 exam mah_1 DepT1gp2 DepT2Gp2 DepT3gp2
## <dbl> <dbl+lbl> <dbl> <dbl+lbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl+lbl> <dbl+lbl> <dbl+lbl>
## 1 4 1 [male] 23 2 [confidence building] 50 15 44 48 16 44 45 14 40 52 0.570 0 [not depressed] 0 [not depressed] 0 [not depressed]
## 2 10 1 [male] 21 2 [confidence building] 47 14 42 45 15 42 44 18 40 55 1.66 0 [not depressed] 0 [not depressed] 0 [not depressed]
## 3 9 1 [male] 25 1 [maths skills] 44 12 40 39 18 40 36 19 38 58 3.54 0 [not depressed] 0 [not depressed] 0 [not depressed]
## 4 3 1 [male] 30 1 [maths skills] 47 11 43 42 16 43 41 20 43 60 2.45 0 [not depressed] 0 [not depressed] 0 [not depressed]
## 5 12 1 [male] 45 2 [confidence building] 46 16 44 45 16 45 43 20 43 58 0.944 0 [not depressed] 1 [depressed] 0 [not depressed]
## 6 11 1 [male] 22 1 [maths skills] 39 13 43 40 20 42 39 22 38 62 1.63 0 [not depressed] 0 [not depressed] 0 [not depressed]
## 7 6 1 [male] 22 2 [confidence building] 32 21 37 33 22 36 32 23 35 59 4.17 0 [not depressed] 0 [not depressed] 0 [not depressed]
## 8 5 1 [male] 26 1 [maths skills] 44 17 46 37 20 47 32 26 42 70 1.03 1 [depressed] 1 [depressed] 0 [not depressed]
## 9 8 1 [male] 23 2 [confidence building] 40 22 37 40 23 37 40 26 35 60 1.71 0 [not depressed] 0 [not depressed] 0 [not depressed]
## 10 13 1 [male] 21 1 [maths skills] 47 20 50 45 25 48 46 27 46 70 3.09 1 [depressed] 1 [depressed] 1 [depressed]
## 11 14 1 [male] 23 2 [confidence building] 38 28 39 37 27 36 32 29 34 72 2.91 0 [not depressed] 0 [not depressed] 0 [not depressed]
## 12 1 1 [male] 19 1 [maths skills] 32 20 44 28 25 43 23 30 40 82 0.347 0 [not depressed] 0 [not depressed] 0 [not depressed]
## 13 15 1 [male] 23 1 [maths skills] 39 21 47 35 26 47 35 30 47 79 1.59 1 [depressed] 1 [depressed] 1 [depressed]
## 14 7 1 [male] 19 1 [maths skills] 36 24 38 32 28 35 30 32 35 80 1.51 0 [not depressed] 0 [not depressed] 0 [not depressed]
## 15 2 1 [male] 21 2 [confidence building] 37 29 50 36 30 47 34 34 45 90 10.2 1 [depressed] 1 [depressed] 1 [depressed]
## 16 27 2 [female] 20 1 [maths skills] 41 16 45 40 14 44 38 18 40 56 1.18 1 [depressed] 0 [not depressed] 0 [not depressed]
## 17 25 2 [female] 24 1 [maths skills] 38 14 42 37 14 40 35 19 39 53 1.06 0 [not depressed] 0 [not depressed] 0 [not depressed]
## 18 19 2 [female] 27 1 [maths skills] 42 15 49 41 13 49 40 20 44 59 3.87 1 [depressed] 1 [depressed] 0 [not depressed]
## 19 18 2 [female] 23 2 [confidence building] 44 13 39 39 20 30 34 22 30 64 2.71 0 [not depressed] 0 [not depressed] 0 [not depressed]
## 20 23 2 [female] 22 1 [maths skills] 32 22 39 31 18 38 32 22 36 63 3.55 0 [not depressed] 0 [not depressed] 0 [not depressed]
## 21 21 2 [female] 46 1 [maths skills] 39 21 44 40 19 44 38 23 44 64 0.501 0 [not depressed] 0 [not depressed] 0 [not depressed]
## 22 26 2 [female] 19 2 [confidence building] 42 13 43 38 20 39 36 23 37 63 1.47 0 [not depressed] 0 [not depressed] 0 [not depressed]
## 23 29 2 [female] 22 1 [maths skills] 37 28 33 38 22 33 36 26 32 67 9.13 0 [not depressed] 0 [not depressed] 0 [not depressed]
## 24 17 2 [female] 37 1 [maths skills] 41 29 39 40 22 40 40 27 40 71 6.21 0 [not depressed] 0 [not depressed] 0 [not depressed]
## 25 20 2 [female] 32 2 [confidence building] 43 17 47 36 26 45 34 28 42 73 1.72 1 [depressed] 1 [depressed] 0 [not depressed]
## 26 28 2 [female] 30 2 [confidence building] 46 20 38 40 28 30 37 29 29 80 1.50 0 [not depressed] 0 [not depressed] 0 [not depressed]
## 27 22 2 [female] 25 2 [confidence building] 30 24 45 28 28 40 25 30 38 83 1.92 1 [depressed] 0 [not depressed] 0 [not depressed]
## 28 24 2 [female] 21 2 [confidence building] 33 12 50 29 20 48 25 30 50 85 7.56 1 [depressed] 1 [depressed] 1 [depressed]
## 29 16 2 [female] 45 2 [confidence building] 40 22 45 30 35 40 25 32 42 78 1.19 1 [depressed] 0 [not depressed] 0 [not depressed]
## 30 30 2 [female] 21 2 [confidence building] 39 21 34 36 30 30 30 32 32 84 6.05 0 [not depressed] 0 [not depressed] 0 [not depressed]
Bước 2: Sau đó các bạn chuyển
tibble này qua dạng
data.frame
Lý do: Bởi vì SPSS có những cột được đánh
dấu label (chính là những biến phân loại) khi được import vào ở dạng
tibble trong R thì nhìn hơi bị rối mắt. Do đó chúng ta nên
chuyển qua data.frame để gọn gàng hơn.
Sau đó, nếu cần thiết thì ta sẽ convert các
cột label qua dạng factor.
data_ok <- as.data.frame(data_spss)
sapply(data_ok, class) ## lưu ý là dù chuyển qua data frame nhưng class của các cột ở `data_ok` vẫn y chang như `data_spss` (tức là vẫn sẽ chứa các thông tin label)## $id
## [1] "numeric"
##
## $sex
## [1] "haven_labelled" "vctrs_vctr" "double"
##
## $age
## [1] "numeric"
##
## $group
## [1] "haven_labelled" "vctrs_vctr" "double"
##
## $fost1
## [1] "numeric"
##
## $confid1
## [1] "numeric"
##
## $depress1
## [1] "numeric"
##
## $fost2
## [1] "numeric"
##
## $confid2
## [1] "numeric"
##
## $depress2
## [1] "numeric"
##
## $fost3
## [1] "numeric"
##
## $confid3
## [1] "numeric"
##
## $depress3
## [1] "numeric"
##
## $exam
## [1] "numeric"
##
## $mah_1
## [1] "numeric"
##
## $DepT1gp2
## [1] "haven_labelled" "vctrs_vctr" "double"
##
## $DepT2Gp2
## [1] "haven_labelled" "vctrs_vctr" "double"
##
## $DepT3gp2
## [1] "haven_labelled" "vctrs_vctr" "double"
print(data_ok)## id sex age group fost1 confid1 depress1 fost2 confid2 depress2 fost3 confid3 depress3 exam mah_1 DepT1gp2 DepT2Gp2 DepT3gp2
## 1 4 1 23 2 50 15 44 48 16 44 45 14 40 52 0.57 0 0 0
## 2 10 1 21 2 47 14 42 45 15 42 44 18 40 55 1.66 0 0 0
## 3 9 1 25 1 44 12 40 39 18 40 36 19 38 58 3.54 0 0 0
## 4 3 1 30 1 47 11 43 42 16 43 41 20 43 60 2.45 0 0 0
## 5 12 1 45 2 46 16 44 45 16 45 43 20 43 58 0.94 0 1 0
## 6 11 1 22 1 39 13 43 40 20 42 39 22 38 62 1.63 0 0 0
## 7 6 1 22 2 32 21 37 33 22 36 32 23 35 59 4.17 0 0 0
## 8 5 1 26 1 44 17 46 37 20 47 32 26 42 70 1.03 1 1 0
## 9 8 1 23 2 40 22 37 40 23 37 40 26 35 60 1.71 0 0 0
## 10 13 1 21 1 47 20 50 45 25 48 46 27 46 70 3.09 1 1 1
## 11 14 1 23 2 38 28 39 37 27 36 32 29 34 72 2.91 0 0 0
## 12 1 1 19 1 32 20 44 28 25 43 23 30 40 82 0.35 0 0 0
## 13 15 1 23 1 39 21 47 35 26 47 35 30 47 79 1.59 1 1 1
## 14 7 1 19 1 36 24 38 32 28 35 30 32 35 80 1.51 0 0 0
## 15 2 1 21 2 37 29 50 36 30 47 34 34 45 90 10.24 1 1 1
## 16 27 2 20 1 41 16 45 40 14 44 38 18 40 56 1.18 1 0 0
## 17 25 2 24 1 38 14 42 37 14 40 35 19 39 53 1.06 0 0 0
## 18 19 2 27 1 42 15 49 41 13 49 40 20 44 59 3.87 1 1 0
## 19 18 2 23 2 44 13 39 39 20 30 34 22 30 64 2.71 0 0 0
## 20 23 2 22 1 32 22 39 31 18 38 32 22 36 63 3.55 0 0 0
## 21 21 2 46 1 39 21 44 40 19 44 38 23 44 64 0.50 0 0 0
## 22 26 2 19 2 42 13 43 38 20 39 36 23 37 63 1.47 0 0 0
## 23 29 2 22 1 37 28 33 38 22 33 36 26 32 67 9.13 0 0 0
## 24 17 2 37 1 41 29 39 40 22 40 40 27 40 71 6.21 0 0 0
## 25 20 2 32 2 43 17 47 36 26 45 34 28 42 73 1.72 1 1 0
## 26 28 2 30 2 46 20 38 40 28 30 37 29 29 80 1.50 0 0 0
## 27 22 2 25 2 30 24 45 28 28 40 25 30 38 83 1.92 1 0 0
## 28 24 2 21 2 33 12 50 29 20 48 25 30 50 85 7.56 1 1 1
## 29 16 2 45 2 40 22 45 30 35 40 25 32 42 78 1.19 1 0 0
## 30 30 2 21 2 39 21 34 36 30 30 30 32 32 84 6.05 0 0 0
Các bạn so sánh kết quả ở data.frame nhìn sẽ gọn gàng
hơn ở dạng tibble.
Giờ chúng ta sẽ xử lý một chút để gán các kết quả label qua dạng factor.
Bước 3: Chuyển cột biến phân loại qua factor
## tìm những cột có lable
sapply(data_ok, class) -> class_ok
names(grep(pattern = "labelled", class_ok, value = TRUE))## [1] "sex" "group" "DepT1gp2" "DepT2Gp2" "DepT3gp2"
## tách thông tin lable ở cột `sex`
names(attributes(data_ok$sex)$labels)## [1] "male" "female"
## gán thông tin đó vào cột `sex` và convert qua factor
data_ok$sex <- factor(data_ok$sex, labels = names(attributes(data_ok$sex)$labels))
## tương tự
data_ok$group <- factor(data_ok$group, labels = names(attributes(data_ok$group)$labels))
data_ok$DepT1gp2 <- factor(data_ok$DepT1gp2, labels = names(attributes(data_ok$DepT1gp2)$labels))
data_ok$DepT2Gp2 <- factor(data_ok$DepT2Gp2, labels = names(attributes(data_ok$DepT2Gp2)$labels))
data_ok$DepT3gp2 <- factor(data_ok$DepT3gp2, labels = names(attributes(data_ok$DepT3gp2)$labels))
## dataset hoàn chỉnh
print(data_ok) ## id sex age group fost1 confid1 depress1 fost2 confid2 depress2 fost3 confid3 depress3 exam mah_1 DepT1gp2 DepT2Gp2 DepT3gp2
## 1 4 male 23 confidence building 50 15 44 48 16 44 45 14 40 52 0.57 not depressed not depressed not depressed
## 2 10 male 21 confidence building 47 14 42 45 15 42 44 18 40 55 1.66 not depressed not depressed not depressed
## 3 9 male 25 maths skills 44 12 40 39 18 40 36 19 38 58 3.54 not depressed not depressed not depressed
## 4 3 male 30 maths skills 47 11 43 42 16 43 41 20 43 60 2.45 not depressed not depressed not depressed
## 5 12 male 45 confidence building 46 16 44 45 16 45 43 20 43 58 0.94 not depressed depressed not depressed
## 6 11 male 22 maths skills 39 13 43 40 20 42 39 22 38 62 1.63 not depressed not depressed not depressed
## 7 6 male 22 confidence building 32 21 37 33 22 36 32 23 35 59 4.17 not depressed not depressed not depressed
## 8 5 male 26 maths skills 44 17 46 37 20 47 32 26 42 70 1.03 depressed depressed not depressed
## 9 8 male 23 confidence building 40 22 37 40 23 37 40 26 35 60 1.71 not depressed not depressed not depressed
## 10 13 male 21 maths skills 47 20 50 45 25 48 46 27 46 70 3.09 depressed depressed depressed
## 11 14 male 23 confidence building 38 28 39 37 27 36 32 29 34 72 2.91 not depressed not depressed not depressed
## 12 1 male 19 maths skills 32 20 44 28 25 43 23 30 40 82 0.35 not depressed not depressed not depressed
## 13 15 male 23 maths skills 39 21 47 35 26 47 35 30 47 79 1.59 depressed depressed depressed
## 14 7 male 19 maths skills 36 24 38 32 28 35 30 32 35 80 1.51 not depressed not depressed not depressed
## 15 2 male 21 confidence building 37 29 50 36 30 47 34 34 45 90 10.24 depressed depressed depressed
## 16 27 female 20 maths skills 41 16 45 40 14 44 38 18 40 56 1.18 depressed not depressed not depressed
## 17 25 female 24 maths skills 38 14 42 37 14 40 35 19 39 53 1.06 not depressed not depressed not depressed
## 18 19 female 27 maths skills 42 15 49 41 13 49 40 20 44 59 3.87 depressed depressed not depressed
## 19 18 female 23 confidence building 44 13 39 39 20 30 34 22 30 64 2.71 not depressed not depressed not depressed
## 20 23 female 22 maths skills 32 22 39 31 18 38 32 22 36 63 3.55 not depressed not depressed not depressed
## 21 21 female 46 maths skills 39 21 44 40 19 44 38 23 44 64 0.50 not depressed not depressed not depressed
## 22 26 female 19 confidence building 42 13 43 38 20 39 36 23 37 63 1.47 not depressed not depressed not depressed
## 23 29 female 22 maths skills 37 28 33 38 22 33 36 26 32 67 9.13 not depressed not depressed not depressed
## 24 17 female 37 maths skills 41 29 39 40 22 40 40 27 40 71 6.21 not depressed not depressed not depressed
## 25 20 female 32 confidence building 43 17 47 36 26 45 34 28 42 73 1.72 depressed depressed not depressed
## 26 28 female 30 confidence building 46 20 38 40 28 30 37 29 29 80 1.50 not depressed not depressed not depressed
## 27 22 female 25 confidence building 30 24 45 28 28 40 25 30 38 83 1.92 depressed not depressed not depressed
## 28 24 female 21 confidence building 33 12 50 29 20 48 25 30 50 85 7.56 depressed depressed depressed
## 29 16 female 45 confidence building 40 22 45 30 35 40 25 32 42 78 1.19 depressed not depressed not depressed
## 30 30 female 21 confidence building 39 21 34 36 30 30 30 32 32 84 6.05 not depressed not depressed not depressed
## check dataset
sapply(data_ok, class)## id sex age group fost1 confid1 depress1 fost2 confid2 depress2 fost3 confid3 depress3 exam mah_1 DepT1gp2 DepT2Gp2 DepT3gp2
## "numeric" "factor" "numeric" "factor" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "factor" "factor" "factor"
summary(data_ok)## id sex age group fost1 confid1 depress1 fost2 confid2 depress2 fost3 confid3 depress3 exam
## Min. : 1.0 male :15 Min. :19 maths skills :15 Min. :30 Min. :11.0 Min. :33 Min. :28 Min. :13 Min. :30 Min. :23 Min. :14 Min. :29 Min. :52
## 1st Qu.: 8.2 female:15 1st Qu.:21 confidence building:15 1st Qu.:37 1st Qu.:14.2 1st Qu.:39 1st Qu.:35 1st Qu.:18 1st Qu.:37 1st Qu.:32 1st Qu.:20 1st Qu.:35 1st Qu.:59
## Median :15.5 Median :23 Median :40 Median :20.0 Median :43 Median :38 Median :21 Median :41 Median :36 Median :26 Median :40 Median :66
## Mean :15.5 Mean :26 Mean :40 Mean :19.0 Mean :43 Mean :38 Mean :22 Mean :41 Mean :35 Mean :25 Mean :39 Mean :68
## 3rd Qu.:22.8 3rd Qu.:27 3rd Qu.:44 3rd Qu.:22.0 3rd Qu.:45 3rd Qu.:40 3rd Qu.:26 3rd Qu.:45 3rd Qu.:40 3rd Qu.:30 3rd Qu.:43 3rd Qu.:79
## Max. :30.0 Max. :46 Max. :50 Max. :29.0 Max. :50 Max. :48 Max. :35 Max. :49 Max. :46 Max. :34 Max. :50 Max. :90
## mah_1 DepT1gp2 DepT2Gp2 DepT3gp2
## Min. : 0.3 not depressed:20 not depressed:22 not depressed:26
## 1st Qu.: 1.3 depressed :10 depressed : 8 depressed : 4
## Median : 1.7
## Mean : 2.9
## 3rd Qu.: 3.5
## Max. :10.2
str(data_ok)## 'data.frame': 30 obs. of 18 variables:
## $ id : num 4 10 9 3 12 11 6 5 8 13 ...
## ..- attr(*, "format.spss")= chr "F8.0"
## $ sex : Factor w/ 2 levels "male","female": 1 1 1 1 1 1 1 1 1 1 ...
## $ age : num 23 21 25 30 45 22 22 26 23 21 ...
## ..- attr(*, "format.spss")= chr "F8.0"
## $ group : Factor w/ 2 levels "maths skills",..: 2 2 1 1 2 1 2 1 2 1 ...
## $ fost1 : num 50 47 44 47 46 39 32 44 40 47 ...
## ..- attr(*, "label")= chr "fear of stats time1"
## ..- attr(*, "format.spss")= chr "F8.0"
## $ confid1 : num 15 14 12 11 16 13 21 17 22 20 ...
## ..- attr(*, "label")= chr "confidence time1"
## ..- attr(*, "format.spss")= chr "F8.0"
## $ depress1: num 44 42 40 43 44 43 37 46 37 50 ...
## ..- attr(*, "label")= chr "depression time1"
## ..- attr(*, "format.spss")= chr "F8.0"
## $ fost2 : num 48 45 39 42 45 40 33 37 40 45 ...
## ..- attr(*, "label")= chr "fear of stats time2"
## ..- attr(*, "format.spss")= chr "F8.0"
## $ confid2 : num 16 15 18 16 16 20 22 20 23 25 ...
## ..- attr(*, "label")= chr "confidence time2"
## ..- attr(*, "format.spss")= chr "F8.0"
## $ depress2: num 44 42 40 43 45 42 36 47 37 48 ...
## ..- attr(*, "label")= chr "depression time2"
## ..- attr(*, "format.spss")= chr "F8.0"
## $ fost3 : num 45 44 36 41 43 39 32 32 40 46 ...
## ..- attr(*, "label")= chr "fear of stats time3"
## ..- attr(*, "format.spss")= chr "F8.0"
## $ confid3 : num 14 18 19 20 20 22 23 26 26 27 ...
## ..- attr(*, "label")= chr "confidence time3"
## ..- attr(*, "format.spss")= chr "F8.0"
## $ depress3: num 40 40 38 43 43 38 35 42 35 46 ...
## ..- attr(*, "label")= chr "depression time3"
## ..- attr(*, "format.spss")= chr "F8.0"
## $ exam : num 52 55 58 60 58 62 59 70 60 70 ...
## ..- attr(*, "label")= chr "stats exam"
## ..- attr(*, "format.spss")= chr "F8.0"
## $ mah_1 : num 0.57 1.659 3.54 2.454 0.944 ...
## ..- attr(*, "label")= chr "Mahalanobis Distance"
## ..- attr(*, "format.spss")= chr "F11.5"
## $ DepT1gp2: Factor w/ 2 levels "not depressed",..: 1 1 1 1 1 1 1 2 1 2 ...
## $ DepT2Gp2: Factor w/ 2 levels "not depressed",..: 1 1 1 1 2 1 1 2 1 2 ...
## $ DepT3gp2: Factor w/ 2 levels "not depressed",..: 1 1 1 1 1 1 1 1 1 2 ...
Bước 4: Làm sạch dataset 100%
Thực tế ở bước 3 các bạn đã có data.frame hoàn chỉnh
rồi, tuy nhiên khi dùng lệnh str() thì ta thấy vẫn còn các
cột có attributes dính đến chữ SPSS. Nếu muốn cho sạch hết
các thông tin này thì các bạn gán giá trị NULL vào.
sapply(data_ok, attributes)## $id
## $id$format.spss
## [1] "F8.0"
##
##
## $sex
## $sex$levels
## [1] "male" "female"
##
## $sex$class
## [1] "factor"
##
##
## $age
## $age$format.spss
## [1] "F8.0"
##
##
## $group
## $group$levels
## [1] "maths skills" "confidence building"
##
## $group$class
## [1] "factor"
##
##
## $fost1
## $fost1$label
## [1] "fear of stats time1"
##
## $fost1$format.spss
## [1] "F8.0"
##
##
## $confid1
## $confid1$label
## [1] "confidence time1"
##
## $confid1$format.spss
## [1] "F8.0"
##
##
## $depress1
## $depress1$label
## [1] "depression time1"
##
## $depress1$format.spss
## [1] "F8.0"
##
##
## $fost2
## $fost2$label
## [1] "fear of stats time2"
##
## $fost2$format.spss
## [1] "F8.0"
##
##
## $confid2
## $confid2$label
## [1] "confidence time2"
##
## $confid2$format.spss
## [1] "F8.0"
##
##
## $depress2
## $depress2$label
## [1] "depression time2"
##
## $depress2$format.spss
## [1] "F8.0"
##
##
## $fost3
## $fost3$label
## [1] "fear of stats time3"
##
## $fost3$format.spss
## [1] "F8.0"
##
##
## $confid3
## $confid3$label
## [1] "confidence time3"
##
## $confid3$format.spss
## [1] "F8.0"
##
##
## $depress3
## $depress3$label
## [1] "depression time3"
##
## $depress3$format.spss
## [1] "F8.0"
##
##
## $exam
## $exam$label
## [1] "stats exam"
##
## $exam$format.spss
## [1] "F8.0"
##
##
## $mah_1
## $mah_1$label
## [1] "Mahalanobis Distance"
##
## $mah_1$format.spss
## [1] "F11.5"
##
##
## $DepT1gp2
## $DepT1gp2$levels
## [1] "not depressed" "depressed"
##
## $DepT1gp2$class
## [1] "factor"
##
##
## $DepT2Gp2
## $DepT2Gp2$levels
## [1] "not depressed" "depressed"
##
## $DepT2Gp2$class
## [1] "factor"
##
##
## $DepT3gp2
## $DepT3gp2$levels
## [1] "not depressed" "depressed"
##
## $DepT3gp2$class
## [1] "factor"
attributes(data_ok$id) <- NULL
attributes(data_ok$age) <- NULL
attributes(data_ok$fost1) <- NULL
attributes(data_ok$confid1) <- NULL
attributes(data_ok$depress1) <- NULL
attributes(data_ok$fost2) <- NULL
attributes(data_ok$confid2) <- NULL
attributes(data_ok$depress2) <- NULL
attributes(data_ok$fost3) <- NULL
attributes(data_ok$confid3) <- NULL
attributes(data_ok$depress3) <- NULL
attributes(data_ok$exam) <- NULL
attributes(data_ok$mah_1) <- NULL
## dataset hoàn chỉnh
print(data_ok) ## id sex age group fost1 confid1 depress1 fost2 confid2 depress2 fost3 confid3 depress3 exam mah_1 DepT1gp2 DepT2Gp2 DepT3gp2
## 1 4 male 23 confidence building 50 15 44 48 16 44 45 14 40 52 0.57 not depressed not depressed not depressed
## 2 10 male 21 confidence building 47 14 42 45 15 42 44 18 40 55 1.66 not depressed not depressed not depressed
## 3 9 male 25 maths skills 44 12 40 39 18 40 36 19 38 58 3.54 not depressed not depressed not depressed
## 4 3 male 30 maths skills 47 11 43 42 16 43 41 20 43 60 2.45 not depressed not depressed not depressed
## 5 12 male 45 confidence building 46 16 44 45 16 45 43 20 43 58 0.94 not depressed depressed not depressed
## 6 11 male 22 maths skills 39 13 43 40 20 42 39 22 38 62 1.63 not depressed not depressed not depressed
## 7 6 male 22 confidence building 32 21 37 33 22 36 32 23 35 59 4.17 not depressed not depressed not depressed
## 8 5 male 26 maths skills 44 17 46 37 20 47 32 26 42 70 1.03 depressed depressed not depressed
## 9 8 male 23 confidence building 40 22 37 40 23 37 40 26 35 60 1.71 not depressed not depressed not depressed
## 10 13 male 21 maths skills 47 20 50 45 25 48 46 27 46 70 3.09 depressed depressed depressed
## 11 14 male 23 confidence building 38 28 39 37 27 36 32 29 34 72 2.91 not depressed not depressed not depressed
## 12 1 male 19 maths skills 32 20 44 28 25 43 23 30 40 82 0.35 not depressed not depressed not depressed
## 13 15 male 23 maths skills 39 21 47 35 26 47 35 30 47 79 1.59 depressed depressed depressed
## 14 7 male 19 maths skills 36 24 38 32 28 35 30 32 35 80 1.51 not depressed not depressed not depressed
## 15 2 male 21 confidence building 37 29 50 36 30 47 34 34 45 90 10.24 depressed depressed depressed
## 16 27 female 20 maths skills 41 16 45 40 14 44 38 18 40 56 1.18 depressed not depressed not depressed
## 17 25 female 24 maths skills 38 14 42 37 14 40 35 19 39 53 1.06 not depressed not depressed not depressed
## 18 19 female 27 maths skills 42 15 49 41 13 49 40 20 44 59 3.87 depressed depressed not depressed
## 19 18 female 23 confidence building 44 13 39 39 20 30 34 22 30 64 2.71 not depressed not depressed not depressed
## 20 23 female 22 maths skills 32 22 39 31 18 38 32 22 36 63 3.55 not depressed not depressed not depressed
## 21 21 female 46 maths skills 39 21 44 40 19 44 38 23 44 64 0.50 not depressed not depressed not depressed
## 22 26 female 19 confidence building 42 13 43 38 20 39 36 23 37 63 1.47 not depressed not depressed not depressed
## 23 29 female 22 maths skills 37 28 33 38 22 33 36 26 32 67 9.13 not depressed not depressed not depressed
## 24 17 female 37 maths skills 41 29 39 40 22 40 40 27 40 71 6.21 not depressed not depressed not depressed
## 25 20 female 32 confidence building 43 17 47 36 26 45 34 28 42 73 1.72 depressed depressed not depressed
## 26 28 female 30 confidence building 46 20 38 40 28 30 37 29 29 80 1.50 not depressed not depressed not depressed
## 27 22 female 25 confidence building 30 24 45 28 28 40 25 30 38 83 1.92 depressed not depressed not depressed
## 28 24 female 21 confidence building 33 12 50 29 20 48 25 30 50 85 7.56 depressed depressed depressed
## 29 16 female 45 confidence building 40 22 45 30 35 40 25 32 42 78 1.19 depressed not depressed not depressed
## 30 30 female 21 confidence building 39 21 34 36 30 30 30 32 32 84 6.05 not depressed not depressed not depressed
## check lại thấy dataset đã sạch hoàn toàn thông tin SPSS
sapply(data_ok, class)## id sex age group fost1 confid1 depress1 fost2 confid2 depress2 fost3 confid3 depress3 exam mah_1 DepT1gp2 DepT2Gp2 DepT3gp2
## "numeric" "factor" "numeric" "factor" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "factor" "factor" "factor"
summary(data_ok)## id sex age group fost1 confid1 depress1 fost2 confid2 depress2 fost3 confid3 depress3 exam
## Min. : 1.0 male :15 Min. :19 maths skills :15 Min. :30 Min. :11.0 Min. :33 Min. :28 Min. :13 Min. :30 Min. :23 Min. :14 Min. :29 Min. :52
## 1st Qu.: 8.2 female:15 1st Qu.:21 confidence building:15 1st Qu.:37 1st Qu.:14.2 1st Qu.:39 1st Qu.:35 1st Qu.:18 1st Qu.:37 1st Qu.:32 1st Qu.:20 1st Qu.:35 1st Qu.:59
## Median :15.5 Median :23 Median :40 Median :20.0 Median :43 Median :38 Median :21 Median :41 Median :36 Median :26 Median :40 Median :66
## Mean :15.5 Mean :26 Mean :40 Mean :19.0 Mean :43 Mean :38 Mean :22 Mean :41 Mean :35 Mean :25 Mean :39 Mean :68
## 3rd Qu.:22.8 3rd Qu.:27 3rd Qu.:44 3rd Qu.:22.0 3rd Qu.:45 3rd Qu.:40 3rd Qu.:26 3rd Qu.:45 3rd Qu.:40 3rd Qu.:30 3rd Qu.:43 3rd Qu.:79
## Max. :30.0 Max. :46 Max. :50 Max. :29.0 Max. :50 Max. :48 Max. :35 Max. :49 Max. :46 Max. :34 Max. :50 Max. :90
## mah_1 DepT1gp2 DepT2Gp2 DepT3gp2
## Min. : 0.3 not depressed:20 not depressed:22 not depressed:26
## 1st Qu.: 1.3 depressed :10 depressed : 8 depressed : 4
## Median : 1.7
## Mean : 2.9
## 3rd Qu.: 3.5
## Max. :10.2
str(data_ok)## 'data.frame': 30 obs. of 18 variables:
## $ id : num 4 10 9 3 12 11 6 5 8 13 ...
## $ sex : Factor w/ 2 levels "male","female": 1 1 1 1 1 1 1 1 1 1 ...
## $ age : num 23 21 25 30 45 22 22 26 23 21 ...
## $ group : Factor w/ 2 levels "maths skills",..: 2 2 1 1 2 1 2 1 2 1 ...
## $ fost1 : num 50 47 44 47 46 39 32 44 40 47 ...
## $ confid1 : num 15 14 12 11 16 13 21 17 22 20 ...
## $ depress1: num 44 42 40 43 44 43 37 46 37 50 ...
## $ fost2 : num 48 45 39 42 45 40 33 37 40 45 ...
## $ confid2 : num 16 15 18 16 16 20 22 20 23 25 ...
## $ depress2: num 44 42 40 43 45 42 36 47 37 48 ...
## $ fost3 : num 45 44 36 41 43 39 32 32 40 46 ...
## $ confid3 : num 14 18 19 20 20 22 23 26 26 27 ...
## $ depress3: num 40 40 38 43 43 38 35 42 35 46 ...
## $ exam : num 52 55 58 60 58 62 59 70 60 70 ...
## $ mah_1 : num 0.57 1.659 3.54 2.454 0.944 ...
## $ DepT1gp2: Factor w/ 2 levels "not depressed",..: 1 1 1 1 1 1 1 2 1 2 ...
## $ DepT2Gp2: Factor w/ 2 levels "not depressed",..: 1 1 1 1 2 1 1 2 1 2 ...
## $ DepT3gp2: Factor w/ 2 levels "not depressed",..: 1 1 1 1 1 1 1 1 1 2 ...
Nếu theo 4 bước ở trên thì các bạn hoàn toàn làm thủ công theo từng dòng lệnh trong R, hoặc các bạn có thể tham khảo các cách xử lý dữ liệu SPSS khác vào R ở link này.
https://cran.r-project.org/web/packages/labelled/vignettes/intro_labelled.html
Tuy nhiên việc đọc hiểu các cách tiếp cận khác nhau khi import SPSS vào R cũng hơi dài dòng, để thuận tiện thì mình xây dựng function này để các bạn chỉ cần copy vào R rồi run là xong. Chúng ta sẽ có dataset sạch sẽ để xử lý trong R cho các công đoạn tiếp theo.
options(digits = 2)
options(width = 200)
library(haven)
data_experim <- read_sav("experim.sav") ## import file SPSS ví dụ
clean_file_spss <- function(input_data) {
data_ok <- as.data.frame(input_data)
# lấy tên những cột có label
sapply(data_ok, class) -> class_ok
vi_tri_cot <- grep(pattern = "labelled", class_ok)
# convert các cột label
for (i in vi_tri_cot) {
data_ok[, i] <- factor(data_ok[, i],
labels = names(attributes(data_ok[, i])$labels)
)
}
# lấy tên những cột có chữ spss
sapply(data_ok, attributes) -> class_spss
vi_tri_cot_spss <- grep(pattern = "spss", class_spss)
# làm sạch chữ spss
for (j in vi_tri_cot_spss) {
attributes(data_ok[, j]) <- NULL
}
return(data_ok)
}
data_ok_2 <- clean_file_spss(data_experim) ## chạy function để clean dataset
print(data_ok_2)## id sex age group fost1 confid1 depress1 fost2 confid2 depress2 fost3 confid3 depress3 exam mah_1 DepT1gp2 DepT2Gp2 DepT3gp2
## 1 4 male 23 confidence building 50 15 44 48 16 44 45 14 40 52 0.57 not depressed not depressed not depressed
## 2 10 male 21 confidence building 47 14 42 45 15 42 44 18 40 55 1.66 not depressed not depressed not depressed
## 3 9 male 25 maths skills 44 12 40 39 18 40 36 19 38 58 3.54 not depressed not depressed not depressed
## 4 3 male 30 maths skills 47 11 43 42 16 43 41 20 43 60 2.45 not depressed not depressed not depressed
## 5 12 male 45 confidence building 46 16 44 45 16 45 43 20 43 58 0.94 not depressed depressed not depressed
## 6 11 male 22 maths skills 39 13 43 40 20 42 39 22 38 62 1.63 not depressed not depressed not depressed
## 7 6 male 22 confidence building 32 21 37 33 22 36 32 23 35 59 4.17 not depressed not depressed not depressed
## 8 5 male 26 maths skills 44 17 46 37 20 47 32 26 42 70 1.03 depressed depressed not depressed
## 9 8 male 23 confidence building 40 22 37 40 23 37 40 26 35 60 1.71 not depressed not depressed not depressed
## 10 13 male 21 maths skills 47 20 50 45 25 48 46 27 46 70 3.09 depressed depressed depressed
## 11 14 male 23 confidence building 38 28 39 37 27 36 32 29 34 72 2.91 not depressed not depressed not depressed
## 12 1 male 19 maths skills 32 20 44 28 25 43 23 30 40 82 0.35 not depressed not depressed not depressed
## 13 15 male 23 maths skills 39 21 47 35 26 47 35 30 47 79 1.59 depressed depressed depressed
## 14 7 male 19 maths skills 36 24 38 32 28 35 30 32 35 80 1.51 not depressed not depressed not depressed
## 15 2 male 21 confidence building 37 29 50 36 30 47 34 34 45 90 10.24 depressed depressed depressed
## 16 27 female 20 maths skills 41 16 45 40 14 44 38 18 40 56 1.18 depressed not depressed not depressed
## 17 25 female 24 maths skills 38 14 42 37 14 40 35 19 39 53 1.06 not depressed not depressed not depressed
## 18 19 female 27 maths skills 42 15 49 41 13 49 40 20 44 59 3.87 depressed depressed not depressed
## 19 18 female 23 confidence building 44 13 39 39 20 30 34 22 30 64 2.71 not depressed not depressed not depressed
## 20 23 female 22 maths skills 32 22 39 31 18 38 32 22 36 63 3.55 not depressed not depressed not depressed
## 21 21 female 46 maths skills 39 21 44 40 19 44 38 23 44 64 0.50 not depressed not depressed not depressed
## 22 26 female 19 confidence building 42 13 43 38 20 39 36 23 37 63 1.47 not depressed not depressed not depressed
## 23 29 female 22 maths skills 37 28 33 38 22 33 36 26 32 67 9.13 not depressed not depressed not depressed
## 24 17 female 37 maths skills 41 29 39 40 22 40 40 27 40 71 6.21 not depressed not depressed not depressed
## 25 20 female 32 confidence building 43 17 47 36 26 45 34 28 42 73 1.72 depressed depressed not depressed
## 26 28 female 30 confidence building 46 20 38 40 28 30 37 29 29 80 1.50 not depressed not depressed not depressed
## 27 22 female 25 confidence building 30 24 45 28 28 40 25 30 38 83 1.92 depressed not depressed not depressed
## 28 24 female 21 confidence building 33 12 50 29 20 48 25 30 50 85 7.56 depressed depressed depressed
## 29 16 female 45 confidence building 40 22 45 30 35 40 25 32 42 78 1.19 depressed not depressed not depressed
## 30 30 female 21 confidence building 39 21 34 36 30 30 30 32 32 84 6.05 not depressed not depressed not depressed
sapply(data_ok, class)## id sex age group fost1 confid1 depress1 fost2 confid2 depress2 fost3 confid3 depress3 exam mah_1 DepT1gp2 DepT2Gp2 DepT3gp2
## "numeric" "factor" "numeric" "factor" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "factor" "factor" "factor"
summary(data_ok)## id sex age group fost1 confid1 depress1 fost2 confid2 depress2 fost3 confid3 depress3 exam
## Min. : 1.0 male :15 Min. :19 maths skills :15 Min. :30 Min. :11.0 Min. :33 Min. :28 Min. :13 Min. :30 Min. :23 Min. :14 Min. :29 Min. :52
## 1st Qu.: 8.2 female:15 1st Qu.:21 confidence building:15 1st Qu.:37 1st Qu.:14.2 1st Qu.:39 1st Qu.:35 1st Qu.:18 1st Qu.:37 1st Qu.:32 1st Qu.:20 1st Qu.:35 1st Qu.:59
## Median :15.5 Median :23 Median :40 Median :20.0 Median :43 Median :38 Median :21 Median :41 Median :36 Median :26 Median :40 Median :66
## Mean :15.5 Mean :26 Mean :40 Mean :19.0 Mean :43 Mean :38 Mean :22 Mean :41 Mean :35 Mean :25 Mean :39 Mean :68
## 3rd Qu.:22.8 3rd Qu.:27 3rd Qu.:44 3rd Qu.:22.0 3rd Qu.:45 3rd Qu.:40 3rd Qu.:26 3rd Qu.:45 3rd Qu.:40 3rd Qu.:30 3rd Qu.:43 3rd Qu.:79
## Max. :30.0 Max. :46 Max. :50 Max. :29.0 Max. :50 Max. :48 Max. :35 Max. :49 Max. :46 Max. :34 Max. :50 Max. :90
## mah_1 DepT1gp2 DepT2Gp2 DepT3gp2
## Min. : 0.3 not depressed:20 not depressed:22 not depressed:26
## 1st Qu.: 1.3 depressed :10 depressed : 8 depressed : 4
## Median : 1.7
## Mean : 2.9
## 3rd Qu.: 3.5
## Max. :10.2
str(data_ok)## 'data.frame': 30 obs. of 18 variables:
## $ id : num 4 10 9 3 12 11 6 5 8 13 ...
## $ sex : Factor w/ 2 levels "male","female": 1 1 1 1 1 1 1 1 1 1 ...
## $ age : num 23 21 25 30 45 22 22 26 23 21 ...
## $ group : Factor w/ 2 levels "maths skills",..: 2 2 1 1 2 1 2 1 2 1 ...
## $ fost1 : num 50 47 44 47 46 39 32 44 40 47 ...
## $ confid1 : num 15 14 12 11 16 13 21 17 22 20 ...
## $ depress1: num 44 42 40 43 44 43 37 46 37 50 ...
## $ fost2 : num 48 45 39 42 45 40 33 37 40 45 ...
## $ confid2 : num 16 15 18 16 16 20 22 20 23 25 ...
## $ depress2: num 44 42 40 43 45 42 36 47 37 48 ...
## $ fost3 : num 45 44 36 41 43 39 32 32 40 46 ...
## $ confid3 : num 14 18 19 20 20 22 23 26 26 27 ...
## $ depress3: num 40 40 38 43 43 38 35 42 35 46 ...
## $ exam : num 52 55 58 60 58 62 59 70 60 70 ...
## $ mah_1 : num 0.57 1.659 3.54 2.454 0.944 ...
## $ DepT1gp2: Factor w/ 2 levels "not depressed",..: 1 1 1 1 1 1 1 2 1 2 ...
## $ DepT2Gp2: Factor w/ 2 levels "not depressed",..: 1 1 1 1 2 1 1 2 1 2 ...
## $ DepT3gp2: Factor w/ 2 levels "not depressed",..: 1 1 1 1 1 1 1 1 1 2 ...
Kiểm tra dataset bằng cách function và code thủ công là y chang nhau.
identical(data_ok, data_ok_2) ## kết quả TRUE cho thấy function thực hiện thành công việc clean dataset.## [1] TRUE
Trên đây là hướng dẫn cách import file SPSS vào R. Để học R bài bản từ A đến Z, thân mời Bạn tham gia khóa học “HDSD R để xử lý dữ liệu” để có nền tảng vững chắc về R nhằm tự tay làm các câu chuyện dữ liệu của riêng mình!
ĐĂNG KÝ NGAY:
https://www.tuhocr.com/register