Tài liệu buổi thực hành ngày 15/9/2021: Xử lý số liệu thô với R: Bộ số liệu Điều tra mức sống dân cư 2018
Speaker: TS. Nguyễn Thị Nhung, Đại học Thăng Long và
TS. Trịnh Thị Hường, Trường ĐH Thương Mại
Chi tiết tại: https://sites.google.com/view/tkud/home?authuser=1
Tài liệu thực hành có thể download tại đây. (Chọn chuột phải tại chữ “tại đây”, chọn open new tab)
Hoặc copy link này: https://drive.google.com/drive/folders/1VN2TMp-jOk-6NIZcx845Rgv4GqKTcg7s?usp=sharing
TÓM TẮT: Trong quá trình tiến hành một nghiên cứu khoa học thực nghiệm, xử lý số liệu thô là một bước bắt buộc và quan trọng. Quá trình xử lý thô số liệu sẽ xử lý các “lỗi” trong quá trình thu thập số liệu, tổng hợp câu trả lời thành các biến để phục vụ câu hỏi nghiên cứu. Đồng thời, quá trình này cũng bao gồm quá trình ghép nối số liệu từ các nguồn khác nhau. Chúng tôi minh họa quá trình xử lý thô trong chủ đề nghiên cứu “Yếu tố ảnh hưởng đến chi tiêu cho giáo dục” tại Việt Nam, trên bộ số liệu Điều tra mức sống dân cư 2018. Phần mềm R được sử dụng để tiến hành các bước xử lý thô để chuẩn bị cho quá trình tiến hành nghiên cứu. Một số thống kê mô tả từ bộ số liệu sạch cũng được minh họa.
Chúng tôi mô phỏng lại quá trình tiến hành xử lý số liệu thô của bài nghiên cứu: Ngoan, Ngô Thị, Nguyễn Thị Tuyết Mai, Đàm Thị Thu Trang, and Trịnh Thị Hường. “NGHIÊN CỨU CÁC YẾU TỐ ẢNH HƯỞNG ĐẾN VIỆC CHI TIÊU CHO GIÁO DỤC CỦA HỘ GIA ĐÌNH TẠI CÁC TỈNH ĐỒNG BẰNG SÔNG HỒNG.” TNU Journal of Science and Technology 226, no. 04 (2021): 53-61.
Link bài báo tại: http://jst.tnu.edu.vn/jst/article/view/4084.
Toàn văn Kết quả Khảo sát mức sống dân cư Việt Nam năm 2018 tại https://www.gso.gov.vn/wp-content/uploads/2020/05/VHLSS2018.pdf
Bảng hỏi và data sử dụng trong nghiên cứu tại Google driver chia sẻ ở trên.
File Rscript chuẩn bị cho buổi trình bày đã được tải lên trên folder Google driver, quý đại biểu có thể tải các file Rscript và chạy thử nghiệm.
Gói dplyr, hay gói tổng hợp tidyverse cung cấp nhiều hàm thuận lợi trong quá trình xử lý thô.
Link hướng dẫn có thể download tại đây (Chọn chuột phải tại chữ “tại đây”, chọn open new tab)
Tại link này: https://drive.google.com/file/d/1USPBCVHmzBgkezYb_em0d79ww_uChFL_/view
setwd("D:/Tap huan VIASM/Chuoi Seminar/Huong Trinh/15Sept")
require(tidyverse)
require(readstata13)
require(stringi)
#=====================HO1.dta==================================
HO1 <- read.dta13("HO1.dta")
#headHO1)
levels(HO1$tinh) <- c("01","02", "04", "06", "08", "10", "11","12" , "14", "15",
"17", "19", "20", "22", "24","25", "26", "27", "30", "31","33","34",
"35", "36", "37", "38","40", "42", "44", "45", "46", "48", "49", "51",
"52","54","56", "58","60", "62","64","66","67","68",
"70","72","74","75","77", "79","80", "82", "83","84","86",
"87","89","91","92","93", "94", "95","96")
#headHO1 %>% select(tinh, huyen, xa, diaban, hoso))
HO1 <- HO1 %>% mutate(ID = paste(tinh, huyen, xa, diaban, hoso, sep = "_"))
#headHO1$ID)
HO1 %>% select(ID) %>% head()## ID
## 1 96_973_32239_19_13
## 2 96_973_32239_19_14
## 3 96_973_32239_19_15
## 4 02_24_691_15_13
## 5 02_24_691_15_14
## 6 02_24_691_15_15
save(HO1, file="HO1.RData")
#=====================HO3.dta==================================
HO3 <- read.dta13("HO3.dta")
#headHO3)
unique(HO3$tinh)[1:5]## [1] T<U+1EC9>nh Cà Mau T<U+1EC9>nh Hà Giang T<U+1EC9>nh Cao B<U+1EB1>ng T<U+1EC9>nh B<U+1EAF>c K<U+1EA1>n
## [5] T<U+1EC9>nh Tuyên Quang
## 63 Levels: Thành ph<U+1ED1> Hà N<U+1ED9>i T<U+1EC9>nh Hà Giang T<U+1EC9>nh Cao B<U+1EB1>ng ... T<U+1EC9>nh Cà Mau
HO3 <- HO3 %>%
mutate( tentinh = stri_trans_general(tinh, 'latin-ascii'))
HO3 <- HO3 %>% mutate(tentinh = str_remove_all(tentinh, "Tinh "),
tentinh = str_remove_all(tentinh, "Thanh pho "))
unique(HO3$tentinh)[1:5]## [1] "Ca Mau" "Ha Giang" "Cao Bang" "Bac Kan" "Tuyen Quang"
levels(HO3$tinh) <- c("01","02", "04", "06", "08", "10", "11","12" , "14", "15",
"17", "19", "20", "22", "24","25", "26", "27", "30", "31","33","34",
"35", "36", "37", "38","40", "42", "44", "45", "46", "48", "49", "51",
"52","54","56", "58","60", "62","64","66","67","68",
"70","72","74","75","77", "79","80", "82", "83","84","86",
"87","89","91","92","93", "94", "95","96" )
HO3 <- HO3 %>%
mutate(ID = paste(tinh, huyen, xa, diaban, hoso, sep = "_"))
HO3 <- HO3 %>% filter(ID %in% HO1$ID)
save(HO3, file="HO3.RData")
#=====================Muc1A.dta==================================
Muc1A <- read.dta13("Muc1A.dta")
head(Muc1A)## tinh huyen xa diaban hoso m1ama m1ac2
## 1 T<U+1EC9>nh Cà Mau 973 32239 19 1 1 N<U+1EEF>
## 2 T<U+1EC9>nh Cà Mau 973 32239 19 1 2 Nam
## 3 T<U+1EC9>nh Cà Mau 973 32239 19 1 3 Nam
## 4 T<U+1EC9>nh Cà Mau 973 32239 19 1 4 Nam
## 5 T<U+1EC9>nh Cà Mau 973 32239 19 1 5 Nam
## 6 T<U+1EC9>nh Cà Mau 973 32239 19 3 1 Nam
## m1ac3 m1ac4a m1ac4b m1ac5 m1ac6 m1ac7a m1ac7b m1ac7c
## 1 Ch<U+1EE7> h<U+1ED9> 9 1958 60 <NA> NA NA NA
## 2 V<U+1EE3>/ch<U+1ED3>ng 1 1954 64 <NA> NA NA NA
## 3 Cháu n<U+1ED9>i ngo<U+1EA1>i 8 1997 21 <NA> NA NA NA
## 4 Cháu n<U+1ED9>i ngo<U+1EA1>i 3 1999 19 <NA> NA NA NA
## 5 Cháu n<U+1ED9>i ngo<U+1EA1>i 3 2003 15 Có 0 0 2
## 6 Ch<U+1EE7> h<U+1ED9> 5 1963 55 <NA> NA NA NA
## m1ac8 m1ac9 m1ac10 m1ama1 m1ac11
## 1 Ðang có v<U+1EE3>/ch<U+1ED3>ng 10 <NA> 1 T<U+1EC9>nh Cà Mau
## 2 Ðang có v<U+1EE3>/ch<U+1ED3>ng 10 <NA> 2 T<U+1EC9>nh Cà Mau
## 3 Chua v<U+1EE3>/ch<U+1ED3>ng 10 <NA> 3 T<U+1EC9>nh Cà Mau
## 4 Chua v<U+1EE3>/ch<U+1ED3>ng 10 <NA> 4 T<U+1EC9>nh Cà Mau
## 5 Chua v<U+1EE3>/ch<U+1ED3>ng 10 <NA> 5 T<U+1EC9>nh Cà Mau
## 6 Goá 12 <NA> 1 T<U+1EC9>nh Cà Mau
## m1ac12 m1ac13 m1ac14a m1ac14b m1ac15a m1ac15b m1ac15c m1ac15d
## 1 Trong xã phu<U+1EDD>ng <NA> NA NA Không Không Không Không
## 2 Trong xã phu<U+1EDD>ng <NA> NA NA Không Không Không Không
## 3 Trong xã phu<U+1EDD>ng <NA> NA NA Không Không Không Không
## 4 Trong xã phu<U+1EDD>ng <NA> NA NA Không Không Không Không
## 5 Trong xã phu<U+1EDD>ng <NA> NA NA Không Không Không Không
## 6 Trong xã phu<U+1EDD>ng <NA> NA NA Không Không Không Không
## m1ac16 m1ac1
## 1 Có d<U+1EB7>ng thu hà
## 2 Có h<U+1ED3> van l<U+1EAF>m
## 3 Có lâm van cu<U+1EDD>ng
## 4 Có lâm van b<U+1EA3>o
## 5 Có lâm van du<U+1EE3>c
## 6 Không ph<U+1EA1>m van d<U+1EA1>n
levels(Muc1A$tinh)<-c("01","02", "04", "06", "08", "10", "11","12" , "14", "15",
"17", "19", "20", "22", "24","25", "26", "27", "30", "31","33","34",
"35", "36", "37", "38","40", "42", "44", "45", "46", "48", "49", "51",
"52","54","56", "58","60", "62","64","66","67","68",
"70","72","74","75","77", "79","80", "82", "83","84","86",
"87","89","91","92","93", "94", "95","96" )
Muc1A <- Muc1A %>%
mutate(ID = paste(tinh, huyen, xa, diaban, hoso, sep = "_"))
Muc1A <- Muc1A %>% filter(ID %in% HO1$ID)
levels(Muc1A$m1ac2) <- c("Male","Female")
levels(Muc1A$m1ac3) <- c("head","spouse","children","parents","grandparents","grandchildren","other","NA")
levels(Muc1A$m1ac6) <- c("Yes","No","NA")
levels(Muc1A$m1ac8) <- c("Single","Married","Widowed","Divorced","Separated","NA")
save(Muc1A, file="Muc1A.RData")
#=====================MUC2X.dta==================================
MUC2X <- read.dta13("MUC2X.dta")
head(MUC2X, 2)## tinh huyen xa diaban hoso m2xma m2xc1
## 1 T<U+1EC9>nh Cà Mau 973 32239 19 13 1 12
## 2 T<U+1EC9>nh Cà Mau 973 32239 19 13 2 5
## m2xc2a m2xc2b m2xc3 m2xc4 m2xc5 m2xc6 m2xc7
## 1 THPT Không Công l<U+1EAD>p Không Không <NA> NA
## 2 K0 b<U+1EB1>ng c<U+1EA5>p Không Tu th<U+1EE5>c Không Không <NA> NA
## m2xc8 m2xma1 m2xc9 m2xc10a m2xc10b m2xc10a1 m2xc10a2 m2xc11a m2xc11b m2xc11c
## 1 <NA> 1 <NA> NA NA NA NA NA NA NA
## 2 <NA> 2 <NA> NA NA NA NA NA NA NA
## m2xc11d m2xc11e m2xc11f m2xc11g m2xc11h m2xc11i m2xc11k m2xma2 m2xc12 m2xc13
## 1 NA NA NA NA NA NA NA 1 <NA> NA
## 2 NA NA NA NA NA NA NA 2 <NA> NA
## m2xc14 m2xc15 m2xc16 m2xc17 m2xc18a m2xc18b m2xc19
## 1 NA NA NA 0 <NA> <NA> NA
## 2 NA NA NA 0 <NA> <NA> NA
MUC2X <- MUC2X %>%
mutate( tentinh = stri_trans_general(tinh, 'latin-ascii'))
MUC2X <- MUC2X %>% mutate(tentinh = str_remove_all(tentinh, "Tinh "),
tentinh = str_remove_all(tentinh, "Thanh pho "))
levels(MUC2X$tinh) <- c("01","02", "04", "06", "08", "10", "11","12" , "14", "15",
"17", "19", "20", "22", "24","25", "26", "27", "30", "31","33","34",
"35", "36", "37", "38","40", "42", "44", "45", "46", "48", "49", "51",
"52","54","56", "58","60", "62","64","66","67","68",
"70","72","74","75","77", "79","80", "82", "83","84","86",
"87","89","91","92","93", "94", "95","96" )
MUC2X <- MUC2X %>% mutate(ID = paste(tinh, huyen, xa, diaban, hoso, sep = "_"))
MUC2X <- MUC2X[which(MUC2X$ID %in% HO1$ID), ]
save(MUC2X, file="MUC2X.RData")
#=====================chitieu.dta==================================
chitieu <- read.dta13("chitieu.dta")
head(chitieu,2)## tinh huyen xa diaban hoso chi_letet chi_thuongxuyen wt9
## 1 Th<e0>nh ph? H<e0> N?i 1 4 21 13 9103 6569 6176
## 2 Th<e0>nh ph? H<e0> N?i 1 4 21 15 7671 8893 6176
## tongchitieu tongchi_bq tongchi_gd tongchi_yte
## 1 178762.5 4965.625 6930 1925
## 2 177719.5 3702.490 6250 18621
levels(chitieu$tinh) <- c("01","02", "04", "06", "08", "10", "11","12" , "14", "15",
"17", "19", "20", "22", "24","25", "26", "27", "30", "31","33","34",
"35", "36", "37", "38","40", "42", "44", "45", "46", "48", "49", "51",
"52","54","56", "58","60", "62","64","66","67","68",
"70","72","74","75","77", "79","80", "82", "83","84","86",
"87","89","91","92","93", "94", "95","96" )
chitieu <- chitieu %>% mutate(ID = paste(tinh, huyen, xa, diaban, hoso, sep = "_"))
save(chitieu, file="chitieu.RData")
#=====================Muc4a.dta==================================
Muc4a <- read.dta13("Muc4a.dta")
head(Muc4a, 2)## tinh huyen xa diaban hoso m4ama m4ac1a m4ac1b m4ac1c m4ac2
## 1 T<U+1EC9>nh Cà Mau 973 32239 19 1 1 Không Có Không Có
## 2 T<U+1EC9>nh Cà Mau 973 32239 19 1 2 Không Có Không Có
## m4ac2a m4ac3a m4ac3 m4ac4 m4ac5 m4ac6 m4ac7 m4ama1 m4ac8a
## 1 <NA> 240 92 3 Có 8 3 1 H<U+1ED9> NLT/cá nhân
## 2 <NA> 290 92 3 Có 10 6 2 H<U+1ED9> NLT/cá nhân
## m4ac8b m4ac9 m4ac10 m4ac11 m4ac12a m4ac12b m4ac13a m4ac13b m4ac13c
## 1 <NA> Không NA NA NA NA <NA> <NA> <NA>
## 2 <NA> Không NA NA NA NA <NA> <NA> <NA>
## m4ac14 m4ama2 m4ac15 m4ac16 m4ac17 m4ac18a m4ac18
## 1 Làm vi<U+1EC7>c t<U+1EA1>i nhà 1 NA NA Không NA NA
## 2 Ði b<U+1ED9> 2 NA 3 Không NA NA
## m4ac19 m4ac20 m4ama3 m4ac21 m4ac22 m4ac23 m4ac24 m4ac25 m4ac26 m4ac27a
## 1 NA <NA> 1 NA NA <NA> <NA> NA NA NA
## 2 NA <NA> 2 NA NA <NA> <NA> NA NA NA
## m4ac27b m4ama4 m4ac28 m4ac29 m4ac30 m4ac31a m4ac31b m4ac31c m4ac31d m4ac31e
## 1 NA 1 <NA> NA Không NA NA NA NA NA
## 2 NA 2 <NA> NA Không NA NA NA NA NA
## m4ac3m m4ac4c m4ac4m m4ac18m m4ac19a m4ac19m
## 1 nuôi tôm, cua d<U+1EB7>ng thu hà tôm, cua, cá
## 2 nuôi tôm, cua d<U+1EB7>ng thu hà tôm cua, cá
levels(Muc4a$tinh) <- c("01","02", "04", "06", "08", "10", "11","12" , "14", "15",
"17", "19", "20", "22", "24","25", "26", "27", "30", "31","33","34",
"35", "36", "37", "38","40", "42", "44", "45", "46", "48", "49", "51",
"52","54","56", "58","60", "62","64","66","67","68",
"70","72","74","75","77", "79","80", "82", "83","84","86",
"87","89","91","92","93", "94", "95","96" )
Muc4a <- Muc4a %>% mutate(ID = paste(tinh, huyen, xa, diaban, hoso, sep = "_"))
save(Muc4a,file="Muc4a.RData")## [1] "tinh" "huyen" "xa" "diaban" "hoso" "thunhap" "thubq"
## [8] "tongthu" "chisxkd" "chikhac" "tentinh" "ID"
## tinh huyen xa diaban hoso thunhap thubq tongthu chisxkd chikhac tentinh
## 1 96 973 32239 19 13 62990 1749 63000 0 0 Ca Mau
## 2 96 973 32239 19 13 51000 NA 0 0 0 Ca Mau
## ID
## 1 96_973_32239_19_13
## 2 96_973_32239_19_13
#thubq: thu binh quan nguoi/thang (nghin dong)
#tongthu: tong thu nhap ho gia dinh/nam (nghin dong)
HO3 <- HO3 %>% filter(is.na(thubq) == FALSE)
HO3 <- HO3 %>% select(ID, tinh, tentinh, thubq, tongthu ) %>% unique()
summary(HO3)## ID tinh tentinh thubq
## Length:9168 79 : 351 Length:9168 Min. : -440
## Class :character 01 : 315 Class :character 1st Qu.: 1749
## Mode :character 38 : 246 Mode :character Median : 2910
## 40 : 225 Mean : 3610
## 75 : 207 3rd Qu.: 4407
## 36 : 195 Max. :129884
## (Other):7629
## tongthu
## Min. : 2800
## 1st Qu.: 81842
## Median : 143762
## Mean : 229608
## 3rd Qu.: 241024
## Max. :73383606
##
HO3 <- HO3 %>% filter(thubq > 0) # delete 2 observations
#=====================2. Muc1A.dta==================================
load("Muc1A.RData")
head(Muc1A, 2)## tinh huyen xa diaban hoso m1ama m1ac2 m1ac3 m1ac4a m1ac4b m1ac5 m1ac6
## 1 96 973 32239 19 13 1 Male head 1 1974 44 <NA>
## 2 96 973 32239 19 13 2 Female parents 1 1933 85 <NA>
## m1ac7a m1ac7b m1ac7c m1ac8 m1ac9 m1ac10 m1ama1 m1ac11
## 1 NA NA NA Married 12 <NA> 1 T<U+1EC9>nh Cà Mau
## 2 NA NA NA Widowed 12 <NA> 2 T<U+1EC9>nh Cà Mau
## m1ac12 m1ac13 m1ac14a m1ac14b m1ac15a m1ac15b m1ac15c m1ac15d
## 1 Trong xã phu<U+1EDD>ng <NA> NA NA Không Không Không Không
## 2 Trong xã phu<U+1EDD>ng <NA> NA NA Không Không Không Không
## m1ac16 m1ac1 ID
## 1 Có nguy<U+1EC5>n út anh 96_973_32239_19_13
## 2 Không tr<U+1EA7>n th<U+1ECB> phèn 96_973_32239_19_13
#==================Lay cac bien sau:
#GIOITINH_CH : m1ac2
#TUOI_CH :m1ac5
#HONNHAN_CH : m1ac8
summary(Muc1A$m1ac3 )## head spouse children parents grandparents
## 9168 7092 13163 955 32
## grandchildren other NA
## 3247 580 0
## m1ama
## Min. :1
## 1st Qu.:1
## Median :1
## Mean :1
## 3rd Qu.:1
## Max. :1
#IMPORTANT NOTE: chu ho = m1ac3 == "head" and ALSO m1ama == 1
Muc1A.head <- Muc1A %>%
rename(GIOITINH_CH = m1ac2, TUOI_CH =m1ac5, HONNHAN_CH = m1ac8) %>%
filter(m1ac3 == "head") %>%
select(ID, GIOITINH_CH, TUOI_CH, HONNHAN_CH)
head(Muc1A.head, 2)## ID GIOITINH_CH TUOI_CH HONNHAN_CH
## 1 96_973_32239_19_13 Male 44 Married
## 2 96_973_32239_19_14 Male 69 Widowed
#summary(Muc1A.head)
# Drop NA factor in HONNHAN_CH
Muc1A.head <- Muc1A.head %>%
filter(HONNHAN_CH != "NA") %>%
droplevels()
summary(Muc1A.head)## ID GIOITINH_CH TUOI_CH HONNHAN_CH
## Length:9168 Male :6841 Min. : 19.00 Single : 239
## Class :character Female:2327 1st Qu.: 42.00 Married :7296
## Mode :character Median : 52.00 Widowed :1317
## Mean : 52.28 Divorced : 253
## 3rd Qu.: 61.00 Separated: 63
## Max. :101.00
## ID tinh tentinh thubq tongthu GIOITINH_CH TUOI_CH HONNHAN_CH
## 1 96_973_32239_19_13 96 Ca Mau 1749 63000 Male 44 Married
## 2 96_973_32239_19_14 96 Ca Mau 1199 69100 Male 69 Widowed
levels(GIAODUC18$HONNHAN_CH) <- c("Other", "Married" ,
"Other", "Other",
"Other", "Other" )
summary(GIAODUC18)## ID tinh tentinh thubq
## Length:9166 79 : 351 Length:9166 Min. : 10
## Class :character 01 : 315 Class :character 1st Qu.: 1749
## Mode :character 38 : 246 Mode :character Median : 2911
## 40 : 225 Mean : 3611
## 75 : 207 3rd Qu.: 4408
## 36 : 195 Max. :129884
## (Other):7627
## tongthu GIOITINH_CH TUOI_CH HONNHAN_CH
## Min. : 2800 Male :6839 Min. : 19.00 Other :1872
## 1st Qu.: 81835 Female:2327 1st Qu.: 42.00 Married:7294
## Median : 143762 Median : 52.00
## Mean : 229585 Mean : 52.29
## 3rd Qu.: 241008 3rd Qu.: 61.00
## Max. :73383606 Max. :101.00
##
Muc1A.head <- NULL
#==================3. HO1.RData============================
#==================Lay cac bien sau:
#DANTOC_CH; "Kinh", "Minority"
#NOISONG: "URBAN","RURAL"
#TSNGUOI: so thanh vien ho
load("HO1.RData")
head(HO1, 2)## vung tinh huyen xa diaban hoso
## 1 Ä\220ồng bằng sông Cá»u Long 96 973 32239 19 13
## 2 Ä\220ồng bằng sông Cá»u Long 96 973 32239 19 14
## quyen tsphieu ttnt dantoc phdich dtv dt ngaydt thangdt
## 1 1B-/KSMS-QSG18-HO 1 Nông thôn 1 Không 29 10 11 12
## 2 1B-/KSMS-QSG18-HO 1 Nông thôn 1 Không 29 10 12 12
## namdt tsnguoi tinh16 huyen16 xa16 diaban16 ttnt16 hoso16 ky m1b1 tsmuc1b
## 1 2018 3 0 NA NA NA <NA> NA 4 Không NA
## 2 2018 4 0 NA NA NA <NA> NA 4 Không NA
## m1c1 tsmuc1c m2dct m2xct m2vct m2xtn m2vtn m3c1g m3ct1 m3ct2 m3ct3 m3c13
## 1 Không NA NA 0 NA 0 NA Có 60 3000 0 600
## 2 Không NA NA 0 NA 0 NA Có 0 19000 0 0
## m3c14 m3c15 m3ct m3tn test loaiphieu ID
## 1 0 0 3660 0 2 1B-/KSMS-QSG18-HO 96_973_32239_19_13
## 2 0 0 19000 0 2 1B-/KSMS-QSG18-HO 96_973_32239_19_14
HO1 <- HO1 %>% rename(NOISONG = ttnt, DANTOC_CH = dantoc, TSNGUOI = tsnguoi ) %>%
select(ID, NOISONG, DANTOC_CH, TSNGUOI)
HO1 <- HO1 %>% mutate(DANTOC_CH = as.factor(ifelse(DANTOC_CH == 1, "Kinh", "Minority")))
levels(HO1$NOISONG) <- c("URBAN","RURAL")
names(GIAODUC18)## [1] "ID" "tinh" "tentinh" "thubq" "tongthu"
## [6] "GIOITINH_CH" "TUOI_CH" "HONNHAN_CH"
## [1] "ID" "NOISONG" "DANTOC_CH" "TSNGUOI"
## [1] "ID" "tinh" "tentinh" "thubq" "tongthu"
## [6] "GIOITINH_CH" "TUOI_CH" "HONNHAN_CH" "NOISONG" "DANTOC_CH"
## [11] "TSNGUOI"
## ID tinh tentinh thubq
## Length:9166 79 : 351 Length:9166 Min. : 10
## Class :character 01 : 315 Class :character 1st Qu.: 1749
## Mode :character 38 : 246 Mode :character Median : 2911
## 40 : 225 Mean : 3611
## 75 : 207 3rd Qu.: 4408
## 36 : 195 Max. :129884
## (Other):7627
## tongthu GIOITINH_CH TUOI_CH HONNHAN_CH NOISONG
## Min. : 2800 Male :6839 Min. : 19.00 Other :1872 URBAN:2747
## 1st Qu.: 81835 Female:2327 1st Qu.: 42.00 Married:7294 RURAL:6419
## Median : 143762 Median : 52.00
## Mean : 229585 Mean : 52.29
## 3rd Qu.: 241008 3rd Qu.: 61.00
## Max. :73383606 Max. :101.00
##
## DANTOC_CH TSNGUOI
## Kinh :7523 Min. : 1.000
## Minority:1643 1st Qu.: 3.000
## Median : 4.000
## Mean : 3.734
## 3rd Qu.: 5.000
## Max. :15.000
##
#==================3. Muc4a.RData============================
#-----------------Nghe nghiep chu ho
#LAMCONGANLUONG
#NONGLAMTHUYSAN
#KINHDOANHDICHVU
load("Muc4a.RData")
head(Muc4a, 2)## tinh huyen xa diaban hoso m4ama m4ac1a m4ac1b m4ac1c m4ac2 m4ac2a m4ac3a
## 1 96 973 32239 19 1 1 Không Có Không Có <NA> 240
## 2 96 973 32239 19 1 2 Không Có Không Có <NA> 290
## m4ac3 m4ac4 m4ac5 m4ac6 m4ac7 m4ama1 m4ac8a m4ac8b m4ac9
## 1 92 3 Có 8 3 1 H<U+1ED9> NLT/cá nhân <NA> Không
## 2 92 3 Có 10 6 2 H<U+1ED9> NLT/cá nhân <NA> Không
## m4ac10 m4ac11 m4ac12a m4ac12b m4ac13a m4ac13b m4ac13c
## 1 NA NA NA NA <NA> <NA> <NA>
## 2 NA NA NA NA <NA> <NA> <NA>
## m4ac14 m4ama2 m4ac15 m4ac16 m4ac17 m4ac18a m4ac18
## 1 Làm vi<U+1EC7>c t<U+1EA1>i nhà 1 NA NA Không NA NA
## 2 Ði b<U+1ED9> 2 NA 3 Không NA NA
## m4ac19 m4ac20 m4ama3 m4ac21 m4ac22 m4ac23 m4ac24 m4ac25 m4ac26 m4ac27a
## 1 NA <NA> 1 NA NA <NA> <NA> NA NA NA
## 2 NA <NA> 2 NA NA <NA> <NA> NA NA NA
## m4ac27b m4ama4 m4ac28 m4ac29 m4ac30 m4ac31a m4ac31b m4ac31c m4ac31d m4ac31e
## 1 NA 1 <NA> NA Không NA NA NA NA NA
## 2 NA 2 <NA> NA Không NA NA NA NA NA
## m4ac3m m4ac4c m4ac4m m4ac18m m4ac19a m4ac19m
## 1 nuôi tôm, cua d<U+1EB7>ng thu hà tôm, cua, cá
## 2 nuôi tôm, cua d<U+1EB7>ng thu hà tôm cua, cá
## ID
## 1 96_973_32239_19_1
## 2 96_973_32239_19_1
## [1] 45839
Muc4a <- Muc4a %>% filter(ID %in% GIAODUC18$ID)
# m4ama = 1 chu ho
Muc4a <- Muc4a %>% filter(m4ama == 1 & ID %in% GIAODUC18$ID) %>%
rename(LAMCONGANLUONG = m4ac1a, NONGLAMTHUYSAN = m4ac1b ,
KINHDOANHDICHVU =m4ac1c) %>%
select(ID, LAMCONGANLUONG, NONGLAMTHUYSAN, KINHDOANHDICHVU )
levels(Muc4a$LAMCONGANLUONG) <- c("Yes", "No")
levels(Muc4a$NONGLAMTHUYSAN) <- c("Yes", "No")
levels(Muc4a$KINHDOANHDICHVU) <- c("Yes", "No")
Muc4a <- Muc4a %>%
mutate(NGHENGHIEP_CH = case_when(LAMCONGANLUONG == "Yes" ~ "LAMCONGANLUONG",
NONGLAMTHUYSAN == "Yes" ~ "NONGLAMTHUYSAN",
TRUE ~ "KINHDOANHDICHVU")) %>%
select(ID, NGHENGHIEP_CH)
GIAODUC18 <- left_join(GIAODUC18, Muc4a )
summary(GIAODUC18)## ID tinh tentinh thubq
## Length:9166 79 : 351 Length:9166 Min. : 10
## Class :character 01 : 315 Class :character 1st Qu.: 1749
## Mode :character 38 : 246 Mode :character Median : 2911
## 40 : 225 Mean : 3611
## 75 : 207 3rd Qu.: 4408
## 36 : 195 Max. :129884
## (Other):7627
## tongthu GIOITINH_CH TUOI_CH HONNHAN_CH NOISONG
## Min. : 2800 Male :6839 Min. : 19.00 Other :1872 URBAN:2747
## 1st Qu.: 81835 Female:2327 1st Qu.: 42.00 Married:7294 RURAL:6419
## Median : 143762 Median : 52.00
## Mean : 229585 Mean : 52.29
## 3rd Qu.: 241008 3rd Qu.: 61.00
## Max. :73383606 Max. :101.00
##
## DANTOC_CH TSNGUOI NGHENGHIEP_CH
## Kinh :7523 Min. : 1.000 Length:9166
## Minority:1643 1st Qu.: 3.000 Class :character
## Median : 4.000 Mode :character
## Mean : 3.734
## 3rd Qu.: 5.000
## Max. :15.000
##
Muc4a <- NULL
#-------------------------MUC2X.RData: BANGCAP_CH----------------------
load("MUC2X.RData")
names(MUC2X)## [1] "tinh" "huyen" "xa" "diaban" "hoso" "m2xma"
## [7] "m2xc1" "m2xc2a" "m2xc2b" "m2xc3" "m2xc4" "m2xc5"
## [13] "m2xc6" "m2xc7" "m2xc8" "m2xma1" "m2xc9" "m2xc10a"
## [19] "m2xc10b" "m2xc10a1" "m2xc10a2" "m2xc11a" "m2xc11b" "m2xc11c"
## [25] "m2xc11d" "m2xc11e" "m2xc11f" "m2xc11g" "m2xc11h" "m2xc11i"
## [31] "m2xc11k" "m2xma2" "m2xc12" "m2xc13" "m2xc14" "m2xc15"
## [37] "m2xc16" "m2xc17" "m2xc18a" "m2xc18b" "m2xc19" "tentinh"
## [43] "ID"
#head(MUC2X, 2)
# m2xc2a Bang cap cao nhat
MUC2X.head <- MUC2X %>% filter(m2xma == 1& ID %in% GIAODUC18$ID ) %>%
select(ID, m2xc2a)
MUC2X.head$m2xc2a <- as.factor(MUC2X.head$m2xc2a)
summary(MUC2X.head$m2xc2a)## NR K0 b<U+1EB1>ng c<U+1EA5>p Ti<U+1EC3>u h<U+1ECD>c THCS THPT Cao d<U+1EB3>ng
## 0 1477 2316 2699 1428 120
## Ð<U+1EA1>i h<U+1ECD>c Th<U+1EA1>c si Ti<U+1EBF>n si Khác NA's
## 573 30 3 3 517
## [1] "NR" "K0 b<U+1EB1>ng c<U+1EA5>p" "Ti<U+1EC3>u h<U+1ECD>c" "THCS" "THPT"
## [6] "Cao d<U+1EB3>ng" "Ð<U+1EA1>i h<U+1ECD>c" "Th<U+1EA1>c si" "Ti<U+1EBF>n si" "Khác"
levels(MUC2X.head$m2xc2a) <- c("No qualification", "No qualification",
"Primary school",
"Secondary-high school",
"Secondary-high school",
"University", "University", "University",
"University",
"No qualification")
summary(MUC2X.head$m2xc2a)## No qualification Primary school Secondary-high school
## 1480 2316 4127
## University NA's
## 726 517
MUC2X.head$m2xc2a[is.na(MUC2X.head$m2xc2a)] <- "No qualification"
MUC2X.head <- MUC2X.head %>% rename(BANGCAP_CH = m2xc2a)
GIAODUC18 <- left_join(GIAODUC18, MUC2X.head)
summary(GIAODUC18)## ID tinh tentinh thubq
## Length:9166 79 : 351 Length:9166 Min. : 10
## Class :character 01 : 315 Class :character 1st Qu.: 1749
## Mode :character 38 : 246 Mode :character Median : 2911
## 40 : 225 Mean : 3611
## 75 : 207 3rd Qu.: 4408
## 36 : 195 Max. :129884
## (Other):7627
## tongthu GIOITINH_CH TUOI_CH HONNHAN_CH NOISONG
## Min. : 2800 Male :6839 Min. : 19.00 Other :1872 URBAN:2747
## 1st Qu.: 81835 Female:2327 1st Qu.: 42.00 Married:7294 RURAL:6419
## Median : 143762 Median : 52.00
## Mean : 229585 Mean : 52.29
## 3rd Qu.: 241008 3rd Qu.: 61.00
## Max. :73383606 Max. :101.00
##
## DANTOC_CH TSNGUOI NGHENGHIEP_CH
## Kinh :7523 Min. : 1.000 Length:9166
## Minority:1643 1st Qu.: 3.000 Class :character
## Median : 4.000 Mode :character
## Mean : 3.734
## 3rd Qu.: 5.000
## Max. :15.000
##
## BANGCAP_CH
## No qualification :1997
## Primary school :2316
## Secondary-high school:4127
## University : 726
##
##
##
MUC2X.head <- NULL
#-------------------------MUC2X.RData: SONAMDANGHOC, SONUDANGHOC----------------------
#head(Muc1A)
#-----------Individual ID
Muc1A <- Muc1A %>% mutate(IDind = paste0(ID, "_", m1ama))
head(Muc1A$IDind, 2)## [1] "96_973_32239_19_13_1" "96_973_32239_19_13_2"
#Muc1A$m1ac2: GIOITINH
Muc1A.GIOITINH <- Muc1A %>% rename(GIOITINH = m1ac2) %>%
select(ID, IDind, GIOITINH)
head(Muc1A.GIOITINH, 2)## ID IDind GIOITINH
## 1 96_973_32239_19_13 96_973_32239_19_13_1 Male
## 2 96_973_32239_19_13 96_973_32239_19_13_2 Female
## [1] Không <NA> Có Ngh<U+1EC9> hè
## Levels: Có Ngh<U+1EC9> hè Không
#summary(MUC2X$m2xc4)
#levels(MUC2X$m2xc4)
levels(MUC2X$m2xc4) <- c("Yes", "Yes", "No")
MUC2X$m2xc4[is.na(MUC2X$m2xc4)] <- "No"
#summary(MUC2X$m2xc4)
#-----------Individual ID
#head(MUC2X)
MUC2X <- MUC2X %>% mutate(IDind = paste0(ID, "_", m2xma))
#head(MUC2X$IDind)
#Kiem tra MUC2X$IDind thuoc Muc1A$IDind
setdiff(MUC2X$IDind, Muc1A$IDind) #DONE## character(0)
Muc2AB_NAMNU <- MUC2X %>% filter(m2xc4 == "Yes") %>%
select(ID, IDind, m2xc4)
Muc2AB_NAMNU <- left_join(Muc2AB_NAMNU, Muc1A.GIOITINH)
#head(Muc2AB_NAMNU )
#summary(Muc2AB_NAMNU )
Muc2AB_NAMNU_Count <- Muc2AB_NAMNU %>% group_by(ID, GIOITINH) %>%
summarise(nvalue = n()) ## Count #SONAMDANGHOC #SONUDANGHOC
#head(Muc2AB_NAMNU_Count)
SONAMDANGHOC <- Muc2AB_NAMNU_Count %>%
filter(GIOITINH == "Male") %>%
rename(SONAMDANGHOC = nvalue ) %>%
select(ID, SONAMDANGHOC )
SONUDANGHOC <- Muc2AB_NAMNU_Count %>%
filter(GIOITINH == "Female") %>%
rename(SONUDANGHOC = nvalue ) %>%
select(ID, SONUDANGHOC )
GIAODUC18 <- left_join(GIAODUC18, SONAMDANGHOC )
GIAODUC18 <- left_join(GIAODUC18, SONUDANGHOC )
GIAODUC18$SONAMDANGHOC[is.na(GIAODUC18$SONAMDANGHOC)] <- 0
GIAODUC18$SONUDANGHOC[is.na(GIAODUC18$SONUDANGHOC)] <- 0
GIAODUC18 <- GIAODUC18 %>%
mutate(songuoidihoc = SONAMDANGHOC + SONUDANGHOC)
GIAODUC18 <- GIAODUC18 %>% mutate(songuoidihocFactor = as.factor(songuoidihoc ))
summary(GIAODUC18)## ID tinh tentinh thubq
## Length:9166 79 : 351 Length:9166 Min. : 10
## Class :character 01 : 315 Class :character 1st Qu.: 1749
## Mode :character 38 : 246 Mode :character Median : 2911
## 40 : 225 Mean : 3611
## 75 : 207 3rd Qu.: 4408
## 36 : 195 Max. :129884
## (Other):7627
## tongthu GIOITINH_CH TUOI_CH HONNHAN_CH NOISONG
## Min. : 2800 Male :6839 Min. : 19.00 Other :1872 URBAN:2747
## 1st Qu.: 81835 Female:2327 1st Qu.: 42.00 Married:7294 RURAL:6419
## Median : 143762 Median : 52.00
## Mean : 229585 Mean : 52.29
## 3rd Qu.: 241008 3rd Qu.: 61.00
## Max. :73383606 Max. :101.00
##
## DANTOC_CH TSNGUOI NGHENGHIEP_CH
## Kinh :7523 Min. : 1.000 Length:9166
## Minority:1643 1st Qu.: 3.000 Class :character
## Median : 4.000 Mode :character
## Mean : 3.734
## 3rd Qu.: 5.000
## Max. :15.000
##
## BANGCAP_CH SONAMDANGHOC SONUDANGHOC
## No qualification :1997 Min. :0.0000 Min. :0.0000
## Primary school :2316 1st Qu.:0.0000 1st Qu.:0.0000
## Secondary-high school:4127 Median :0.0000 Median :0.0000
## University : 726 Mean :0.4743 Mean :0.4655
## 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :5.0000 Max. :5.0000
##
## songuoidihoc songuoidihocFactor
## Min. :0.0000 0:3904
## 1st Qu.:0.0000 1:2533
## Median :1.0000 2:2205
## Mean :0.9398 3: 438
## 3rd Qu.:2.0000 4: 74
## Max. :6.0000 5: 11
## 6: 1
## [1] "Có" "Khác"
levels(MUC2X$m2xc9) <- c("Yes", "No")
MUC2X_TROCAP <- MUC2X %>% select(ID, m2xc9) %>%
filter(m2xc9 == "Yes" ) %>% unique()
MUC2X_TROCAP <- MUC2X_TROCAP %>% rename(TROCAP = m2xc9)
GIAODUC18 <- left_join(GIAODUC18, MUC2X_TROCAP)
GIAODUC18$TROCAP[is.na(GIAODUC18$TROCAP)] <- "No"
summary(GIAODUC18$TROCAP)## Yes No
## 2885 6281
#------------------------chitieu.RDATA
load("chitieu.RData")
#head(chitieu)
chitieu <- chitieu %>% mutate(CHIGD = tongchi_gd,
TONGCHIGD = tongchi_gd*100/tongchitieu)
GIAODUC18 <- GIAODUC18 %>% rename(THUBQ = thubq)
GIAODUC18 <- GIAODUC18[which(GIAODUC18$THUBQ >=0), ]
GIAODUC18$tongthu <- NULL
GIAODUC18 <- left_join(GIAODUC18, chitieu %>% select(ID, CHIGD, TONGCHIGD))
summary(GIAODUC18)## ID tinh tentinh THUBQ
## Length:9166 79 : 351 Length:9166 Min. : 10
## Class :character 01 : 315 Class :character 1st Qu.: 1749
## Mode :character 38 : 246 Mode :character Median : 2911
## 40 : 225 Mean : 3611
## 75 : 207 3rd Qu.: 4408
## 36 : 195 Max. :129884
## (Other):7627
## GIOITINH_CH TUOI_CH HONNHAN_CH NOISONG DANTOC_CH
## Male :6839 Min. : 19.00 Other :1872 URBAN:2747 Kinh :7523
## Female:2327 1st Qu.: 42.00 Married:7294 RURAL:6419 Minority:1643
## Median : 52.00
## Mean : 52.29
## 3rd Qu.: 61.00
## Max. :101.00
##
## TSNGUOI NGHENGHIEP_CH BANGCAP_CH
## Min. : 1.000 Length:9166 No qualification :1997
## 1st Qu.: 3.000 Class :character Primary school :2316
## Median : 4.000 Mode :character Secondary-high school:4127
## Mean : 3.734 University : 726
## 3rd Qu.: 5.000
## Max. :15.000
##
## SONAMDANGHOC SONUDANGHOC songuoidihoc songuoidihocFactor
## Min. :0.0000 Min. :0.0000 Min. :0.0000 0:3904
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1:2533
## Median :0.0000 Median :0.0000 Median :1.0000 2:2205
## Mean :0.4743 Mean :0.4655 Mean :0.9398 3: 438
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:2.0000 4: 74
## Max. :5.0000 Max. :5.0000 Max. :6.0000 5: 11
## 6: 1
## TROCAP CHIGD TONGCHIGD
## Yes:2885 Min. : 0 Min. : 0.000
## No :6281 1st Qu.: 0 1st Qu.: 0.000
## Median : 1095 Median : 1.326
## Mean : 5647 Mean : 4.121
## 3rd Qu.: 5124 3rd Qu.: 5.228
## Max. :353480 Max. :83.219
##
#==================Back to MUC2X.RDATA
setwd("D:/Tap huan VIASM/Chuoi Seminar/Huong Trinh/15Sept")
require(tableone)
require(reshape2)
require(gridExtra)
#---- To get co cau chi tieu by individual and by cap hoc-------
load("MUC2X.RData")
#MUC2X$m2xc4: Hien nay co di hoc khong?
head(MUC2X)## tinh huyen xa diaban hoso m2xma m2xc1 m2xc2a m2xc2b
## 1 96 973 32239 19 13 1 12 THPT Không
## 2 96 973 32239 19 13 2 5 K0 b<U+1EB1>ng c<U+1EA5>p Không
## 3 96 973 32239 19 13 3 5 Ti<U+1EC3>u h<U+1ECD>c Không
## 4 96 973 32239 19 14 1 5 K0 b<U+1EB1>ng c<U+1EA5>p Không
## 5 96 973 32239 19 14 2 11 THCS Không
## 6 96 973 32239 19 14 3 8 Ti<U+1EC3>u h<U+1ECD>c Không
## m2xc3 m2xc4 m2xc5 m2xc6 m2xc7 m2xc8 m2xma1 m2xc9 m2xc10a m2xc10b
## 1 Công l<U+1EAD>p Không Không <NA> NA <NA> 1 <NA> NA NA
## 2 Tu th<U+1EE5>c Không Không <NA> NA <NA> 2 <NA> NA NA
## 3 Công l<U+1EAD>p Không Không <NA> NA <NA> 3 <NA> NA NA
## 4 Tu th<U+1EE5>c Không Không <NA> NA <NA> 1 <NA> NA NA
## 5 Công l<U+1EAD>p Không Không <NA> NA <NA> 2 <NA> NA NA
## 6 Công l<U+1EAD>p Không Không <NA> NA <NA> 3 <NA> NA NA
## m2xc10a1 m2xc10a2 m2xc11a m2xc11b m2xc11c m2xc11d m2xc11e m2xc11f m2xc11g
## 1 NA NA NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA NA NA
## 3 NA NA NA NA NA NA NA NA NA
## 4 NA NA NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA NA NA
## 6 NA NA NA NA NA NA NA NA NA
## m2xc11h m2xc11i m2xc11k m2xma2 m2xc12 m2xc13 m2xc14 m2xc15 m2xc16 m2xc17
## 1 NA NA NA 1 <NA> NA NA NA NA 0
## 2 NA NA NA 2 <NA> NA NA NA NA 0
## 3 NA NA NA 3 <NA> NA NA NA NA 0
## 4 NA NA NA 1 <NA> NA NA NA NA 0
## 5 NA NA NA 2 <NA> NA NA NA NA 0
## 6 NA NA NA 3 <NA> NA NA NA NA 0
## m2xc18a m2xc18b m2xc19 tentinh ID
## 1 <NA> <NA> NA Ca Mau 96_973_32239_19_13
## 2 <NA> <NA> NA Ca Mau 96_973_32239_19_13
## 3 <NA> <NA> NA Ca Mau 96_973_32239_19_13
## 4 <NA> <NA> 0 Ca Mau 96_973_32239_19_14
## 5 <NA> <NA> NA Ca Mau 96_973_32239_19_14
## 6 <NA> <NA> NA Ca Mau 96_973_32239_19_14
## [1] Không <NA> Có Ngh<U+1EC9> hè
## Levels: Có Ngh<U+1EC9> hè Không
## Có Ngh<U+1EC9> hè Không NA's
## 6822 1795 22352 3268
## [1] "Có" "Ngh<U+1EC9> hè" "Không"
levels(MUC2X$m2xc4) <- c("Yes", "Yes", "No")
MUC2X$m2xc4[is.na(MUC2X$m2xc4)] <- "No"
summary(MUC2X$m2xc4)## Yes No
## 8617 25620
#-----------Individual ID
#headMUC2X)
MUC2X <- MUC2X %>% mutate(IDind = paste0(ID, "_", m2xma))
#headMUC2X$IDind)
names(MUC2X)## [1] "tinh" "huyen" "xa" "diaban" "hoso" "m2xma"
## [7] "m2xc1" "m2xc2a" "m2xc2b" "m2xc3" "m2xc4" "m2xc5"
## [13] "m2xc6" "m2xc7" "m2xc8" "m2xma1" "m2xc9" "m2xc10a"
## [19] "m2xc10b" "m2xc10a1" "m2xc10a2" "m2xc11a" "m2xc11b" "m2xc11c"
## [25] "m2xc11d" "m2xc11e" "m2xc11f" "m2xc11g" "m2xc11h" "m2xc11i"
## [31] "m2xc11k" "m2xma2" "m2xc12" "m2xc13" "m2xc14" "m2xc15"
## [37] "m2xc16" "m2xc17" "m2xc18a" "m2xc18b" "m2xc19" "tentinh"
## [43] "ID" "IDind"
CCCHIGD <- MUC2X %>% filter(m2xc4 == "Yes") %>%
select(ID, IDind, m2xc1, m2xc5, m2xc6, m2xc7, m2xc8 )
summary(CCCHIGD)## ID IDind m2xc1 m2xc5
## Length:8617 Length:8617 Length:8617 Có : 0
## Class :character Class :character Class :character Không: 0
## Mode :character Mode :character Mode :character NA's :8617
##
##
##
##
## m2xc6 m2xc7 m2xc8
## Ti<U+1EC3>u h<U+1ECD>c:2822 Min. : 1.000 Công l<U+1EAD>p:8224
## THCS :2134 1st Qu.: 3.000 Dân l<U+1EAD>p : 165
## Nhà tr<U+1EBB>, MG :1556 Median : 6.000 Tu th<U+1EE5>c : 200
## THPT :1171 Mean : 6.056 Khác : 28
## Ð<U+1EA1>i h<U+1ECD>c : 684 3rd Qu.: 9.000 NR : 0
## Cao d<U+1EB3>ng : 113 Max. :12.000
## (Other) : 137 NA's :2490
## [1] "tinh" "huyen" "xa" "diaban" "hoso" "m2xma"
## [7] "m2xc1" "m2xc2a" "m2xc2b" "m2xc3" "m2xc4" "m2xc5"
## [13] "m2xc6" "m2xc7" "m2xc8" "m2xma1" "m2xc9" "m2xc10a"
## [19] "m2xc10b" "m2xc10a1" "m2xc10a2" "m2xc11a" "m2xc11b" "m2xc11c"
## [25] "m2xc11d" "m2xc11e" "m2xc11f" "m2xc11g" "m2xc11h" "m2xc11i"
## [31] "m2xc11k" "m2xma2" "m2xc12" "m2xc13" "m2xc14" "m2xc15"
## [37] "m2xc16" "m2xc17" "m2xc18a" "m2xc18b" "m2xc19" "tentinh"
## [43] "ID" "IDind"
MUC2X <- MUC2X %>% mutate(HOCPHI = m2xc11a,
TRAITUYEN = m2xc11b,
DONGGOP = m2xc11c +m2xc11d,
QUANAO = m2xc11e,
SGK = m2xc11f,
DUNGCU = m2xc11g,
HOCTHEM = m2xc11h,
CHIGDKHAC = m2xc11i,
CHIALL = m2xc11k)
#ctgd: chi tieu giao duc
MUC2X.ctgd <- MUC2X %>% filter(IDind %in% CCCHIGD$IDind)%>%
select(ID, IDind, tinh, tentinh, HOCPHI, TRAITUYEN, DONGGOP, QUANAO,
SGK, DUNGCU, HOCTHEM, CHIGDKHAC, CHIALL, m2xc9) %>% unique()
summary(MUC2X.ctgd)## ID IDind tinh tentinh
## Length:8617 Length:8617 01 : 314 Length:8617
## Class :character Class :character 79 : 292 Class :character
## Mode :character Mode :character 40 : 251 Mode :character
## 38 : 238
## 66 : 189
## 75 : 186
## (Other):7147
## HOCPHI TRAITUYEN DONGGOP QUANAO
## Min. : -2 Min. : -2.0 Min. : -4.0 Min. : -2.0
## 1st Qu.: 0 1st Qu.: 0.0 1st Qu.: 100.0 1st Qu.: 0.0
## Median : 225 Median : 0.0 Median : 250.0 Median : 200.0
## Mean : 1848 Mean : 31.8 Mean : 426.6 Mean : 248.9
## 3rd Qu.: 675 3rd Qu.: 0.0 3rd Qu.: 500.0 3rd Qu.: 350.0
## Max. :350000 Max. :250000.0 Max. :69000.0 Max. :5000.0
##
## SGK DUNGCU HOCTHEM CHIGDKHAC
## Min. : -2.0 Min. : -2.0 Min. : -2 Min. : -2
## 1st Qu.: 0.0 1st Qu.: 100.0 1st Qu.: 0 1st Qu.: 0
## Median : 200.0 Median : 200.0 Median : 0 Median : 100
## Mean : 261.9 Mean : 280.1 Mean : 1074 Mean : 958
## 3rd Qu.: 350.0 3rd Qu.: 360.0 3rd Qu.: 952 3rd Qu.: 300
## Max. :6000.0 Max. :22000.0 Max. :48000 Max. :137350
##
## CHIALL m2xc9
## Min. : 0 Có :3892
## 1st Qu.: 1069 Khác:4725
## Median : 2230
## Mean : 5366
## 3rd Qu.: 5106
## Max. :353480
##
## ID IDind tinh tentinh
## Length:142 Length:142 02 :36 Length:142
## Class :character Class :character 12 :23 Class :character
## Mode :character Mode :character 62 :12 Mode :character
## 06 : 8
## 10 : 6
## 15 : 5
## (Other):52
## HOCPHI TRAITUYEN DONGGOP QUANAO SGK DUNGCU
## Min. :0 Min. :0 Min. :0 Min. :0 Min. :0 Min. :0
## 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0 1st Qu.:0
## Median :0 Median :0 Median :0 Median :0 Median :0 Median :0
## Mean :0 Mean :0 Mean :0 Mean :0 Mean :0 Mean :0
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0
## Max. :0 Max. :0 Max. :0 Max. :0 Max. :0 Max. :0
##
## HOCTHEM CHIGDKHAC CHIALL m2xc9
## Min. :0 Min. :0 Min. :0 Có :127
## 1st Qu.:0 1st Qu.:0 1st Qu.:0 Khác: 15
## Median :0 Median :0 Median :0
## Mean :0 Mean :0 Mean :0
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0
## Max. :0 Max. :0 Max. :0
##
MUC2X.ctgd <- MUC2X.ctgd %>%
filter(HOCPHI >=0 & TRAITUYEN >= 0 & DONGGOP>=0 & QUANAO >= 0& SGK >= 0 & DUNGCU >=0 & HOCTHEM >=0 & CHIGDKHAC>=0)
CCCHIGD <- left_join(CCCHIGD, MUC2X.ctgd)
CCCHIGD <- CCCHIGD %>% rename(TROCAP = m2xc9)
levels(CCCHIGD $TROCAP) <- c("Yes", "No")
CCCHIGD <- CCCHIGD %>% filter(CHIALL > 0 ) #906
levels(CCCHIGD$m2xc6) <- c("Nursery School", "Primary School",
"Secondary school", "Secondary school", "Vocational school",
"Vocational school", "Vocational school", "Vocational school",
"University", "University", "University", "University", "Others")
summary(CCCHIGD)## ID IDind m2xc1 m2xc5
## Length:7801 Length:7801 Length:7801 Có : 0
## Class :character Class :character Class :character Không: 0
## Mode :character Mode :character Mode :character NA's :7801
##
##
##
##
## m2xc6 m2xc7 m2xc8
## Nursery School :1338 Min. : 1.000 Công l<U+1EAD>p:7435
## Primary School :2584 1st Qu.: 3.000 Dân l<U+1EAD>p : 156
## Secondary school :3037 Median : 6.000 Tu th<U+1EE5>c : 184
## Vocational school: 84 Mean : 6.065 Khác : 26
## University : 742 3rd Qu.: 9.000 NR : 0
## Others : 16 Max. :12.000
## NA's :2180
## tinh tentinh HOCPHI TRAITUYEN
## 01 : 298 Length:7801 Min. : 0 Min. : 0.00
## 79 : 261 Class :character 1st Qu.: 0 1st Qu.: 0.00
## 40 : 250 Mode :character Median : 270 Median : 0.00
## 38 : 233 Mean : 1895 Mean : 35.15
## 66 : 183 3rd Qu.: 705 3rd Qu.: 0.00
## 31 : 179 Max. :350000 Max. :250000.00
## (Other):6397
## DONGGOP QUANAO SGK DUNGCU
## Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 100.0 1st Qu.: 80.0 1st Qu.: 85.0 1st Qu.: 120.0
## Median : 300.0 Median : 200.0 Median : 210.0 Median : 230.0
## Mean : 462.6 Mean : 263.9 Mean : 281.2 Mean : 300.7
## 3rd Qu.: 550.0 3rd Qu.: 360.0 3rd Qu.: 370.0 3rd Qu.: 380.0
## Max. :69000.0 Max. :5000.0 Max. :6000.0 Max. :22000.0
##
## HOCTHEM CHIGDKHAC CHIALL TROCAP
## Min. : 0 Min. : 0 Min. : 20 Yes:3480
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 1080 No :4321
## Median : 0 Median : 100 Median : 2290
## Mean : 1141 Mean : 1017 Mean : 5396
## 3rd Qu.: 1052 3rd Qu.: 360 3rd Qu.: 5190
## Max. :48000 Max. :137350 Max. :353480
##
#--------SEC1: LOC CAC TINH DONG BANG SONG HONG--
#-------------------------------------------------------------------
GIAODUC18 <- GIAODUC18 %>%
filter(tinh %in% c("01", "26", "27", "22",
"30", "31", "33", "34", "35", "36", "37")) #=========SEC2: DO CLEANING MORE!!!!!!!!!!!!!!!!1
CCCHIGD_HH <- CCCHIGD %>% group_by(ID) %>% summarise(HOCTHEM = sum(HOCTHEM))
CCCHIGD_HH <- CCCHIGD_HH %>% mutate(HOCTHEM = ifelse(HOCTHEM > 0 , "Yes", "No"))
#GIAODUC18
GIAODUC18 <- GIAODUC18 [which(GIAODUC18 $CHIGD < 60000), ] #delete some outlier
summary(GIAODUC18$TONGCHIGD)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 1.650 4.735 7.014 46.243
## [1] 844
## [1] 1866 19
GIAODUC18 <- left_join(GIAODUC18, CCCHIGD_HH)
GIAODUC18$HOCTHEM[is.na(GIAODUC18$HOCTHEM)] <- "No"
summary(GIAODUC18)## ID tinh tentinh THUBQ
## Length:1866 01 :306 Length:1866 Min. : 152
## Class :character 36 :193 Class :character 1st Qu.: 2341
## Mode :character 34 :188 Mode :character Median : 3434
## 31 :184 Mean : 4038
## 30 :180 3rd Qu.: 4873
## 22 :147 Max. :66599
## (Other):668
## GIOITINH_CH TUOI_CH HONNHAN_CH NOISONG DANTOC_CH
## Male :1443 Min. : 19.00 Other : 345 URBAN: 531 Kinh :1841
## Female: 423 1st Qu.: 45.00 Married:1521 RURAL:1335 Minority: 25
## Median : 54.00
## Mean : 54.67
## 3rd Qu.: 63.00
## Max. :101.00
##
## TSNGUOI NGHENGHIEP_CH BANGCAP_CH
## Min. : 1.000 Length:1866 No qualification : 156
## 1st Qu.: 2.000 Class :character Primary school : 307
## Median : 3.000 Mode :character Secondary-high school:1240
## Mean : 3.531 University : 163
## 3rd Qu.: 5.000
## Max. :11.000
##
## SONAMDANGHOC SONUDANGHOC songuoidihoc songuoidihocFactor
## Min. :0.0000 Min. :0.0000 Min. :0.0000 0:864
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1:421
## Median :0.0000 Median :0.0000 Median :1.0000 2:480
## Mean :0.4614 Mean :0.4432 Mean :0.9046 3: 97
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:2.0000 4: 4
## Max. :3.0000 Max. :3.0000 Max. :4.0000 5: 0
## 6: 0
## TROCAP CHIGD TONGCHIGD HOCTHEM
## Yes: 479 Min. : 0 Min. : 0.000 Length:1866
## No :1387 1st Qu.: 0 1st Qu.: 0.000 Class :character
## Median : 1800 Median : 1.650 Mode :character
## Mean : 6372 Mean : 4.735
## 3rd Qu.: 7754 3rd Qu.: 7.014
## Max. :59990 Max. :46.243
##
#====================SEC3: BANG THONG KE MO TA
factorVars <- c("NOISONG", "DANTOC_CH", "HONNHAN_CH", "GIOITINH_CH",
"NGHENGHIEP_CH", "songuoidihocFactor",
"BANGCAP_CH", "HOCTHEM", "TROCAP")
vars <- c( "TUOI_CH", "TSNGUOI", "songuoidihoc" ,
"NOISONG", "DANTOC_CH", "HONNHAN_CH", "GIOITINH_CH",
"NGHENGHIEP_CH", "songuoidihocFactor",
"BANGCAP_CH", "HOCTHEM", "TROCAP" )
Des1 <- CreateTableOne(vars = vars,
factorVars = factorVars,
data = GIAODUC18)
Des1 <- print(Des1, format = "p")##
## Overall
## n 1866
## TUOI_CH (mean (SD)) 54.67 (13.79)
## TSNGUOI (mean (SD)) 3.53 (1.57)
## songuoidihoc (mean (SD)) 0.90 (0.97)
## NOISONG = RURAL (%) 71.5
## DANTOC_CH = Minority (%) 1.3
## HONNHAN_CH = Married (%) 81.5
## GIOITINH_CH = Female (%) 22.7
## NGHENGHIEP_CH (%)
## KINHDOANHDICHVU 31.8
## LAMCONGANLUONG 40.1
## NONGLAMTHUYSAN 28.0
## songuoidihocFactor (%)
## 0 46.3
## 1 22.6
## 2 25.7
## 3 5.2
## 4 0.2
## BANGCAP_CH (%)
## No qualification 8.4
## Primary school 16.5
## Secondary-high school 66.5
## University 8.7
## HOCTHEM = Yes (%) 35.6
## TROCAP = No (%) 74.3
write.csv(Des1, file = "Des1.csv")
#=====================SEC2: DO CLEANING MORE!!!!!!!!!!!!!!!!1
summary(GIAODUC18$songuoidihocFactor)## 0 1 2 3 4 5 6
## 864 421 480 97 4 0 0
levels(GIAODUC18$songuoidihocFactor) <- c("0", "1", "2", "3", "3", "3", "3" )
DesALL <- list()
for (i in unique(GIAODUC18$tentinh))
{
Tempt <- NULL
Tempt <- CreateTableOne(vars = vars,
factorVars = factorVars,
data = GIAODUC18 %>%
filter(tentinh == i))
DesALL[[i]] <- print( Tempt, format = "p")
}##
## Overall
## n 147
## TUOI_CH (mean (SD)) 53.00 (13.11)
## TSNGUOI (mean (SD)) 3.59 (1.38)
## songuoidihoc (mean (SD)) 0.85 (0.90)
## NOISONG = RURAL (%) 40.8
## DANTOC_CH = Minority (%) 9.5
## HONNHAN_CH = Married (%) 83.7
## GIOITINH_CH = Female (%) 17.0
## NGHENGHIEP_CH (%)
## KINHDOANHDICHVU 36.7
## LAMCONGANLUONG 40.1
## NONGLAMTHUYSAN 23.1
## songuoidihocFactor (%)
## 0 45.6
## 1 27.2
## 2 23.8
## 3 3.4
## BANGCAP_CH (%)
## No qualification 12.9
## Primary school 18.4
## Secondary-high school 57.8
## University 10.9
## HOCTHEM = Yes (%) 15.6
## TROCAP = No (%) 72.8
##
## Overall
## n 134
## TUOI_CH (mean (SD)) 50.25 (12.74)
## TSNGUOI (mean (SD)) 4.07 (1.58)
## songuoidihoc (mean (SD)) 1.22 (1.02)
## NOISONG = RURAL (%) 73.9
## DANTOC_CH = Minority (%) 5.2
## HONNHAN_CH = Married (%) 84.3
## GIOITINH_CH = Female (%) 25.4
## NGHENGHIEP_CH (%)
## KINHDOANHDICHVU 25.4
## LAMCONGANLUONG 42.5
## NONGLAMTHUYSAN 32.1
## songuoidihocFactor (%)
## 0 32.8
## 1 22.4
## 2 34.3
## 3 10.4
## BANGCAP_CH (%)
## No qualification 7.5
## Primary school 19.4
## Secondary-high school 62.7
## University 10.4
## HOCTHEM = Yes (%) 54.5
## TROCAP = No (%) 56.0
##
## Overall
## n 137
## TUOI_CH (mean (SD)) 51.27 (11.27)
## TSNGUOI (mean (SD)) 4.18 (1.71)
## songuoidihoc (mean (SD)) 1.20 (1.02)
## NOISONG = RURAL (%) 72.3
## DANTOC_CH = Kinh (%) 100.0
## HONNHAN_CH = Married (%) 86.1
## GIOITINH_CH = Female (%) 19.7
## NGHENGHIEP_CH (%)
## KINHDOANHDICHVU 34.3
## LAMCONGANLUONG 42.3
## NONGLAMTHUYSAN 23.4
## songuoidihocFactor (%)
## 0 33.6
## 1 23.4
## 2 32.8
## 3 10.2
## BANGCAP_CH (%)
## No qualification 10.2
## Primary school 24.1
## Secondary-high school 58.4
## University 7.3
## HOCTHEM = Yes (%) 30.7
## TROCAP = No (%) 74.5
##
## Overall
## n 180
## TUOI_CH (mean (SD)) 55.12 (13.70)
## TSNGUOI (mean (SD)) 3.57 (1.58)
## songuoidihoc (mean (SD)) 0.82 (0.95)
## NOISONG = RURAL (%) 78.3
## DANTOC_CH = Kinh (%) 100.0
## HONNHAN_CH = Married (%) 83.3
## GIOITINH_CH = Female (%) 18.9
## NGHENGHIEP_CH (%)
## KINHDOANHDICHVU 26.1
## LAMCONGANLUONG 40.6
## NONGLAMTHUYSAN 33.3
## songuoidihocFactor (%)
## 0 51.1
## 1 20.6
## 2 23.9
## 3 4.4
## BANGCAP_CH (%)
## No qualification 5.6
## Primary school 15.0
## Secondary-high school 75.0
## University 4.4
## HOCTHEM = Yes (%) 35.0
## TROCAP = No (%) 76.7
##
## Overall
## n 184
## TUOI_CH (mean (SD)) 53.11 (14.41)
## TSNGUOI (mean (SD)) 3.51 (1.47)
## songuoidihoc (mean (SD)) 0.95 (0.91)
## NOISONG = RURAL (%) 53.8
## DANTOC_CH = Minority (%) 0.5
## HONNHAN_CH = Married (%) 78.8
## GIOITINH_CH = Female (%) 27.2
## NGHENGHIEP_CH (%)
## KINHDOANHDICHVU 33.7
## LAMCONGANLUONG 46.7
## NONGLAMTHUYSAN 19.6
## songuoidihocFactor (%)
## 0 41.3
## 1 24.5
## 2 32.1
## 3 2.2
## BANGCAP_CH (%)
## No qualification 6.0
## Primary school 19.6
## Secondary-high school 62.0
## University 12.5
## HOCTHEM = Yes (%) 46.7
## TROCAP = No (%) 68.5
##
## Overall
## n 145
## TUOI_CH (mean (SD)) 56.56 (14.34)
## TSNGUOI (mean (SD)) 3.30 (1.57)
## songuoidihoc (mean (SD)) 0.78 (0.93)
## NOISONG = RURAL (%) 88.3
## DANTOC_CH = Kinh (%) 100.0
## HONNHAN_CH = Married (%) 77.2
## GIOITINH_CH = Female (%) 21.4
## NGHENGHIEP_CH (%)
## KINHDOANHDICHVU 26.2
## LAMCONGANLUONG 42.1
## NONGLAMTHUYSAN 31.7
## songuoidihocFactor (%)
## 0 53.1
## 1 18.6
## 2 26.2
## 3 2.1
## BANGCAP_CH (%)
## No qualification 11.0
## Primary school 11.0
## Secondary-high school 71.7
## University 6.2
## HOCTHEM = Yes (%) 31.7
## TROCAP = No (%) 75.9
##
## Overall
## n 188
## TUOI_CH (mean (SD)) 57.19 (14.38)
## TSNGUOI (mean (SD)) 3.03 (1.51)
## songuoidihoc (mean (SD)) 0.69 (0.89)
## NOISONG = RURAL (%) 90.4
## DANTOC_CH = Kinh (%) 100.0
## HONNHAN_CH = Married (%) 80.9
## GIOITINH_CH = Female (%) 18.1
## NGHENGHIEP_CH (%)
## KINHDOANHDICHVU 25.5
## LAMCONGANLUONG 39.9
## NONGLAMTHUYSAN 34.6
## songuoidihocFactor (%)
## 0 56.4
## 1 22.3
## 2 17.6
## 3 3.7
## BANGCAP_CH (%)
## No qualification 4.3
## Primary school 10.6
## Secondary-high school 81.4
## University 3.7
## HOCTHEM = Yes (%) 28.2
## TROCAP = No (%) 79.3
##
## Overall
## n 123
## TUOI_CH (mean (SD)) 56.41 (14.35)
## TSNGUOI (mean (SD)) 3.11 (1.34)
## songuoidihoc (mean (SD)) 0.81 (0.91)
## NOISONG = RURAL (%) 87.8
## DANTOC_CH = Kinh (%) 100.0
## HONNHAN_CH = Married (%) 78.0
## GIOITINH_CH = Female (%) 20.3
## NGHENGHIEP_CH (%)
## KINHDOANHDICHVU 30.1
## LAMCONGANLUONG 35.0
## NONGLAMTHUYSAN 35.0
## songuoidihocFactor (%)
## 0 48.0
## 1 26.8
## 2 21.1
## 3 4.1
## BANGCAP_CH (%)
## No qualification 12.2
## Primary school 13.0
## Secondary-high school 72.4
## University 2.4
## HOCTHEM = Yes (%) 41.5
## TROCAP = No (%) 77.2
##
## Overall
## n 193
## TUOI_CH (mean (SD)) 57.35 (13.10)
## TSNGUOI (mean (SD)) 3.15 (1.53)
## songuoidihoc (mean (SD)) 0.75 (0.98)
## NOISONG = RURAL (%) 81.3
## DANTOC_CH = Kinh (%) 100.0
## HONNHAN_CH = Married (%) 76.2
## GIOITINH_CH = Female (%) 22.8
## NGHENGHIEP_CH (%)
## KINHDOANHDICHVU 24.4
## LAMCONGANLUONG 37.3
## NONGLAMTHUYSAN 38.3
## songuoidihocFactor (%)
## 0 58.0
## 1 15.0
## 2 21.2
## 3 5.7
## BANGCAP_CH (%)
## No qualification 11.4
## Primary school 16.6
## Secondary-high school 67.4
## University 4.7
## HOCTHEM = Yes (%) 30.1
## TROCAP = No (%) 82.4
##
## Overall
## n 129
## TUOI_CH (mean (SD)) 51.15 (15.32)
## TSNGUOI (mean (SD)) 3.41 (1.36)
## songuoidihoc (mean (SD)) 1.00 (0.97)
## NOISONG = RURAL (%) 79.1
## DANTOC_CH = Minority (%) 1.6
## HONNHAN_CH = Married (%) 88.4
## GIOITINH_CH = Female (%) 19.4
## NGHENGHIEP_CH (%)
## KINHDOANHDICHVU 19.4
## LAMCONGANLUONG 50.4
## NONGLAMTHUYSAN 30.2
## songuoidihocFactor (%)
## 0 39.5
## 1 27.9
## 2 25.6
## 3 7.0
## BANGCAP_CH (%)
## No qualification 7.0
## Primary school 17.1
## Secondary-high school 70.5
## University 5.4
## HOCTHEM = Yes (%) 43.4
## TROCAP = No (%) 65.9
##
## Overall
## n 306
## TUOI_CH (mean (SD)) 56.25 (13.05)
## TSNGUOI (mean (SD)) 3.84 (1.63)
## songuoidihoc (mean (SD)) 0.97 (1.01)
## NOISONG = RURAL (%) 56.2
## DANTOC_CH = Minority (%) 0.3
## HONNHAN_CH = Married (%) 82.0
## GIOITINH_CH = Female (%) 30.7
## NGHENGHIEP_CH (%)
## KINHDOANHDICHVU 50.7
## LAMCONGANLUONG 32.7
## NONGLAMTHUYSAN 16.7
## songuoidihocFactor (%)
## 0 43.8
## 1 22.9
## 2 26.5
## 3 6.9
## BANGCAP_CH (%)
## No qualification 7.2
## Primary school 17.0
## Secondary-high school 57.2
## University 18.6
## HOCTHEM = Yes (%) 37.3
## TROCAP = No (%) 78.8
DesALPrint <- data.frame(DesALL[["Ha Noi"]])
names(DesALPrint ) <- "Ha Noi"
for (i in c("Quang Ninh", "Vinh Phuc", "Bac Ninh",
"Hai Duong", "Hai Phong", "Hung Yen",
"Thai Binh", "Ha Nam", "Nam Dinh", "Ninh Binh"))
{
DesALPrint[, i] <- DesALL[[i]]
}
names(DesALPrint) <- c("Ha Noi", "Quang Ninh", "Vinh Phuc", "Bac Ninh",
"Hai Duong", "Hai Phong", "Hung Yen",
"Thai Binh", "Ha Nam", "Nam Dinh", "Ninh Binh")
DesALPrint## Ha Noi Overall Overall
## n 306 147 134
## TUOI_CH (mean (SD)) 56.25 (13.05) 53.00 (13.11) 50.25 (12.74)
## TSNGUOI (mean (SD)) 3.84 (1.63) 3.59 (1.38) 4.07 (1.58)
## songuoidihoc (mean (SD)) 0.97 (1.01) 0.85 (0.90) 1.22 (1.02)
## NOISONG = RURAL (%) 56.2 40.8 73.9
## DANTOC_CH = Minority (%) 0.3 9.5 5.2
## HONNHAN_CH = Married (%) 82.0 83.7 84.3
## GIOITINH_CH = Female (%) 30.7 17.0 25.4
## NGHENGHIEP_CH (%)
## KINHDOANHDICHVU 50.7 36.7 25.4
## LAMCONGANLUONG 32.7 40.1 42.5
## NONGLAMTHUYSAN 16.7 23.1 32.1
## songuoidihocFactor (%)
## 0 43.8 45.6 32.8
## 1 22.9 27.2 22.4
## 2 26.5 23.8 34.3
## 3 6.9 3.4 10.4
## BANGCAP_CH (%)
## No qualification 7.2 12.9 7.5
## Primary school 17.0 18.4 19.4
## Secondary-high school 57.2 57.8 62.7
## University 18.6 10.9 10.4
## HOCTHEM = Yes (%) 37.3 15.6 54.5
## TROCAP = No (%) 78.8 72.8 56.0
## Overall Overall Overall
## n 137 180 184
## TUOI_CH (mean (SD)) 51.27 (11.27) 55.12 (13.70) 53.11 (14.41)
## TSNGUOI (mean (SD)) 4.18 (1.71) 3.57 (1.58) 3.51 (1.47)
## songuoidihoc (mean (SD)) 1.20 (1.02) 0.82 (0.95) 0.95 (0.91)
## NOISONG = RURAL (%) 72.3 78.3 53.8
## DANTOC_CH = Minority (%) 100.0 100.0 0.5
## HONNHAN_CH = Married (%) 86.1 83.3 78.8
## GIOITINH_CH = Female (%) 19.7 18.9 27.2
## NGHENGHIEP_CH (%)
## KINHDOANHDICHVU 34.3 26.1 33.7
## LAMCONGANLUONG 42.3 40.6 46.7
## NONGLAMTHUYSAN 23.4 33.3 19.6
## songuoidihocFactor (%)
## 0 33.6 51.1 41.3
## 1 23.4 20.6 24.5
## 2 32.8 23.9 32.1
## 3 10.2 4.4 2.2
## BANGCAP_CH (%)
## No qualification 10.2 5.6 6.0
## Primary school 24.1 15.0 19.6
## Secondary-high school 58.4 75.0 62.0
## University 7.3 4.4 12.5
## HOCTHEM = Yes (%) 30.7 35.0 46.7
## TROCAP = No (%) 74.5 76.7 68.5
## Overall Overall Overall
## n 145 188 123
## TUOI_CH (mean (SD)) 56.56 (14.34) 57.19 (14.38) 56.41 (14.35)
## TSNGUOI (mean (SD)) 3.30 (1.57) 3.03 (1.51) 3.11 (1.34)
## songuoidihoc (mean (SD)) 0.78 (0.93) 0.69 (0.89) 0.81 (0.91)
## NOISONG = RURAL (%) 88.3 90.4 87.8
## DANTOC_CH = Minority (%) 100.0 100.0 100.0
## HONNHAN_CH = Married (%) 77.2 80.9 78.0
## GIOITINH_CH = Female (%) 21.4 18.1 20.3
## NGHENGHIEP_CH (%)
## KINHDOANHDICHVU 26.2 25.5 30.1
## LAMCONGANLUONG 42.1 39.9 35.0
## NONGLAMTHUYSAN 31.7 34.6 35.0
## songuoidihocFactor (%)
## 0 53.1 56.4 48.0
## 1 18.6 22.3 26.8
## 2 26.2 17.6 21.1
## 3 2.1 3.7 4.1
## BANGCAP_CH (%)
## No qualification 11.0 4.3 12.2
## Primary school 11.0 10.6 13.0
## Secondary-high school 71.7 81.4 72.4
## University 6.2 3.7 2.4
## HOCTHEM = Yes (%) 31.7 28.2 41.5
## TROCAP = No (%) 75.9 79.3 77.2
## Overall Overall
## n 193 129
## TUOI_CH (mean (SD)) 57.35 (13.10) 51.15 (15.32)
## TSNGUOI (mean (SD)) 3.15 (1.53) 3.41 (1.36)
## songuoidihoc (mean (SD)) 0.75 (0.98) 1.00 (0.97)
## NOISONG = RURAL (%) 81.3 79.1
## DANTOC_CH = Minority (%) 100.0 1.6
## HONNHAN_CH = Married (%) 76.2 88.4
## GIOITINH_CH = Female (%) 22.8 19.4
## NGHENGHIEP_CH (%)
## KINHDOANHDICHVU 24.4 19.4
## LAMCONGANLUONG 37.3 50.4
## NONGLAMTHUYSAN 38.3 30.2
## songuoidihocFactor (%)
## 0 58.0 39.5
## 1 15.0 27.9
## 2 21.2 25.6
## 3 5.7 7.0
## BANGCAP_CH (%)
## No qualification 11.4 7.0
## Primary school 16.6 17.1
## Secondary-high school 67.4 70.5
## University 4.7 5.4
## HOCTHEM = Yes (%) 30.1 43.4
## TROCAP = No (%) 82.4 65.9
write.csv(DesALPrint, file = "DesALPrint.csv")
#========================SEC4. Chi tieu giao duc tren tong chi tieu
p1 <- ggplot(GIAODUC18, aes(y = TONGCHIGD, x = tentinh)) +
geom_boxplot() + ylim (c(0, 20)) +
ylab("Phantram") +
stat_summary(fun.y = mean, colour = "darkred", geom = "point",
shape = 18, size = 3,show_guide = FALSE) +
ggtitle("Phan tram (%) chi tieu cho giao duc") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
p1means <- aggregate(CHIGD ~ tentinh, GIAODUC18, mean)
p2 <- ggplot(GIAODUC18, aes(y = CHIGD, x = tentinh)) +
geom_boxplot() + ylim (c( 0, 36000)) + ylab("nghin dong")+
stat_summary(fun.y = mean, colour = "darkred", geom = "point",
shape = 18, size = 3,show_guide = FALSE) +
ggtitle("Chi tieu cho giao duc (Nghin dong)") +
theme(axis.text.x = element_text(angle = 90,
vjust = 0.5, hjust = 1))
p2#-----ghi tren folder
op <- par(mfrow = c(1, 1))
png(filename = "Hinh1a.png", width = 800, height = 480)
p1
par(op)
dev.off()## png
## 2
op <- par(mfrow = c(1, 1))
png(filename = "Hinh1b.png", width = 800, height = 480)
p2
par(op)
dev.off()## png
## 2
#---------------SEC4: Work with CCCHIGD
#------------------4.1. LOC CAC TINH DONG BANG SONG HONG
CCCHIGD <- CCCHIGD %>%
filter(tinh %in% c("01", "26", "27", "22", "30", "31",
"33", "34", "35", "36", "37"))
dim(CCCHIGD) # ## [1] 1635 19
#CCCHIGD %>% filter(CHIALL >= 40000)
CCCHIGD <- CCCHIGD %>% filter(CHIALL < 40000) # delete 13 observations
CCCHIGD$m2xc6 <- as.character(CCCHIGD$m2xc6)
p3 <- ggplot(CCCHIGD, aes(x = tentinh, y = CHIALL, fill = m2xc6)) +
geom_boxplot(outlier.shape = NA) + ylim(0, 35000) +
ggtitle("Chi phi 1 nam hoc theo cap hoc va theo tinh") +
theme(axis.text.x = element_text(angle = 90,
vjust = 0.5, hjust = 1))
p3op <- par(mfrow = c(1, 1))
png(filename = "Hinh3.png", width = 800, height = 480)
p3
par(op)
dev.off()## png
## 2
Tiếp tục xử lý chi tiêu theo từng khoản mục
See more at tại đây https://rpubs.com/Lucie/746058
TRÂN TRỌNG MỜI ĐẠI BIỂU THAM DỰ VÀ TRÂN TRỌNG CẢM ƠN!