Nhap data bang “File/import Dataset/ Excel”

dat1=read.csv("E:/CONG VIEC/Ky nang ngoai/Xu ly so lieu Bang R/Van Lang R and Machine Learning 2023/Thuc hanh ngay 1/Salaries.csv",header=T,na.strings = "NA")

Xem so hang cot (so predictors va observations), xem 6 dong dau tien, xem dac tinh cua bien va khai quat data

dim(dat1)
## [1] 397   7
head(dat1)
##   ID      Rank Discipline Yrs.since.phd Yrs.service  Sex Salary
## 1  1      Prof          B            19          18 Male 139750
## 2  2      Prof          B            20          16 Male 173200
## 3  3  AsstProf          B             4           3 Male  79750
## 4  4      Prof          B            45          39 Male 115000
## 5  5      Prof          B            40          41 Male 141500
## 6  6 AssocProf          B             6           6 Male  97000
str(dat1)
## 'data.frame':    397 obs. of  7 variables:
##  $ ID           : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Rank         : chr  "Prof" "Prof" "AsstProf" "Prof" ...
##  $ Discipline   : chr  "B" "B" "B" "B" ...
##  $ Yrs.since.phd: int  19 20 4 45 40 6 30 45 21 18 ...
##  $ Yrs.service  : int  18 16 3 39 41 6 23 45 20 18 ...
##  $ Sex          : chr  "Male" "Male" "Male" "Male" ...
##  $ Salary       : int  139750 173200 79750 115000 141500 97000 175000 147765 119250 129000 ...

int: so nguyen; chr: ky tu/category; num: kieu so thuc

Goi ten cac bien (predictors)

names(dat1)
## [1] "ID"            "Rank"          "Discipline"    "Yrs.since.phd"
## [5] "Yrs.service"   "Sex"           "Salary"

Xem data bang table1 va compareGroups

Goi 2 package “table1”, “compareGroups”

library(table1)
## 
## Attaching package: 'table1'
## The following objects are masked from 'package:base':
## 
##     units, units<-
library(compareGroups)

Xem toan bo data bang table1

table1(data=dat1,~Rank+Discipline+Yrs.since.phd+Yrs.service+Sex+Salary)
Overall
(N=397)
Rank
AssocProf 64 (16.1%)
AsstProf 67 (16.9%)
Prof 266 (67.0%)
Discipline
A 181 (45.6%)
B 216 (54.4%)
Yrs.since.phd
Mean (SD) 22.3 (12.9)
Median [Min, Max] 21.0 [1.00, 56.0]
Yrs.service
Mean (SD) 17.6 (13.0)
Median [Min, Max] 16.0 [0, 60.0]
Sex
Female 39 (9.8%)
Male 358 (90.2%)
Salary
Mean (SD) 114000 (30300)
Median [Min, Max] 107000 [57800, 232000]

Xem data bang table1, phan theo nhom Sex

table1(data=dat1,~Rank+Discipline+Yrs.since.phd+Yrs.service+Salary|Sex)
Female
(N=39)
Male
(N=358)
Overall
(N=397)
Rank
AssocProf 10 (25.6%) 54 (15.1%) 64 (16.1%)
AsstProf 11 (28.2%) 56 (15.6%) 67 (16.9%)
Prof 18 (46.2%) 248 (69.3%) 266 (67.0%)
Discipline
A 18 (46.2%) 163 (45.5%) 181 (45.6%)
B 21 (53.8%) 195 (54.5%) 216 (54.4%)
Yrs.since.phd
Mean (SD) 16.5 (9.78) 22.9 (13.0) 22.3 (12.9)
Median [Min, Max] 17.0 [2.00, 39.0] 22.0 [1.00, 56.0] 21.0 [1.00, 56.0]
Yrs.service
Mean (SD) 11.6 (8.81) 18.3 (13.2) 17.6 (13.0)
Median [Min, Max] 10.0 [0, 36.0] 18.0 [0, 60.0] 16.0 [0, 60.0]
Salary
Mean (SD) 101000 (26000) 115000 (30400) 114000 (30300)
Median [Min, Max] 104000 [62900, 161000] 108000 [57800, 232000] 107000 [57800, 232000]

Giai thich: Bang nay cung cap: mean(sd) va median (25% va 75%) bach phan vi cho cac bien lien tuc. Neu sd << 1/2 mean thi bien do phan bo rat gan phan bo chuan. Ta dung mean va sd. Nguoc lai, ta dung median va bach phan vi. Luu y, khong dung 1 luc ca 2 cho bai bao hay bang bieu. Chi dung mean hoac median.

Xem data chia theo 2 cap: Sex va Discipline

table1(data=dat1,~Rank+Yrs.since.phd+Yrs.service+Salary|Sex+Discipline)
Female
Male
Overall
A
(N=18)
B
(N=21)
A
(N=163)
B
(N=195)
A
(N=181)
B
(N=216)
Rank
AssocProf 4 (22.2%) 6 (28.6%) 22 (13.5%) 32 (16.4%) 26 (14.4%) 38 (17.6%)
AsstProf 6 (33.3%) 5 (23.8%) 18 (11.0%) 38 (19.5%) 24 (13.3%) 43 (19.9%)
Prof 8 (44.4%) 10 (47.6%) 123 (75.5%) 125 (64.1%) 131 (72.4%) 135 (62.5%)
Yrs.since.phd
Mean (SD) 17.5 (11.9) 15.7 (7.72) 26.3 (13.0) 20.2 (12.5) 25.4 (13.1) 19.7 (12.1)
Median [Min, Max] 15.0 [2.00, 39.0] 17.0 [3.00, 36.0] 28.0 [2.00, 56.0] 19.0 [1.00, 56.0] 27.0 [2.00, 56.0] 18.5 [1.00, 56.0]
Yrs.service
Mean (SD) 11.4 (10.5) 11.7 (7.36) 20.9 (13.7) 16.1 (12.4) 20.0 (13.7) 15.7 (12.1)
Median [Min, Max] 8.00 [0, 36.0] 10.0 [0, 26.0] 19.0 [0, 57.0] 15.0 [0, 60.0] 19.0 [0, 57.0] 14.0 [0, 60.0]
Salary
Mean (SD) 89100 (21600) 111000 (25400) 111000 (30700) 119000 (29800) 109000 (30500) 118000 (29500)
Median [Min, Max] 78000 [62900, 137000] 105000 [71100, 161000] 105000 [57800, 206000] 114000 [67600, 232000] 104000 [57800, 206000] 113000 [67600, 232000]

###Giai thich: ham table1 chi dung chia toi da 2 cap.

Xem toan bo data bang compareGroups va createTable

t= compareGroups(data=dat1,Sex~Rank+Discipline+Yrs.since.phd+Yrs.service+Salary)
createTable(t)
## 
## --------Summary descriptives table by 'Sex'---------
## 
## _____________________________________________________ 
##                   Female          Male      p.overall 
##                    N=39          N=358                
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ 
## Rank:                                         0.014   
##     AssocProf   10 (25.6%)     54 (15.1%)             
##     AsstProf    11 (28.2%)     56 (15.6%)             
##     Prof        18 (46.2%)    248 (69.3%)             
## Discipline:                                   1.000   
##     A           18 (46.2%)    163 (45.5%)             
##     B           21 (53.8%)    195 (54.5%)             
## Yrs.since.phd  16.5 (9.78)    22.9 (13.0)    <0.001   
## Yrs.service    11.6 (8.81)    18.3 (13.2)    <0.001   
## Salary        101002 (25952) 115090 (30437)   0.003   
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Giai thich: bang nay chi cho gia tri count, mean va sd, phan chia theo gioi tinh. Nhung co kem P.overall de thay co khac biet giua cac nhom khong? Neu P<0.05 giua cac nhom co khac biet co y nghia thong ke, nhung neu nhieu hon 2 nhom, ta khong the noi chinh xac nhom nao khac nhom nao.