Bien tap du lieu tuc la to chuc, sx lai du lieu sao cho R co the phan tich mot cach huu hieu.
setwd("/Volumes/DATA/6. NGHIEN CUU KHOA HOC/R")
chol3=read.csv("chol3.csv", header=TRUE)
attach(chol3)
chol3.new= na.omit(chol3)
Neu chung ta chi muon phan tich Nam gioi, co the tach “chol3” ra thanh 2 data.frame, tam goi la “nam” va “nu”. Chung ta dung lenh “subset (data, cond)”, trong do data la data.frame ma chung ta muon tach roi, vaf cond la dieu kien. VD
nam =subset(chol3, sex=="nam")
nu =subset(chol3, sex=="nu")
Vay la chung ta da co 2 du lieu (2 data.frame) moi ten la nam va nu. Chu y: dung == thay vi = de chi dieu kien chinh xac.
VD#
old=subset(chol3, age>60)
dim(old)
## [1] 5 8
Hay 1 data.frame moi voi nhung BN tren 60 tuoi va nam gioi:
n60=subset(chol3, age>=60 & sex=="nam")
dim(n60)
## [1] 3 8
Trong chol3 co 8 bien so. kiem tra
names(chol3)
## [1] "id" "sex" "age" "bmi" "hdl" "ldl" "tc" "tg"
De chiet du lieu cho chol3 chi con 3 bien so can thiet la: id, age, tc. Bien so “id” la cot so 1; “age” la cot so 3; “tc” la cot so 7
data2=chol3[, c(1,3,7)]
Dau phay truoc c, co nghia la chung ta chonj tat ca cac dong so lieu trong data.frame chol3.
Neu chung ta muon chon 5 dong dau tien, thi lenh se la
data3=chol3[1:5, c (1,3,7)]
print(data3)
## id age tc
## 1 1 57 4.0
## 2 2 64 3.5
## 3 3 60 4.7
## 4 4 65 7.7
## 5 5 47 5.0
Chu y lenh “print(arg)” don gian liet ke tat ca cac so lieu trong data.frame arg. That ra chung ta chi can don gian go data3, ket qua giong y nhu print(data3)
data3
## id age tc
## 1 1 57 4.0
## 2 2 64 3.5
## 3 3 60 4.7
## 4 4 65 7.7
## 5 5 47 5.0
setwd("/Volumes/DATA/6. NGHIEN CUU KHOA HOC/R")
d1=read.csv("d1.csv", header=TRUE)
save(d1,file="d1.rda")
d1
## id sex tc
## 1 1 nam 4.0
## 2 2 nu 3.5
## 3 3 nu 4.7
## 4 4 nam 7.7
## 5 5 nam 5.0
## 6 6 nu 4.2
## 7 7 nam 5.9
## 8 8 nam 6.1
## 9 9 nam 5.9
## 10 10 nu 4.0
VD2
d2=read.csv("d2.csv", header=TRUE)
save(d2,file="d2.rda")
d2
## id sex tg
## 1 1 nam 1.1
## 2 2 nu 2.1
## 3 3 nu 0.8
## 4 4 nam 1.1
## 5 5 nam 2.1
## 6 6 nu 1.5
## 7 7 nam 2.6
## 8 8 nam 1.5
## 9 9 nam 5.4
## 10 10 nu 1.9
## 11 11 nu 1.7
Hai du lieu nay co chung 2 bien so id va sex. Nhung du lieu d1 co 10 dong, d2 co 11 dong. Chung ta co the nhap hai du lieu thanh 1 data.frame bang cach dung lenh merge nhu sau - Dung bien so “id” lam chuan.
d= merge(d1, d2, by="id", all=TRUE)
d
## id sex.x tc sex.y tg
## 1 1 nam 4.0 nam 1.1
## 2 2 nu 3.5 nu 2.1
## 3 3 nu 4.7 nu 0.8
## 4 4 nam 7.7 nam 1.1
## 5 5 nam 5.0 nam 2.1
## 6 6 nu 4.2 nu 1.5
## 7 7 nam 5.9 nam 2.6
## 8 8 nam 6.1 nam 1.5
## 9 9 nam 5.9 nam 5.4
## 10 10 nu 4.0 nu 1.9
## 11 11 <NA> NA nu 1.7
bmd=c(0.92, 0.21, 0.17, -3.21, -1.80, -2.60, -2.00, 1.71, 2.12, -2.11)
bmd
## [1] 0.92 0.21 0.17 -3.21 -1.80 -2.60 -2.00 1.71 2.12 -2.11
De phan loai 3 nhom loang xuong, xop xuong va binh thuong, chung ta co the dung ma so 1, 2 va 3. Noi cach khac, chung ta muon tao nen 1 bien so khac (diagnosis) gom 3 gia tri tren dua vao gia tri cua bmd.
# Tam thoi cho bien so diagnosi bang bmd
diagnosis=bmd
#bien doi bmd thanh diagnosis
diagnosis [bmd <= -2.5] <- 1
diagnosis [bmd > -2.5 & bmd <= -1.0] =2
diagnosis [bmd > -1.0] =3
# tao thanh 1 data.frame
data=data.frame(bmd, diagnosis)
#liet ke de kiem tra xem lenh co hieu qua khong
data
## bmd diagnosis
## 1 0.92 3
## 2 0.21 3
## 3 0.17 3
## 4 -3.21 1
## 5 -1.80 2
## 6 -2.60 1
## 7 -2.00 2
## 8 1.71 3
## 9 2.12 3
## 10 -2.11 2
V.1) BIEN DOI SO LIEU BANG CACH DUNG “replace” Cach nay tuong doi phuc tap hon. Tiep tuc vd tren chung ta bien doi tu bmd sang diagnosis nhu sau:
diagnosis=bmd
diagnosis = replace(diagnosis, bmd <= -2.5, 1)
diagnosis = replace(diagnosis, bmd > -2.5 & bmd <= -1.0, 2)
diagnosis = replace(diagnosis, bmd > -1.0, 3)
diagnosis
## [1] 3 3 3 1 2 1 2 3 3 2
V.2) BIEN DOI THANH YEU TO (FACTOR)
Trong phan tich thong ke, chung ta phan biet 1 bien so mang tinh yeu to (factor) va bien so lien tuc binh thuong. Bien so factor khong the tinh toan cong tru nhan chia, VD diagnosis o tren, vi gia tri trung binh giua 1 va 2 chang co y nghia thuc te gi ca. Nhung hien nay, bmd duoc xem la 1 bien so so hoc. De bien thanh bien so yeu to, ta can su dung function factor nhu sau:
diag = factor(diagnosis)
diag
## [1] 3 3 3 1 2 1 2 3 3 2
## Levels: 1 2 3
Chu y bay gio R thong bao cho chung ta biet diag co 3 bac: 1, 2 va 3. Neu chung ta yeu cau R tinh so trung binh cua diag, R se khong lam theo yeu cau nay vi do khong phai la 1 bien so so hoc.
mean(diag)
## Warning in mean.default(diag): argument is not numeric or logical: returning NA
## [1] NA
Di nhien chung ta co the tinh gia tri trung binh cua diagnosis:
mean(diagnosis)
## [1] 2.3
Nhung ket qua 2.3 nay khong co y nghia gi trong thuc te ca.
age = c(17, 19, 22, 43, 14, 8, 12, 19, 20, 51, 8, 12, 27, 31, 44)
do tuoi thap nhat la 8 va cao nhat la 51. Neu chung ta muon chia thanh 2 nhom tuoi:
cut(age, 2)
## [1] (7.96,29.5] (7.96,29.5] (7.96,29.5] (29.5,51] (7.96,29.5] (7.96,29.5]
## [7] (7.96,29.5] (7.96,29.5] (7.96,29.5] (29.5,51] (7.96,29.5] (7.96,29.5]
## [13] (7.96,29.5] (29.5,51] (29.5,51]
## Levels: (7.96,29.5] (29.5,51]
cut chia bien age thanh 2 nhom: nhom 1 tu 7.96-29.5 tuoi, nhom 2: 29.5 den 51 Chung ta co the dem so doi tuong trong tung nhom tuoi bang ham table nhu sau
table(cut(age, 2))
##
## (7.96,29.5] (29.5,51]
## 11 4
Trong lenh sau day, chung ta chia bien do tuoi thanh 3 nhom va dat ten 3 nhom la “low”, “medium” va “high”
ageg=cut(age, 3, lables = c("low", "medium", "high"))
table(ageg)
## ageg
## (7.96,22.3] (22.3,36.7] (36.7,51]
## 10 2 3
ageg
## [1] (7.96,22.3] (7.96,22.3] (7.96,22.3] (36.7,51] (7.96,22.3] (7.96,22.3]
## [7] (7.96,22.3] (7.96,22.3] (7.96,22.3] (36.7,51] (7.96,22.3] (7.96,22.3]
## [13] (22.3,36.7] (22.3,36.7] (36.7,51]
## Levels: (7.96,22.3] (22.3,36.7] (36.7,51]
Tat nhien chung ta cung co the chia age thanh 4 nhom (quartiles) bang cach cho nhung thong so: 0, 0.25, 0.50, 0.75 nhu sau:
cut(age,
breaks=quantile (age, c(0, 0.25, 0.50, 0.75, 1)),
lables = c("q1", "q2", "q3", "q4"),
include.lowest = TRUE)
## [1] (13,19] (13,19] (19,29] (29,51] (13,19] [8,13] [8,13] (13,19] (19,29]
## [10] (29,51] [8,13] [8,13] (19,29] (29,51] (29,51]
## Levels: [8,13] (13,19] (19,29] (29,51]
# nhap package Hmisc de co the dung function cut2
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
bmd=c(-0.92, 0.21, 0.17, -3.21, -1.80, -2.60, -2.00, 1.71, 2.12, -2.11)
# chia bien so bmd thanh 2 nhom va de trong doi tuong group
group= cut2(bmd, g=2)
table(group)
## group
## [-3.21,-0.92) [-0.92, 2.12]
## 5 5
g=2 co nghia la chia thanh 2 nhom (g=group). Moi nhom gom 5 so. Tat nhien chung ta cung co the chia thanh 3 nhom bang lenh:
group=cut2(bmd, g=3)
table(group)
## group
## [-3.21,-1.80) [-1.80, 0.21) [ 0.21, 2.12]
## 4 3 3