Ngày 2: So sánh 2 nhóm - Biến liên tục

Việc 1. Phân tích mô tả

1.1 Đọc dữ liệu vào R

ob = read.csv("C:\\Thach\\VN trips\\2025_7Oct\\Truong DH Kien Truc Da Nang\\Datasets\\Obesity data.csv")

1.2 Mô tả

library(table1)

## 
## Attaching package: 'table1'

## The following objects are masked from 'package:base':
## 
##     units, units<-

table1(~ age + gender + weight + height + pcfat + hypertension + diabetes, data = ob)

	Overall (N=1217)
age
Mean (SD)	47.2 (17.3)
Median [Min, Max]	48.0 [13.0, 88.0]
gender
F	862 (70.8%)
M	355 (29.2%)
weight
Mean (SD)	55.1 (9.40)
Median [Min, Max]	54.0 [34.0, 95.0]
height
Mean (SD)	157 (7.98)
Median [Min, Max]	155 [136, 185]
pcfat
Mean (SD)	31.6 (7.18)
Median [Min, Max]	32.4 [9.20, 48.4]
hypertension
Mean (SD)	0.507 (0.500)
Median [Min, Max]	1.00 [0, 1.00]
diabetes
Mean (SD)	0.111 (0.314)
Median [Min, Max]	0 [0, 1.00]

1.3 Nhận xét kết quả Cao HA và tiểu đường (so với giới tính và các đặc điểm khác)

ob$hyper = as.factor(ob$hypertension)
ob$diab = as.factor(ob$diabetes)

table1(~ age + gender + weight + height + pcfat + hypertension + hyper + diabetes + diab, data = ob)

	Overall (N=1217)
age
Mean (SD)	47.2 (17.3)
Median [Min, Max]	48.0 [13.0, 88.0]
gender
F	862 (70.8%)
M	355 (29.2%)
weight
Mean (SD)	55.1 (9.40)
Median [Min, Max]	54.0 [34.0, 95.0]
height
Mean (SD)	157 (7.98)
Median [Min, Max]	155 [136, 185]
pcfat
Mean (SD)	31.6 (7.18)
Median [Min, Max]	32.4 [9.20, 48.4]
hypertension
Mean (SD)	0.507 (0.500)
Median [Min, Max]	1.00 [0, 1.00]
hyper
0	600 (49.3%)
1	617 (50.7%)
diabetes
Mean (SD)	0.111 (0.314)
Median [Min, Max]	0 [0, 1.00]
diab
0	1082 (88.9%)
1	135 (11.1%)

1.4 Trình bày median (Q1, Q3)

table1(~ age + weight + height + pcfat, data = ob, render.continuous = c(. = "Mean (SD)", . = "Median [Q1, Q3]"))

	Overall (N=1217)
age
Mean (SD)	47.2 (17.3)
Median [Q1, Q3]	48.0 [35.0, 58.0]
weight
Mean (SD)	55.1 (9.40)
Median [Q1, Q3]	54.0 [49.0, 61.0]
height
Mean (SD)	157 (7.98)
Median [Q1, Q3]	155 [151, 162]
pcfat
Mean (SD)	31.6 (7.18)
Median [Q1, Q3]	32.4 [27.0, 36.8]

1.5 Mô tả theo giới tính

table1(~ age + weight + height + pcfat + hyper + diab | gender, data = ob)

	F (N=862)	M (N=355)	Overall (N=1217)
age
Mean (SD)	48.6 (16.4)	43.7 (18.8)	47.2 (17.3)
Median [Min, Max]	49.0 [14.0, 85.0]	44.0 [13.0, 88.0]	48.0 [13.0, 88.0]
weight
Mean (SD)	52.3 (7.72)	62.0 (9.59)	55.1 (9.40)
Median [Min, Max]	51.0 [34.0, 95.0]	62.0 [38.0, 95.0]	54.0 [34.0, 95.0]
height
Mean (SD)	153 (5.55)	165 (6.73)	157 (7.98)
Median [Min, Max]	153 [136, 170]	165 [146, 185]	155 [136, 185]
pcfat
Mean (SD)	34.7 (5.19)	24.2 (5.76)	31.6 (7.18)
Median [Min, Max]	34.7 [14.6, 48.4]	24.6 [9.20, 39.0]	32.4 [9.20, 48.4]
hyper
0	430 (49.9%)	170 (47.9%)	600 (49.3%)
1	432 (50.1%)	185 (52.1%)	617 (50.7%)
diab
0	760 (88.2%)	322 (90.7%)	1082 (88.9%)
1	102 (11.8%)	33 (9.3%)	135 (11.1%)

1.6 Đánh giá khác biệt giữa 2 nhóm

library(compareGroups)
createTable(compareGroups(gender ~ age + weight + height + pcfat + hyper + diab, data = ob))

## 
## --------Summary descriptives table by 'gender'---------
## 
## ________________________________________ 
##             F           M      p.overall 
##           N=862       N=355              
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ 
## age    48.6 (16.4) 43.7 (18.8)  <0.001   
## weight 52.3 (7.72) 62.0 (9.59)  <0.001   
## height 153 (5.55)  165 (6.73)   <0.001   
## pcfat  34.7 (5.19) 24.2 (5.76)  <0.001   
## hyper:                           0.569   
##     0  430 (49.9%) 170 (47.9%)           
##     1  432 (50.1%) 185 (52.1%)           
## diab:                            0.238   
##     0  760 (88.2%) 322 (90.7%)           
##     1  102 (11.8%) 33 (9.30%)            
## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Việc 2. Phân tích khác biệt giữa 2 nhóm

2.1 Dữ liệu tải trọng

A = c(14, 4, 10, 6, 3, 11, 12)
B = c(16, 17, 13, 12, 7, 16, 11, 8, 7)

wt = c(A, B)
group = c(rep("A", 7), rep("B", 9))
df = data.frame(wt, group)
dim(df)

## [1] 16  2

2.2 Đánh giá phân bố

Dùng biểu đồ

library(lessR)

## Warning: package 'lessR' was built under R version 4.3.3

## 
## lessR 4.3.9                         feedback: gerbing@pdx.edu 
## --------------------------------------------------------------
## > d <- Read("")   Read text, Excel, SPSS, SAS, or R data file
##   d is default data frame, data= in analysis routines optional
## 
## Many examples of reading, writing, and manipulating data, 
## graphics, testing means and proportions, regression, factor analysis,
## customization, and descriptive statistics from pivot tables
##   Enter: browseVignettes("lessR")
## 
## View lessR updates, now including time series forecasting
##   Enter: news(package="lessR")
## 
## Interactive data analysis
##   Enter: interact()

## 
## Attaching package: 'lessR'

## The following object is masked from 'package:table1':
## 
##     label

Histogram(wt, data = df)

## >>> Note: wt is not in a data frame (table)
## >>> Note: wt is not in a data frame (table)

## >>> Suggestions 
## bin_width: set the width of each bin 
## bin_start: set the start of the first bin 
## bin_end: set the end of the last bin 
## Histogram(wt, density=TRUE)  # smoothed curve + histogram 
## Plot(wt)  # Violin/Box/Scatterplot (VBS) plot 
## 
## --- wt --- 
##  
##      n   miss     mean       sd      min      mdn      max 
##      16      0    10.44     4.29     3.00    11.00    17.00 
## 
## No (Box plot) outliers 
## 
## 
## Bin Width: 2 
## Number of Bins: 8 
##  
##      Bin  Midpnt  Count    Prop  Cumul.c  Cumul.p 
## ------------------------------------------------- 
##   2 >  4       3      2    0.12        2     0.12 
##   4 >  6       5      1    0.06        3     0.19 
##   6 >  8       7      3    0.19        6     0.38 
##   8 > 10       9      1    0.06        7     0.44 
##  10 > 12      11      4    0.25       11     0.69 
##  12 > 14      13      2    0.12       13     0.81 
##  14 > 16      15      2    0.12       15     0.94 
##  16 > 18      17      1    0.06       16     1.00

Dùng test

shapiro.test(df$wt)

## 
##  Shapiro-Wilk normality test
## 
## data:  df$wt
## W = 0.96213, p-value = 0.7006

2.3 Mô tả đặc điểm tải trọng

library(table1)
table1(~ wt | group, data = df, render.continuous = c(. = "Mean (SD)", . = "Median [Q1, Q3]"))

	A (N=7)	B (N=9)	Overall (N=16)
wt
Mean (SD)	8.57 (4.24)	11.9 (3.95)	10.4 (4.29)
Median [Q1, Q3]	10.0 [5.00, 11.5]	12.0 [8.00, 16.0]	11.0 [7.00, 13.3]

2.4 Thực hiện phép kiểm t

t.test(A, B)

## 
##  Welch Two Sample t-test
## 
## data:  A and B
## t = -1.6, df = 12.554, p-value = 0.1345
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -7.813114  1.178194
## sample estimates:
## mean of x mean of y 
##  8.571429 11.888889

t.test(wt ~ group, data = df)

## 
##  Welch Two Sample t-test
## 
## data:  wt by group
## t = -1.6, df = 12.554, p-value = 0.1345
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -7.813114  1.178194
## sample estimates:
## mean in group A mean in group B 
##        8.571429       11.888889

2.5 Thực hiện bootstrap

library(simpleboot)

## Simple Bootstrap Routines (1.1-7)

library(boot)
b = two.boot(A, B, mean, R = 1000)
boot.ci(b)

## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = b)
## 
## Intervals : 
## Level      Normal              Basic         
## 95%   (-7.300,  0.381 )   (-7.206,  0.396 )  
## 
## Level     Percentile            BCa          
## 95%   (-7.031,  0.571 )   (-7.796,  0.211 )  
## Calculations and Intervals on Original Scale

hist(b, breaks = 50)

Data Analysis Workshop - Danang Architechture University (21-23Oct2025) - Day 2

Thach Tran

2025-10-11

Ngày 2: So sánh 2 nhóm - Biến liên tục

Việc 1. Phân tích mô tả

1.1 Đọc dữ liệu vào R

1.2 Mô tả

1.3 Nhận xét kết quả Cao HA và tiểu đường (so với giới tính và các đặc điểm khác)

1.4 Trình bày median (Q1, Q3)

1.5 Mô tả theo giới tính

1.6 Đánh giá khác biệt giữa 2 nhóm

Việc 2. Phân tích khác biệt giữa 2 nhóm

2.1 Dữ liệu tải trọng

2.2 Đánh giá phân bố

Dùng biểu đồ

Dùng test

2.3 Mô tả đặc điểm tải trọng

2.4 Thực hiện phép kiểm t

2.5 Thực hiện bootstrap

Việc 6. Ghi lại tất cả các hàm/lệnh trên và chia sẻ lên tài khoản rpubs (https://rpubs.com/ThachTran/1354649)