This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.

library(datasets)
head(tobacco)
summary(tobacco)

gender age age.gr BMI smoker
F :489 Min. :18.0 18-34:258 Min. : 8.826 Yes:298
M :489 1st Qu.:34.0 35-50:241 1st Qu.:22.927 No :702
NA’s: 22 Median :50.0 51-70:317 Median :25.620
Mean :49.6 71 + :159 Mean :25.731
3rd Qu.:66.0 NA’s : 25 3rd Qu.:28.649
Max. :80.0 Max. :39.439
NA’s :25 NA’s :26
cigs.per.day diseased disease samp.wgts
Min. : 0.000 Yes:224 Length:1000 Min. :0.8614
1st Qu.: 0.000 No :776 Class :character 1st Qu.:0.8614
Median : 0.000 Mode :character Median :1.0442
Mean : 6.782 Mean :1.0000
3rd Qu.:11.000 3rd Qu.:1.0494
Max. :40.000 Max. :1.0625
NA’s :35

by(tobacco,tobacco$gender, summary)

tobacco\(gender: F gender age age.gr BMI smoker F:489 Min. :18.00 18-34:123 Min. : 9.009 Yes:147 M: 0 1st Qu.:34.00 35-50:118 1st Qu.:22.983 No :342 Median :50.00 51-70:157 Median :25.867 Mean :49.56 71 + : 77 Mean :26.099 3rd Qu.:66.00 NA's : 14 3rd Qu.:29.472 Max. :80.00 Max. :39.439 NA's :14 NA's :14 cigs.per.day diseased disease samp.wgts Min. : 0.000 Yes:111 Length:489 Min. :0.8614 1st Qu.: 0.000 No :378 Class :character 1st Qu.:0.8614 Median : 0.000 Mode :character Median :1.0442 Mean : 6.876 Mean :0.9998 3rd Qu.:10.250 3rd Qu.:1.0494 Max. :40.000 Max. :1.0625 NA's :21 -------------------------------------------------------- tobacco\)gender: M gender age age.gr BMI smoker
F: 0 Min. :18.00 18-34:130 Min. : 8.826 Yes:143
M:489 1st Qu.:34.00 35-50:118 1st Qu.:22.517 No :346
Median :49.50 51-70:151 Median :25.140
Mean :49.58 71 + : 79 Mean :25.308
3rd Qu.:66.00 NA’s : 11 3rd Qu.:27.956
Max. :80.00 Max. :36.761
NA’s :11 NA’s :12
cigs.per.day diseased disease samp.wgts
Min. : 0.000 Yes:110 Length:489 Min. :0.8614
1st Qu.: 0.000 No :379 Class :character 1st Qu.:0.8614
Median : 0.000 Mode :character Median :1.0442
Mean : 6.722 Mean :0.9998
3rd Qu.:11.000 3rd Qu.:1.0494
Max. :40.000 Max. :1.0625
NA’s :14

summary tool

print(dfSummary(tobacco, method = "browser", plain.ascii = FALSE, style="grid", valid.col = FALSE, tmp.img.dir = "/tmp"))
temporary images written to 'D:\tmp'

Data Frame Summary

tobacco
Dimensions: 1000 x 9
Duplicates: 2

No Variable Stats / Values Freqs (% of Valid) Graph Missing
1 gender
[factor]
1. F
2. M
489 (50.0%)
489 (50.0%)
22
(2.2%)
2 age
[numeric]
Mean (sd) : 49.6 (18.3)
min < med < max:
18 < 50 < 80
IQR (CV) : 32 (0.4)
63 distinct values 25
(2.5%)
3 age.gr
[factor]
1. 18-34
2. 35-50
3. 51-70
4. 71 +
258 (26.5%)
241 (24.7%)
317 (32.5%)
159 (16.3%)
25
(2.5%)
4 BMI
[numeric]
Mean (sd) : 25.7 (4.5)
min < med < max:
8.8 < 25.6 < 39.4
IQR (CV) : 5.7 (0.2)
974 distinct values 26
(2.6%)
5 smoker
[factor]
1. Yes
2. No
298 (29.8%)
702 (70.2%)
0
(0%)
6 cigs.per.day
[numeric]
Mean (sd) : 6.8 (11.9)
min < med < max:
0 < 0 < 40
IQR (CV) : 11 (1.8)
37 distinct values 35
(3.5%)
7 diseased
[factor]
1. Yes
2. No
224 (22.4%)
776 (77.6%)
0
(0%)
8 disease
[character]
1. Hypertension
2. Cancer
3. Cholesterol
4. Heart
5. Pulmonary
6. Musculoskeletal
7. Diabetes
8. Hearing
9. Digestive
10. Hypotension
[ 3 others ]
36 (16.2%)
34 (15.3%)
21 ( 9.5%)
20 ( 9.0%)
20 ( 9.0%)
19 ( 8.6%)
14 ( 6.3%)
14 ( 6.3%)
12 ( 5.4%)
11 ( 5.0%)
21 ( 9.5%)
778
(77.8%)
9 samp.wgts
[numeric]
Mean (sd) : 1 (0.1)
min < med < max:
0.9 < 1 < 1.1
IQR (CV) : 0.2 (0.1)
0.86!: 267 (26.7%)
1.04!: 249 (24.9%)
1.05!: 324 (32.4%)
1.06!: 160 (16.0%)
! rounded


0
(0%)
freq(iris$Species, plain.ascii = FALSE, style = "rmarkdown")

Frequencies

iris$Species
Type: Factor

  Freq % Valid % Valid Cum. % Total % Total Cum.
setosa 50 33.33 33.33 33.33 33.33
versicolor 50 33.33 66.67 33.33 66.67
virginica 50 33.33 100.00 33.33 100.00
<NA> 0 0.00 100.00
Total 150 100.00 100.00 100.00 100.00
freq(iris$Species, report.nas = FALSE, plain.ascii = FALSE, style = "rmarkdown")

Frequencies

iris$Species
Type: Factor

  Freq % % Cum.
setosa 50 33.33 33.33
versicolor 50 33.33 66.67
virginica 50 33.33 100.00
Total 150 100.00 100.00

Tạo bảng mô tả các biến số học

descr(tobacco, style = "rmarkdown")
Non-numerical variable(s) ignored: gender, age.gr, smoker, diseased, disease

Descriptive Statistics

tobacco
N: 1000

  age BMI cigs.per.day samp.wgts
Mean 49.60 25.73 6.78 1.00
Std.Dev 18.29 4.49 11.88 0.08
Min 18.00 8.83 0.00 0.86
Q1 34.00 22.93 0.00 0.86
Median 50.00 25.62 0.00 1.04
Q3 66.00 28.65 11.00 1.05
Max 80.00 39.44 40.00 1.06
MAD 23.72 4.18 0.00 0.01
IQR 32.00 5.72 11.00 0.19
CV 0.37 0.17 1.75 0.08
Skewness -0.04 0.02 1.54 -1.04
SE.Skewness 0.08 0.08 0.08 0.08
Kurtosis -1.26 0.26 0.90 -0.90
N.Valid 975.00 974.00 965.00 1000.00
Pct.Valid 97.50 97.40 96.50 100.00

Tạo bảng 2 chiều và tính toán chisquare

with(tobacco, print(ctable(smoker, diseased), method = 'render'))

Cross-Tabulation, Row Proportions

smoker * diseased
Data Frame: tobacco
diseased
smoker Yes No Total
Yes 125 ( 41.9% ) 173 ( 58.1% ) 298 ( 100.0% )
No 99 ( 14.1% ) 603 ( 85.9% ) 702 ( 100.0% )
Total 224 ( 22.4% ) 776 ( 77.6% ) 1000 ( 100.0% )
with(tobacco, 
     print(ctable(smoker, diseased, prop = 'n', totals = FALSE), 
           omit.headings = TRUE, method = "render"))

Cross-Tabulation

smoker * diseased
Data Frame: tobacco
diseased
smoker Yes No
Yes 125 173
No 99 603