Kita semua pernah mengikuti survei, tetapi apakah kita pernah bertanya-tanya apa yang terjadi pada jawaban survey kita? Survei diberikan kepada sampel orang yang dipilih secara cermat dengan tujuan untuk menggeneralisasi hasilnya ke populasi yang jauh lebih besar.
Data National Health and Nutrition Examination Survey (NHANES) adalah survei kompleks terhadap puluhan ribu orang yang dirancang untuk menilai status kesehatan dan gizi orang dewasa dan anak-anak di Amerika Serikat. Data NHANES mencakup banyak pengukuran yang berkaitan dengan kesehatan secara keseluruhan, aktivitas fisik, diet, kesehatan psikologis, faktor sosioekonomi, dan banyak lagi.
Bergantung pada desain pengambilan sampel, setiap orang memiliki bobot pengambilan sampel yang mengukur berapa banyak orang dalam populasi yang lebih besar yang diwakili oleh data mereka. Dalam notebook ini, kita akan menerapkan metode survei yang menggunakan bobot sampling untuk memperkirakan dan memodelkan hubungan antar pengukuran.
Kita akan fokus pada indikator kesehatan yang umum, Indeks Massa Tubuh (BMI) (kg/m2), dan bagaimana indikator ini terkait dengan aktivitas fisik. Kita akan memvisualisasikan data dan menggunakan ukuran pemusatan data (mean, median, modus) serta ukuran standar deviasi untuk menguji hubungan antar variabel.
# Load the NHANES and dplyr packages
library(NHANES)
library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
# Load the NHANESraw data
data("NHANESraw")
# Take a glimpse at the contents
glimpse(NHANESraw)
Rows: 20,293
Columns: 78
$ ID <int> 51624, 51625, 51626, 51627, 51628, 51629, 51630, 51631, 51632, 51633, 51634, 51635, 51636, 51637, 51638, 51639, 51640, 51641, 5164~
$ SurveyYr <fct> 2009_10, 2009_10, 2009_10, 2009_10, 2009_10, 2009_10, 2009_10, 2009_10, 2009_10, 2009_10, 2009_10, 2009_10, 2009_10, 2009_10, 2009~
$ Gender <fct> male, male, male, male, female, male, female, female, male, male, male, male, male, female, male, male, male, male, male, female, ~
$ Age <int> 34, 4, 16, 10, 60, 26, 49, 1, 10, 80, 10, 80, 4, 35, 9, 4, 17, 13, 7, 42, 0, 66, 8, 45, 28, 8, 11, 19, 1, 44, 66, 49, 58, 54, 26, ~
$ AgeMonths <int> 409, 49, 202, 131, 722, 313, 596, 12, 124, NA, 121, NA, 48, 431, 115, 58, 208, 156, 85, 514, 9, 795, 101, 541, 338, 107, 139, 232,~
$ Race1 <fct> White, Other, Black, Black, Black, Mexican, White, White, Hispanic, White, Mexican, White, Other, White, White, Mexican, Hispanic,~
$ Race3 <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
$ Education <fct> High School, NA, NA, NA, High School, 9 - 11th Grade, Some College, NA, NA, Some College, NA, 9 - 11th Grade, NA, Some College, NA~
$ MaritalStatus <fct> Married, NA, NA, NA, Widowed, Married, LivePartner, NA, NA, Married, NA, Widowed, NA, Married, NA, NA, NA, NA, NA, Married, NA, Ma~
$ HHIncome <fct> 25000-34999, 20000-24999, 45000-54999, 20000-24999, 10000-14999, 25000-34999, 35000-44999, 35000-44999, 65000-74999, 15000-19999, ~
$ HHIncomeMid <int> 30000, 22500, 50000, 22500, 12500, 30000, 40000, 40000, 70000, 17500, 22500, 17500, 30000, NA, 87500, 40000, 12500, 87500, NA, 400~
$ Poverty <dbl> 1.36, 1.07, 2.27, 0.81, 0.69, 1.01, 1.91, 1.36, 2.68, 1.27, 0.93, 1.69, 1.36, NA, 1.84, 0.95, 0.30, 2.91, NA, 2.35, NA, 0.41, 2.33~
$ HomeRooms <int> 6, 9, 5, 6, 6, 4, 5, 5, 7, 4, 5, 5, 7, NA, 6, 6, 5, 6, 4, 6, 9, 5, 7, 6, 4, 7, 5, 7, 5, 4, 5, 5, 10, 6, 5, 10, 3, 6, 9, 4, 9, 5, 1~
$ HomeOwn <fct> Own, Own, Own, Rent, Rent, Rent, Rent, Rent, Own, Own, Own, Own, Own, NA, Rent, Rent, Other, Rent, Rent, Rent, Own, Own, Own, Own,~
$ Work <fct> NotWorking, NA, NotWorking, NA, NotWorking, Working, NotWorking, NA, NA, NotWorking, NA, NotWorking, NA, Working, NA, NA, Looking,~
$ Weight <dbl> 87.4, 17.0, 72.3, 39.8, 116.8, 97.6, 86.7, 9.4, 26.0, 79.1, 44.7, 89.6, NA, NA, 29.8, 17.9, 74.7, 40.6, 22.2, 107.7, 9.3, 82.9, 35~
$ Length <dbl> NA, NA, NA, NA, NA, NA, NA, 75.7, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 72.6, NA, NA, NA, NA, NA, NA, NA, 79.5, NA, NA, ~
$ HeadCirc <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
$ Height <dbl> 164.7, 105.4, 181.3, 147.8, 166.0, 173.0, 168.4, NA, 140.3, 174.3, 143.6, 180.1, NA, NA, 133.1, 103.0, 169.6, 156.4, 120.2, 164.3,~
$ BMI <dbl> 32.22, 15.30, 22.00, 18.22, 42.39, 32.61, 30.57, NA, 13.21, 26.04, 21.68, 27.62, NA, NA, 16.82, 16.87, 25.97, 16.60, 15.37, 39.90,~
$ BMICatUnder20yrs <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
$ BMI_WHO <fct> 30.0_plus, 12.0_18.5, 18.5_to_24.9, 12.0_18.5, 30.0_plus, 30.0_plus, 30.0_plus, NA, 12.0_18.5, 25.0_to_29.9, 18.5_to_24.9, 25.0_to~
$ Pulse <int> 70, NA, 68, 68, 72, 72, 86, NA, 70, 88, 84, 54, NA, NA, 82, NA, 68, 102, NA, 86, NA, 92, 72, 62, 68, NA, 76, 70, NA, 88, 60, 86, 6~
$ BPSysAve <int> 113, NA, 109, 93, 150, 104, 112, NA, 108, 139, 94, 121, NA, NA, 86, NA, 114, 112, NA, 116, NA, 138, 107, 118, 100, NA, 107, 93, NA~
$ BPDiaAve <int> 85, NA, 59, 41, 68, 49, 75, NA, 53, 43, 45, 60, NA, NA, 47, NA, 78, 37, NA, 100, NA, 62, 37, 64, 68, NA, 64, 70, NA, 85, 63, 78, 7~
$ BPSys1 <int> 114, NA, 112, 92, 154, 102, 118, NA, 106, 142, 94, 126, NA, NA, 84, NA, 122, 106, NA, NA, NA, 146, 114, 106, 108, NA, 112, NA, NA,~
$ BPDia1 <int> 88, NA, 62, 36, 70, 50, 82, NA, 60, 62, 38, 62, NA, NA, 50, NA, 76, 12, NA, NA, NA, 68, 46, 62, 62, NA, 70, NA, NA, 90, 64, 80, 76~
$ BPSys2 <int> 114, NA, 114, 94, 150, 104, 108, NA, 106, 140, 92, 124, NA, NA, 84, NA, 112, 110, NA, 116, NA, 138, 108, 118, 100, NA, 110, 90, NA~
$ BPDia2 <int> 88, NA, 60, 44, 68, 48, 74, NA, 50, 46, 40, 62, NA, NA, 50, NA, 82, 38, NA, 100, NA, 64, 36, 68, 66, NA, 60, 74, NA, 82, 62, 80, 7~
$ BPSys3 <int> 112, NA, 104, 92, 150, 104, 116, NA, 110, 138, 96, 118, NA, NA, 88, NA, 116, 114, NA, NA, NA, 138, 106, 118, 100, NA, 104, 96, NA,~
$ BPDia3 <int> 82, NA, 58, 38, 68, 50, 76, NA, 56, 40, 50, 58, NA, NA, 44, NA, 74, 36, NA, NA, NA, 60, 38, 60, 70, NA, 68, 66, NA, 88, 64, 76, 76~
$ Testosterone <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
$ DirectChol <dbl> 1.29, NA, 1.55, 1.89, 1.16, 1.16, 1.16, NA, 1.58, 1.94, 1.60, 1.27, NA, NA, 1.34, NA, 1.71, 1.63, 1.34, 0.91, NA, 1.03, 1.55, 2.12~
$ TotChol <dbl> 3.49, NA, 4.97, 4.16, 5.22, 4.14, 6.70, NA, 4.14, 4.71, 2.87, 3.83, NA, NA, 4.86, NA, 4.60, 5.04, 3.59, 4.40, NA, 5.61, 4.09, 5.82~
$ UrineVol1 <int> 352, NA, 281, 139, 30, 202, 77, NA, 39, 128, 109, 38, NA, NA, 123, NA, 315, 290, 60, 137, NA, 70, 238, 106, 153, 248, 60, 56, NA, ~
$ UrineFlow1 <dbl> NA, NA, 0.415, 1.078, 0.476, 0.563, 0.094, NA, 0.300, 1.208, 0.956, 0.197, NA, NA, 1.538, NA, NA, NA, 0.221, 1.223, NA, 0.467, 1.3~
$ UrineVol2 <int> NA, NA, NA, NA, 246, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
$ UrineFlow2 <dbl> NA, NA, NA, NA, 2.510, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
$ Diabetes <fct> No, No, No, No, Yes, No, No, No, No, No, No, Yes, No, No, No, No, No, No, No, Yes, NA, No, No, No, No, No, No, No, No, No, No, No,~
$ DiabetesAge <int> NA, NA, NA, NA, 56, NA, NA, NA, NA, NA, NA, 70, NA, NA, NA, NA, NA, NA, NA, 34, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
$ HealthGen <fct> Good, NA, Vgood, NA, Fair, Good, Good, NA, NA, Excellent, NA, Good, NA, NA, NA, NA, Good, Good, NA, Poor, NA, Good, NA, Vgood, Fai~
$ DaysPhysHlthBad <int> 0, NA, 2, NA, 20, 2, 0, NA, NA, 0, NA, 0, NA, NA, NA, NA, 0, 0, NA, 0, NA, 0, NA, 0, 0, NA, NA, 4, NA, 1, 10, 0, 0, 4, 0, NA, 0, 3~
$ DaysMentHlthBad <int> 15, NA, 0, NA, 25, 14, 10, NA, NA, 0, NA, 0, NA, NA, NA, NA, 18, 2, NA, 30, NA, 0, NA, 3, 0, NA, NA, 30, NA, 1, 0, 0, 0, 0, 0, NA,~
$ LittleInterest <fct> Most, NA, NA, NA, Most, None, Several, NA, NA, None, NA, None, NA, NA, NA, NA, NA, NA, NA, Several, NA, None, NA, None, None, NA, ~
$ Depressed <fct> Several, NA, NA, NA, Most, Most, Several, NA, NA, None, NA, None, NA, NA, NA, NA, NA, NA, NA, Most, NA, None, NA, None, None, NA, ~
$ nPregnancies <int> NA, NA, NA, NA, 1, NA, 2, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, 5, NA, NA,~
$ nBabies <int> NA, NA, NA, NA, 1, NA, 2, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 5, NA, NA~
$ Age1stBaby <int> NA, NA, NA, NA, NA, NA, 27, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 16, NA~
$ SleepHrsNight <int> 4, NA, 8, NA, 4, 4, 8, NA, NA, 6, NA, 9, NA, 7, NA, NA, 7, NA, NA, 10, NA, 7, NA, 8, 8, NA, NA, 6, NA, 6, 7, 7, 5, 4, 4, NA, 7, 4,~
$ SleepTrouble <fct> Yes, NA, No, NA, No, No, Yes, NA, NA, No, NA, No, NA, No, NA, NA, No, NA, NA, Yes, NA, No, NA, No, No, NA, NA, No, NA, Yes, No, No~
$ PhysActive <fct> No, NA, Yes, NA, No, Yes, No, NA, NA, Yes, NA, No, NA, No, NA, NA, Yes, Yes, NA, No, NA, No, NA, Yes, Yes, NA, NA, Yes, NA, No, Ye~
$ PhysActiveDays <int> NA, NA, 5, NA, NA, 2, NA, NA, NA, 4, NA, NA, NA, NA, NA, NA, 6, 2, NA, NA, NA, NA, NA, 5, 2, NA, NA, 1, NA, NA, 7, NA, 5, 1, NA, N~
$ TVHrsDay <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
$ CompHrsDay <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
$ TVHrsDayChild <int> NA, 4, NA, 1, NA, NA, NA, NA, 1, NA, 3, NA, 2, NA, 5, 2, NA, NA, 2, NA, NA, NA, 1, NA, NA, 3, 1, NA, NA, NA, NA, NA, NA, NA, NA, 4~
$ CompHrsDayChild <int> NA, 1, NA, 1, NA, NA, NA, NA, 0, NA, 0, NA, 1, NA, 0, 0, NA, NA, 6, NA, NA, NA, 6, NA, NA, 1, 0, NA, NA, NA, NA, NA, NA, NA, NA, 3~
$ Alcohol12PlusYr <fct> Yes, NA, NA, NA, No, Yes, Yes, NA, NA, Yes, NA, No, NA, NA, NA, NA, NA, NA, NA, No, NA, Yes, NA, Yes, Yes, NA, NA, NA, NA, Yes, Ye~
$ AlcoholDay <int> NA, NA, NA, NA, NA, 19, 2, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 8, NA, 3, 3, NA, NA, NA, NA, 2, 1, NA, 2, 6, 5, ~
$ AlcoholYear <int> 0, NA, NA, NA, 0, 48, 20, NA, NA, 52, NA, 0, NA, NA, NA, NA, NA, NA, NA, 0, NA, 104, NA, 52, 12, NA, NA, NA, NA, 104, 100, NA, 104~
$ SmokeNow <fct> No, NA, NA, NA, Yes, No, Yes, NA, NA, No, NA, No, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, No, NA, NA, NA, NA, NA, No, NA, ~
$ Smoke100 <fct> Yes, NA, NA, NA, Yes, Yes, Yes, NA, NA, Yes, NA, Yes, NA, No, NA, NA, NA, NA, NA, No, NA, No, NA, No, Yes, NA, NA, NA, NA, No, Yes~
$ SmokeAge <int> 18, NA, NA, NA, 16, 15, 38, NA, NA, 16, NA, 21, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 16, NA, NA, NA, NA, NA, 13, NA, NA~
$ Marijuana <fct> Yes, NA, NA, NA, NA, Yes, Yes, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, No, NA, NA, NA, Yes, No, NA, NA, Yes, NA, Yes, NA, ~
$ AgeFirstMarij <int> 17, NA, NA, NA, NA, 10, 18, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 13, NA, NA, NA, 18, NA, 15, NA, NA, 19~
$ RegularMarij <fct> No, NA, NA, NA, NA, Yes, No, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, No, NA, NA, NA, No, NA, Yes, NA, NA, ~
$ AgeRegMarij <int> NA, NA, NA, NA, NA, 12, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 18, NA, NA, 20~
$ HardDrugs <fct> Yes, NA, NA, NA, No, Yes, Yes, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, No, NA, No, NA, No, No, NA, NA, No, NA, Yes, No, No~
$ SexEver <fct> Yes, NA, NA, NA, Yes, Yes, Yes, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, Yes, NA, Yes, NA, Yes, No, NA, NA, Yes, NA, Yes, Y~
$ SexAge <int> 16, NA, NA, NA, 15, 9, 12, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 18, NA, 15, NA, 13, NA, NA, NA, 15, NA, 15, 17, 15, 22,~
$ SexNumPartnLife <int> 8, NA, NA, NA, 4, 10, 10, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 100, NA, 15, NA, 20, 0, NA, NA, 3, NA, 50, 15, 1, 7, 100~
$ SexNumPartYear <int> 1, NA, NA, NA, NA, 1, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, 0, 0, NA, NA, 1, NA, 3, NA, 1, 1, 1, 2, NA~
$ SameSex <fct> No, NA, NA, NA, No, No, Yes, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, No, NA, No, NA, Yes, No, NA, NA, No, NA, No, No, No, ~
$ SexOrientation <fct> Heterosexual, NA, NA, NA, NA, Heterosexual, Heterosexual, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, Heterosexual, NA, NA, NA~
$ WTINT2YR <dbl> 80100.544, 53901.104, 13953.078, 11664.899, 20090.339, 22537.827, 74212.270, 23306.398, 8056.943, 11998.401, 9805.508, 21806.929, ~
$ WTMEC2YR <dbl> 81528.772, 56995.035, 14509.279, 12041.635, 21000.339, 22633.582, 74112.487, 24776.492, 8175.946, 12381.115, 10232.612, 22502.507,~
$ SDMVPSU <int> 1, 2, 1, 2, 2, 1, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 2, 1, 2, 1, 2, 2, 1, 2, 1, 1, 2, 1, 2, 2, 2, 1, 2, 2, 1~
$ SDMVSTRA <int> 83, 79, 84, 86, 75, 88, 85, 86, 88, 77, 86, 79, 84, 77, 88, 89, 81, 79, 88, 75, 80, 75, 77, 78, 76, 78, 86, 82, 87, 79, 86, 82, 77~
$ PregnantNow <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, Unknown, NA, NA, NA, NA, NA, No, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
summary(NHANESraw)
ID SurveyYr Gender Age AgeMonths Race1 Race3 Education MaritalStatus
Min. :51624 2009_10:10537 female:10212 Min. : 0.00 Min. : 0.0 Black :4640 Asian : 1282 8th Grade :1321 Divorced :1250
1st Qu.:56697 2011_12: 9756 male :10081 1st Qu.:10.00 1st Qu.: 90.0 Hispanic:2209 Black : 2683 9 - 11th Grade:1787 LivePartner : 923
Median :61770 Median :28.00 Median :285.0 Mexican :3739 Hispanic: 1076 High School :2595 Married :5869
Mean :61770 Mean :32.02 Mean :351.6 White :7393 Mexican : 1355 Some College :3399 NeverMarried:2287
3rd Qu.:66843 3rd Qu.:53.00 3rd Qu.:592.0 Other :2312 White : 2973 College Grad :2656 Separated : 411
Max. :71916 Max. :80.00 Max. :959.0 Other : 387 NA's :8535 Widowed :1027
NA's :9555 NA's :10537 NA's :8526
HHIncome HHIncomeMid Poverty HomeRooms HomeOwn Work Weight Length HeadCirc
more 99999 :2892 Min. : 2500 Min. :0.000 Min. : 1.000 Own :10939 Looking : 576 Min. : 2.70 Min. : 45.00 Min. :32.50
25000-34999:2483 1st Qu.: 22500 1st Qu.:0.890 1st Qu.: 4.000 Rent : 8715 NotWorking:5890 1st Qu.: 37.70 1st Qu.: 70.70 1st Qu.:39.40
35000-44999:1789 Median : 40000 Median :1.650 Median : 6.000 Other: 502 Working :6594 Median : 65.70 Median : 83.90 Median :41.50
75000-99999:1697 Mean : 47386 Mean :2.217 Mean : 5.806 NA's : 137 NA's :7233 Mean : 62.45 Mean : 82.17 Mean :41.22
20000-24999:1682 3rd Qu.: 87500 3rd Qu.:3.510 3rd Qu.: 7.000 3rd Qu.: 83.60 3rd Qu.: 93.60 3rd Qu.:43.10
(Other) :7674 Max. :100000 Max. :5.000 Max. :13.000 Max. :239.40 Max. :115.60 Max. :48.40
NA's :2076 NA's :2076 NA's :1836 NA's :145 NA's :888 NA's :18008 NA's :19819
Height BMI BMICatUnder20yrs BMI_WHO Pulse BPSysAve BPDiaAve BPSys1 BPDia1
Min. : 79.1 Min. :12.40 UnderWeight: 126 12.0_18.5 :3641 Min. : 0.00 Min. : 74.0 Min. : 0.0 Min. : 72.0 Min. : 0.00
1st Qu.:149.8 1st Qu.:19.79 NormWeight : 2155 18.5_to_24.9:5354 1st Qu.: 66.00 1st Qu.:105.0 1st Qu.: 58.0 1st Qu.:106.0 1st Qu.: 58.00
Median :162.4 Median :24.92 OverWeight : 481 25.0_to_29.9:4387 Median : 74.00 Median :115.0 Median : 67.0 Median :116.0 Median : 68.00
Mean :156.0 Mean :25.65 Obese : 593 30.0_plus :4565 Mean : 74.08 Mean :118.1 Mean : 65.6 Mean :119.2 Mean : 66.52
3rd Qu.:171.6 3rd Qu.:30.10 NA's :16938 NA's :2346 3rd Qu.: 82.00 3rd Qu.:127.0 3rd Qu.: 75.0 3rd Qu.:128.0 3rd Qu.: 76.00
Max. :204.5 Max. :84.87 Max. :172.00 Max. :233.0 Max. :131.0 Max. :238.0 Max. :134.00
NA's :2258 NA's :2279 NA's :5397 NA's :5426 NA's :5426 NA's :6008 NA's :6008
BPSys2 BPDia2 BPSys3 BPDia3 Testosterone DirectChol TotChol UrineVol1 UrineFlow1
Min. : 74.0 Min. : 0.00 Min. : 74.0 Min. : 0.0 Min. : 0.25 Min. :0.280 Min. : 1.530 Min. : 0.0 Min. : 0.000
1st Qu.:106.0 1st Qu.: 58.00 1st Qu.:104.0 1st Qu.: 58.0 1st Qu.: 14.99 1st Qu.:1.090 1st Qu.: 4.010 1st Qu.: 47.0 1st Qu.: 0.368
Median :116.0 Median : 68.00 Median :116.0 Median : 66.0 Median : 35.88 Median :1.320 Median : 4.650 Median : 88.0 Median : 0.638
Mean :118.5 Mean : 65.74 Mean :117.8 Mean : 65.4 Mean : 184.94 Mean :1.361 Mean : 4.771 Mean :113.6 Mean : 0.910
3rd Qu.:128.0 3rd Qu.: 76.00 3rd Qu.:128.0 3rd Qu.: 76.0 3rd Qu.: 342.65 3rd Qu.:1.580 3rd Qu.: 5.430 3rd Qu.:156.0 3rd Qu.: 1.122
Max. :234.0 Max. :134.00 Max. :232.0 Max. :128.0 Max. :2543.99 Max. :4.630 Max. :13.650 Max. :524.0 Max. :39.800
NA's :5812 NA's :5812 NA's :5788 NA's :5788 NA's :13467 NA's :5458 NA's :5459 NA's :4210 NA's :5603
UrineVol2 UrineFlow2 Diabetes DiabetesAge HealthGen DaysPhysHlthBad DaysMentHlthBad LittleInterest Depressed
Min. : 0.0 Min. : 0.000 No :17754 Min. : 1.0 Excellent:1309 Min. : 0.000 Min. : 0.000 None :7825 None :7926
1st Qu.: 43.0 1st Qu.: 0.437 Yes : 1706 1st Qu.:40.0 Vgood :3461 1st Qu.: 0.000 1st Qu.: 0.000 Several:1790 Several:1774
Median : 82.0 Median : 0.724 NA's: 833 Median :50.0 Good :4959 Median : 0.000 Median : 0.000 Most : 893 Most : 814
Mean :112.2 Mean : 1.136 Mean :49.5 Fair :2284 Mean : 3.719 Mean : 4.151 NA's :9785 NA's :9779
3rd Qu.:160.0 3rd Qu.: 1.445 3rd Qu.:60.0 Poor : 436 3rd Qu.: 3.000 3rd Qu.: 4.000
Max. :420.0 Max. :62.333 Max. :80.0 NA's :7844 Max. :30.000 Max. :30.000
NA's :17585 NA's :17596 NA's :18856 NA's :7862 NA's :7867
nPregnancies nBabies Age1stBaby SleepHrsNight SleepTrouble PhysActive PhysActiveDays TVHrsDay CompHrsDay
Min. : 1.000 Min. : 0.000 Min. :14.00 Min. : 2.000 No :10077 No :6901 Min. : 1.000 2_hr : 2389 0_hrs : 2586
1st Qu.: 2.000 1st Qu.: 2.000 1st Qu.:18.00 1st Qu.: 6.000 Yes : 2981 Yes :7377 1st Qu.: 2.000 1_hr : 1616 0_to_1_hr: 2354
Median : 3.000 Median : 2.000 Median :21.00 Median : 7.000 NA's: 7235 NA's:6015 Median : 3.000 3_hr : 1533 1_hr : 1646
Mean : 3.428 Mean : 2.772 Mean :21.73 Mean : 6.891 Mean : 3.783 More_4_hr: 1275 2_hr : 1129
3rd Qu.: 4.000 3rd Qu.: 3.000 3rd Qu.:24.00 3rd Qu.: 8.000 3rd Qu.: 5.000 0_to_1_hr: 1175 3_hr : 562
Max. :32.000 Max. :17.000 Max. :39.00 Max. :12.000 Max. :99.000 (Other) : 1077 (Other) : 797
NA's :16091 NA's :16354 NA's :17135 NA's :7261 NA's :12918 NA's :11228 NA's :11219
TVHrsDayChild CompHrsDayChild Alcohol12PlusYr AlcoholDay AlcoholYear SmokeNow Smoke100 SmokeAge Marijuana AgeFirstMarij
Min. : 0.000 Min. : 0.000 No :2820 Min. : 1.000 Min. : 0.00 No : 2779 No :6536 Min. : 6.00 No : 3353 Min. : 0.00
1st Qu.: 1.000 1st Qu.: 0.000 Yes :7483 1st Qu.: 1.000 1st Qu.: 1.00 Yes : 2454 Yes :5235 1st Qu.:15.00 Yes : 3719 1st Qu.:15.00
Median : 2.000 Median : 1.000 NA's:9990 Median : 2.000 Median : 12.00 NA's:15060 NA's:8522 Median :17.00 NA's:13221 Median :16.00
Mean : 2.134 Mean : 2.647 Mean : 2.968 Mean : 64.36 Mean :18.06 Mean :16.96
3rd Qu.: 3.000 3rd Qu.: 6.000 3rd Qu.: 4.000 3rd Qu.:104.00 3rd Qu.:20.00 3rd Qu.:18.00
Max. :99.000 Max. :77.000 Max. :82.000 Max. :364.00 Max. :72.00 Max. :56.00
NA's :18065 NA's :18065 NA's :13300 NA's :11462 NA's :15244 NA's :16579
RegularMarij AgeRegMarij HardDrugs SexEver SexAge SexNumPartnLife SexNumPartYear SameSex SexOrientation
No : 1892 Min. : 0.00 No : 7207 No : 471 Min. : 9.00 Min. : 0.00 Min. : 0.000 No : 8057 Bisexual : 202
Yes : 1820 1st Qu.:15.00 Yes : 1434 Yes : 8167 1st Qu.:15.00 1st Qu.: 2.00 1st Qu.: 1.000 Yes : 579 Heterosexual: 6534
NA's:16581 Median :17.00 NA's:11652 NA's:11655 Median :17.00 Median : 5.00 Median : 1.000 NA's:11657 Homosexual : 111
Mean :17.63 Mean :17.39 Mean : 14.52 Mean : 1.382 NA's :13446
3rd Qu.:19.00 3rd Qu.:19.00 3rd Qu.: 12.00 3rd Qu.: 1.000
Max. :52.00 Max. :55.00 Max. :2000.00 Max. :100.000
NA's :18473 NA's :12157 NA's :11761 NA's :13253
WTINT2YR WTMEC2YR SDMVPSU SDMVSTRA PregnantNow
Min. : 3280 Min. : 0 Min. :1.000 Min. : 75.00 Yes : 125
1st Qu.: 11709 1st Qu.: 11622 1st Qu.:1.000 1st Qu.: 81.00 No : 2332
Median : 18913 Median : 18971 Median :2.000 Median : 88.00 Unknown: 156
Mean : 29987 Mean : 29987 Mean :1.594 Mean : 88.38 NA's :17680
3rd Qu.: 34954 3rd Qu.: 35312 3rd Qu.:2.000 3rd Qu.: 95.00
Max. :220233 Max. :222580 Max. :3.000 Max. :103.00
Kita lihat dari glimpse() bahwa data NHANESraw memiliki banyak variabel pengukuran kesehatan. Data ini juga mengandung variabel bobot sampling WTMEC2YR.
Karena data NHANESraw mencakup 4 tahun (2009-2012) dan bobot pengambilan sampel didasarkan pada 2 tahun data, pertama-tama kita harus membuat variabel bobot yang menskalakan sampel selama 4 tahun penuh. Saat ini jumlah bobot adalah 2 kali lipat dari jumlah populasi AS, jadi kita perlu membagi bobot 2 tahun menjadi dua sehingga secara total, jumlah bobot sama dengan populasi AS.
Data NHANES memiliki sampel yang terlalu banyak untuk beberapa wilayah geografis dan kelompok minoritas tertentu. Dengan memeriksa distribusi bobot sampel untuk setiap ras, kita dapat melihat bahwa orang kulit putih kurang tersampel dan memiliki bobot yang lebih tinggi, sementara orang kulit hitam, Meksiko, Hispanik yang tersampel lebih banyak memiliki bobot yang lebih rendah karena setiap orang yang tersampel dalam kelompok-kelompok minoritas ini mewakili lebih sedikit orang AS.
# Load the ggplot2 package
library(ggplot2)
# Use mutate to create a 4-year weight variable and call it WTMEC4YR
NHANESraw <- NHANESraw %>% mutate(WTMEC4YR = WTMEC2YR/2)
# Calculate the sum of this weight variable
NHANESraw %>% summarize(WTMEC4YR)
# Plot the sample weights using boxplots, with Race1 on the x-axis
ggplot(NHANESraw, aes(x = Race1, y = WTMEC4YR, fill = Race1)) + geom_boxplot()
Sekarang kita akan menggunakan paket survei di R untuk menentukan desain survei yang kompleks yang akan kita gunakan dalam analisis selanjutnya. Kita perlu menentukan desain agar bobot dan desain sampling dapat digunakan dengan benar dalam model statistik.
Data NHANESraw berisi variabel strata SDMVSTRA, dan variabel id klaster (juga dikenal sebagai unit pengambilan sampel primer, PSU), SDMVPSU, yang memperhitungkan efek desain pengelompokan. Kluster-kluster ini (PSU) bersarang di dalam strata.
# Load the survey package
library(survey)
Loading required package: grid
Loading required package: Matrix
Loading required package: survival
Attaching package: ‘survey’
The following object is masked from ‘package:graphics’:
dotchart
# Specify the survey design
nhanes_design <- svydesign(
data = NHANESraw,
strata = ~SDMVSTRA,
id = ~SDMVPSU,
nest = TRUE,
weights = ~WTMEC4YR)
# Print a summary of this design
summary(nhanes_design)
Stratified 1 - level Cluster Sampling design (with replacement)
With (62) clusters.
svydesign(data = NHANESraw, strata = ~SDMVSTRA, id = ~SDMVPSU,
nest = TRUE, weights = ~WTMEC4YR)
Probabilities:
Min. 1st Qu. Median Mean 3rd Qu. Max.
8.986e-06 5.664e-05 1.054e-04 Inf 1.721e-04 Inf
Stratum Sizes:
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
obs 803 785 823 829 696 751 696 724 713 683 592 946 598 647 251 862 998 875 602 688 722 676 608 708 682 700 715 624 296
design.PSU 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 3 3 3 2 2 2 2 2 2 2 2 2 2 2
actual.PSU 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 3 3 3 2 2 2 2 2 2 2 2 2 2 2
Data variables:
[1] "ID" "SurveyYr" "Gender" "Age" "AgeMonths" "Race1" "Race3" "Education"
[9] "MaritalStatus" "HHIncome" "HHIncomeMid" "Poverty" "HomeRooms" "HomeOwn" "Work" "Weight"
[17] "Length" "HeadCirc" "Height" "BMI" "BMICatUnder20yrs" "BMI_WHO" "Pulse" "BPSysAve"
[25] "BPDiaAve" "BPSys1" "BPDia1" "BPSys2" "BPDia2" "BPSys3" "BPDia3" "Testosterone"
[33] "DirectChol" "TotChol" "UrineVol1" "UrineFlow1" "UrineVol2" "UrineFlow2" "Diabetes" "DiabetesAge"
[41] "HealthGen" "DaysPhysHlthBad" "DaysMentHlthBad" "LittleInterest" "Depressed" "nPregnancies" "nBabies" "Age1stBaby"
[49] "SleepHrsNight" "SleepTrouble" "PhysActive" "PhysActiveDays" "TVHrsDay" "CompHrsDay" "TVHrsDayChild" "CompHrsDayChild"
[57] "Alcohol12PlusYr" "AlcoholDay" "AlcoholYear" "SmokeNow" "Smoke100" "SmokeAge" "Marijuana" "AgeFirstMarij"
[65] "RegularMarij" "AgeRegMarij" "HardDrugs" "SexEver" "SexAge" "SexNumPartnLife" "SexNumPartYear" "SameSex"
[73] "SexOrientation" "WTINT2YR" "WTMEC2YR" "SDMVPSU" "SDMVSTRA" "PregnantNow" "WTMEC4YR"
Analisis data survei memerlukan pertimbangan yang cermat terhadap desain pengambilan sampel dan bobot di setiap langkah. Sesuatu yang sederhana seperti memfilter data menjadi rumit ketika melibatkan pembobotan.
Ketika kita ingin memeriksa subset data (misalnya, subpopulasi orang Hispanik dewasa yang menderita diabetes, atau wanita hamil), kita harus secara eksplisit menentukan hal ini dalam desain. Kita tidak dapat dengan mudah menghapus subset data tersebut melalui filter data mentah karena bobot survei tidak akan lagi benar dan tidak akan bertambah menjadi populasi AS secara keseluruhan.
Kategori BMI berbeda untuk anak-anak dan dewasa muda yang berusia kurang dari 20 tahun, jadi kami akan membagi data untuk hanya menganalisis orang dewasa yang berusia minimal 20 tahun.
# Select adults of Age >= 20 with subset
nhanes_adult <- subset(nhanes_design, Age >= 20)
# Print a summary of this subset
summary(nhanes_adult)
Stratified 1 - level Cluster Sampling design (with replacement)
With (62) clusters.
subset(nhanes_design, Age >= 20)
Probabilities:
Min. 1st Qu. Median Mean 3rd Qu. Max.
8.986e-06 4.303e-05 8.107e-05 Inf 1.240e-04 Inf
Stratum Sizes:
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
obs 471 490 526 500 410 464 447 400 411 395 357 512 327 355 153 509 560 483 376 368 454 362 315 414 409 377 460 308 165
design.PSU 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 3 3 3 2 2 2 2 2 2 2 2 2 2 2
actual.PSU 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 3 3 3 2 2 2 2 2 2 2 2 2 2 2
Data variables:
[1] "ID" "SurveyYr" "Gender" "Age" "AgeMonths" "Race1" "Race3" "Education"
[9] "MaritalStatus" "HHIncome" "HHIncomeMid" "Poverty" "HomeRooms" "HomeOwn" "Work" "Weight"
[17] "Length" "HeadCirc" "Height" "BMI" "BMICatUnder20yrs" "BMI_WHO" "Pulse" "BPSysAve"
[25] "BPDiaAve" "BPSys1" "BPDia1" "BPSys2" "BPDia2" "BPSys3" "BPDia3" "Testosterone"
[33] "DirectChol" "TotChol" "UrineVol1" "UrineFlow1" "UrineVol2" "UrineFlow2" "Diabetes" "DiabetesAge"
[41] "HealthGen" "DaysPhysHlthBad" "DaysMentHlthBad" "LittleInterest" "Depressed" "nPregnancies" "nBabies" "Age1stBaby"
[49] "SleepHrsNight" "SleepTrouble" "PhysActive" "PhysActiveDays" "TVHrsDay" "CompHrsDay" "TVHrsDayChild" "CompHrsDayChild"
[57] "Alcohol12PlusYr" "AlcoholDay" "AlcoholYear" "SmokeNow" "Smoke100" "SmokeAge" "Marijuana" "AgeFirstMarij"
[65] "RegularMarij" "AgeRegMarij" "HardDrugs" "SexEver" "SexAge" "SexNumPartnLife" "SexNumPartYear" "SameSex"
[73] "SexOrientation" "WTINT2YR" "WTMEC2YR" "SDMVPSU" "SDMVSTRA" "PregnantNow" "WTMEC4YR"
# Compare the number of observations in the full data to the adult data
nrow(nhanes_design)
[1] 20293
nrow(nhanes_adult)
[1] 11778
Dengan metode survei, kita dapat menggunakan bobot sampling untuk memperkirakan distribusi pengukuran yang sebenarnya dalam seluruh populasi. Hal ini dapat digunakan untuk banyak statistik seperti rata-rata, proporsi, dan standar deviasi.
Kita akan menggunakan metode survei untuk memperkirakan rata-rata BMI pada populasi orang dewasa di AS dan juga untuk menggambar histogram berbobot dari distribusi tersebut.
# Menghitung rata-rata BMI pada dataset NHANESraw
bmi_mean_raw <- NHANESraw %>%
filter(Age >= 20) %>%
summarize(mean(BMI, na.rm = TRUE))
bmi_mean_raw
# Hitung rata-rata BMI dengan mempertimbangkan bobot sampel pada orang dewasa AS
bmi_mean <- svymean(~BMI, design = nhanes_adult, na.rm = TRUE)
bmi_mean
mean SE
BMI 28.734 0.1235
# Menggambar histogram BMI pada populasi US dan menunjukkan nilai rata-rata BMI
NHANESraw %>%
filter(Age >= 20) %>%
ggplot(mapping = aes(x = BMI, weight = WTMEC4YR)) +
geom_histogram()+
geom_vline(xintercept = coef(bmi_mean), color="red")
# Menghitung Standar Deviasi
sd_bmi = NHANESraw %>%
filter(Age >= 20)
sd(sd_bmi$BMI, na.rm = TRUE)
[1] 6.870053
Memvisualisasikan distribusi kategori BMI dan ras dapat membantu kita menemukan modus, yaitu kategori BMI yang paling umum di setiap ras. Untuk melakukan ini, kita dapat menggunakan stacked bar chart.
Pada stacked bar chart, sumbu x menunjukkan kategori BMI, dan sumbu y menunjukkan frekuensi atau proporsi individu dalam setiap kategori BMI. Di dalam setiap bar terdapat label yang berisi informasi jumlah orang. Dengan mengelompokkan data berdasarkan ras, kita dapat melihat bagaimana distribusi kategori BMI berbeda di antara kelompok ras yang berbeda.
library("purrr")
NHANESrawAdult <- NHANESraw[NHANESraw$Age >= 20 & !is.na(NHANESraw$BMI), ]
bmi = NHANESrawAdult$BMI
ras = NHANESrawAdult$Race1
weight = NHANESrawAdult$WTMEC4YR
quantiles <- c(0,18.4,24,29,Inf)
# Apply the custom cut function
bmiCat <- cut(bmi, quantiles, c("underWeight", "Normweight", "overweight", "obese"))
bmiCat
[1] obese obese obese obese overweight overweight obese overweight overweight obese obese Normweight overweight Normweight overweight
[16] obese obese obese overweight overweight overweight overweight obese overweight overweight obese Normweight Normweight obese overweight
[31] overweight obese Normweight Normweight overweight obese Normweight obese overweight obese overweight obese Normweight overweight obese
[46] obese overweight overweight obese overweight Normweight overweight obese obese obese obese overweight overweight Normweight Normweight
[61] obese Normweight obese Normweight Normweight overweight overweight overweight obese overweight obese obese obese overweight obese
[76] obese overweight overweight obese Normweight obese obese overweight obese overweight overweight overweight obese overweight Normweight
[91] obese overweight overweight overweight overweight obese overweight Normweight obese obese overweight obese obese overweight Normweight
[106] overweight overweight Normweight Normweight obese obese Normweight obese underWeight overweight obese obese obese Normweight Normweight
[121] Normweight Normweight Normweight obese obese obese obese obese Normweight overweight overweight Normweight overweight overweight overweight
[136] Normweight overweight obese Normweight obese obese obese overweight obese Normweight overweight Normweight Normweight overweight overweight
[151] obese obese obese overweight overweight Normweight overweight obese obese obese Normweight Normweight obese Normweight obese
[166] obese overweight Normweight obese obese obese obese obese Normweight obese obese obese overweight overweight obese
[181] obese obese obese overweight obese obese obese overweight obese overweight Normweight Normweight obese Normweight Normweight
[196] overweight obese overweight obese obese obese obese overweight obese overweight obese Normweight obese overweight overweight
[211] overweight Normweight obese overweight obese obese obese overweight obese overweight obese overweight Normweight Normweight overweight
[226] obese obese overweight obese underWeight obese overweight overweight obese obese Normweight overweight obese obese obese
[241] overweight obese overweight obese overweight obese obese Normweight obese obese overweight overweight obese obese overweight
[256] Normweight overweight obese obese obese Normweight obese obese overweight obese overweight overweight obese obese overweight
[271] obese obese obese overweight obese obese Normweight obese obese Normweight Normweight overweight Normweight obese obese
[286] obese obese overweight obese obese obese overweight obese obese Normweight Normweight overweight obese Normweight Normweight
[301] obese overweight obese obese obese overweight obese obese overweight obese overweight obese obese obese overweight
[316] Normweight obese overweight overweight obese Normweight obese overweight obese obese obese Normweight obese overweight obese
[331] Normweight obese obese Normweight Normweight overweight overweight overweight obese overweight Normweight obese obese obese obese
[346] Normweight Normweight obese obese overweight obese overweight obese obese Normweight overweight Normweight Normweight Normweight overweight
[361] overweight Normweight overweight overweight overweight overweight obese obese Normweight obese obese overweight obese overweight obese
[376] obese overweight overweight obese Normweight obese overweight obese obese obese overweight Normweight overweight Normweight Normweight
[391] Normweight Normweight overweight Normweight Normweight overweight obese overweight overweight Normweight obese overweight overweight overweight overweight
[406] overweight Normweight Normweight Normweight overweight overweight overweight obese obese obese overweight obese obese overweight obese
[421] Normweight obese Normweight overweight overweight overweight obese Normweight overweight obese overweight obese Normweight overweight obese
[436] obese Normweight Normweight overweight overweight Normweight overweight obese overweight obese obese overweight Normweight overweight overweight
[451] Normweight Normweight obese obese obese overweight overweight obese overweight overweight obese obese obese obese obese
[466] Normweight obese obese Normweight obese overweight obese obese obese obese overweight Normweight obese obese overweight
[481] overweight obese overweight overweight Normweight obese obese obese Normweight obese obese overweight obese Normweight Normweight
[496] overweight obese Normweight obese Normweight obese overweight overweight overweight Normweight overweight obese obese obese overweight
[511] overweight obese obese overweight obese obese overweight Normweight obese overweight overweight obese overweight obese overweight
[526] overweight obese Normweight obese Normweight Normweight overweight Normweight overweight obese overweight obese overweight obese overweight
[541] overweight Normweight overweight underWeight obese overweight overweight obese obese obese obese overweight obese Normweight obese
[556] overweight Normweight obese obese Normweight overweight obese overweight overweight Normweight obese obese Normweight overweight overweight
[571] obese overweight Normweight obese overweight Normweight obese obese overweight overweight Normweight overweight overweight obese overweight
[586] obese Normweight obese obese obese obese overweight obese obese Normweight obese overweight overweight obese obese
[601] obese obese obese obese obese Normweight overweight obese Normweight obese Normweight obese Normweight overweight obese
[616] overweight overweight obese obese underWeight obese obese Normweight obese obese obese obese overweight obese obese
[631] overweight overweight obese overweight overweight Normweight obese obese Normweight Normweight obese obese obese overweight obese
[646] overweight overweight Normweight overweight obese obese obese Normweight obese obese obese overweight obese obese obese
[661] overweight obese overweight obese obese Normweight Normweight Normweight obese obese Normweight obese overweight obese obese
[676] obese obese obese Normweight obese Normweight obese obese overweight overweight obese Normweight Normweight obese obese
[691] overweight obese obese overweight overweight obese Normweight obese overweight Normweight obese obese Normweight obese overweight
[706] obese Normweight obese obese obese obese Normweight obese overweight obese obese obese overweight overweight overweight
[721] Normweight obese Normweight obese obese Normweight overweight overweight obese obese obese obese obese obese overweight
[736] obese Normweight obese Normweight Normweight obese obese overweight overweight obese overweight overweight obese obese obese
[751] obese overweight overweight obese overweight overweight obese obese obese overweight overweight overweight Normweight obese obese
[766] obese overweight overweight obese Normweight Normweight obese obese obese obese obese Normweight Normweight obese obese
[781] overweight Normweight Normweight Normweight overweight overweight Normweight obese Normweight overweight obese overweight underWeight Normweight Normweight
[796] obese obese obese overweight overweight obese obese overweight Normweight Normweight Normweight overweight overweight overweight overweight
[811] overweight obese obese overweight overweight overweight obese obese obese obese Normweight overweight obese obese overweight
[826] overweight obese overweight overweight Normweight Normweight overweight obese obese obese overweight Normweight overweight obese obese
[841] overweight Normweight overweight obese Normweight obese obese obese Normweight obese obese obese overweight Normweight Normweight
[856] overweight overweight overweight overweight underWeight Normweight overweight obese overweight obese overweight Normweight overweight obese Normweight
[871] obese obese overweight Normweight overweight overweight obese Normweight Normweight Normweight overweight overweight Normweight obese Normweight
[886] obese overweight obese overweight Normweight Normweight obese obese obese obese overweight overweight overweight obese obese
[901] overweight overweight obese obese overweight obese overweight obese obese obese overweight Normweight obese overweight overweight
[916] overweight overweight overweight obese obese obese Normweight obese overweight overweight overweight overweight obese obese obese
[931] overweight overweight obese overweight obese overweight overweight obese obese Normweight overweight overweight overweight Normweight obese
[946] obese overweight obese obese obese overweight obese obese obese Normweight obese Normweight Normweight Normweight obese
[961] obese obese overweight overweight obese Normweight obese overweight overweight obese overweight Normweight obese Normweight obese
[976] overweight overweight overweight obese overweight Normweight Normweight obese Normweight obese Normweight obese obese obese overweight
[991] Normweight overweight overweight obese Normweight obese overweight overweight obese overweight
[ reached getOption("max.print") -- omitted 10231 entries ]
Levels: underWeight Normweight overweight obese
data <- data.frame(Ras = ras, BMI_Category = bmiCat, population = weight)
cross_table <- table(data$Ras, data$BMI_Category)
cross_table_with_total <- addmargins(cross_table, margin = 2)
ggplot(data, aes(x = Ras, fill = BMI_Category)) +
geom_bar(position = "fill") + ylab("proportion") +
stat_count(geom = "text",
aes(label = stat(count)),
position=position_fill(vjust = 0.5), colour="white")
cross_table_with_total
underWeight Normweight overweight obese Sum
Black 45 425 710 1289 2469
Hispanic 7 221 415 505 1148
Mexican 9 215 576 805 1605
White 86 1135 1611 1950 4782
Other 43 510 430 244 1227
Boxplot adalah metode grafis untuk menggambarkan kumpulan data numerik berdasarkan nilai kuartilnya. Boxplot menunjukkan distribusi data dengan garis tengah, dua kuartil (Q1 dan Q3), dan dua ekstrem (minimum dan maksimum).
Untuk memvisualisasikan hubungan BMI dan aktivitas fisik, kita dapat membuat boxplot dengan dua kategori aktivitas fisik: aktif dan tidak aktif. Pada boxplot, median dari kategori aktif akan lebih rendah dibandingkan dengan median dari kategori tidak aktif. Hal ini menunjukkan bahwa orang yang secara fisik aktif memiliki BMI yang lebih rendah dibandingkan dengan orang yang secara fisik tidak aktif.
NHANESraw %>%
filter(Age>=20) %>%
ggplot(mapping = aes(x = PhysActive, y =BMI)) +
geom_boxplot()
Untuk memvisualisasikan hubungan BMI dan kebiasaan merokok, kita dapat membuat boxplot dengan dua kategori kebiasaan merokok: perokok dan bukan perokok. Pada boxplot, median dari kategori perokok akan lebih rendah dibandingkan dengan median dari kategori bukan perokok. Hal ini menunjukkan bahwa orang yang merokok cenderung memiliki BMI yang lebih rendah dari rata-rata dibandingkan dengan orang yang tidak merokok.
NHANESraw %>%
filter(Age>=20, !is.na(SmokeNow)) %>%
ggplot(mapping = aes(x = SmokeNow, y = BMI, weight = WTMEC4YR)) +
geom_boxplot()