The code below examines relationships between diabetes and insomnia. In doing so, we use the NHANES data set in the NHANES R package. You may find more about the data set in the NHANES package’s manual. To find the manual, Google the package’s name or click [here] (https://cran.r-project.org/web/packages/NHANES/NHANES.pdf).
Def of variables
Given the data, it is shown below whether or not is appropriate to reject the null hypothesis in favor of the research claim of interest.
Your assignment
Suppose that your boss is more interested in relationships between smoking marijuana and diabetes. Are those who regularly smoke marijuana are more likely to be diabetic than those who don’t? Run the program after replacing SleepTrouble by RegularMarij.
Would you reject the null hypothesis? In other words, would you conclude that those who smoking marijuana are more likely to be diabetic than those who don’t. At what significance level? Write your conclusion by revising the interpretation given in the Ex_Inference file.
Hint: The row represents survey participants.
There are 10,000 participants.
Hint: Check the result of the summary() function.
1366 survey participants reported regularly soming marijuana.
# Load packages
library(NHANES)
library(tidyverse)
str(NHANES)
## Classes 'tbl_df', 'tbl' and 'data.frame': 10000 obs. of 76 variables:
## $ ID : int 51624 51624 51624 51625 51630 51638 51646 51647 51647 51647 ...
## $ SurveyYr : Factor w/ 2 levels "2009_10","2011_12": 1 1 1 1 1 1 1 1 1 1 ...
## $ Gender : Factor w/ 2 levels "female","male": 2 2 2 2 1 2 2 1 1 1 ...
## $ Age : int 34 34 34 4 49 9 8 45 45 45 ...
## $ AgeDecade : Factor w/ 8 levels " 0-9"," 10-19",..: 4 4 4 1 5 1 1 5 5 5 ...
## $ AgeMonths : int 409 409 409 49 596 115 101 541 541 541 ...
## $ Race1 : Factor w/ 5 levels "Black","Hispanic",..: 4 4 4 5 4 4 4 4 4 4 ...
## $ Race3 : Factor w/ 6 levels "Asian","Black",..: NA NA NA NA NA NA NA NA NA NA ...
## $ Education : Factor w/ 5 levels "8th Grade","9 - 11th Grade",..: 3 3 3 NA 4 NA NA 5 5 5 ...
## $ MaritalStatus : Factor w/ 6 levels "Divorced","LivePartner",..: 3 3 3 NA 2 NA NA 3 3 3 ...
## $ HHIncome : Factor w/ 12 levels " 0-4999"," 5000-9999",..: 6 6 6 5 7 11 9 11 11 11 ...
## $ HHIncomeMid : int 30000 30000 30000 22500 40000 87500 60000 87500 87500 87500 ...
## $ Poverty : num 1.36 1.36 1.36 1.07 1.91 1.84 2.33 5 5 5 ...
## $ HomeRooms : int 6 6 6 9 5 6 7 6 6 6 ...
## $ HomeOwn : Factor w/ 3 levels "Own","Rent","Other": 1 1 1 1 2 2 1 1 1 1 ...
## $ Work : Factor w/ 3 levels "Looking","NotWorking",..: 2 2 2 NA 2 NA NA 3 3 3 ...
## $ Weight : num 87.4 87.4 87.4 17 86.7 29.8 35.2 75.7 75.7 75.7 ...
## $ Length : num NA NA NA NA NA NA NA NA NA NA ...
## $ HeadCirc : num NA NA NA NA NA NA NA NA NA NA ...
## $ Height : num 165 165 165 105 168 ...
## $ BMI : num 32.2 32.2 32.2 15.3 30.6 ...
## $ BMICatUnder20yrs: Factor w/ 4 levels "UnderWeight",..: NA NA NA NA NA NA NA NA NA NA ...
## $ BMI_WHO : Factor w/ 4 levels "12.0_18.5","18.5_to_24.9",..: 4 4 4 1 4 1 2 3 3 3 ...
## $ Pulse : int 70 70 70 NA 86 82 72 62 62 62 ...
## $ BPSysAve : int 113 113 113 NA 112 86 107 118 118 118 ...
## $ BPDiaAve : int 85 85 85 NA 75 47 37 64 64 64 ...
## $ BPSys1 : int 114 114 114 NA 118 84 114 106 106 106 ...
## $ BPDia1 : int 88 88 88 NA 82 50 46 62 62 62 ...
## $ BPSys2 : int 114 114 114 NA 108 84 108 118 118 118 ...
## $ BPDia2 : int 88 88 88 NA 74 50 36 68 68 68 ...
## $ BPSys3 : int 112 112 112 NA 116 88 106 118 118 118 ...
## $ BPDia3 : int 82 82 82 NA 76 44 38 60 60 60 ...
## $ Testosterone : num NA NA NA NA NA NA NA NA NA NA ...
## $ DirectChol : num 1.29 1.29 1.29 NA 1.16 1.34 1.55 2.12 2.12 2.12 ...
## $ TotChol : num 3.49 3.49 3.49 NA 6.7 4.86 4.09 5.82 5.82 5.82 ...
## $ UrineVol1 : int 352 352 352 NA 77 123 238 106 106 106 ...
## $ UrineFlow1 : num NA NA NA NA 0.094 ...
## $ UrineVol2 : int NA NA NA NA NA NA NA NA NA NA ...
## $ UrineFlow2 : num NA NA NA NA NA NA NA NA NA NA ...
## $ Diabetes : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ DiabetesAge : int NA NA NA NA NA NA NA NA NA NA ...
## $ HealthGen : Factor w/ 5 levels "Excellent","Vgood",..: 3 3 3 NA 3 NA NA 2 2 2 ...
## $ DaysPhysHlthBad : int 0 0 0 NA 0 NA NA 0 0 0 ...
## $ DaysMentHlthBad : int 15 15 15 NA 10 NA NA 3 3 3 ...
## $ LittleInterest : Factor w/ 3 levels "None","Several",..: 3 3 3 NA 2 NA NA 1 1 1 ...
## $ Depressed : Factor w/ 3 levels "None","Several",..: 2 2 2 NA 2 NA NA 1 1 1 ...
## $ nPregnancies : int NA NA NA NA 2 NA NA 1 1 1 ...
## $ nBabies : int NA NA NA NA 2 NA NA NA NA NA ...
## $ Age1stBaby : int NA NA NA NA 27 NA NA NA NA NA ...
## $ SleepHrsNight : int 4 4 4 NA 8 NA NA 8 8 8 ...
## $ SleepTrouble : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 1 1 1 ...
## $ PhysActive : Factor w/ 2 levels "No","Yes": 1 1 1 NA 1 NA NA 2 2 2 ...
## $ PhysActiveDays : int NA NA NA NA NA NA NA 5 5 5 ...
## $ TVHrsDay : Factor w/ 7 levels "0_hrs","0_to_1_hr",..: NA NA NA NA NA NA NA NA NA NA ...
## $ CompHrsDay : Factor w/ 7 levels "0_hrs","0_to_1_hr",..: NA NA NA NA NA NA NA NA NA NA ...
## $ TVHrsDayChild : int NA NA NA 4 NA 5 1 NA NA NA ...
## $ CompHrsDayChild : int NA NA NA 1 NA 0 6 NA NA NA ...
## $ Alcohol12PlusYr : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 2 2 2 ...
## $ AlcoholDay : int NA NA NA NA 2 NA NA 3 3 3 ...
## $ AlcoholYear : int 0 0 0 NA 20 NA NA 52 52 52 ...
## $ SmokeNow : Factor w/ 2 levels "No","Yes": 1 1 1 NA 2 NA NA NA NA NA ...
## $ Smoke100 : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 1 1 1 ...
## $ Smoke100n : Factor w/ 2 levels "Non-Smoker","Smoker": 2 2 2 NA 2 NA NA 1 1 1 ...
## $ SmokeAge : int 18 18 18 NA 38 NA NA NA NA NA ...
## $ Marijuana : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 2 2 2 ...
## $ AgeFirstMarij : int 17 17 17 NA 18 NA NA 13 13 13 ...
## $ RegularMarij : Factor w/ 2 levels "No","Yes": 1 1 1 NA 1 NA NA 1 1 1 ...
## $ AgeRegMarij : int NA NA NA NA NA NA NA NA NA NA ...
## $ HardDrugs : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 1 1 1 ...
## $ SexEver : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 2 2 2 ...
## $ SexAge : int 16 16 16 NA 12 NA NA 13 13 13 ...
## $ SexNumPartnLife : int 8 8 8 NA 10 NA NA 20 20 20 ...
## $ SexNumPartYear : int 1 1 1 NA 1 NA NA 0 0 0 ...
## $ SameSex : Factor w/ 2 levels "No","Yes": 1 1 1 NA 2 NA NA 2 2 2 ...
## $ SexOrientation : Factor w/ 3 levels "Bisexual","Heterosexual",..: 2 2 2 NA 2 NA NA 1 1 1 ...
## $ PregnantNow : Factor w/ 3 levels "Yes","No","Unknown": NA NA NA NA NA NA NA NA NA NA ...
summary(NHANES)
## ID SurveyYr Gender Age
## Min. :51624 2009_10:5000 female:5020 Min. : 0.00
## 1st Qu.:56904 2011_12:5000 male :4980 1st Qu.:17.00
## Median :62160 Median :36.00
## Mean :61945 Mean :36.74
## 3rd Qu.:67039 3rd Qu.:54.00
## Max. :71915 Max. :80.00
##
## AgeDecade AgeMonths Race1 Race3
## 40-49 :1398 Min. : 0.0 Black :1197 Asian : 288
## 0-9 :1391 1st Qu.:199.0 Hispanic: 610 Black : 589
## 10-19 :1374 Median :418.0 Mexican :1015 Hispanic: 350
## 20-29 :1356 Mean :420.1 White :6372 Mexican : 480
## 30-39 :1338 3rd Qu.:624.0 Other : 806 White :3135
## (Other):2810 Max. :959.0 Other : 158
## NA's : 333 NA's :5038 NA's :5000
## Education MaritalStatus HHIncome
## 8th Grade : 451 Divorced : 707 more 99999 :2220
## 9 - 11th Grade: 888 LivePartner : 560 75000-99999:1084
## High School :1517 Married :3945 25000-34999: 958
## Some College :2267 NeverMarried:1380 35000-44999: 863
## College Grad :2098 Separated : 183 45000-54999: 784
## NA's :2779 Widowed : 456 (Other) :3280
## NA's :2769 NA's : 811
## HHIncomeMid Poverty HomeRooms HomeOwn
## Min. : 2500 Min. :0.000 Min. : 1.000 Own :6425
## 1st Qu.: 30000 1st Qu.:1.240 1st Qu.: 5.000 Rent :3287
## Median : 50000 Median :2.700 Median : 6.000 Other: 225
## Mean : 57206 Mean :2.802 Mean : 6.249 NA's : 63
## 3rd Qu.: 87500 3rd Qu.:4.710 3rd Qu.: 8.000
## Max. :100000 Max. :5.000 Max. :13.000
## NA's :811 NA's :726 NA's :69
## Work Weight Length HeadCirc
## Looking : 311 Min. : 2.80 Min. : 47.10 Min. :34.20
## NotWorking:2847 1st Qu.: 56.10 1st Qu.: 75.70 1st Qu.:39.58
## Working :4613 Median : 72.70 Median : 87.00 Median :41.45
## NA's :2229 Mean : 70.98 Mean : 85.02 Mean :41.18
## 3rd Qu.: 88.90 3rd Qu.: 96.10 3rd Qu.:42.92
## Max. :230.70 Max. :112.20 Max. :45.40
## NA's :78 NA's :9457 NA's :9912
## Height BMI BMICatUnder20yrs BMI_WHO
## Min. : 83.6 Min. :12.88 UnderWeight: 55 12.0_18.5 :1277
## 1st Qu.:156.8 1st Qu.:21.58 NormWeight : 805 18.5_to_24.9:2911
## Median :166.0 Median :25.98 OverWeight : 193 25.0_to_29.9:2664
## Mean :161.9 Mean :26.66 Obese : 221 30.0_plus :2751
## 3rd Qu.:174.5 3rd Qu.:30.89 NA's :8726 NA's : 397
## Max. :200.4 Max. :81.25
## NA's :353 NA's :366
## Pulse BPSysAve BPDiaAve BPSys1
## Min. : 40.00 Min. : 76.0 Min. : 0.00 Min. : 72.0
## 1st Qu.: 64.00 1st Qu.:106.0 1st Qu.: 61.00 1st Qu.:106.0
## Median : 72.00 Median :116.0 Median : 69.00 Median :116.0
## Mean : 73.56 Mean :118.2 Mean : 67.48 Mean :119.1
## 3rd Qu.: 82.00 3rd Qu.:127.0 3rd Qu.: 76.00 3rd Qu.:128.0
## Max. :136.00 Max. :226.0 Max. :116.00 Max. :232.0
## NA's :1437 NA's :1449 NA's :1449 NA's :1763
## BPDia1 BPSys2 BPDia2 BPSys3
## Min. : 0.00 Min. : 76.0 Min. : 0.00 Min. : 76.0
## 1st Qu.: 62.00 1st Qu.:106.0 1st Qu.: 60.00 1st Qu.:106.0
## Median : 70.00 Median :116.0 Median : 68.00 Median :116.0
## Mean : 68.28 Mean :118.5 Mean : 67.66 Mean :117.9
## 3rd Qu.: 76.00 3rd Qu.:128.0 3rd Qu.: 76.00 3rd Qu.:126.0
## Max. :118.00 Max. :226.0 Max. :118.00 Max. :226.0
## NA's :1763 NA's :1647 NA's :1647 NA's :1635
## BPDia3 Testosterone DirectChol TotChol
## Min. : 0.0 Min. : 0.25 Min. :0.390 Min. : 1.530
## 1st Qu.: 60.0 1st Qu.: 17.70 1st Qu.:1.090 1st Qu.: 4.110
## Median : 68.0 Median : 43.82 Median :1.290 Median : 4.780
## Mean : 67.3 Mean : 197.90 Mean :1.365 Mean : 4.879
## 3rd Qu.: 76.0 3rd Qu.: 362.41 3rd Qu.:1.580 3rd Qu.: 5.530
## Max. :116.0 Max. :1795.60 Max. :4.030 Max. :13.650
## NA's :1635 NA's :5874 NA's :1526 NA's :1526
## UrineVol1 UrineFlow1 UrineVol2 UrineFlow2
## Min. : 0.0 Min. : 0.0000 Min. : 0.0 Min. : 0.000
## 1st Qu.: 50.0 1st Qu.: 0.4030 1st Qu.: 52.0 1st Qu.: 0.475
## Median : 94.0 Median : 0.6990 Median : 95.0 Median : 0.760
## Mean :118.5 Mean : 0.9793 Mean :119.7 Mean : 1.149
## 3rd Qu.:164.0 3rd Qu.: 1.2210 3rd Qu.:171.8 3rd Qu.: 1.513
## Max. :510.0 Max. :17.1670 Max. :409.0 Max. :13.692
## NA's :987 NA's :1603 NA's :8522 NA's :8524
## Diabetes DiabetesAge HealthGen DaysPhysHlthBad
## No :9098 Min. : 1.00 Excellent: 878 Min. : 0.000
## Yes : 760 1st Qu.:40.00 Vgood :2508 1st Qu.: 0.000
## NA's: 142 Median :50.00 Good :2956 Median : 0.000
## Mean :48.42 Fair :1010 Mean : 3.335
## 3rd Qu.:58.00 Poor : 187 3rd Qu.: 3.000
## Max. :80.00 NA's :2461 Max. :30.000
## NA's :9371 NA's :2468
## DaysMentHlthBad LittleInterest Depressed nPregnancies
## Min. : 0.000 None :5103 None :5246 Min. : 1.000
## 1st Qu.: 0.000 Several:1130 Several:1009 1st Qu.: 2.000
## Median : 0.000 Most : 434 Most : 418 Median : 3.000
## Mean : 4.127 NA's :3333 NA's :3327 Mean : 3.027
## 3rd Qu.: 4.000 3rd Qu.: 4.000
## Max. :30.000 Max. :32.000
## NA's :2466 NA's :7396
## nBabies Age1stBaby SleepHrsNight SleepTrouble
## Min. : 0.000 Min. :14.00 Min. : 2.000 No :5799
## 1st Qu.: 2.000 1st Qu.:19.00 1st Qu.: 6.000 Yes :1973
## Median : 2.000 Median :22.00 Median : 7.000 NA's:2228
## Mean : 2.457 Mean :22.65 Mean : 6.928
## 3rd Qu.: 3.000 3rd Qu.:26.00 3rd Qu.: 8.000
## Max. :12.000 Max. :39.00 Max. :12.000
## NA's :7584 NA's :8116 NA's :2245
## PhysActive PhysActiveDays TVHrsDay CompHrsDay
## No :3677 Min. :1.000 2_hr :1275 0_to_1_hr:1409
## Yes :4649 1st Qu.:2.000 1_hr : 884 0_hrs :1073
## NA's:1674 Median :3.000 3_hr : 836 1_hr :1030
## Mean :3.744 0_to_1_hr: 638 2_hr : 589
## 3rd Qu.:5.000 More_4_hr: 615 3_hr : 347
## Max. :7.000 (Other) : 611 (Other) : 415
## NA's :5337 NA's :5141 NA's :5137
## TVHrsDayChild CompHrsDayChild Alcohol12PlusYr AlcoholDay
## Min. :0.000 Min. :0.000 No :1368 Min. : 1.000
## 1st Qu.:1.000 1st Qu.:0.000 Yes :5212 1st Qu.: 1.000
## Median :2.000 Median :1.000 NA's:3420 Median : 2.000
## Mean :1.939 Mean :2.198 Mean : 2.914
## 3rd Qu.:3.000 3rd Qu.:6.000 3rd Qu.: 3.000
## Max. :6.000 Max. :6.000 Max. :82.000
## NA's :9347 NA's :9347 NA's :5086
## AlcoholYear SmokeNow Smoke100 Smoke100n SmokeAge
## Min. : 0.0 No :1745 No :4024 Non-Smoker:4024 Min. : 6.00
## 1st Qu.: 3.0 Yes :1466 Yes :3211 Smoker :3211 1st Qu.:15.00
## Median : 24.0 NA's:6789 NA's:2765 NA's :2765 Median :17.00
## Mean : 75.1 Mean :17.83
## 3rd Qu.:104.0 3rd Qu.:19.00
## Max. :364.0 Max. :72.00
## NA's :4078 NA's :6920
## Marijuana AgeFirstMarij RegularMarij AgeRegMarij HardDrugs
## No :2049 Min. : 1.00 No :3575 Min. : 5.00 No :4700
## Yes :2892 1st Qu.:15.00 Yes :1366 1st Qu.:15.00 Yes :1065
## NA's:5059 Median :16.00 NA's:5059 Median :17.00 NA's:4235
## Mean :17.02 Mean :17.69
## 3rd Qu.:19.00 3rd Qu.:19.00
## Max. :48.00 Max. :52.00
## NA's :7109 NA's :8634
## SexEver SexAge SexNumPartnLife SexNumPartYear
## No : 223 Min. : 9.00 Min. : 0.00 Min. : 0.000
## Yes :5544 1st Qu.:15.00 1st Qu.: 2.00 1st Qu.: 1.000
## NA's:4233 Median :17.00 Median : 5.00 Median : 1.000
## Mean :17.43 Mean : 15.09 Mean : 1.342
## 3rd Qu.:19.00 3rd Qu.: 12.00 3rd Qu.: 1.000
## Max. :50.00 Max. :2000.00 Max. :69.000
## NA's :4460 NA's :4275 NA's :5072
## SameSex SexOrientation PregnantNow
## No :5353 Bisexual : 119 Yes : 72
## Yes : 415 Heterosexual:4638 No :1573
## NA's:4232 Homosexual : 85 Unknown: 51
## NA's :5158 NA's :8304
##
##
##
The percentage of regular smokers who are diabetic is 0.06881406%.
The difference in the percentages is 0.01398888%.
# What are the variables in the NHANES_Ph dataset?
NHANES_Ph <-
NHANES %>%
select(RegularMarij, Diabetes) %>%
filter(!is.na(RegularMarij))
NHANES_Ph
## # A tibble: 4,941 x 2
## RegularMarij Diabetes
## <fct> <fct>
## 1 No No
## 2 No No
## 3 No No
## 4 No No
## 5 No No
## 6 No No
## 7 No No
## 8 Yes No
## 9 Yes No
## 10 No No
## # ... with 4,931 more rows
# Create a contingency table summarizing the data
table(NHANES_Ph)
## Diabetes
## RegularMarij No Yes
## No 3379 196
## Yes 1272 94
# Find proportion of each SleepTrouble who were Diabetes
NHANES_Ph %>%
group_by(RegularMarij) %>%
summarise(Diabetes_prop = mean(Diabetes == "Yes", na.rm = TRUE))
## # A tibble: 2 x 2
## RegularMarij Diabetes_prop
## <fct> <dbl>
## 1 No 0.0548
## 2 Yes 0.0688
# Find the difference in proporations between SleepTrouble
NHANES_Ph %>%
group_by(RegularMarij) %>%
summarise(Diabetes_prop = mean(Diabetes == "Yes", na.rm = TRUE)) %>%
summarise(Diabetes_prop_diff = diff(Diabetes_prop)) # Yes - No
## # A tibble: 1 x 1
## Diabetes_prop_diff
## <dbl>
## 1 0.0140
Interpretation
The code below creates 1,000 sets of permuted dataset (which represents no relationships between diabetes and insomnia because they were created by random shuffling) and then compare the difference between observed dataset and permuted dataset.
Hint: Answer this question by describing the histogram below. In addition, read the interpretation below for the case of diabetes and insomnia.
# The oilabs package is failed to be installed b/c it's not available for this R version.
# Define function
rep_sample_n <- function(tbl, size, replace = FALSE, reps = 1) {
n <- nrow(tbl)
i <- unlist(replicate(reps, sample.int(n, size, replace = replace), simplify = FALSE))
rep_tbl <- cbind(replicate = rep(1:reps,rep(size,reps)), tbl[i,])
dplyr::group_by(rep_tbl, replicate)
}
# Create a data frame of differences in diabetic rates
# Create 1,000 replicate sets
NHANES_Ph %>%
rep_sample_n(size = nrow(NHANES_Ph), reps = 2000)
## # A tibble: 9,882,000 x 3
## # Groups: replicate [2,000]
## replicate RegularMarij Diabetes
## <int> <fct> <fct>
## 1 1 No No
## 2 1 No No
## 3 1 No No
## 4 1 No No
## 5 1 Yes No
## 6 1 No No
## 7 1 Yes No
## 8 1 No No
## 9 1 Yes No
## 10 1 No No
## # ... with 9,881,990 more rows
# Create a new variable, Diabetes_perm, which represents no relationships between diabetes and insomonia becasue they were created by random shuffling
NHANES_Ph %>%
rep_sample_n(size = nrow(NHANES_Ph), reps = 2000) %>%
mutate(Diabetes_perm = sample(Diabetes))
## # A tibble: 9,882,000 x 4
## # Groups: replicate [2,000]
## replicate RegularMarij Diabetes Diabetes_perm
## <int> <fct> <fct> <fct>
## 1 1 No No No
## 2 1 No No No
## 3 1 No No No
## 4 1 Yes Yes No
## 5 1 No Yes Yes
## 6 1 No No No
## 7 1 No No No
## 8 1 No No No
## 9 1 No No No
## 10 1 No No No
## # ... with 9,881,990 more rows
# Calculate percentages of diabetes
NHANES_Ph %>%
rep_sample_n(size = nrow(NHANES_Ph), reps = 2000) %>%
mutate(Diabetes_perm = sample(Diabetes)) %>%
group_by(replicate, RegularMarij) %>%
summarize(prop_Diabetes = mean(Diabetes == "Yes", na.rm = TRUE),
prop_Diabetes_perm = mean(Diabetes_perm == "Yes", na.rm = TRUE))
## # A tibble: 4,000 x 4
## # Groups: replicate [?]
## replicate RegularMarij prop_Diabetes prop_Diabetes_perm
## <int> <fct> <dbl> <dbl>
## 1 1 No 0.0548 0.0579
## 2 1 Yes 0.0688 0.0608
## 3 2 No 0.0548 0.0557
## 4 2 Yes 0.0688 0.0666
## 5 3 No 0.0548 0.0601
## 6 3 Yes 0.0688 0.0549
## 7 4 No 0.0548 0.0568
## 8 4 Yes 0.0688 0.0637
## 9 5 No 0.0548 0.0601
## 10 5 Yes 0.0688 0.0549
## # ... with 3,990 more rows
# Calculate the difference in the percentages between those who have sleep trouble and those who are not
NHANES_Ph %>%
rep_sample_n(size = nrow(NHANES_Ph), reps = 2000) %>%
mutate(Diabetes_perm = sample(Diabetes)) %>%
group_by(replicate, RegularMarij) %>%
summarize(prop_Diabetes = mean(Diabetes == "Yes", na.rm = TRUE),
prop_Diabetes_perm = mean(Diabetes_perm == "Yes", na.rm = TRUE)) %>%
summarize(diff_orig = diff(prop_Diabetes),
diff_perm = diff(prop_Diabetes_perm))
## # A tibble: 2,000 x 3
## replicate diff_orig diff_perm
## <int> <dbl> <dbl>
## 1 1 0.0140 -0.00422
## 2 2 0.0140 0.00488
## 3 3 0.0140 0.0110
## 4 4 0.0140 0.00691
## 5 5 0.0140 -0.00928
## 6 6 0.0140 0.00286
## 7 7 0.0140 0.00792
## 8 8 0.0140 0.00893
## 9 9 0.0140 0.0120
## 10 10 0.0140 0.00286
## # ... with 1,990 more rows
# Save the above results under the new object, NHANES_Ph_perm
NHANES_Ph_perm <-
NHANES_Ph %>%
rep_sample_n(size = nrow(NHANES_Ph), reps = 2000) %>%
mutate(Diabetes_perm = sample(Diabetes)) %>%
group_by(replicate, RegularMarij) %>%
summarize(prop_Diabetes = mean(Diabetes == "Yes", na.rm = TRUE),
prop_Diabetes_perm = mean(Diabetes_perm == "Yes", na.rm = TRUE)) %>%
summarize(diff_orig = diff(prop_Diabetes),
diff_perm = diff(prop_Diabetes_perm))
# Histogram of permuted differences
ggplot(NHANES_Ph_perm, aes(x = diff_perm)) +
geom_histogram(binwidth = 0.01) +
geom_vline(aes(xintercept = diff_orig), col = "red")
Interpretation