-Hi Frances! This html file illustrates initial data cleaning, primarily showing output. Several large code chunks have been hidden from the html file to improve readability but I can re-include these if you’d like.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(naniar)
library(gtsummary)
knitr::opts_chunk$set(include = FALSE)
-Code hidden
dim(df)
## [1] 892 616
#892 people with 616 variables
##
## Do not use my data. I did not devote my full attention.
## 87
## Use my data. I devoted my full attention.
## 708
##
## A little likely A little unlikely Likely Unlikely
## 23 21 11 16
## Very likely Very unlikely
## 23 746
##
## 1 - Agree Strongly 2 - Agree Somewhat 3 - Disagree Somewhat
## 14 42 23
## 4 - Disagree Strongly
## 755
##
## (1) Not at all (2) A little (3) Somewhat (4) Well (5) Very well
## 3 5 18 9 235
##
## (1) Not at all (2) A little (3) Somewhat (4) Well (5) Very well
## 2 3 15 17 236
##
## (1) Not at all (2) A little (3) Somewhat (4) Well (5) Very well
## 3 10 12 12 229
## # A tibble: 1 × 14
## Total_Participants Failed_ATTN_Checks_Count Failed_ATTN_Chec… Data_Use_Exclud…
## <int> <dbl> <dbl> <dbl>
## 1 892 168 0.199 87
## # … with 10 more variables: Data_Use_Exclude_Percent <dbl>,
## # BCaffEQ_19_Exclude_Count <dbl>, BCaffEQ_19_Exclude_Percent <dbl>,
## # UPPS_P_57_Exclude_Count <dbl>, UPPS_P_57_Exclude_Percent <dbl>,
## # SMS_ATTN_Neg_Exclude_Count <dbl>, SMS_ATTN_Neg_Exclude_Percent <dbl>,
## # SMS8_Neutral_Exclude_Count <dbl>, SMS8_Neutral_Exclude_Percent <dbl>,
## # SMS8_Positive_Exclude_Count <dbl>
-Code hidden.
## [1] 597 624
##
## Four or more times a week Monthly or less Never
## 22 171 169
## Two to four times a month Two to three times a week
## 129 106
## Warning: `gather_()` was deprecated in tidyr 1.2.0.
## Please use `gather()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
## Warning: The `fmt_missing()` function is deprecated and will soon be removed
## * Use the `sub_missing()` function instead
| Characteristic | Full Sample | By Condition | ||
|---|---|---|---|---|
| N = 4281 | Negative, N = 1391 | Neutral, N = 1491 | Positive, N = 1401 | |
| Age | M(SD)=19.49(1.93) | M(SD)=19.70(2.18) | M(SD)=19.52(1.82) | M(SD)=19.25(1.75) |
| Sex-at-Birth | ||||
| Female | 247 (58%) | 75 (54%) | 100 (67%) | 72 (51%) |
| Male | 181 (42%) | 64 (46%) | 49 (33%) | 68 (49%) |
| Gender | ||||
| Female | 248 (58%) | 75 (54%) | 101 (68%) | 72 (51%) |
| Male | 179 (42%) | 63 (45%) | 48 (32%) | 68 (49%) |
| Non-binary | 1 (0.2%) | 1 (0.7%) | 0 (0%) | 0 (0%) |
| Sexual Orientation | ||||
| Asexual | 3 (0.7%) | 2 (1.4%) | 1 (0.7%) | 0 (0%) |
| Bisexual | 22 (5.1%) | 3 (2.2%) | 8 (5.4%) | 11 (7.9%) |
| Heterosexual | 391 (91%) | 130 (94%) | 136 (91%) | 125 (89%) |
| Homosexual | 12 (2.8%) | 4 (2.9%) | 4 (2.7%) | 4 (2.9%) |
| Race/Ethnicity | ||||
| American Indian or Alaska Native | 5 (1.2%) | 2 (1.4%) | 3 (2.0%) | 0 (0%) |
| Asian | 8 (1.9%) | 1 (0.7%) | 5 (3.4%) | 2 (1.4%) |
| Black or African American | 19 (4.5%) | 6 (4.3%) | 7 (4.7%) | 6 (4.3%) |
| Hispanic or Latino | 30 (7.0%) | 10 (7.2%) | 10 (6.7%) | 10 (7.2%) |
| Middle Eastern | 2 (0.5%) | 0 (0%) | 1 (0.7%) | 1 (0.7%) |
| Multiracial | 10 (2.3%) | 4 (2.9%) | 4 (2.7%) | 2 (1.4%) |
| White (non-Hispanic) | 352 (83%) | 116 (83%) | 119 (80%) | 117 (85%) |
| Student Status | ||||
| Yes | 428 (100%) | 139 (100%) | 149 (100%) | 140 (100%) |
| Student Year | ||||
| Freshman | 241 (56%) | 71 (51%) | 80 (54%) | 90 (64%) |
| Junior | 46 (11%) | 14 (10%) | 21 (14%) | 11 (7.9%) |
| Senior | 39 (9.1%) | 18 (13%) | 12 (8.1%) | 9 (6.4%) |
| Sophomore | 102 (24%) | 36 (26%) | 36 (24%) | 30 (21%) |
| 1 M(SD)=Mean(SD); n (%) | ||||
##
## Negative Neutral Positive
## Yes 139 149 140
## # A tibble: 8 × 4
## Variable df `Chi_Square/F_Value` p_value
## <chr> <int> <dbl> <dbl>
## 1 SAB.f 2 8.46 0.0145
## 2 Gender.f 4 11.2 0.0244
## 3 Marital_Status.f 6 11.4 0.0758
## 4 Student_Year.f 6 9.27 0.159
## 5 Sexual_Orientation.f 6 6.64 0.355
## 6 Employment.f 6 5.93 0.431
## 7 Native_Language.f 6 4.87 0.561
## 8 Race_Ethnicity.f 12 7.44 0.827
## Df Sum Sq Mean Sq F value Pr(>F)
## Condition 2 14.5 7.242 1.961 0.142
## Residuals 424 1566.2 3.694
## 1 observation deleted due to missingness
-Note: Some code has been hidden to improve readability
-NOTE: Non-drinkers have been filtered out of this data-set
-Substance Use Missingness
-Substance Use Descriptives
## Warning: The `fmt_missing()` function is deprecated and will soon be removed
## * Use the `sub_missing()` function instead
| Characteristic | Full Sample | By Condition | ||
|---|---|---|---|---|
| N = 4281 | Negative, N = 1391 | Neutral, N = 1491 | Positive, N = 1401 | |
| Drinking Frequency | ||||
| Never | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) |
| Monthly or less | 171 (40%) | 65 (47%) | 58 (39%) | 48 (34%) |
| 2-4x/month | 129 (30%) | 33 (24%) | 48 (32%) | 48 (34%) |
| 2-3x/week | 106 (25%) | 32 (23%) | 38 (26%) | 36 (26%) |
| 4+ x/week | 22 (5.1%) | 9 (6.5%) | 5 (3.4%) | 8 (5.7%) |
| Drinking Quantity | ||||
| 1-2 | 150 (35%) | 53 (38%) | 56 (38%) | 41 (29%) |
| 3-4 | 148 (35%) | 44 (32%) | 53 (36%) | 51 (36%) |
| 5-6 | 81 (19%) | 28 (20%) | 29 (19%) | 24 (17%) |
| 7-9 | 40 (9.3%) | 12 (8.6%) | 8 (5.4%) | 20 (14%) |
| 10+ | 9 (2.1%) | 2 (1.4%) | 3 (2.0%) | 4 (2.9%) |
| Binge Drinking Frequency | ||||
| Never | 155 (36%) | 49 (35%) | 63 (42%) | 43 (31%) |
| < Monthly | 141 (33%) | 49 (35%) | 46 (31%) | 46 (33%) |
| Monthly | 77 (18%) | 23 (17%) | 27 (18%) | 27 (19%) |
| Weekly | 54 (13%) | 17 (12%) | 13 (8.7%) | 24 (17%) |
| Daily or ~Daily | 1 (0.2%) | 1 (0.7%) | 0 (0%) | 0 (0%) |
| AUDIT Total | M(SD)=6.5(5.2) | M(SD)=6.4(5.0) | M(SD)=6.0(5.0) | M(SD)=7.2(5.4) |
| DUDIT_Total | M(SD)=2.4(4.6) | M(SD)=2.2(4.6) | M(SD)=2.6(5.4) | M(SD)=2.4(3.8) |
| AUD Criteria Endorsed | M(SD)=2.10(2.14) | M(SD)=1.96(2.07) | M(SD)=2.18(2.30) | M(SD)=2.16(2.06) |
| SUD Criteria Endorsed | M(SD)=1.28(2.26) | M(SD)=0.95(1.69) | M(SD)=1.35(2.44) | M(SD)=1.52(2.53) |
| AUD Diagnostic Status | ||||
| Mild | 126 (29%) | 29 (21%) | 41 (28%) | 56 (40%) |
| Moderate | 66 (15%) | 25 (18%) | 24 (16%) | 17 (12%) |
| None | 203 (47%) | 76 (55%) | 71 (48%) | 56 (40%) |
| Severe | 33 (7.7%) | 9 (6.5%) | 13 (8.7%) | 11 (7.9%) |
| SUD Diagnostic Status | ||||
| Mild | 62 (14%) | 16 (12%) | 18 (12%) | 28 (20%) |
| Moderate | 28 (6.5%) | 12 (8.6%) | 10 (6.7%) | 6 (4.3%) |
| None | 312 (73%) | 108 (78%) | 110 (74%) | 94 (67%) |
| Severe | 26 (6.1%) | 3 (2.2%) | 11 (7.4%) | 12 (8.6%) |
| 1 n (%); M(SD)=Mean(SD) | ||||
-Chi-Square Test of Categorical SUD Variables by Condition
## # A tibble: 7 × 4
## Variable df `Chi_Square/F_Value` p_value
## <chr> <int> <dbl> <dbl>
## 1 MINI_AUD_Dx 6 14.2 0.0272
## 2 MINI_SUD_Dx 6 12.9 0.0452
## 3 Favorite_Caff.f 8 9.91 0.271
## 4 AUDIT2.f 8 9.80 0.280
## 5 AUDIT1.f 6 7.43 0.283
## 6 AUDIT3.f 8 9.59 0.295
## 7 Favorite_Alcohol.f 6 3.82 0.701
-ANOVAs for Continuous SUD Variables by Condition
## Variable F_value df_n df_d p_value
## 1 MINI_SUD_Sum 2.3582996 2 425 0.09581754
## 2 AUDIT_Sum 2.0582221 2 425 0.12895159
## 3 MINI_AUD_Sum 0.4344687 2 425 0.64789596
## 4 DUDIT_Sum 0.2091036 2 425 0.81139458
## Variable T_stat T_df T_p_value T_Mdiff
## t Negative 11.417252 138 1.136635e-21 2.3453237
## t1 Neutral -3.710182 148 2.924186e-04 -0.6644295
## t2 Positive -9.287051 139 2.901962e-16 -1.6000000
## Adding missing grouping variables: `Condition`
## # A tibble: 3 × 5
## Condition AG1_Valence_M AG2_Valence_M AG1_Valence_SD AG2_Valence_SD
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Negative 5.71 3.36 2.05 2.07
## 2 Neutral 5.79 6.46 2.10 1.98
## 3 Positive 5.39 6.99 2.00 1.87
## # A tibble: 3 × 3
## Condition SD_Ratio Cor_Ratio
## <fct> <dbl> <dbl>
## 1 Negative 0.991 0.308
## 2 Neutral 1.06 0.426
## 3 Positive 1.07 0.448
-Note: Some code hidden
## # A tibble: 3 × 5
## Condition MAAS_M MAAS_SD MDIS_Pre_M MDIS_Pre_SD
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Negative 3.68 3.68 3.16 1.38
## 2 Neutral 3.67 3.67 2.75 1.25
## 3 Positive 3.59 3.59 2.91 1.11
## Df Sum Sq Mean Sq F value Pr(>F)
## Condition 2 0.65 0.3247 0.49 0.613
## Residuals 423 280.11 0.6622
## 2 observations deleted due to missingness
## Df Sum Sq Mean Sq F value Pr(>F)
## Condition 2 8.2 4.116 2.827 0.0603 .
## Residuals 425 618.7 1.456
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## Condition 2 12.0 6.020 3.849 0.0221 *
## Residuals 424 663.2 1.564
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
## Rows: 428
## Columns: 35
## $ ID <int> 23, 24, 27, 28, 47, 48, 49, 50, 51, 52, 53, 54, 5…
## $ Condition <fct> Neutral, Negative, Neutral, Negative, Negative, N…
## $ Age <dbl> 22, 21, 21, 24, 20, 19, 19, 19, 20, 19, 19, 20, 2…
## $ SAB.f <fct> Female, Male, Female, Female, Male, Male, Female,…
## $ Gender.f <fct> Female, Male, Female, Non-binary, Male, Male, Fem…
## $ Sexual_Orientation.f <fct> Heterosexual, Heterosexual, Heterosexual, Asexual…
## $ Race_Ethnicity.f <fct> Hispanic or Latino, White (non-Hispanic), White (…
## $ Student_Status.f <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes,…
## $ Student_Year.f <fct> Senior, Junior, Senior, Senior, Sophomore, Freshm…
## $ Marital_Status.f <fct> Single, Single, Single, Married, Single, Single, …
## $ Employment.f <fct> Unemployed, Employed 1-20 hours per week, Employe…
## $ Native_Language.f <fct> English, English, English, English, English, Engl…
## $ AUDIT1.f <fct> 2-4x/month, Monthly or less, 2-3x/week, 2-4x/mont…
## $ AUDIT2.f <fct> 3-4, 1-2, 1-2, 3-4, 3-4, 3-4, 3-4, 5-6, 1-2, 1-2,…
## $ AUDIT3.f <fct> < Monthly, Never, < Monthly, < Monthly, Never, Mo…
## $ AUDIT_Sum <dbl> 6, 1, 6, 14, 3, 10, 4, 9, 1, 1, 3, 7, 8, 2, 20, 1…
## $ DUDIT_Sum <dbl> 0, 9, 4, 24, 0, 3, 0, 0, 0, 0, 0, 3, 3, 0, 1, 0, …
## $ MINI_AUD_Sum <dbl> 7, 0, 9, 6, 1, 1, 1, 3, 0, 0, 2, 4, 4, 2, 7, 0, 1…
## $ MINI_SUD_Sum <dbl> 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 3, 0, 3, 0, 5…
## $ MINI_AUD_Dx <fct> Severe, None, Severe, Severe, None, None, None, M…
## $ MINI_SUD_Dx <fct> None, None, None, None, None, None, None, None, N…
## $ AG1 <dbl> 16, 42, 73, 77, 61, 62, 59, 60, 52, 60, 52, 62, 5…
## $ AG2 <dbl> 41, 40, 73, 73, 56, 35, 41, 29, 52, 21, 62, 68, 3…
## $ AG1_Valence <dbl> 7, 6, 1, 5, 7, 8, 5, 6, 7, 6, 7, 8, 8, 9, 5, 7, 7…
## $ AG1_Arousal <dbl> 2, 5, 9, 9, 7, 7, 7, 7, 6, 7, 6, 7, 6, 6, 9, 7, 7…
## $ AG2_Valence <dbl> 5, 4, 1, 1, 2, 8, 5, 2, 7, 3, 8, 5, 3, 9, 8, 4, 7…
## $ AG2_Arousal <dbl> 5, 5, 9, 9, 7, 4, 5, 4, 6, 3, 7, 8, 5, 7, 7, 4, 9…
## $ NU_Avg <dbl> 2.083333, 1.250000, 1.666667, 2.833333, 2.833333,…
## $ PU_Avg <dbl> 1.500000, 1.142857, 1.142857, 3.000000, 2.428571,…
## $ SS_Avg <dbl> 3.250000, 2.916667, 1.583333, 2.750000, 3.166667,…
## $ LoPM_Avg <dbl> 2.000000, 1.545455, 1.363636, 2.818182, 1.818182,…
## $ LoPER_Avg <dbl> 1.5, 1.4, 2.4, 2.3, 2.0, 2.1, 1.4, 1.6, 1.5, 1.4,…
## $ MDIS_Pre_Avg <dbl> 1.333333, 1.333333, 5.000000, 4.000000, 4.333333,…
## $ MDIS_Post_Avg <dbl> 2.666667, 1.666667, 5.666667, 7.000000, 4.666667,…
## $ MAAS_Avg <dbl> 3.400000, 5.266667, 2.400000, 2.666667, 2.866667,…
-Full Data Set
-For Negative condition
-For Neutral condition
-For Positive condition
-Code below hased out to prevent continual re-writing of csv upon markdown publications.
#Full_df %>% write_csv("/Users/noahwolkowicz/Desktop/CT/West Haven/Postdoc/Postdoc Research/F&N #Collab/FN_Collab_6.28.22.csv")