library(knitr)
library(kableExtra)
library(readxl)
library(tidyverse)
-- Attaching packages ---------------------------------------------------------------------- tidyverse 1.2.1 --
v ggplot2 3.1.0 v purrr 0.2.5
v tibble 1.4.2 v dplyr 0.7.8
v tidyr 0.8.2 v stringr 1.3.1
v readr 1.1.1 v forcats 0.3.0
-- Conflicts ------------------------------------------------------------------------- tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
I downloaded the 2015 Public Use File in R format from http://grc.osu.edu/OMAS/2015Survey. I then exported it to a comma-separated version which I called omas2015.csv
so I could import it here in a more familiar way. This is a huge file.
omas2015 <- read_csv("omas_data/omas2015.csv")
Parsed with column specification:
cols(
.default = col_integer(),
S10C = col_character(),
NOCHILD_CK = col_character(),
B4C2CON = col_character(),
B4C2AGE = col_character(),
B4I_7M2 = col_character(),
B20AM3 = col_character(),
C26CON = col_character(),
E65A = col_character(),
BF_28 = col_character(),
BF_31 = col_character(),
BF_32 = col_character(),
G72CM6 = col_character(),
G72CM7 = col_character(),
J100BCON = col_character(),
J100G1M3 = col_character(),
POSTJ113 = col_character(),
NJ117AM1 = col_character(),
NJ117AM2 = col_character(),
K98 = col_character(),
K98A = col_character()
# ... with 41 more columns
)
See spec(...) for full column specifications.
I then selected observations and variables from the original data set which met several inclusion and exclusion parameters, creating the “omas_431raw” data set, as shown below. I also created a .csv of this file, which I can provide as needed.
omas_431raw <- omas2015 %>%
filter(A1 == 1, B29BC == 1, D30 < 98, D30I < 98,
D30BINC > 47, D30BINC < 84, D30A_UNIT == 1,
D30A_VALUE < 998, D45 < 98, D46 < 98,
!is.na(E59DAYS), E60 < 98, E62 < 98,
!is.na(E63DAYS), F69 < 98, H76 < 98, H77 < 98,
S9_REGION < 8, S14_REC_85 < 90, S15 < 3,
B4A == 1 | B4B == 1 | B4C == 1 | B4E == 1,
B4A < 9, B4B < 9, B4C < 9, B4E < 9,
H84_A3 > 100, H84_A3 < 600000, S15 < 3) %>%
mutate(resp_ID = 1001:2007) %>%
select(resp_ID, A1, B4A, B4B, B4C, B4E, B29BC,
D30, D30I, D30BINC, D30A_UNIT, D30A_VALUE,
D45, D46, E59DAYS, E60, E62, E63DAYS, F69,
H76, H77, H84_A3, Region,
S9_REGION, S14_REC_85, S15)
write_csv(omas_431raw, "omas_431raw.csv")
Then, I built the actual data set to use in 431, by recasting many variables.
omas_431 <- omas_431raw %>%
mutate(prob_access = fct_recode(factor(B29BC), Yes = "1"),
insurance = fct_recode(factor(A1), Yes = "1"),
ins_employer = fct_recode(factor(B4A), Yes = "1", No = "2"),
ins_medicare = fct_recode(factor(B4B), Yes = "1", No = "2"),
ins_medicaid = fct_recode(factor(B4C), Yes = "1", No = "2"),
ins_private = fct_recode(factor(B4E), Yes = "1", No = "2"),
health_stat = fct_recode(factor(D30), E = "1", VG = "2", G = "3", F = "4", P = "5"),
mental_30 = D30I,
height = D30BINC,
weight = D30A_VALUE,
bmi = 703 * weight / (height^2),
smoke_100 = fct_recode(factor(D45), Yes = "1", No = "2"),
alcohol_30 = D46,
doc_days = E59DAYS,
hospital = E60,
er_visits = E62,
dent_days = E63DAYS,
care_now = fct_recode(factor(F69), Easier = "1", Harder = "2", Same = "3"),
care_now = fct_relevel(care_now, "Easier", "Same", "Harder"),
marital = fct_recode(factor(H76), Married = "1", Divorced = "2", Widowed = "3",
Separated = "4", Never = "5", Coupled = "6"),
education = fct_recode(factor(H77), BelowHSGrad = "2", BelowHSGrad = "3",
HSGrad = "4", SomeCollege = "5", SomeCollege = "6",
CollegeGrad = "7", PostCollege = "8"),
income = H84_A3/1000,
county_type = fct_recode(factor(Region), Rural_App = "1", Metro = "2",
Rural_NonApp = "3", Suburban = "4"),
ohio_region = fct_recode(factor(S9_REGION), N_Cent = "1", NE = "2", NE_Cent = "3",
NW = "4", S_Cent = "5", SE = "6", SW = "7"),
age = S14_REC_85,
gender = fct_recode(factor(S15), M = "1", F = "2")) %>%
select(resp_ID, prob_access, insurance, ins_employer,
ins_medicare, ins_medicaid, ins_private, age,
gender, marital, education, income, county_type,
ohio_region, care_now, health_stat, mental_30, height,
weight, bmi, smoke_100, alcohol_30, doc_days,
dent_days, hospital, er_visits)
write_csv(omas_431, "omas_431.csv")
The omas_431
file contains data from the 2015 Public Use File of the Ohio Medicaid Assessment Survey for 1007 respondents, and 26 variables to describe each respondent.
To be included in this omas_431
data frame, the respondent needed to meet the following criteria:
omas_431
(so that, for instance, all responses to YES/NO questions are either YES or NO)B29BC = 1
in the public use file, prob_access
= “Yes” in omas_431
)A1 = 1
in the public use file, insurance
= “Yes” in omas_431
)B4A = 1
in the public use file, ins_employer
= “Yes” in omas_431
)B4B = 1
in the public use file, ins_medicare
= “Yes” in omas_431
)B4C = 1
in the public use file, ins_medicaid
= “Yes” in omas_431
)B4E = 1
in the public use file, ins_private
= “Yes” in omas_431
)D30BINC
in public use file, height
in omas_431
)H84_A3
in public use file, income
in omas_431
)All 1007 respondents meeting these criteria are included in omas_431
.
I built a code book in Excel, and display it here, as follows.
omas_431_codes <- read_xlsx("omas_data/omas_431_codebook.xlsx")
omas_431_codes %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"))
Variable Name | type | Description | Source |
---|---|---|---|
resp_ID | character | Identification Code - arbitrary (codes are numerical: 1001 - 2037) | Dr. Love |
prob_access | Yes/No | Problems getting the care you needed in the past 12 months? (All responses are Yes) | B29BC |
insurance | Yes/No | Are you covered by health insurance? (All responses are Yes) | A1 |
ins_employer | Yes/No | Insurance through an employer or union? | B4A |
ins_medicare | Yes/No | Insurance through Medicare? | B4B |
ins_medicaid | Yes/No | Insurance through Medicaid? | B4C |
ins_private | Yes/No | Insurance through a private plan? | B4E |
age | quantitative | Age in years (19-85, 85 and above are reported as 85) | S14_REC_85 |
gender | 2-level factor | F = female, M = male | S15 |
marital | 6-level factor | Married, Divorced, Widowed, Separated, Never (been married), Coupled (but unmarried) | H76 |
education | 5-level factor | BelowHSGrad, HSGrad, SomeCollege, CollegeGrad, PostCollege | H77 (some collapsing) |
income | quantitative | Total family income in 2014, in thousands of dollars (observed range = 0.500 to 500.000) | H84 (divided by 1000) |
county_type | 4-level factor | Metro, Suburban, Rural_App (Rural Appalachian), Rural_NonApp (Rural Non-Appalachian) | Region |
ohio_region | 7-level factor | N_Cent = North Central, NE, NE_Cent = North East Central, NW, S_Cent, SE, SW | S9_REGION |
care_now | 3-level factor | Is health care now easier to get than 3 years ago? Easier, Same (as 3 years ago), Harder | F69 |
health_stat | 5-level factor | Self-reported overall health: E = Excellent, VG = Very Good, G = Good, F = Fair, P = Poor | D30 |
mental_30 | quantitative | # of days in the past 30 where your mental health prevented you from doing normal activities | D30I |
height | quantitative | height in inches | D30BINC |
weight | quantitative | weight in pounds | D30A_VALUE, D30A_UNIT |
bmi | quantitative | body-mass index | Calculated |
smoke_100 | Yes/No | Have you smoked 100 cigarettes in your life? | D45 |
alcohol_30 | quantitative | # of days in the past 30 where you consumed alcohol | D46 |
doc_days | quantitative | # of days since last non-emergency visit to a doctor / care professional about your health | E59DAYS |
dent_days | quantitative | # of days since you last visited a dentist | E63DAYS |
hospital | quantitative | # of times in the past 12 months where you were admitted to a hospital for an overnight stay | E60 |
er_visits | quantitative | # of times in the past 12 months where you were a patient in a hospital emergency room | E62 |
omas_431
Hmisc::describe(omas_431)
omas_431
26 Variables 1007 Observations
---------------------------------------------------------------------------
resp_ID
n missing distinct Info Mean Gmd .05 .10
1007 0 1007 1 1504 336 1051 1102
.25 .50 .75 .90 .95
1252 1504 1756 1906 1957
lowest : 1001 1002 1003 1004 1005, highest: 2003 2004 2005 2006 2007
---------------------------------------------------------------------------
prob_access
n missing distinct value
1007 0 1 Yes
Value Yes
Frequency 1007
Proportion 1
---------------------------------------------------------------------------
insurance
n missing distinct value
1007 0 1 Yes
Value Yes
Frequency 1007
Proportion 1
---------------------------------------------------------------------------
ins_employer
n missing distinct
1007 0 2
Value Yes No
Frequency 446 561
Proportion 0.443 0.557
---------------------------------------------------------------------------
ins_medicare
n missing distinct
1007 0 2
Value Yes No
Frequency 398 609
Proportion 0.395 0.605
---------------------------------------------------------------------------
ins_medicaid
n missing distinct
1007 0 2
Value Yes No
Frequency 340 667
Proportion 0.338 0.662
---------------------------------------------------------------------------
ins_private
n missing distinct
1007 0 2
Value Yes No
Frequency 145 862
Proportion 0.144 0.856
---------------------------------------------------------------------------
age
n missing distinct Info Mean Gmd .05 .10
1007 0 66 1 51.65 16.53 26.3 31.0
.25 .50 .75 .90 .95
40.5 53.0 62.0 70.0 75.0
lowest : 19 20 21 22 23, highest: 80 81 82 84 85
---------------------------------------------------------------------------
gender
n missing distinct
1007 0 2
Value M F
Frequency 362 645
Proportion 0.359 0.641
---------------------------------------------------------------------------
marital
n missing distinct
1007 0 6
Value Married Divorced Widowed Separated Never Coupled
Frequency 436 244 81 42 157 47
Proportion 0.433 0.242 0.080 0.042 0.156 0.047
---------------------------------------------------------------------------
education
n missing distinct
1007 0 5
Value BelowHSGrad HSGrad SomeCollege CollegeGrad PostCollege
Frequency 78 290 371 163 105
Proportion 0.077 0.288 0.368 0.162 0.104
---------------------------------------------------------------------------
income
n missing distinct Info Mean Gmd .05 .10
1007 0 207 1 43.58 42.93 4.59 8.06
.25 .50 .75 .90 .95
14.00 30.00 54.50 95.00 128.40
lowest : 0.500 0.700 0.714 0.731 0.733
highest: 250.000 300.000 315.000 450.000 500.000
---------------------------------------------------------------------------
county_type
n missing distinct
1007 0 4
Value Rural_App Metro Rural_NonApp Suburban
Frequency 176 538 137 156
Proportion 0.175 0.534 0.136 0.155
---------------------------------------------------------------------------
ohio_region
n missing distinct
1007 0 7
Value N_Cent NE NE_Cent NW S_Cent SE SW
Frequency 73 265 77 49 193 91 259
Proportion 0.072 0.263 0.076 0.049 0.192 0.090 0.257
---------------------------------------------------------------------------
care_now
n missing distinct
1007 0 3
Value Easier Same Harder
Frequency 106 345 556
Proportion 0.105 0.343 0.552
---------------------------------------------------------------------------
health_stat
n missing distinct
1007 0 5
Value E VG G F P
Frequency 59 204 301 274 169
Proportion 0.059 0.203 0.299 0.272 0.168
---------------------------------------------------------------------------
mental_30
n missing distinct Info Mean Gmd .05 .10
1007 0 24 0.705 5.725 9.042 0 0
.25 .50 .75 .90 .95
0 0 5 30 30
lowest : 0 1 2 3 4, highest: 26 27 28 29 30
---------------------------------------------------------------------------
height
n missing distinct Info Mean Gmd .05 .10
1007 0 28 0.994 66.43 4.533 60.3 62.0
.25 .50 .75 .90 .95
64.0 66.0 69.0 72.0 74.0
lowest : 53.97 54.00 55.00 56.00 57.00, highest: 75.00 76.00 77.00 78.00 79.00
---------------------------------------------------------------------------
weight
n missing distinct Info Mean Gmd .05 .10
1007 0 172 0.999 187.6 55.19 120 130
.25 .50 .75 .90 .95
150 180 214 250 285
lowest : 82 85 86 95 98, highest: 365 370 381 420 426
---------------------------------------------------------------------------
bmi
n missing distinct Info Mean Gmd .05 .10
1007 0 616 1 29.82 8.006 20.22 21.50
.25 .50 .75 .90 .95
24.40 28.79 33.30 39.70 42.79
lowest : 14.52406 15.33173 16.05886 16.35804 16.49898
highest: 59.06864 61.56450 63.50342 66.71374 76.21811
---------------------------------------------------------------------------
smoke_100
n missing distinct
1007 0 2
Value Yes No
Frequency 600 407
Proportion 0.596 0.404
---------------------------------------------------------------------------
alcohol_30
n missing distinct Info Mean Gmd .05 .10
1007 0 24 0.803 2.833 4.582 0 0
.25 .50 .75 .90 .95
0 0 3 10 15
lowest : 0 1 2 3 4, highest: 25 27 28 29 30
---------------------------------------------------------------------------
doc_days
n missing distinct Info Mean Gmd .05 .10
1007 0 57 0.992 152 244.8 1 3
.25 .50 .75 .90 .95
7 30 90 312 540
lowest : 1 2 3 4 5, highest: 2920 3650 4380 5110 22630
---------------------------------------------------------------------------
dent_days
n missing distinct Info Mean Gmd .05 .10
1007 0 69 0.997 1108 1668 7 21
.25 .50 .75 .90 .95
90 240 1095 2555 4270
lowest : 0 1 2 3 4, highest: 25550 25915 26280 28105 36427
---------------------------------------------------------------------------
hospital
n missing distinct Info Mean Gmd .05 .10
1007 0 13 0.565 0.4826 0.8197 0 0
.25 .50 .75 .90 .95
0 0 0 2 2
Value 0 1 2 3 4 5 7 8 9 10
Frequency 761 143 59 23 3 8 1 2 1 2
Proportion 0.756 0.142 0.059 0.023 0.003 0.008 0.001 0.002 0.001 0.002
Value 11 13 14
Frequency 1 1 2
Proportion 0.001 0.001 0.002
---------------------------------------------------------------------------
er_visits
n missing distinct Info Mean Gmd .05 .10
1007 0 17 0.809 1.148 1.744 0 0
.25 .50 .75 .90 .95
0 0 1 3 5
Value 0 1 2 3 4 5 6 7 8 9
Frequency 571 203 92 53 27 13 15 8 6 2
Proportion 0.567 0.202 0.091 0.053 0.027 0.013 0.015 0.008 0.006 0.002
Value 10 12 13 15 18 20 21
Frequency 8 2 2 1 1 1 2
Proportion 0.008 0.002 0.002 0.001 0.001 0.001 0.002
---------------------------------------------------------------------------
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 forcats_0.3.0 stringr_1.3.1 dplyr_0.7.8
[5] purrr_0.2.5 readr_1.1.1 tidyr_0.8.2 tibble_1.4.2
[9] ggplot2_3.1.0 tidyverse_1.2.1 readxl_1.1.0 kableExtra_0.9.0
[13] knitr_1.20
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 lubridate_1.7.4 lattice_0.20-35
[4] assertthat_0.2.0 rprojroot_1.3-2 digest_0.6.18
[7] R6_2.3.0 cellranger_1.1.0 plyr_1.8.4
[10] backports_1.1.2 acepack_1.4.1 evaluate_0.12
[13] httr_1.3.1 highr_0.7 pillar_1.3.0
[16] rlang_0.3.0.1 lazyeval_0.2.1 data.table_1.11.8
[19] rstudioapi_0.8 rpart_4.1-13 Matrix_1.2-14
[22] checkmate_1.8.5 rmarkdown_1.10 splines_3.5.1
[25] foreign_0.8-70 htmlwidgets_1.3 munsell_0.5.0
[28] broom_0.5.0 compiler_3.5.1 modelr_0.1.2
[31] pkgconfig_2.0.2 base64enc_0.1-3 htmltools_0.3.6
[34] nnet_7.3-12 tidyselect_0.2.5 htmlTable_1.12
[37] gridExtra_2.3 Hmisc_4.1-1 viridisLite_0.3.0
[40] crayon_1.3.4 withr_2.1.2 grid_3.5.1
[43] nlme_3.1-137 jsonlite_1.5 gtable_0.2.0
[46] magrittr_1.5 scales_1.0.0 cli_1.0.1
[49] stringi_1.2.4 latticeExtra_0.6-28 xml2_1.2.0
[52] Formula_1.2-3 RColorBrewer_1.1-2 tools_3.5.1
[55] glue_1.3.0 hms_0.4.2 survival_2.43-1
[58] yaml_2.2.0 colorspace_1.3-2 cluster_2.0.7-1
[61] rvest_0.3.2 bindr_0.1.1 haven_1.1.2