This script relies on a private dataset that will be provided to authorized collaborators on HSC20170563N separately. No simulated version of the data currently exists for testing.
However, the code is decoupled from the data and is available for anybody to review in the 0eb085b commit of the bokov/emr2nis repository master branch
The most current compiled version of this report can be viewed at https://rpubs.com/bokov/emr2nis_data_prep
rc_disp <- fread('recode_dispuniform.csv');
rc_pay1 <- fread('recode_pay1.csv');
rc_race <- fread('recode_race.csv');
requestedcols <- c('YEAR','DISPUNIFORM','PAY1','AGE','RACE','FEMALE'
,'diabetes','liver','kidney','laproscopic_neph'
,'robotic_neph', 'neph', 'h_neph','died', 'hhc', 'routine'
,'transferother', 'transfershort', 'hispanic', 'newpay');
raw00 <- fread(unzip(inputdata),na.strings='',key=c('patient_num','start_date'))[
,start_date := as.Date(start_date)][,YEAR:=format(start_date,'%Y')];
dat00 <- raw00;
DISPUNIFORM variable to mimic HCUPCaveat: there is often more than one discharge code per patient-year, and some guesswork was involved in matching them to the closest unified HCUP code. Please see recode_dispuniform.csv and draw your own conclusions. Note in particular that 6, Home Health Care does not have any equivalent codes in our local data under our current ETL. Therefore the hhc analytic variable will always be 0.
dat00[,DISPUNIFORM := submulti(v000_Sts__ptnts_cd,rc_disp,method='full')][
,DISPUNIFORM := ifelse(is.na(DISPUNIFORM),'.',DISPUNIFORM)];
PAY1 variable to mimic HCUPCaveat: there is usually more than one payer code per patient-year, and some guesswork was involved in matching them to the closest unified HCUP code. Please see recode_pay1.csv and draw your own conclusions.
dat00[,PAY1 := submulti(v001_Fncl_ptnts_cd,rc_pay1,method='full')];
AGE, FEMALE, and RACE variables to mimic HCUPdat00[,AGE:=round(min(age_at_visit_days)/365.25)
,by=list(patient_num,YEAR)][
,AGE:=ifelse(AGE<0,'.A',AGE)][
,FEMALE:=recode(sex_cd,m='0',f='1',.default='.')][
,RACE := submulti(race_cd,rc_race,method='full')];
Hispanic ethnicity is a separate variable in our EMR, but we overwrite the patient’s actual race with 3 if they are of Hispanic ethnicity
dat00[v002_Ethnct__ptnts_cd=='DEM|ETHNICITY:Y'
,RACE := '3'];
These variables are: diabetes,liver,kidney,laproscopic_neph, robotic_neph, neph, and h_neph should not be (re)calculated analytically within SAS when using this dataset because the underlying diagnosis codes are not in the same format or column structure as HCUP
Caveat: Do not trust h_neph– in this version it should have been V4573 but instead V457 was pulled which seems to not include V4573 though it’s supposed to. Will try to fix this in next update of data pull.
Caveat: neph is under-reporting nephrectomies. The SAS code currently only pulls partial nephrectomies (55.4). I made mine do the equivalent, but in my opinion both should also pull full nephrectomies (55.5) (the commented out line).
Caveat: robotic_neph is not necessarily a robotically assisted nephrectomy, both here and in the SAS code. The code 17.4 is for robotc assisted procedures in general. Likewise for laproscopic_neph.
dat00[,`:=`(diabetes=1*v010_mlts__ptnts_tf
,liver=1*v009_intrhptc_ptnts_tf
,kidney=1*v008_Mlgnt_ptnts_tf
,laproscopic_neph=1*v004_Lprscp_ptnts_tf
,robotic_neph=1*v003_prcdrs_ptnts_tf
,neph=1*v005_nphrctm_ptnts_tf
#,neph=1*(v005_nphrctm_ptnts_tf|v006_nphrctm_ptnts_tf)
,h_neph=0)];
These variables are: died, hhc, routine, transferother, transfershort, hispanic, newpay. They can be (re)calculated from the main NIS variables and perhaps should be, to check the validity of this script.
Caveat: died only reflects those patients whose discharge disposition indicates death, it ignores the age_at_death_days variable available in our local data, for consistency with NIS. As mentioned above, hhc is always 0 because I don’t know which of our codes would correspond to NIS’s 6.
dat00[,`:=`( died=1*(DISPUNIFORM=='20')
,hhc=1*(DISPUNIFORM=='6')
,routine=1*(DISPUNIFORM=='1')
,transferother=1*(DISPUNIFORM=='5')
,transfershort=1*(DISPUNIFORM=='2')
,hispanic=1*(RACE=='3')
,newpay=ifelse(PAY1 %in% 1:4,PAY1,'5')
)];
dat2012 <- dat00[start_date < as.Date('2013-01-01')];
dat2013 <- dat00[!patient_num %in% dat2012$patient_num &
start_date < as.Date('2014-01-01')];
Number of distinct patients without a previous diagnosis of kidney cancer seen at UHS in 2012 and how many of them were diagnosed with kidney or with liver cancer that year.
| Neither | Kidney | Liver | Sum | |
|---|---|---|---|---|
| Non-Hispanic | 77653 | 47 | 190 | 77890 |
| Hispanic | 83504 | 101 | 138 | 83743 |
| Sum | 161157 | 148 | 328 | 161633 |
Number of distinct patients without a previous diagnosis of kidney cancer seen at UHS in 2013 and how many of them were diagnosed with kidney or with liver cancer that year.
| Neither | Kidney | Liver | Sum | |
|---|---|---|---|---|
| Non-Hispanic | 62924 | 34 | 88 | 63046 |
| Hispanic | 62276 | 42 | 59 | 62377 |
| Sum | 125200 | 76 | 147 | 125423 |
export(dat2012[,..requestedcols],'dat2012.csv');
export(dat2013[,..requestedcols],'dat2013.csv');
c()
## NULL