HCUP Analysis Review 1

Kaihua (William) Hou

Reading data

CORE2010 = read.csv("~/OneDrive/Johns Hopkins/Ali Sobhi Afshar - HCUP/Data/SIDC_MD_2010/rds/MD_SID_2010_CORE.csv");
CORE2011 = read.csv("~/OneDrive/Johns Hopkins/Ali Sobhi Afshar - HCUP/Data/SIDC_MD_2011/rds/MD_SID_2011_CORE.csv");
CORE2012 = read.csv("~/OneDrive/Johns Hopkins/Ali Sobhi Afshar - HCUP/Data/SIDC_MD_2012/rds/MD_SID_2012_CORE.csv");
CORE2013 = read.csv("~/OneDrive/Johns Hopkins/Ali Sobhi Afshar - HCUP/Data/SIDC_MD_2013/rds/MD_SID_2013_CORE.csv");
CORE2014 = read.csv("~/OneDrive/Johns Hopkins/Ali Sobhi Afshar - HCUP/Data/SIDC_MD_2014/rds/MD_SID_2014_CORE.csv");
CORE2015q4 = read.csv("~/OneDrive/Johns Hopkins/Ali Sobhi Afshar - HCUP/Data/SIDC_MD_2015/rds/MD_SID_2015q4_CORE.csv");
CORE2015q1q3 = read.csv("~/OneDrive/Johns Hopkins/Ali Sobhi Afshar - HCUP/Data/SIDC_MD_2015/rds/MD_SID_2015q1q3_CORE.csv");
colnames(CORE2015q1q3) = colnames(CORE2015q4);
CORE2015=rbind(CORE2015q1q3, CORE2015q4);
CORE2016 = read.csv("~/OneDrive/Johns Hopkins/Ali Sobhi Afshar - HCUP/Data/SIDC_MD_2016/rds/MD_SID_2016_CORE.csv");
CORE2017 = read.csv("~/OneDrive/Johns Hopkins/Ali Sobhi Afshar - HCUP/Data/SIDC_MD_2017/rds/MD_SID_2017_CORE.csv");

Functions

uniqueHospitals <- function (yearData) {
    yearData$DSHOSPID = as.factor(yearData$DSHOSPID);
    numHospitals = nrow(as.data.frame(levels(yearData$DSHOSPID)));
    numMissingID = sum(which(yearData$DSHOSPID == ""));
    numHospitals;
}
missingHospitalIDs <- function (yearData) {
    numMissingID = sum(which(yearData$DSHOSPID == ""));
    numMissingID;
}
totalAdmissionsPerHospital <- function (yearData) {
    totalAdmissions <- as.data.frame(table(yearData$DSHOSPID));
    totalAdmissions <-totalAdmissions[order(-totalAdmissions$Freq), ];
    colnames(totalAdmissions) = c("DSHOSPID", "totalAdmissions");
    totalAdmissions;
}
uniquePatients <- function (yearData) {
    yearData$DSHOSPID = as.factor(yearData$DSHOSPID);
    patients <- select(yearData, DSHOSPID, VisitLink);
    colnames(patients) = cbind("DSHOSPID", "VisitLink");
    numPatients = data.frame(matrix(ncol = 2, nrow = 0));
    colnames(numPatients) = c("DSHOSPID", "numUnique");
    numPatients[1, 1] = 21001;
    numPatients[1, 2] = 1;

    for (i in c(1:uniqueHospitals(yearData))){
        hospitalPatients <- patients[which(patients$DSHOSPID == as.data.frame(levels(yearData$DSHOSPID))[i,1]), ]
        hospitalPatients$VisitLink = as.factor(hospitalPatients$VisitLink)
        numPatients = rbind(numPatients, c(as.data.frame(levels(yearData$DSHOSPID))[i,1], nrow(as.data.frame(unique(hospitalPatients$VisitLink)))))
    }
    numPatients = numPatients[-1, ]
}
zipFrequency <- function (yearData) {
    yearData$ZIP3 <- as.factor(yearData$ZIP3);
    zipFreq <- as.data.frame(table(yearData$ZIP3));
    zipFreq <-zipFreq[order(-zipFreq$Freq), ];
    colnames(zipFreq) = c("zip", "zipFreq");
    zipFreq
}
UrbanOrRural <- function (yearData) {
    UorR <- as.data.frame(table(yearData$PL_NCHS));
    description = c('Central of >=1 million', 'Fringe of >=1 million',
                    '250,000-999,999', '50,000-249,999', 'Micropolitan', 'Rural');
    UorR = UorR[1:6, ]
    UorR = cbind(UorR, description);
    colnames(UorR) = c("urban/rual", "Freq", "description");
    UorR <-UorR[order(-UorR$Freq), ];
    UorR;
}

Number of Admissions Per Hospital (2010~2017)

Total admission from 2010~2017

plot of chunk q2.analysisPlot

Number of Unique Patients Per Hospital (2012~2017)

*The chart uses 'VsitLink' variable in the HCUP data, which is first implemented in 2012

Number of Unique Patients Per Hospital (2012~2017)

plot of chunk q3.analysisPlot *The chart uses 'VsitLink' variable in the HCUP data, which is first implemented in 2012

Frequency of Patients' Home Locations (2010-2017)

           Area   Zip Num Admissions
 Main Baltimore   212        1534478
      Annapolis   207         627743
  Baltimore A-L   210         618730
      Frederick   217         388135
  Baltimore M-Z   211         384414
       Bethesda   208         376002
        Waldorf   206         238806
  Silver Spring   209         225010
      Salisbury   218         173328
    other areas other         509140

Frequency of Patients' Home Locations (2010-2017)

Frequency of Patients' Home Locations (2010-2017)

Urban Or Rural Division of Patients' Home Locations (2010~2017)

   Urban/Rural Division Number of Admissions
  Fringe of >=1 million              3240855
 Central of >=1 million               871432
         50,000-249,999               289097
        250,000-999,999               187554
           Micropolitan               173899
                  Rural                92007

Urban Or Rural Division of Patients' Home Locations (2010~2017)

*The chart uses 'PL_NCHS' variable in the HCUP data (year 2015 of this category is corrupted, 80% of data of that year are invalid or missing)

Urban Or Rural Division of Patients' Home Locations (2010~2017)

*The chart uses 'PL_NCHS' variable in the HCUP data (year 2015 of this category is corrupted, 80% of data of that year are invalid or missing)

The Problem of PL_NCHS in 2015 Data

PL_NCHS should only have 6 categories (1~6), but in year 2015:

 [1] "1"  "2"  "3"  "4"  "5"  "6"  "8"  "32" "35" "37" "39" "40" "41" "42" "47"

Number of NAs in PL_NCHS of year 2015

[1] 472910

Percent of NAs in PL_NCHS of year 2015

[1] 75.27405