\(\underline{Step\ 1}\)

data<-read.csv("CSV_HCV.csv", header=T)
data[1:4,]
##   X.Required.Variables.for.All.Patients.Positive.for.HIV..Tested.or.Not.Tested.Known.Positive.Individuals.Out.of.Care...HCV..Antibody..RNA..or.Not.Tested.Known.Positive.Individuals.Out.of.Care...or.HBV..Tested.or.Not.Tested.Known.Positive.Individuals.
## 1                                                                                                                                                                                                                                                          
## 2                                                                                                                                                                                                                                                          
## 3                                                                                                                                                                                                                                     Client ID\n(Unique #)
## 4                                                                                                                                                                                                                                                 A11549001
##                                   X                    X.1
## 1                                                         
## 2                                                         
## 3                         Site Name Data Reporting Quarter
## 4 262F - CSV: Alameda Health Center                2017-Q1
##                                                                       X.2
## 1                                                                        
## 2                                                    Patient Information 
## 3 Year of Birth \n(YYYY)\n(enter \xd29999\xd3 for patients older than 89)
## 4                                                                    1962
##                            X.3             X.4     X.5
## 1                                                     
## 2                                                     
## 3                         Race      Ethnicity  Gender 
## 4 03-Black or African American 02-Non-Hispanic 01-Male
##                                                                                                                                                                                                                                                                                        X.6
## 1 HCV Positives (Antibody, RNA, or  Not Tested/Known Positive Individuals Out of Care) \nRequired for partners conducting HCV testing or linkage to care\n*Only complete the HCV section if the patient tested HCV positive through FOCUS or is part of FOCUS HCV linkage to care efforts*
## 2                                                                                                                                                                                                                                                   HCV Risk Factor - Ever Injected Drugs 
## 3                                                                                                                                                                                                            Ever Injected Drugs \n(only required for patients not born between 1945-1965)
## 4                                                                                                                                                                                                                                                                                     0-No
##                           X.7                X.8
## 1                                               
## 2 Previous HCV Ab Test Result HCV Ab Test Result
## 3 Previous HCV Ab Test Result HCV Ab Test Result
## 4  01-Positive\xd0Self-report        01-Positive
##                                 X.9
## 1                                  
## 2                      HCV RNA Test
## 3 HCV RNA Test Conducted by Partner
## 4  1-Yes, test conducted by partner
##                                                             X.10
## 1                                                               
## 2                                                               
## 3 HCV RNA Test Result\n(leave blank for tests not yet conducted)
## 4                                                    01-Positive
##                                                                                                                               X.11
## 1                                                                                                                                 
## 2 HCV RNA Positive\n(Required for all RNA positives)\n(Please update linkages in progress, when new information becomes available)
## 3                                                                                        HCV RNA Positive Patient Newly Identified
## 4                                                                                                                             0-No
##                                                                 X.12
## 1                                                                   
## 2                                                                   
## 3      HCV RNA Positive Patient\n Attended First Appointment for HCV
## 4 2-Linked - attended medical appointment with primary care provider
##                                                                                                                                      X.13
## 1                                                                                                                                        
## 2                                                                                                                                        
## 3 HCV RNA Positive Patient Reason Did Not Attend First Appointment (required for HCV RNA-positive patients not keeping first appointment)
## 4                                                                                                                                        
##                                                       X.14                 X.15
## 1                                                                              
## 2 HCV Co-Infection\n(Required for all Ab or RNA positives)                     
## 3                                     HCV/HIV Co-Infection HCV/HBV Co-Infection
## 4                                                     0-No                 0-No
##                                                                                                                                                                                                                                                                     X.16
## 1 HBsAg Positives (Tested or Not Tested/Known Positive Individuals)\nRequired for partners conducting HBV testing or linkage to care\n*Only complete the HBV section if the patient tested HBsAg positive through FOCUS or is part of FOCUS HBV linkage to care efforts*
## 2                                                                                                                                                                                                                                   Country of Origin - HBsAg Prevalence
## 3                                                                                                                                                                                                                        Country of Origin has a HBsAg Prevalence > = 2%
## 4                                                                                                                                                                                                                                                                       
##                                                                                                          X.17
## 1                                                                                                            
## 2                                                                      HBV Risk Factor - Ever Injected Drugs 
## 3 Ever Injected Drugs (only required for those whose country of origin's HBsAg prevalence is < 2% or unknown)
## 4                                                                                                            
##                 X.18
## 1                   
## 2 HBsAg Test Result 
## 3 HBsAg Test Result 
## 4                   
##                                                                                                                               X.19
## 1                                                                                                                                 
## 2 HBsAg Positive\n(Required for all HBsAg positives)\n(Please update linkages in progress, when new information becomes available)
## 3                                                                                          HBsAg Positive Patient Newly Identified
## 4                                                                                                                                 
##                                                                 X.20
## 1                                                                   
## 2                                                                   
## 3 HBsAg Positive Patient: Attended First Medical Appointment for HBV
## 4                                                                   
##                                                                                                                                                   X.21
## 1                                                                                                                                                     
## 2                                                                                                                                                     
## 3 HBsAg Positive Patient: Reason Did Not Attend First Medical Appointment (required for HBsAg-positive patients not keeping first medical appointment)
## 4

The data used for this activity is comprised of de-identified Hepatitis C patient encounter records. Patients were screened at a local health facility within the Paso Del Norte region. The dataset is comprised of 564 entries and 17 variables. A quick look at the data shows several questionable inputs, these will require re-coding to be more useable when creating a data viz.

First, some adjustments can be done to the race and ethnicity variables. This was accomplished by collapsing these variables into one, resulting in a joint variable (i.e. White_non_His). In addition to this, the gender (Male & Female) can be re-coded as (0 & 1). Several of the values entered comprise of an alpha and numeric response (0-No / 1-Yes), these were be re-coded to numeric/factors values only. All variable responses with similar characteristics were addressed using the same approach. A final step in addressing the ambiguity within the dataset was to collapse several of the levels, limiting the variables to a positive (1) or negative (0) response only.

dat<-read.csv("CSV_HCV1.csv", header=T)
head(dat)
##   Client.ID..Unique... Site.Name       Date  YoB      Race Gender
## 1                    3      Alam 2017-01-02 1962       BLK   Male
## 2                   36      Alam 2017-01-05 1970 White-His   Male
## 3                   37      Alam 2017-01-06 1972 White-His Female
## 4                   38      Alam 2017-01-07 1961     White   Male
## 5                   76      Alam 2017-04-05 1970 White-His   Male
## 6                   77      Alam 2017-04-06 1966 White-His   Male
##   IDU.betweem..1945.1965. Previous_Ab_Test HCV_Ab Test.Conducted.Partner
## 1                       0                1      1                      1
## 2                      NA                1      1                      1
## 3                       0                1      1                      1
## 4                       1                1      1                      1
## 5                      NA                1      1                      1
## 6                      NA                1      1                      1
##   RNA.Test RNA_New_Poz RNA.Pos.1st.App. RNA.Pos..No.1st.App. HCV.HIV.Co.Infe
## 1        1           0        Link_Prov                 <NA>               0
## 2        0          NA             <NA>                 <NA>              NA
## 3        0          NA             <NA>                 <NA>               0
## 4        1           0        Link_Prov                 <NA>               0
## 5        1           0        Link_Prov                 <NA>               0
## 6        1           0        Link_Prov                 <NA>               0
##   HCV.HBV.Co.Infe
## 1               0
## 2              NA
## 3               0
## 4               0
## 5               0
## 6               0
str (dat)
## 'data.frame':    564 obs. of  16 variables:
##  $ Client.ID..Unique...   : int  3 36 37 38 76 77 78 79 80 81 ...
##  $ Site.Name              : Factor w/ 5 levels "Alam","H_less ",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Date                   : Factor w/ 473 levels "2017-01-01","2017-01-02",..: 2 5 6 7 49 50 51 52 53 54 ...
##  $ YoB                    : int  1962 1970 1972 1961 1970 1966 1955 1944 1969 1946 ...
##  $ Race                   : Factor w/ 7 levels "AI","BLK","Blk_His",..: 2 7 7 6 7 7 7 7 7 7 ...
##  $ Gender                 : Factor w/ 3 levels "Female","Male",..: 2 2 1 2 2 2 2 1 1 1 ...
##  $ IDU.betweem..1945.1965.: int  0 NA 0 1 NA NA NA 0 0 0 ...
##  $ Previous_Ab_Test       : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ HCV_Ab                 : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Test.Conducted.Partner : int  1 1 1 1 1 1 0 1 1 1 ...
##  $ RNA.Test               : int  1 0 0 1 1 1 1 0 0 1 ...
##  $ RNA_New_Poz            : int  0 NA NA 0 0 0 0 NA NA 0 ...
##  $ RNA.Pos.1st.App.       : Factor w/ 4 levels "Link_Prov","Link-_pec",..: 1 NA NA 1 1 1 3 NA NA 1 ...
##  $ RNA.Pos..No.1st.App.   : Factor w/ 8 levels "Decline","In Care",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ HCV.HIV.Co.Infe        : int  0 NA 0 0 0 0 0 0 0 0 ...
##  $ HCV.HBV.Co.Infe        : int  0 NA 0 0 0 0 0 0 0 0 ...
dat [1:4,]
##   Client.ID..Unique... Site.Name       Date  YoB      Race Gender
## 1                    3      Alam 2017-01-02 1962       BLK   Male
## 2                   36      Alam 2017-01-05 1970 White-His   Male
## 3                   37      Alam 2017-01-06 1972 White-His Female
## 4                   38      Alam 2017-01-07 1961     White   Male
##   IDU.betweem..1945.1965. Previous_Ab_Test HCV_Ab Test.Conducted.Partner
## 1                       0                1      1                      1
## 2                      NA                1      1                      1
## 3                       0                1      1                      1
## 4                       1                1      1                      1
##   RNA.Test RNA_New_Poz RNA.Pos.1st.App. RNA.Pos..No.1st.App. HCV.HIV.Co.Infe
## 1        1           0        Link_Prov                 <NA>               0
## 2        0          NA             <NA>                 <NA>              NA
## 3        0          NA             <NA>                 <NA>               0
## 4        1           0        Link_Prov                 <NA>               0
##   HCV.HBV.Co.Infe
## 1               0
## 2              NA
## 3               0
## 4               0

\(\underline{Step\ 1\ cont.}\)

A closer look at the full dataset also shows several missing data (NA’s). To address this issue, a count of all missing values must be conducted. To accomplish this, the (R code1) provided below can be used. Next the Multivariate Imputation by Chained Equations “MICE” function was used. This allows for the replacement of missing values (imputation). MICE does this by populating the missing values (NA’s) by cross-referencing them with the adjacent values captured within the entire data set. This can be done with the (R code2) provided below.

\(\underline{(R-Code\ 1)}\)

aggr(dat, col=c('navyblue', 'yellow'), numbers=T, sortVars=T, 
     labels=names(data), cex.axis=.7, gap=3, ylab=c("Missing%", "Pattern"))

## 
##  Variables sorted by number of missings: 
##                                                                                                                                                                                                                                                   Variable
##                                                                                                                                                                                                                                                       X.12
##                                                                                                                                                                                                                                                       X.14
##                                                                                                                                                                                                                                                        X.5
##                                                                                                                                                                                                                                                       X.10
##                                                                                                                                                                                                                                                       X.11
##                                                                                                                                                                                                                                                       X.13
##                                                                                                                                                                                                                                                        X.9
##                                                                                                                                                                                                                                                        X.6
##  X.Required.Variables.for.All.Patients.Positive.for.HIV..Tested.or.Not.Tested.Known.Positive.Individuals.Out.of.Care...HCV..Antibody..RNA..or.Not.Tested.Known.Positive.Individuals.Out.of.Care...or.HBV..Tested.or.Not.Tested.Known.Positive.Individuals.
##                                                                                                                                                                                                                                                          X
##                                                                                                                                                                                                                                                        X.1
##                                                                                                                                                                                                                                                        X.2
##                                                                                                                                                                                                                                                        X.3
##                                                                                                                                                                                                                                                        X.4
##                                                                                                                                                                                                                                                        X.7
##                                                                                                                                                                                                                                                        X.8
##        Count
##  0.875886525
##  0.377659574
##  0.356382979
##  0.187943262
##  0.186170213
##  0.182624113
##  0.015957447
##  0.007092199
##  0.000000000
##  0.000000000
##  0.000000000
##  0.000000000
##  0.000000000
##  0.000000000
##  0.000000000
##  0.000000000

\(\underline{(R-Code\ 2)}\)

fit.mice <- mice(dat , m=2, maxit = 5, method = 'pmm', seed = 500)
...
## 
##  iter imp variable
##   1   1  IDU.betweem..1945.1965.*  Previous_Ab_Test*  RNA.Test*  RNA_New_Poz*  RNA.Pos.1st.App.*  RNA.Pos..No.1st.App.*  HCV.HIV.Co.Infe*  HCV.HBV.Co.Infe*
##   1   2  IDU.betweem..1945.1965.*  Previous_Ab_Test*  RNA.Test*  RNA_New_Poz*  RNA.Pos.1st.App.*  RNA.Pos..No.1st.App.*  HCV.HIV.Co.Infe*  HCV.HBV.Co.Infe*
...
## Warning: Number of logged events: 173
dat <- complete(fit.mice,1)
dat[1:10,]
...
##    Client.ID..Unique... Site.Name       Date  YoB      Race Gender
## 1                     3      Alam 2017-01-02 1962       BLK   Male
## 2                    36      Alam 2017-01-05 1970 White-His   Male
## 3                    37      Alam 2017-01-06 1972 White-His Female
...

Post mice NA’s check.

aggr(dat, col=c('navyblue', 'yellow'), numbers=T, sortVars=T, 
     labels=names(data), cex.axis=.7, gap=3, ylab=c("Missing%", "Pattern"))

## 
##  Variables sorted by number of missings: 
##                                                                                                                                                                                                                                                   Variable
##  X.Required.Variables.for.All.Patients.Positive.for.HIV..Tested.or.Not.Tested.Known.Positive.Individuals.Out.of.Care...HCV..Antibody..RNA..or.Not.Tested.Known.Positive.Individuals.Out.of.Care...or.HBV..Tested.or.Not.Tested.Known.Positive.Individuals.
##                                                                                                                                                                                                                                                          X
##                                                                                                                                                                                                                                                        X.1
##                                                                                                                                                                                                                                                        X.2
##                                                                                                                                                                                                                                                        X.3
##                                                                                                                                                                                                                                                        X.4
##                                                                                                                                                                                                                                                        X.5
##                                                                                                                                                                                                                                                        X.6
##                                                                                                                                                                                                                                                        X.7
##                                                                                                                                                                                                                                                        X.8
##                                                                                                                                                                                                                                                        X.9
##                                                                                                                                                                                                                                                       X.10
##                                                                                                                                                                                                                                                       X.11
##                                                                                                                                                                                                                                                       X.12
##                                                                                                                                                                                                                                                       X.13
##                                                                                                                                                                                                                                                       X.14
##  Count
##      0
##      0
##      0
##      0
##      0
##      0
##      0
##      0
##      0
##      0
##      0
##      0
##      0
##      0
##      0
##      0

\(\underline{Step\ 2}\)

As a low-level visualization, the visualization and plot commands in R was utalized. These commands provide a visual representation of each variable, allowing for the review of any concerns/abnormalities that may have occurred with the data set. Any noted abnormalities will be addressed at this point. Once ajusted step 1 will be restarted. From the boxplot below, we can see some of the variables needs to be revisited.

plot(dat[,-1])

boxplot(dat[-c(1:4)])

visualize (dat[,-1])

\(\underline{Step\ 3}\)

This assignment will evaluate five health centers (N=5) and the HCV-RNA positivity rates per site. Tufte’s Principles will be utilized to produce a well-accepted data visualization (datviz).

First, we will envision the story I wish the datviz to tell. In reference to the data set presented above, the datviz will focus on showing the differences/commonalities of HCV incidence among the five local health centers, over the first three years of the program’s implementation.

The first time series plot looks at the overall growth of the program from its inception. Here we see as steady growth with an intake of 564 patients at the end of July 2010. Within this plot, we note several crest and lows over the life of the program. A cross validation of the lows shows consistency with noted brakes of services experienced by the program (i.e. spring break and seasonal vacations taken by the medical provider). While many of the drops were justified, several require a second look at the capturing method, as it raises questions on how the data was charted (i.e. January 2019).

The bar plot provided below shows a count per health center. Here we that the majority of the patients were screened at the Alameda location.

dat$Date <- ymd(dat$Date)
don <- xts(x = dat$Client.ID..Unique..., order.by = dat$Date)


p <- dygraph(don,ylab = "Count", xlab = "Date", main = "Patient Intake Graph (2017-2019)") %>%
  dyOptions(labelsUTC = TRUE, fillGraph=TRUE, fillAlpha=0.1, drawGrid = F, colors="#D8AE5A") %>%
  dyRangeSelector() %>%
  dyCrosshair(direction = "vertical") %>%
  dyHighlight(highlightCircleSize = 5, highlightSeriesBackgroundAlpha = 0.2, hideOnMouseOut = FALSE)  %>%
  dyRoller(rollPeriod = 1)
p
ggplot(dat, aes(x =Site.Name, colour=Site.Name, fill=Site.Name)) +
  geom_bar()

The below plot presents to the audience the race and ethnicity of the screened population per health center. In addition to this, the plot provides a count of each of these demographics per site. This allows for the visualization of several of the key demographics.

ggplot(dat, aes(Site.Name, Race, colour=Race)) + 
  geom_count()

The below plots provide a count of RNA positive/Negative, per gender, at each site. This will allow the audience to have a clear idea of, which site screened the most RNA reactive patients and their identified gender.

dat$RNA.Test<-as.factor(dat$RNA.Test)
dat$Gender<-as.factor(dat$Gender)

ggplot(dat, aes(x = Site.Name, y =Gender, colour = RNA.Test, fill=RNA.Test))+
  geom_point()

ggplot(dat) + geom_point(aes(x=Site.Name, y=Gender, color=RNA.Test))  + 
  facet_wrap( ~ RNA.Test, scales="free")

Second

To achieve “Graphical Integrity” (GI), all variations/manipulations to the dataset has been documented and articulated in this publication/presentation. All data formatting steps is presented along with its subsequent coding. Keeping in line with GI, all scales are clearly legible (i.e. labeled margins and scales will start from 0).

Third

Specific attention was paid to the layering of the variables. Precisely, several graphic devices were used to separate the categories. In addition to this, the layers were applied in a hierarchy approach to reduce confusion in the story being told. For this specific assignment, the graphs were layered to show HCV incidence per site in a hierarchy approach. Keen attention was paid to the color schemes and scale. An extra emphasis was placed on depicting small changes within the data. As the assignments continue to progress, a focus will be placed on the “Parallelism” of the datviz. Specifically, isomorphism and visual juxtapositions revealing connections will also be layered in.

To avoided “Chart-junk”, a cookbook approach to datviz was first utilized to develop sound graphs. Once the graph was developed, a more inventive/liberal approach to datviz was applied. Specifically focusing on visual noise reduction, which may disrupt the story that is being told.

\(\underline{Step\ 4}\)

The bar plot best tells the story of patient seen at each health center. demographics.

ct<- ggplot(dat, aes(x =Site.Name, colour=Site.Name, fill=Site.Name)) +
  geom_bar()

\(\underline{Step\ 5}\)

A layer of site total shas been added to the plot to provide clarity. The Y limits has also been ajusted to fit the population.

ct+ylim(0,350)+ggtitle("Patient Flow Per Site (2017-2019)")+geom_text(stat = 'count',aes(label =..count.., vjust = -0.2))

\(\underline{Step\ 6}\)

From the provided plot the audience will be able to gain an understanding of the count of patients seen at the five health centers.