1 Data Cleaning and Import

1.1 Cleaning in Excel

The excel workbook ‘lit review v3’ was received from A Debes and the following variables were copied to a new sheet for analysis:

  • First Author, Year Pub, Type RDT, Manufacturer, Country where tested, if field…, technical level…, Direct or Enriched…, Gold standard…, N, sensitivity, 95 CI LL, 95 CI LL, specificity, 95 CI LL, 95 CI UL.

  • Multiple variable names were shortened or in the case of confidence limits, sens or spec tags added to avoid duplicate variables.

  • Created a new variable, Reference, that combines First_Author and Year_Pub using the excel function concatenate. This variable will be used for labeling the forest plots.

  • Excel sheet manually screened for misspellings

  • Filtered out rows with no info on confidence limits. This left 42 out of 59 rows.

  • Converted percentages to numbers for display in graph

  • removed rows tagged with ‘exclude’

  • Saved to One Drive as ‘rdt_review_v3’ in ‘C:/Users/QMP0/OneDrive - CDC/Data Science/r_practice’

1.2 Import and Cleaning in R

Import csv file with rio package.

1.3 Tweaking Data to Arrange by Order

To get the forest plots to display in ascending order based on the confidence interval lower limits, I had to change the Reference variable from character type into factor type. Then I ordered the factor levels by the value of the the confidence interval lower limit. To display each graoh correctly, I have to do this for each of the 4 data configurations: Direct Sensitivity, Enriched Sensitivity, etc.

2 Visualizations

2.1 Forest Plots w/ ggplot Version 1

Libraries

For my first attempt at making a forest plot, I found some code which uses the ggplot2 package at this site.

I’ll try it out in the code chunk below for sensitivity:

This comes out really crowded and I’m not sure how some of the settings are working so I’m going to try this code again on a small subset of our data.

One problem I need to fix is the labeling for the reference in each facet. I’d like it to display horizontally. The fix I found for this is in the theme settings, I had to switch from strip.text.y with angle = 180 to strip.text.y.left with angle = 0.

Now I’ll see how many rows I can fit back in and have it still look nice.

Ok, it would have been nice to have the direct and enriched on one graph but I think I’ll have to filter by sample type.

For simplicity’s sake, I’ll leave off the rows labeled as ‘both’ under enrichment.

Next, I’ll try the graphs with colors given by RDT type.

This looks pretty nice - will need to find out what’s going on with these facets with overlapping CIs. I think this is due to different raw data calcs based on different faint line interpretations?

Let’s see how the other 3 graph configurations look now.

Enriched Sensitivity

Will need to deal with Grandesso row

2.2 Forest Plots w/ ggplot Version 2

I found another plotting method here that seems like it will be less crowded if I can get it to work.

2.2.1 Basic Graphs

2.2.1.1 Direct Sensitivity

2.2.1.2 Enriched Sensitivity

2.2.1.3 Direct Specificity

2.2.1.4 Enriched Specificity

2.2.2 Stratified by Gold Standard

2.2.2.1 Direct Sensitivity

2.2.2.2 Enriched Sensitivity

2.2.2.3 Direct Specificity

2.2.2.4 Enriched Specificity

2.2.3 Stratified by Technicican Level

2.2.3.1 Direct Sensitivity

2.2.3.2 Enriched Sensitivity

2.2.3.3 Direct Specificity

2.2.3.4 Enriched Specificity

2.2.4 Stratified by Setting

2.2.4.1 Direct Sensitivity

2.2.4.2 Enriched Sensitivity

2.2.4.3 Direct Specificity

2.2.4.4 Enriched Specificity