Qualitative (Categorical) Data Analysis for R’s transplant data

The transplant data set contains information for 815 subjects on a liver transplant waiting list from 1990-1999, and their disposition: received a transplant, died while waiting, withdrew from the list, or censored (still waiting at time data set was created). The variables are:

age = age at addition to the waiting list sex = m or f abo = blood type: A, B, AB or O year = year in which they entered the waiting list futime = time from entry to final disposition event = final disposition: censored (still waiting), death (died while waiting), ltx (got transplant) or withdraw (withdrew from list)

This data set is in R’s built-in “survival” package.To use data from a built-in package, use the library() function and put the package name in the parentheses. For example, library(survival). This will tell R to call that package up from the library where it is stored, so we can use it. The Console (bottom of this section) will show it has been called up from the library. Type answers after the -> sign.

#Get the structure of the transplant data set to view the variables in it
library(survival) #Call up the survival package from the library
str(transplant) #View transplant data structure
## 'data.frame':    815 obs. of  6 variables:
##  $ age   : num  47 55 52 40 70 66 41 55 50 61 ...
##  $ sex   : Factor w/ 2 levels "m","f": 1 1 1 2 1 2 2 1 1 1 ...
##  $ abo   : Factor w/ 4 levels "A","B","AB","O": 2 1 2 4 4 4 1 4 1 1 ...
##  $ year  : num  1994 1991 1996 1995 1996 ...
##  $ futime: num  1197 28 85 231 1271 ...
##  $ event : Factor w/ 4 levels "censored","death",..: 2 3 3 3 1 3 3 3 3 3 ...

Write code to get help in the chunk below before you answer any questions:

# WRITE YOUR CODE TO GET HELP FOR THE DATASET transplant USING ?
?transplant
  1. What type of variable did R list year as, and it is accurate?

-> year is listed as num, when it should have been a categorical variable.

  1. How many categories does the abo variable have?

-> 4

#Create frequency table for the sex variable in the transplant data frame
table(transplant$sex)
## 
##   m   f 
## 447 368

Next, update the code below and run it in order to answer the questions below:

#Create frequency table for the abo variable in the transplant data frame
#FINISH FOLLOWING CODE BY TYPING THE VARIABLE NAME abo AFTER THE DOLLAR SIGN
table(transplant$abo)
## 
##   A   B  AB   O 
## 325 103  41 346
  1. Which blood type is most common among those on this liver transplant waiting list?

-> O

  1. Which blood type is least common among those on this liver transplant waiting list?

-> AB

#Create frequency table with sum total
addmargins(table(transplant$abo)) #the addmargins() function add the sum of the row
## 
##   A   B  AB   O Sum 
## 325 103  41 346 815
  1. How many patients were blood type collected on?

-> 815

#Create relative frequency table
prop.table(table(transplant$abo))
## 
##          A          B         AB          O 
## 0.39877301 0.12638037 0.05030675 0.42453988
  1. What proportion (or can convert to percentage) have AB blood type?

-> 5.03%

Make a barchart (called barplot in R)

Making a graph in R requires you first create the data table and then save it as an object. We will create and save the abo table as an object called aboTable. Then, in the same code chunk, we’ll use the barplot() function to make a bar chart.

aboTable <- table(transplant$abo) #Save abo variable's frequency table as an object called aboTable (this is required to make the bar chart)

barplot(aboTable, #This tells R to use the barplot function on the aboTable object
        main="Basic Bar Chart of Blood Type", #This adds a title
   xlab="Blood Type", #This adds an x-axis label
   ylab = "Count", #This adds a y-axis label
   col = "hotpink") #This give the bars a color just for the fun of it

  1. Based on the bar chart, which blood type had the fewest number of liver patients on the list?

-> AB

Create a pie chart

Technically, how this works is we create a bar chart and convert it to a circle. We’ll use the event variable to walk through whole process. And, of course, it will start with creating a frequency table and saving it as an object!

#1 create a frequency table and save it as an object
event <- table(transplant$event) 
#2 Convert frequency table object to a data frame
event_df  <- as.data.frame(event) 
#3 Change variable names from Var1 & Freq to original variable's name & Count
names(event_df) <- c("Event","Count") 
#4 Calculate the percentage using one decimal place and save it as an object we will call pct for percent
pct <- round(100*event_df$Count/sum(event_df$Count), 1) 
#5 Add a ‘%' sign and category name to each slice
pie(event_df$Count, #This makes the pie reflect the counts
    labels = paste(event_df$Event, sep = " ", pct, "%"), #This adds the category names and percentages to the pie slices
    radius = 1, #Radius = 1 controls size of pie & can keep labels from overlapping if slices are really small (not an issue on our chart)
    main = "Base R Pie Chart of Event") #Add title

8. What percentage of the 815 people on the list get a liver transplant?

-> 78%

Create 2x2 (contingency) table with row & column totals

#The variable listed first gets put on the rows; addmargins adds the sum for each row & column; We believe abo is the explanatory variable, so we'll put it on the rows - that is why it is listed first
addmargins(table(transplant$abo,transplant$event))
##      
##       censored death ltx withdraw Sum
##   A         27    21 269        8 325
##   B          9    10  78        6 103
##   AB         2     3  33        3  41
##   O         38    32 256       20 346
##   Sum       76    66 636       37 815
  1. Which event has the highest outcome?

-> ltx, liver transplant

  1. Which blood type had the fewest deaths, and why might that be, based on the data?

-> AB, because it has the least amount of patients

Create a relative frequency table using the prop.table function (proportion table) by row percentages

Save the contingency table as an object and use the object in the proportion function

#save contingency as object, and then use the object in the prop.table function
abo_event <- table(transplant$abo,transplant$event)
row_prop <- prop.table(abo_event, 1)# the 1 tells it to give row proportions; save it as an object to use when we make a bar chart
row_prop
##     
##        censored      death        ltx   withdraw
##   A  0.08307692 0.06461538 0.82769231 0.02461538
##   B  0.08737864 0.09708738 0.75728155 0.05825243
##   AB 0.04878049 0.07317073 0.80487805 0.07317073
##   O  0.10982659 0.09248555 0.73988439 0.05780347
  1. Of all those with AB blood type, what proportion got a transplant?

-> 80.5%

  1. Of all those with B blood type, what proportion withdrew?

->5.82%

  1. Which blood type had the lowest proportion of deaths?

-> A

#Get column proportions
(prop.table(abo_event, 2)) # the 2 tells it to give column proportions
##     
##        censored      death        ltx   withdraw
##   A  0.35526316 0.31818182 0.42295597 0.21621622
##   B  0.11842105 0.15151515 0.12264151 0.16216216
##   AB 0.02631579 0.04545455 0.05188679 0.08108108
##   O  0.50000000 0.48484848 0.40251572 0.54054054
  1. Out of all who withdrew, what proportion had blood type B?

-> 16.2%

#Make side-by-side bar chart for abo and event We created a 2x2 table and saved it as abo_event. We will use that table to make the barplot by frequencies. Again, notice that to make the graph, we had to create the table and save it as an object first.

We put the event variable in the columns and blood type on the rows in our table, so we need to label the x-axis for the columns. The row categories (blood types) will show up in the legend and represent the height of the bars.

Then, we’ll use the row proportion table we saved as row_prop and create a bar chart by row proportions.

#Bar chart by frequencies
barplot(abo_event, main="Blood Type and Event", #Add title
  xlab="Event", #Label x-axis
  ylab = "Frequencies", #Label y-axis
col=c("hotpink","lightgreen", "lightblue", "lightyellow"), #Color code bars; there must be same number of colors as categories, so we need four
  legend = rownames(abo_event), #Add legend
beside=TRUE) #Put bars beside each other instead of stacked (excluding beside = TRUE results in stacked bar chart) We can try it for fun

#Bar chart by row proportions
barplot(row_prop, main="Blood Type and Event", #Add title
  xlab="Event", #Label x-axis
   ylab = "Row Proportions", #Label y-axis
col=c("hotpink","lightgreen", "lightblue", "lightyellow"), #Color code bars; there must be same number of colors as categories, so we need four
  legend = rownames(abo_event), #Add legend
beside=TRUE) #Put bars beside each other instead of stacked (excluding beside = TRUE results in stacked bar chart) We can try it for fun

15. Based on the frequencies bar chart, which blood type received the least transplants?

-> AB

Next, we combine blood types to 2 groups, O vs others. We also combine event to ltx (receive transplant) vs others and generate a contingency table:

transplant$bloodtypenew <- ifelse(transplant$abo == "O", "O", "others")
transplant$eventnew <- ifelse(transplant$event == "death", "death", "others")

# Get contingency table with ulcer on the rows and status on the columns and add the sum to each row & column
addmargins(table(transplant$bloodtypenew,transplant$eventnew))
##         
##          death others Sum
##   O         32    314 346
##   others    34    435 469
##   Sum       66    749 815
  1. What is the risk of death if a patient is A blood type O?

-> Risk1= 32/346= 9.25%

  1. Interpret the risk above:

-> The risk of dying while waiting for a liver transplant as a patient with blood type O is 9.25%.

  1. What is the relative risk (i.e. risk ratio) of death if the patient is O blood type as compared to other blood types?

-> 9.25/7.25= 1.28%

  1. Interpret the relative risk above:

-> Patients with blood type O are 1.28 times more likely to die than other blood types while waiting for a liver transplant.

  1. What are the odds of death if a patient is A blood type O?

-> 32/314= 0.1019

  1. What is the odds ratio of death if the patient is O blood type as compared to other blood types?

-> 0.1019/0.0782= 1.30

  1. Interpret the odds ratio above:

-> the odds of death while waiting for a liver transplant if a patient is blood type O are 1.30 times greater than other blood types.

Knit file as either an html or a PDF. Submitting as a PDF may require installing the tinytex package. If you can, great. If not, the just knit as html. Submit your file in the D2L Categorical EDA submission folder.