The transplant data set contains information for 815 subjects on a liver transplant waiting list from 1990-1999, and their disposition: received a transplant, died while waiting, withdrew from the list, or censored (still waiting at time data set was created). The variables are:
age = age at addition to the waiting list sex = m or f abo = blood type: A, B, AB or O year = year in which they entered the waiting list futime = time from entry to final disposition event = final disposition: censored (still waiting), death (died while waiting), ltx (got transplant) or withdraw (withdrew from list)
This data set is in R’s built-in “survival” package.To use data from a built-in package, use the library() function and put the package name in the parentheses. For example, library(survival). This will tell R to call that package up from the library where it is stored, so we can use it. The Console (bottom of this section) will show it has been called up from the library. Type answers after the -> sign.
#Get the structure of the transplant data set to view the variables in it
library(survival) #Call up the survival package from the library
str(transplant) #View transplant data structure
## 'data.frame': 815 obs. of 6 variables:
## $ age : num 47 55 52 40 70 66 41 55 50 61 ...
## $ sex : Factor w/ 2 levels "m","f": 1 1 1 2 1 2 2 1 1 1 ...
## $ abo : Factor w/ 4 levels "A","B","AB","O": 2 1 2 4 4 4 1 4 1 1 ...
## $ year : num 1994 1991 1996 1995 1996 ...
## $ futime: num 1197 28 85 231 1271 ...
## $ event : Factor w/ 4 levels "censored","death",..: 2 3 3 3 1 3 3 3 3 3 ...
Write code to get help in the chunk below before you answer any questions:
# WRITE YOUR CODE TO GET HELP FOR THE DATASET transplant USING ?
?transplant
-> year is listed as num, when it should have been a categorical variable.
-> 4
#Create frequency table for the sex variable in the transplant data frame
table(transplant$sex)
##
## m f
## 447 368
Next, update the code below and run it in order to answer the questions below:
#Create frequency table for the abo variable in the transplant data frame
#FINISH FOLLOWING CODE BY TYPING THE VARIABLE NAME abo AFTER THE DOLLAR SIGN
table(transplant$abo)
##
## A B AB O
## 325 103 41 346
-> O
-> AB
#Create frequency table with sum total
addmargins(table(transplant$abo)) #the addmargins() function add the sum of the row
##
## A B AB O Sum
## 325 103 41 346 815
-> 815
#Create relative frequency table
prop.table(table(transplant$abo))
##
## A B AB O
## 0.39877301 0.12638037 0.05030675 0.42453988
-> 5.03%
Making a graph in R requires you first create the data table and then save it as an object. We will create and save the abo table as an object called aboTable. Then, in the same code chunk, we’ll use the barplot() function to make a bar chart.
aboTable <- table(transplant$abo) #Save abo variable's frequency table as an object called aboTable (this is required to make the bar chart)
barplot(aboTable, #This tells R to use the barplot function on the aboTable object
main="Basic Bar Chart of Blood Type", #This adds a title
xlab="Blood Type", #This adds an x-axis label
ylab = "Count", #This adds a y-axis label
col = "hotpink") #This give the bars a color just for the fun of it
-> AB
Technically, how this works is we create a bar chart and convert it to a circle. We’ll use the event variable to walk through whole process. And, of course, it will start with creating a frequency table and saving it as an object!
#1 create a frequency table and save it as an object
event <- table(transplant$event)
#2 Convert frequency table object to a data frame
event_df <- as.data.frame(event)
#3 Change variable names from Var1 & Freq to original variable's name & Count
names(event_df) <- c("Event","Count")
#4 Calculate the percentage using one decimal place and save it as an object we will call pct for percent
pct <- round(100*event_df$Count/sum(event_df$Count), 1)
#5 Add a ‘%' sign and category name to each slice
pie(event_df$Count, #This makes the pie reflect the counts
labels = paste(event_df$Event, sep = " ", pct, "%"), #This adds the category names and percentages to the pie slices
radius = 1, #Radius = 1 controls size of pie & can keep labels from overlapping if slices are really small (not an issue on our chart)
main = "Base R Pie Chart of Event") #Add title
8. What percentage of the 815 people on the list get a liver
transplant?
-> 78%
#The variable listed first gets put on the rows; addmargins adds the sum for each row & column; We believe abo is the explanatory variable, so we'll put it on the rows - that is why it is listed first
addmargins(table(transplant$abo,transplant$event))
##
## censored death ltx withdraw Sum
## A 27 21 269 8 325
## B 9 10 78 6 103
## AB 2 3 33 3 41
## O 38 32 256 20 346
## Sum 76 66 636 37 815
-> ltx, liver transplant
-> AB, because it has the least amount of patients
#save contingency as object, and then use the object in the prop.table function
abo_event <- table(transplant$abo,transplant$event)
row_prop <- prop.table(abo_event, 1)# the 1 tells it to give row proportions; save it as an object to use when we make a bar chart
row_prop
##
## censored death ltx withdraw
## A 0.08307692 0.06461538 0.82769231 0.02461538
## B 0.08737864 0.09708738 0.75728155 0.05825243
## AB 0.04878049 0.07317073 0.80487805 0.07317073
## O 0.10982659 0.09248555 0.73988439 0.05780347
-> 80.5%
->5.82%
-> A
#Get column proportions
(prop.table(abo_event, 2)) # the 2 tells it to give column proportions
##
## censored death ltx withdraw
## A 0.35526316 0.31818182 0.42295597 0.21621622
## B 0.11842105 0.15151515 0.12264151 0.16216216
## AB 0.02631579 0.04545455 0.05188679 0.08108108
## O 0.50000000 0.48484848 0.40251572 0.54054054
-> 16.2%
#Make side-by-side bar chart for abo and event We created a 2x2 table and saved it as abo_event. We will use that table to make the barplot by frequencies. Again, notice that to make the graph, we had to create the table and save it as an object first.
We put the event variable in the columns and blood type on the rows in our table, so we need to label the x-axis for the columns. The row categories (blood types) will show up in the legend and represent the height of the bars.
Then, we’ll use the row proportion table we saved as row_prop and create a bar chart by row proportions.
#Bar chart by frequencies
barplot(abo_event, main="Blood Type and Event", #Add title
xlab="Event", #Label x-axis
ylab = "Frequencies", #Label y-axis
col=c("hotpink","lightgreen", "lightblue", "lightyellow"), #Color code bars; there must be same number of colors as categories, so we need four
legend = rownames(abo_event), #Add legend
beside=TRUE) #Put bars beside each other instead of stacked (excluding beside = TRUE results in stacked bar chart) We can try it for fun
#Bar chart by row proportions
barplot(row_prop, main="Blood Type and Event", #Add title
xlab="Event", #Label x-axis
ylab = "Row Proportions", #Label y-axis
col=c("hotpink","lightgreen", "lightblue", "lightyellow"), #Color code bars; there must be same number of colors as categories, so we need four
legend = rownames(abo_event), #Add legend
beside=TRUE) #Put bars beside each other instead of stacked (excluding beside = TRUE results in stacked bar chart) We can try it for fun
15. Based on the frequencies bar chart, which blood type received the
least transplants?
-> AB
Next, we combine blood types to 2 groups, O vs others. We also combine event to ltx (receive transplant) vs others and generate a contingency table:
transplant$bloodtypenew <- ifelse(transplant$abo == "O", "O", "others")
transplant$eventnew <- ifelse(transplant$event == "death", "death", "others")
# Get contingency table with ulcer on the rows and status on the columns and add the sum to each row & column
addmargins(table(transplant$bloodtypenew,transplant$eventnew))
##
## death others Sum
## O 32 314 346
## others 34 435 469
## Sum 66 749 815
-> Risk1= 32/346= 9.25%
-> The risk of dying while waiting for a liver transplant as a patient with blood type O is 9.25%.
-> 9.25/7.25= 1.28%
-> Patients with blood type O are 1.28 times more likely to die than other blood types while waiting for a liver transplant.
-> 32/314= 0.1019
-> 0.1019/0.0782= 1.30
-> the odds of death while waiting for a liver transplant if a patient is blood type O are 1.30 times greater than other blood types.