This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown, see http://rmarkdown.rstudio.com.
When you click the Knit button, a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
Now let’s display the head of the Data data frame:
library(MASS)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.1 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::select() masks MASS::select()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(naniar)
library(dendextend)
##
## ---------------------
## Welcome to dendextend version 1.17.1
## Type citation('dendextend') for how to cite the package.
##
## Type browseVignettes(package = 'dendextend') for the package vignette.
## The github page is: https://github.com/talgalili/dendextend/
##
## Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
## You may ask questions at stackoverflow, use the r and dendextend tags:
## https://stackoverflow.com/questions/tagged/dendextend
##
## To suppress this message use: suppressPackageStartupMessages(library(dendextend))
## ---------------------
##
##
## Attaching package: 'dendextend'
##
## The following object is masked from 'package:stats':
##
## cutree
Data <- biopsy
head(Data)
## ID V1 V2 V3 V4 V5 V6 V7 V8 V9 class
## 1 1000025 5 1 1 1 2 1 3 1 1 benign
## 2 1002945 5 4 4 5 7 10 3 2 1 benign
## 3 1015425 3 1 1 1 2 2 3 1 1 benign
## 4 1016277 6 8 8 1 3 4 3 7 1 benign
## 5 1017023 4 1 1 3 2 1 3 1 1 benign
## 6 1017122 8 10 10 8 7 10 9 7 1 malignant
dim(Data)
## [1] 699 11
# Remove ID column
Data <- dplyr::select(Data, -ID) # relate to dendextend , make sure run libray #first, oltherwise error
head(Data)
## V1 V2 V3 V4 V5 V6 V7 V8 V9 class
## 1 5 1 1 1 2 1 3 1 1 benign
## 2 5 4 4 5 7 10 3 2 1 benign
## 3 3 1 1 1 2 2 3 1 1 benign
## 4 6 8 8 1 3 4 3 7 1 benign
## 5 4 1 1 3 2 1 3 1 1 benign
## 6 8 10 10 8 7 10 9 7 1 malignant
# How many missing values are there?
sum(is.na(Data))
## [1] 16
vis_miss(Data)
Data <- na.omit(Data)
This R code snippet performs two main operations on the Data data frame using the dplyr package, which is part of the tidyverse. Let’s break it down step by step:
The %>% operator (pipe operator) is used to chain operations in a more readable manner. It takes the output from the left-hand side operation and passes it as the first argument to the right-hand side function.
dplyr::select(-class): The select() function from the dplyr package is used to remove the class column from the Data data frame. The - sign before the column name indicates that this column should be removed.
scale(): This is a base R function that standardizes the remaining columns of the data frame by centering (subtracting the mean) and scaling (dividing by the standard deviation) each numeric variable. This operation is useful when you want to compare variables with different units or scales, and is often done as a preprocessing step in machine learning and statistical modeling.
#Data <- biopsy
# Scale the column
DataScaled <- Data %>%
dplyr::select(-class) %>%
scale()
set.seed(51773)
kM <- kmeans(DataScaled,2)
pca <- prcomp(DataScaled)
df <- data.frame(pca$x, cluster = paste("cluster", kM$cluster, sep = "_"), Data)
ggplot(df, aes(x = PC1, y = PC2, colour = class, shape = cluster)) + geom_point()
set.seed(51773): This sets the seed for the random number generator in R. Setting a seed ensures that any random operation performed later in the code will yield the same results each time the script is run. This is useful for reproducibility purposes.
kmeans(DataScaled, 2): This performs k-means clustering on the DataScaled data frame, using 2 clusters. The k-means algorithm is a popular clustering technique that tries to find cluster centers that minimize the within-cluster sum of squares. The result is saved in the kM object.
prcomp(DataScaled): This performs principal component analysis (PCA) on the DataScaled data frame. PCA is a dimensionality reduction technique that linearly transforms the original variables into a new set of uncorrelated variables called principal components. The result is saved in the pca object.
data.frame(pca\(x, cluster = paste("cluster", kM\)cluster, sep = “”), Data): This creates a new data frame that includes the PCA scores (the transformed data points in the principal component space), the cluster assignments from the k-means clustering (with the prefix ”cluster”), and the original Data data frame.
ggplot(df, aes(x = PC1, y = PC2, colour = class, shape = cluster)) + geom_point(): This creates a scatter plot using the ggplot2 package, with the x-axis representing the first principal component (PC1) and the y-axis representing the second principal component (PC2). The points are colored according to the class variable and shaped based on their cluster assignment from the k-means clustering. This visualization helps in understanding the relationship between the clusters found by the k-means algorithm and the original class labels in the context of the first two principal components.
# Perform clustering on the scaled data
hC <- hclust(dist(DataScaled))
# Plot the dendrogram
plot(hC)
# We can cut the dendrogram at a height such that there are two clusters
hCC <- cutree(hC, k = 2)
# There are lots of packages for plotting dendrograms
hC %>%
as.dendrogram() %>%
set("branches_k_color", k = 2) %>%
set("labels","") %>%
plot()
## Warning in `labels<-.dendrogram`(dend, value = value, ...): The lengths of the
## new labels is shorter than the number of leaves in the dendrogram - labels are
## recycled.
hclust(dist(DataScaled)): This performs hierarchical clustering on the DataScaled data frame. The dist() function calculates the distance matrix using Euclidean distances by default. The hclust() function then performs hierarchical clustering using the distance matrix. The result is saved in the hC object.
plot(hC): This creates a basic dendrogram plot of the hierarchical clustering result (hC). A dendrogram is a tree-like diagram that represents the hierarchy of the clusters.
cutree(hC, k = 2): This function cuts the hierarchical clustering tree (hC) into a specified number of clusters, in this case, 2. The result is saved in the hCC object.
The next block of code uses the dendextend package to create a more visually appealing dendrogram:
hC %>% as.dendrogram(): This converts the hC hierarchical clustering object into a dendrogram object.
set(“branches_k_color”, k = 2): This sets the colors of the branches in the dendrogram based on the number of clusters specified (k = 2).
set(“labels”, ““): This removes the labels from the dendrogram’s leaves.
plot(): This creates a plot of the modified dendrogram.
DATA <- data.frame(Data, kmeans = kM$cluster, hclust = hCC)
(tabk <- table(DATA$kmeans, DATA$class))
##
## benign malignant
## 1 10 220
## 2 434 19
chisq.test(tabk)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: tabk
## X-squared = 556.92, df = 1, p-value < 2.2e-16
(tabh <- table(DATA$hclust, DATA$class))
##
## benign malignant
## 1 442 99
## 2 2 140
(tabhk <- table(DATA$hclust, DATA$kmeans))
##
## 1 2
## 1 88 453
## 2 142 0
DATA <- data.frame(Data, kmeans = kM\(cluster, hclust = hCC): This creates a new data frame DATA that includes the original Data data frame along with the cluster assignments from k-means clustering (stored in kM\)cluster) and hierarchical clustering (stored in hCC).
(tabk <- table(DATA\(kmeans, DATA\)class)): This creates a contingency table showing the relationship between the k-means cluster assignments (DATA$kmeans) and the class labels from the original data. The result is saved in the tabk object.
chisq.test(tabk): This performs a chi-squared test for independence using the contingency table tabk. The test checks whether the k-means cluster assignments are independent of the class labels. A significant result (low p-value) would indicate that there is a relationship between the k-means clusters and the class labels.
(tabh <- table(DATA\(hclust, DATA\)class)): This creates a contingency table showing the relationship between the hierarchical clustering assignments (DATA$hclust) and the class labels from the original data. The result is saved in the tabh object.
(tabhk <- table(DATA\(hclust, DATA\)kmeans)): This creates a contingency table showing the relationship between the hierarchical clustering assignments (DATA\(hclust) and the k-means clustering assignments (DATA\)kmeans). The result is saved in the tabhk object.
Data2 <- Data %>%
dplyr::select(-class)
set.seed(51773)
kM2 <- kmeans(Data2,2)
pca2 <- prcomp(Data2)
df2 <- data.frame(pca2$x, cluster = paste("cluster", kM2$cluster, sep = "_"), Data)
ggplot(df2, aes(x = PC1, y = PC2, colour = class, shape = cluster)) + geom_point()
library(ggplot2)
library(tidyr)
library(datasets)
# The ToothGrowth dataset is loaded in the 'datasets' package
tooth = ToothGrowth
head(tooth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
ggplot(tooth,aes(y = len, x = supp, colour = factor(dose)))+geom_boxplot() + theme_classic() + xlab('Supplement') + ylab("Length")+ labs(colour = "Supplement")
VCdata = tooth[tooth[,"supp"] == "VC",]
dim(VCdata)
## [1] 30 3
head(VCdata)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
boxplot(len~dose, VCdata)
aov1 = aov(len~dose, VCdata)
summary(aov1)
## Df Sum Sq Mean Sq F value Pr(>F)
## dose 1 1601.3 1601.3 117.9 1.51e-11 ***
## Residuals 28 380.1 13.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aov2 = aov(len~factor(dose), VCdata)
summary(aov2)
## Df Sum Sq Mean Sq F value Pr(>F)
## factor(dose) 2 1650 824.7 67.07 3.36e-11 ***
## Residuals 27 332 12.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ggplot(VCdata,aes(y = len, x = factor(dose)))+geom_boxplot() + theme_classic() + xlab('Dose') + ylab("Length")
## Another way of fitting anova in R
fit = lm(len~factor(dose), VCdata)
anova(fit)
## Analysis of Variance Table
##
## Response: len
## Df Sum Sq Mean Sq F value Pr(>F)
## factor(dose) 2 1649.5 824.74 67.072 3.357e-11 ***
## Residuals 27 332.0 12.30
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Change dose to factor
tooth$dose = as.factor(tooth$dose)
## Without ggplot
attach(tooth)
interaction.plot(dose, supp, len)
detach()
## With ggplot
ggplot(tooth, aes(x=factor(dose), y = len, colour=supp)) + geom_boxplot() +
stat_summary(fun.y = mean, geom = "line", aes(group = supp)) + theme_classic() + xlab('Dose') + ylab("Length")+ labs(colour = "Supplement")
## Warning: The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
## ℹ Please use the `fun` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
This R code is creating interaction plots to visualize the relationship between dose, supplement type, and tooth length using the built-in interaction.plot() function and ggplot2. Let’s break it down step by step:
tooth\(dose = as.factor(tooth\)dose): This line of code converts the ‘dose’ variable in the ‘tooth’ data frame to a factor. This is done to treat ‘dose’ as a categorical variable instead of a numeric one.
attach(tooth): This line of code temporarily adds the columns of the ‘tooth’ data frame to the search path. This allows you to use the column names directly in your analysis without specifying the data frame.
interaction.plot(dose, supp, len): This line of code creates an interaction plot using the built-in interaction.plot() function. The x-axis represents the ‘dose’ variable, different line colors represent the ‘supp’ variable (supplement type), and the y-axis shows the ‘len’ variable (tooth length).
detach(): This line of code removes the ‘tooth’ data frame from the search path to avoid potential conflicts with other data frames that have columns with the same names.
The following code creates an interaction plot using the ggplot2 package:
scss Copy code ggplot(tooth, aes(x=factor(dose), y = len, colour=supp)) + geom_boxplot() + stat_summary(fun.y = mean, geom = “line”, aes(group = supp)) + theme_classic() + xlab(‘Dose’) + ylab(“Length”) + labs(colour = “Supplement”) The plot is created by specifying the data frame (tooth), the aesthetics (x, y, and color), adding a boxplot layer, adding a line to connect the means of each group, setting a classic theme, and customizing the axis labels and legend.
In summary, the code creates interaction plots to visualize the relationship between dose, supplement type, and tooth length using both the built-in interaction.plot() function and the ggplot2 package. These plots help understand how the combination of dose and supplement type affects tooth length.
tooth.aov = aov(len ~ supp * dose, tooth)
summary(tooth.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## supp 1 205.4 205.4 15.572 0.000231 ***
## dose 2 2426.4 1213.2 92.000 < 2e-16 ***
## supp:dose 2 108.3 54.2 4.107 0.021860 *
## Residuals 54 712.1 13.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tooth.aov.additive = aov(len ~ supp + dose, tooth)
summary(tooth.aov.additive)
## Df Sum Sq Mean Sq F value Pr(>F)
## supp 1 205.4 205.4 14.02 0.000429 ***
## dose 2 2426.4 1213.2 82.81 < 2e-16 ***
## Residuals 56 820.4 14.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(tooth.aov,tooth.aov.additive)
## Analysis of Variance Table
##
## Model 1: len ~ supp * dose
## Model 2: len ~ supp + dose
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 54 712.11
## 2 56 820.43 -2 -108.32 4.107 0.02186 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(tidyverse)
patientData <- read.csv('C:/Users/u6977/Desktop/Patient.csv')
transplantData <- read.csv('C:/Users/u6977/Desktop/Transplant.csv')
creatineData <- read.csv('C:/Users/u6977/Desktop/SerumCreatinine.csv')
# For simplicity let's ignore patient's multiple transplants. What does the !
# do?
transplantData2 <- transplantData[!duplicated(transplantData$id), ]
# Join patientData and transplantData
Data <- full_join(patientData, transplantData2, by = "id")
# Extract some interesting variables
use <- c("gendercode", "latereferralcode", "creatinineatentry", "height", "weight",
"smokingcode", "cancerever", "chroniclungcode", "coronaryarterycode", "peripheralvascularcode",
"cerebrovasularcode", "diabetescode", "graftno", "transplantcentrestate", "recipientantibodycmvcode",
"recipientantibodyebvcode", "donorsourcecode", "donorage", "donorgendercode",
"ischaemia", "ageattransplant", "hlamismatchesa", "hlamismatchesb", "hlamismatchesdr",
"hlamismatchesdq", "maxcytotoxicantibodies", "currentcytotoxicantibodies", "timeondialysis",
"transplantstatus")
Data <- Data[, use]
# Recode some NA
Data[Data == ""] = NA
Data[Data == "-"] = NA
head(Data)
## gendercode latereferralcode creatinineatentry height weight smokingcode
## 1 M N 600 112.7 21.8 Never
## 2 M N 1234 152.0 33.6 Never
## 3 M N 1025 162.0 53.0 Never
## 4 M N 646 175.0 85.0 Never
## 5 M N NA 154.0 51.5 Never
## 6 M N 449 163.0 NA Former
## cancerever chroniclungcode coronaryarterycode peripheralvascularcode
## 1 No N N Y
## 2 Yes N N N
## 3 No N N Y
## 4 No N N Y
## 5 Yes N N N
## 6 Yes N N N
## cerebrovasularcode diabetescode graftno transplantcentrestate
## 1 Y N 2 SA
## 2 N N 2 NSW
## 3 N N 3 VIC
## 4 N N 2 NZ
## 5 N N 1 NSW
## 6 N N 1 WA
## recipientantibodycmvcode recipientantibodyebvcode donorsourcecode donorage
## 1 Positive Positive Friend 46
## 2 Positive Positive Deceased 59
## 3 Positive Positive Deceased 45
## 4 Positive Positive Sister 58
## 5 Positive Positive Deceased 55
## 6 Positive Negative Deceased 22
## donorgendercode ischaemia ageattransplant hlamismatchesa hlamismatchesb
## 1 F 20 27 2 2
## 2 F 20 60 0 2
## 3 F 7 34 2 0
## 4 F 16 54 0 1
## 5 M 5 48 2 2
## 6 F 11 59 2 2
## hlamismatchesdr hlamismatchesdq maxcytotoxicantibodies
## 1 0 NA 0
## 2 0 0 0
## 3 0 NA 64
## 4 0 0 0
## 5 2 NA 99
## 6 1 1 0
## currentcytotoxicantibodies timeondialysis transplantstatus
## 1 0 18.01506 0
## 2 0 24.41068 0
## 3 56 18.06981 1
## 4 0 24.26283 0
## 5 3 13.89459 1
## 6 0 9.59343 1
## You can read in "dates" in R using as.Dates
#### Make Data L
DataPrep <- full_join(patientData, transplantData2, by = "id")
use <- c("gendercode", "lastfollowupdate", "transplantdate", "graftfailuredate", "transplantdate", "ageattransplant")
DataL <- DataPrep[, use]
DataL[DataL == ""] = NA
DataL[DataL == "-"] = NA
DataL <- na.omit(DataL)
## Time till last follow-up
DataL
## gendercode lastfollowupdate transplantdate graftfailuredate
## 7 M 31-Dec-16 23-Aug-07 24-Feb-15
## 9 M 16-Sep-05 25-Oct-00 26-Oct-00
## 12 M 31-Dec-16 30-Apr-08 31-May-11
## 17 F 31-Dec-16 2-Oct-08 28-Jul-16
## 24 M 28-Nov-15 16-Jul-03 1-Dec-11
## 26 M 31-Dec-16 3-Jun-08 5-May-15
## 36 M 31-Dec-16 30-Apr-07 21-Jul-16
## 42 F 14-May-11 24-Oct-00 13-May-11
## 74 F 31-Dec-16 28-Dec-11 22-Dec-16
## 75 M 31-Dec-16 15-Jan-09 16-Jan-09
## 85 M 31-Dec-16 17-Dec-08 29-Aug-16
## 89 F 31-Dec-16 29-Oct-12 31-May-13
## 104 F 31-Dec-16 7-Jul-11 12-Mar-12
## 113 M 31-Dec-16 4-Aug-10 30-Nov-16
## 115 M 31-Dec-16 27-Nov-08 5-Jun-15
## 132 M 31-Dec-16 18-Jun-09 12-Apr-12
## 135 M 31-Dec-16 2-Feb-10 10-Feb-10
## 154 M 31-Dec-16 16-Aug-14 2-Dec-16
## 160 F 31-Dec-16 18-Jun-09 16-Aug-11
## 168 F 1-Apr-12 24-Dec-08 24-Mar-11
## 174 M 31-Dec-16 18-Jul-07 9-Feb-09
## 178 M 31-Dec-16 5-May-07 3-Feb-11
## 180 M 31-Dec-16 16-Aug-08 30-Jun-14
## 198 F 17-Oct-10 12-Jul-09 13-Jul-09
## 210 M 31-Dec-16 2-Mar-11 26-Oct-15
## 297 F 31-Dec-16 26-Sep-13 8-Sep-16
## 300 F 31-Dec-16 5-Sep-13 19-Aug-16
## 326 M 31-Dec-16 8-Sep-14 1-Mar-15
## 362 F 31-Dec-16 4-Dec-15 16-Apr-16
## 392 F 31-Dec-16 30-Nov-11 19-Jan-15
## 398 M 31-Dec-16 12-Dec-08 16-Jun-14
## 404 M 31-Dec-16 27-Mar-09 12-Nov-15
## 424 M 31-Dec-16 21-Feb-12 19-Apr-13
## 440 M 31-Dec-16 9-Mar-13 20-May-13
## 447 M 31-Dec-16 23-May-15 24-May-15
## 461 F 31-Dec-16 22-Jun-11 25-Sep-13
## 490 M 2-Feb-15 22-Jun-10 23-Aug-12
## 492 M 31-Dec-16 27-Apr-11 26-Aug-14
## 511 F 16-Mar-15 23-May-13 5-Aug-14
## 516 M 31-Dec-16 1-Sep-02 23-Mar-10
## 519 F 21-Mar-13 30-Jul-03 1-Mar-07
## 521 F 31-Dec-16 12-Jun-01 27-May-09
## 524 M 24-Jun-09 20-Apr-06 22-Jun-07
## 528 F 31-Dec-16 23-Jan-06 17-Jun-16
## 558 F 31-Dec-16 2-Mar-08 4-Aug-14
## 562 M 31-Dec-16 9-Jan-08 15-Jun-10
## 565 F 31-Dec-16 21-May-08 5-Jul-16
## 566 M 31-Dec-16 8-Jun-06 1-Oct-11
## 587 M 31-Dec-16 6-Oct-10 3-Jul-13
## 593 F 17-Apr-16 18-Sep-15 17-Apr-16
## 599 F 31-Dec-16 14-Oct-10 22-Dec-14
## 611 F 31-Dec-16 5-Jul-10 5-Jul-10
## 618 M 31-Dec-16 17-May-12 31-May-12
## 627 F 18-Nov-15 23-Oct-01 30-Aug-10
## 628 M 31-Dec-16 2-Oct-02 10-Sep-12
## 634 F 15-Jan-05 5-Jun-04 12-Oct-04
## 635 M 13-May-13 12-Oct-06 11-May-13
## 638 M 31-Dec-16 2-May-01 7-May-08
## 644 M 31-Dec-16 29-Jan-01 14-Mar-05
## 651 M 31-Dec-16 12-Apr-06 24-Oct-13
## 657 F 31-Dec-16 30-Jun-05 15-Mar-16
## 660 M 31-Dec-16 8-Dec-08 21-Apr-15
## 668 M 31-Dec-16 9-Feb-05 9-Jun-14
## 673 M 31-Dec-16 12-Jun-06 5-Mar-15
## 674 M 9-Jul-06 14-Jun-06 14-Jun-06
## 676 M 31-Dec-16 2-May-07 11-Aug-16
## 686 F 30-Mar-10 20-Jun-08 5-Sep-08
## 688 M 31-Dec-16 13-Aug-09 18-May-15
## 694 F 31-Dec-16 27-Oct-03 3-Sep-04
## 696 F 31-Dec-16 29-Sep-06 7-May-08
## 698 M 31-Dec-16 21-Jul-04 27-Nov-14
## 713 M 31-Dec-16 21-Jul-04 6-Aug-08
## 725 F 31-Dec-16 2-Dec-02 5-Mar-12
## 728 F 31-Dec-16 14-Apr-03 1-Jul-14
## 729 F 31-Dec-16 8-Apr-11 25-Oct-16
## 730 F 31-Dec-16 19-Feb-10 1-Jun-10
## 737 F 11-Nov-15 9-Sep-03 19-Sep-14
## 738 M 31-Dec-16 29-Aug-08 27-Oct-08
## 740 M 31-Dec-16 4-Sep-13 19-Jun-15
## 746 M 31-Dec-16 31-Dec-04 7-Mar-14
## 747 M 31-Dec-16 16-Apr-03 17-Apr-03
## 752 F 8-Oct-13 15-Feb-05 20-Nov-06
## 757 M 9-May-06 25-Jul-02 1-Feb-05
## 761 M 31-Dec-16 22-Aug-03 12-Nov-08
## 762 F 18-Mar-05 19-Nov-04 20-Nov-04
## 766 F 31-Dec-16 23-Dec-07 24-Dec-07
## 768 M 28-Feb-14 30-Dec-04 30-Dec-09
## 772 M 31-Dec-16 31-May-04 4-Jul-04
## 774 M 31-Dec-16 26-May-04 19-Jun-08
## 783 F 31-Dec-16 5-Mar-05 24-Apr-08
## 795 M 29-Dec-15 5-Jan-06 29-Dec-15
## 796 M 1-Apr-04 14-Mar-04 14-Mar-04
## 797 M 4-Feb-08 14-Apr-04 29-Nov-07
## 798 M 31-Dec-16 10-Aug-03 12-Nov-07
## 800 M 13-Apr-16 12-Aug-03 8-Jun-04
## 807 M 6-Dec-10 9-Sep-02 5-Dec-10
## 809 M 27-Feb-10 20-May-01 20-Apr-09
## 811 M 8-Dec-16 21-Jun-01 26-Jun-14
## 812 M 31-Dec-16 30-Oct-12 31-Aug-16
## 816 F 1-Oct-06 29-Aug-05 30-Aug-05
## 818 M 31-Dec-16 10-Jun-05 10-Jun-05
## 819 M 31-Dec-16 6-May-04 3-Oct-12
## 820 M 31-Dec-16 8-Jul-02 31-Aug-08
## 821 M 4-Nov-14 15-Nov-02 9-Jul-12
## 822 M 31-Dec-16 17-Jul-06 18-Jul-06
## 826 M 22-Sep-16 20-Jun-03 21-Jun-06
## 830 F 3-Apr-14 5-Dec-02 25-Nov-11
## 831 M 31-Dec-16 24-Apr-03 18-Mar-16
## 839 F 31-Dec-16 19-Mar-00 10-Sep-12
## 840 M 31-Dec-16 18-Oct-05 17-Oct-16
## 841 M 31-Dec-16 23-Jul-08 24-Dec-09
## 843 M 31-Dec-16 31-Mar-03 13-Mar-05
## 852 M 24-Jul-15 12-Oct-01 24-Jul-15
## 858 M 31-Dec-16 22-May-01 21-Feb-05
## 859 M 28-Jan-16 23-Jun-04 28-Aug-14
## 863 M 31-Dec-16 31-Jan-05 20-Apr-12
## 868 M 31-Dec-16 18-Jan-01 10-Jan-14
## 869 M 12-Jul-09 17-Nov-06 4-Nov-08
## 872 M 17-Apr-16 15-Aug-01 18-Oct-06
## 873 F 10-Jul-09 22-Feb-07 3-Jul-09
## 874 M 31-Dec-16 5-Jul-00 4-Dec-07
## 882 F 31-Dec-16 16-Feb-04 24-Mar-14
## 885 M 31-Dec-16 17-Aug-06 21-Feb-13
## 897 F 23-Mar-12 13-Jun-05 25-Sep-05
## 898 M 3-Apr-07 22-Mar-07 1-Apr-07
## 903 M 31-Dec-16 25-Jul-04 21-Aug-04
## 908 M 31-Dec-16 30-Oct-00 30-Oct-00
## 912 F 31-Dec-16 17-Jul-06 18-Jul-06
## 915 F 9-Nov-15 9-Nov-03 4-Jul-05
## 916 M 31-Dec-16 17-Apr-03 11-Nov-06
## 919 M 31-Dec-16 3-Nov-01 6-Jan-09
## 923 F 31-Dec-16 22-Oct-06 3-Feb-16
## 926 M 31-Dec-16 14-Jul-11 30-Mar-12
## 928 M 28-Sep-03 23-Sep-03 24-Sep-03
## 931 F 31-Dec-16 16-Feb-02 4-Mar-02
## 938 F 31-Dec-16 17-Nov-02 1-Dec-02
## 941 F 31-Dec-16 1-Jul-05 14-Aug-11
## 942 F 2-Dec-01 7-Aug-00 8-Aug-00
## 943 M 31-Dec-16 12-Dec-02 11-Jan-07
## 948 F 31-Dec-16 6-Jun-10 26-Jun-12
## 951 M 11-Oct-10 18-Oct-03 1-Jun-10
## 952 M 31-Dec-16 2-Dec-02 8-Jul-11
## 954 M 31-Dec-16 11-Mar-01 3-Sep-03
## 957 F 31-Dec-16 7-Nov-00 25-May-04
## 960 M 8-Dec-13 12-Dec-01 29-Jan-07
## 961 F 31-Dec-16 12-Jun-08 9-May-11
## 962 F 31-Dec-16 13-Jul-00 1-Jul-08
## 965 F 31-Dec-16 6-Dec-06 27-May-16
## 966 M 31-Dec-16 14-Oct-03 19-Aug-15
## 968 F 11-Jan-16 9-Aug-03 29-Sep-15
## 969 M 31-Dec-16 13-Oct-00 13-Oct-00
## 988 M 31-Dec-16 9-Mar-01 19-Oct-09
## 991 M 25-Jul-00 3-Jan-00 12-Jan-00
## transplantdate.1 ageattransplant
## 7 23-Aug-07 24
## 9 25-Oct-00 48
## 12 30-Apr-08 51
## 17 2-Oct-08 68
## 24 16-Jul-03 53
## 26 3-Jun-08 35
## 36 30-Apr-07 65
## 42 24-Oct-00 57
## 74 28-Dec-11 33
## 75 15-Jan-09 41
## 85 17-Dec-08 15
## 89 29-Oct-12 16
## 104 7-Jul-11 60
## 113 4-Aug-10 62
## 115 27-Nov-08 14
## 132 18-Jun-09 45
## 135 2-Feb-10 37
## 154 16-Aug-14 51
## 160 18-Jun-09 26
## 168 24-Dec-08 40
## 174 18-Jul-07 22
## 178 5-May-07 52
## 180 16-Aug-08 43
## 198 12-Jul-09 52
## 210 2-Mar-11 55
## 297 26-Sep-13 52
## 300 5-Sep-13 17
## 326 8-Sep-14 67
## 362 4-Dec-15 56
## 392 30-Nov-11 16
## 398 12-Dec-08 13
## 404 27-Mar-09 58
## 424 21-Feb-12 43
## 440 9-Mar-13 60
## 447 23-May-15 55
## 461 22-Jun-11 29
## 490 22-Jun-10 24
## 492 27-Apr-11 20
## 511 23-May-13 43
## 516 1-Sep-02 38
## 519 30-Jul-03 38
## 521 12-Jun-01 51
## 524 20-Apr-06 29
## 528 23-Jan-06 54
## 558 2-Mar-08 56
## 562 9-Jan-08 57
## 565 21-May-08 34
## 566 8-Jun-06 9
## 587 6-Oct-10 36
## 593 18-Sep-15 62
## 599 14-Oct-10 20
## 611 5-Jul-10 52
## 618 17-May-12 43
## 627 23-Oct-01 62
## 628 2-Oct-02 9
## 634 5-Jun-04 57
## 635 12-Oct-06 48
## 638 2-May-01 9
## 644 29-Jan-01 14
## 651 12-Apr-06 36
## 657 30-Jun-05 48
## 660 8-Dec-08 55
## 668 9-Feb-05 27
## 673 12-Jun-06 29
## 674 14-Jun-06 63
## 676 2-May-07 54
## 686 20-Jun-08 66
## 688 13-Aug-09 48
## 694 27-Oct-03 16
## 696 29-Sep-06 12
## 698 21-Jul-04 27
## 713 21-Jul-04 27
## 725 2-Dec-02 25
## 728 14-Apr-03 43
## 729 8-Apr-11 66
## 730 19-Feb-10 52
## 737 9-Sep-03 39
## 738 29-Aug-08 53
## 740 4-Sep-13 37
## 746 31-Dec-04 61
## 747 16-Apr-03 35
## 752 15-Feb-05 40
## 757 25-Jul-02 31
## 761 22-Aug-03 54
## 762 19-Nov-04 59
## 766 23-Dec-07 54
## 768 30-Dec-04 50
## 772 31-May-04 48
## 774 26-May-04 43
## 783 5-Mar-05 16
## 795 5-Jan-06 59
## 796 14-Mar-04 58
## 797 14-Apr-04 25
## 798 10-Aug-03 39
## 800 12-Aug-03 58
## 807 9-Sep-02 67
## 809 20-May-01 35
## 811 21-Jun-01 33
## 812 30-Oct-12 35
## 816 29-Aug-05 64
## 818 10-Jun-05 46
## 819 6-May-04 52
## 820 8-Jul-02 28
## 821 15-Nov-02 49
## 822 17-Jul-06 58
## 826 20-Jun-03 43
## 830 5-Dec-02 52
## 831 24-Apr-03 2
## 839 19-Mar-00 29
## 840 18-Oct-05 38
## 841 23-Jul-08 21
## 843 31-Mar-03 28
## 852 12-Oct-01 65
## 858 22-May-01 1
## 859 23-Jun-04 58
## 863 31-Jan-05 22
## 868 18-Jan-01 10
## 869 17-Nov-06 20
## 872 15-Aug-01 58
## 873 22-Feb-07 48
## 874 5-Jul-00 36
## 882 16-Feb-04 29
## 885 17-Aug-06 9
## 897 13-Jun-05 21
## 898 22-Mar-07 66
## 903 25-Jul-04 42
## 908 30-Oct-00 48
## 912 17-Jul-06 51
## 915 9-Nov-03 47
## 916 17-Apr-03 30
## 919 3-Nov-01 54
## 923 22-Oct-06 36
## 926 14-Jul-11 29
## 928 23-Sep-03 48
## 931 16-Feb-02 31
## 938 17-Nov-02 24
## 941 1-Jul-05 32
## 942 7-Aug-00 33
## 943 12-Dec-02 35
## 948 6-Jun-10 24
## 951 18-Oct-03 45
## 952 2-Dec-02 41
## 954 11-Mar-01 41
## 957 7-Nov-00 45
## 960 12-Dec-01 51
## 961 12-Jun-08 30
## 962 13-Jul-00 42
## 965 6-Dec-06 32
## 966 14-Oct-03 31
## 968 9-Aug-03 50
## 969 13-Oct-00 23
## 988 9-Mar-01 25
## 991 3-Jan-00 45
#timeFU = as.Date(DataL$lastfollowupdate, format = "%d%b%Y") - #as.Date(DataL$transplantdate, format = "%d%b%Y")
##the date format in the as.Date() function is incorrect. In the data, #the date format is "%d-%b-%y", which means the day is represented by two #digits, the month is represented by a three-letter abbreviation, and the year #is represented by two digits.
# Convert character variables to date variables with the correct format
timeFU <- as.Date(DataL$lastfollowupdate, format = "%d-%b-%y") - as.Date(DataL$transplantdate, format = "%d-%b-%y")
timeFailure <- as.Date(DataL$graftfailuredate, format = "%d-%b-%y") - as.Date(DataL$transplantdate, format = "%d-%b-%y")
failures = !is.na(timeFailure)
## Simple survival model
library(survival)
fit = coxph(Surv(timeFailure,failures)~DataL$ageattransplant)
fit
## Call:
## coxph(formula = Surv(timeFailure, failures) ~ DataL$ageattransplant)
##
## coef exp(coef) se(coef) z p
## DataL$ageattransplant 0.003975 1.003983 0.004880 0.814 0.415
##
## Likelihood ratio test=0.67 on 1 df, p=0.4139
## n= 153, number of events= 153
plot(survfit(Surv(timeFailure/365,failures)~DataL$gendercode),col = 1:2,lty = 1:2,xlab = 'years',ylab = 'Probability of success')
legend('topright',c('Women','Men'),col = 1:2,lty = 1:2)
fit = coxph(Surv(timeFailure,failures)~DataL$gendercode)
fit
## Call:
## coxph(formula = Surv(timeFailure, failures) ~ DataL$gendercode)
##
## coef exp(coef) se(coef) z p
## DataL$gendercodeM -0.07485 0.92788 0.17028 -0.44 0.66
##
## Likelihood ratio test=0.19 on 1 df, p=0.6612
## n= 153, number of events= 153