R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown, see http://rmarkdown.rstudio.com.

When you click the Knit button, a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Head of Data

Now let’s display the head of the Data data frame:

library(MASS)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.1     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ dplyr::select() masks MASS::select()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(naniar)
library(dendextend)
## 
## ---------------------
## Welcome to dendextend version 1.17.1
## Type citation('dendextend') for how to cite the package.
## 
## Type browseVignettes(package = 'dendextend') for the package vignette.
## The github page is: https://github.com/talgalili/dendextend/
## 
## Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
## You may ask questions at stackoverflow, use the r and dendextend tags: 
##   https://stackoverflow.com/questions/tagged/dendextend
## 
##  To suppress this message use:  suppressPackageStartupMessages(library(dendextend))
## ---------------------
## 
## 
## Attaching package: 'dendextend'
## 
## The following object is masked from 'package:stats':
## 
##     cutree
Data <- biopsy
head(Data)
##        ID V1 V2 V3 V4 V5 V6 V7 V8 V9     class
## 1 1000025  5  1  1  1  2  1  3  1  1    benign
## 2 1002945  5  4  4  5  7 10  3  2  1    benign
## 3 1015425  3  1  1  1  2  2  3  1  1    benign
## 4 1016277  6  8  8  1  3  4  3  7  1    benign
## 5 1017023  4  1  1  3  2  1  3  1  1    benign
## 6 1017122  8 10 10  8  7 10  9  7  1 malignant
dim(Data)
## [1] 699  11
# Remove ID column
Data <- dplyr::select(Data, -ID) # relate to dendextend , make sure run libray #first, oltherwise error
head(Data)
##   V1 V2 V3 V4 V5 V6 V7 V8 V9     class
## 1  5  1  1  1  2  1  3  1  1    benign
## 2  5  4  4  5  7 10  3  2  1    benign
## 3  3  1  1  1  2  2  3  1  1    benign
## 4  6  8  8  1  3  4  3  7  1    benign
## 5  4  1  1  3  2  1  3  1  1    benign
## 6  8 10 10  8  7 10  9  7  1 malignant
# How many missing values are there?
sum(is.na(Data))
## [1] 16
vis_miss(Data)

Data <- na.omit(Data)

R Markdown

This R code snippet performs two main operations on the Data data frame using the dplyr package, which is part of the tidyverse. Let’s break it down step by step:

The %>% operator (pipe operator) is used to chain operations in a more readable manner. It takes the output from the left-hand side operation and passes it as the first argument to the right-hand side function.

dplyr::select(-class): The select() function from the dplyr package is used to remove the class column from the Data data frame. The - sign before the column name indicates that this column should be removed.

scale(): This is a base R function that standardizes the remaining columns of the data frame by centering (subtracting the mean) and scaling (dividing by the standard deviation) each numeric variable. This operation is useful when you want to compare variables with different units or scales, and is often done as a preprocessing step in machine learning and statistical modeling.

#Data <- biopsy
# Scale the column
DataScaled <- Data %>%
  dplyr::select(-class) %>%
  scale()
set.seed(51773)
kM <- kmeans(DataScaled,2)
pca <- prcomp(DataScaled)
df <- data.frame(pca$x, cluster = paste("cluster", kM$cluster, sep = "_"), Data)
ggplot(df, aes(x = PC1, y = PC2, colour = class, shape = cluster)) + geom_point()

R Markdown

set.seed(51773): This sets the seed for the random number generator in R. Setting a seed ensures that any random operation performed later in the code will yield the same results each time the script is run. This is useful for reproducibility purposes.

kmeans(DataScaled, 2): This performs k-means clustering on the DataScaled data frame, using 2 clusters. The k-means algorithm is a popular clustering technique that tries to find cluster centers that minimize the within-cluster sum of squares. The result is saved in the kM object.

prcomp(DataScaled): This performs principal component analysis (PCA) on the DataScaled data frame. PCA is a dimensionality reduction technique that linearly transforms the original variables into a new set of uncorrelated variables called principal components. The result is saved in the pca object.

data.frame(pca\(x, cluster = paste("cluster", kM\)cluster, sep = “”), Data): This creates a new data frame that includes the PCA scores (the transformed data points in the principal component space), the cluster assignments from the k-means clustering (with the prefix ”cluster”), and the original Data data frame.

ggplot(df, aes(x = PC1, y = PC2, colour = class, shape = cluster)) + geom_point(): This creates a scatter plot using the ggplot2 package, with the x-axis representing the first principal component (PC1) and the y-axis representing the second principal component (PC2). The points are colored according to the class variable and shaped based on their cluster assignment from the k-means clustering. This visualization helps in understanding the relationship between the clusters found by the k-means algorithm and the original class labels in the context of the first two principal components.

# Perform clustering on the scaled data
hC <- hclust(dist(DataScaled))

# Plot the dendrogram
plot(hC)

# We can cut the dendrogram at a height such that there are two clusters
hCC <- cutree(hC, k = 2)

# There are lots of packages for plotting dendrograms
hC %>% 
  as.dendrogram() %>%
  set("branches_k_color", k = 2) %>% 
  set("labels","") %>%
  plot() 
## Warning in `labels<-.dendrogram`(dend, value = value, ...): The lengths of the
## new labels is shorter than the number of leaves in the dendrogram - labels are
## recycled.

Including Plots

hclust(dist(DataScaled)): This performs hierarchical clustering on the DataScaled data frame. The dist() function calculates the distance matrix using Euclidean distances by default. The hclust() function then performs hierarchical clustering using the distance matrix. The result is saved in the hC object.

plot(hC): This creates a basic dendrogram plot of the hierarchical clustering result (hC). A dendrogram is a tree-like diagram that represents the hierarchy of the clusters.

cutree(hC, k = 2): This function cuts the hierarchical clustering tree (hC) into a specified number of clusters, in this case, 2. The result is saved in the hCC object.

The next block of code uses the dendextend package to create a more visually appealing dendrogram:

  1. hC %>% as.dendrogram(): This converts the hC hierarchical clustering object into a dendrogram object.

  2. set(“branches_k_color”, k = 2): This sets the colors of the branches in the dendrogram based on the number of clusters specified (k = 2).

  3. set(“labels”, ““): This removes the labels from the dendrogram’s leaves.

  4. plot(): This creates a plot of the modified dendrogram.

DATA <- data.frame(Data, kmeans = kM$cluster, hclust = hCC)

(tabk <- table(DATA$kmeans, DATA$class))
##    
##     benign malignant
##   1     10       220
##   2    434        19
chisq.test(tabk)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  tabk
## X-squared = 556.92, df = 1, p-value < 2.2e-16
(tabh <- table(DATA$hclust, DATA$class))
##    
##     benign malignant
##   1    442        99
##   2      2       140
(tabhk <- table(DATA$hclust, DATA$kmeans))
##    
##       1   2
##   1  88 453
##   2 142   0

Including Plots

DATA <- data.frame(Data, kmeans = kM\(cluster, hclust = hCC): This creates a new data frame DATA that includes the original Data data frame along with the cluster assignments from k-means clustering (stored in kM\)cluster) and hierarchical clustering (stored in hCC).

(tabk <- table(DATA\(kmeans, DATA\)class)): This creates a contingency table showing the relationship between the k-means cluster assignments (DATA$kmeans) and the class labels from the original data. The result is saved in the tabk object.

chisq.test(tabk): This performs a chi-squared test for independence using the contingency table tabk. The test checks whether the k-means cluster assignments are independent of the class labels. A significant result (low p-value) would indicate that there is a relationship between the k-means clusters and the class labels.

(tabh <- table(DATA\(hclust, DATA\)class)): This creates a contingency table showing the relationship between the hierarchical clustering assignments (DATA$hclust) and the class labels from the original data. The result is saved in the tabh object.

(tabhk <- table(DATA\(hclust, DATA\)kmeans)): This creates a contingency table showing the relationship between the hierarchical clustering assignments (DATA\(hclust) and the k-means clustering assignments (DATA\)kmeans). The result is saved in the tabhk object.

Data2 <- Data %>%
  dplyr::select(-class)

set.seed(51773)
kM2 <- kmeans(Data2,2)
pca2 <- prcomp(Data2)
df2 <- data.frame(pca2$x, cluster = paste("cluster", kM2$cluster, sep = "_"), Data)
ggplot(df2, aes(x = PC1, y = PC2, colour = class, shape = cluster)) + geom_point()

library(ggplot2)
library(tidyr)
library(datasets)
# The ToothGrowth dataset is loaded in the 'datasets' package
tooth = ToothGrowth
head(tooth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
ggplot(tooth,aes(y = len, x = supp, colour = factor(dose)))+geom_boxplot() + theme_classic() + xlab('Supplement') + ylab("Length")+ labs(colour = "Supplement")

VCdata = tooth[tooth[,"supp"] == "VC",]
dim(VCdata)  
## [1] 30  3
head(VCdata)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
boxplot(len~dose, VCdata)

aov1 = aov(len~dose, VCdata)
summary(aov1)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## dose         1 1601.3  1601.3   117.9 1.51e-11 ***
## Residuals   28  380.1    13.6                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aov2 = aov(len~factor(dose), VCdata)
summary(aov2)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## factor(dose)  2   1650   824.7   67.07 3.36e-11 ***
## Residuals    27    332    12.3                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ggplot(VCdata,aes(y = len, x = factor(dose)))+geom_boxplot() + theme_classic() + xlab('Dose') + ylab("Length")

## Another way of fitting anova in R
fit = lm(len~factor(dose), VCdata)
anova(fit)
## Analysis of Variance Table
## 
## Response: len
##              Df Sum Sq Mean Sq F value    Pr(>F)    
## factor(dose)  2 1649.5  824.74  67.072 3.357e-11 ***
## Residuals    27  332.0   12.30                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Change dose to factor
tooth$dose = as.factor(tooth$dose)

## Without ggplot
attach(tooth)
interaction.plot(dose, supp, len)

detach()

## With ggplot
ggplot(tooth, aes(x=factor(dose), y = len, colour=supp)) + geom_boxplot() + 
  stat_summary(fun.y = mean, geom = "line", aes(group = supp))  + theme_classic() + xlab('Dose') + ylab("Length")+ labs(colour = "Supplement")
## Warning: The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
## ℹ Please use the `fun` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Including Plots

This R code is creating interaction plots to visualize the relationship between dose, supplement type, and tooth length using the built-in interaction.plot() function and ggplot2. Let’s break it down step by step:

tooth\(dose = as.factor(tooth\)dose): This line of code converts the ‘dose’ variable in the ‘tooth’ data frame to a factor. This is done to treat ‘dose’ as a categorical variable instead of a numeric one.

attach(tooth): This line of code temporarily adds the columns of the ‘tooth’ data frame to the search path. This allows you to use the column names directly in your analysis without specifying the data frame.

interaction.plot(dose, supp, len): This line of code creates an interaction plot using the built-in interaction.plot() function. The x-axis represents the ‘dose’ variable, different line colors represent the ‘supp’ variable (supplement type), and the y-axis shows the ‘len’ variable (tooth length).

detach(): This line of code removes the ‘tooth’ data frame from the search path to avoid potential conflicts with other data frames that have columns with the same names.

The following code creates an interaction plot using the ggplot2 package:

scss Copy code ggplot(tooth, aes(x=factor(dose), y = len, colour=supp)) + geom_boxplot() + stat_summary(fun.y = mean, geom = “line”, aes(group = supp)) + theme_classic() + xlab(‘Dose’) + ylab(“Length”) + labs(colour = “Supplement”) The plot is created by specifying the data frame (tooth), the aesthetics (x, y, and color), adding a boxplot layer, adding a line to connect the means of each group, setting a classic theme, and customizing the axis labels and legend.

In summary, the code creates interaction plots to visualize the relationship between dose, supplement type, and tooth length using both the built-in interaction.plot() function and the ggplot2 package. These plots help understand how the combination of dose and supplement type affects tooth length.

tooth.aov = aov(len ~ supp * dose, tooth)
summary(tooth.aov)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## supp         1  205.4   205.4  15.572 0.000231 ***
## dose         2 2426.4  1213.2  92.000  < 2e-16 ***
## supp:dose    2  108.3    54.2   4.107 0.021860 *  
## Residuals   54  712.1    13.2                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tooth.aov.additive = aov(len ~ supp + dose, tooth)
summary(tooth.aov.additive)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## supp         1  205.4   205.4   14.02 0.000429 ***
## dose         2 2426.4  1213.2   82.81  < 2e-16 ***
## Residuals   56  820.4    14.7                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(tooth.aov,tooth.aov.additive)
## Analysis of Variance Table
## 
## Model 1: len ~ supp * dose
## Model 2: len ~ supp + dose
##   Res.Df    RSS Df Sum of Sq     F  Pr(>F)  
## 1     54 712.11                             
## 2     56 820.43 -2   -108.32 4.107 0.02186 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(tidyverse)



patientData <- read.csv('C:/Users/u6977/Desktop/Patient.csv')

transplantData <- read.csv('C:/Users/u6977/Desktop/Transplant.csv')


creatineData <- read.csv('C:/Users/u6977/Desktop/SerumCreatinine.csv')

# For simplicity let's ignore patient's multiple transplants.  What does the !
# do?
transplantData2 <- transplantData[!duplicated(transplantData$id), ]

# Join patientData and transplantData
Data <- full_join(patientData, transplantData2, by = "id")

# Extract some interesting variables
use <- c("gendercode", "latereferralcode", "creatinineatentry", "height", "weight",
    "smokingcode", "cancerever", "chroniclungcode", "coronaryarterycode", "peripheralvascularcode",
    "cerebrovasularcode", "diabetescode", "graftno", "transplantcentrestate", "recipientantibodycmvcode",
    "recipientantibodyebvcode", "donorsourcecode", "donorage", "donorgendercode",
    "ischaemia", "ageattransplant", "hlamismatchesa", "hlamismatchesb", "hlamismatchesdr",
    "hlamismatchesdq", "maxcytotoxicantibodies", "currentcytotoxicantibodies", "timeondialysis",
    "transplantstatus")

Data <- Data[, use]

# Recode some NA
Data[Data == ""] = NA
Data[Data == "-"] = NA
head(Data)
##   gendercode latereferralcode creatinineatentry height weight smokingcode
## 1          M                N               600  112.7   21.8       Never
## 2          M                N              1234  152.0   33.6       Never
## 3          M                N              1025  162.0   53.0       Never
## 4          M                N               646  175.0   85.0       Never
## 5          M                N                NA  154.0   51.5       Never
## 6          M                N               449  163.0     NA      Former
##   cancerever chroniclungcode coronaryarterycode peripheralvascularcode
## 1         No               N                  N                      Y
## 2        Yes               N                  N                      N
## 3         No               N                  N                      Y
## 4         No               N                  N                      Y
## 5        Yes               N                  N                      N
## 6        Yes               N                  N                      N
##   cerebrovasularcode diabetescode graftno transplantcentrestate
## 1                  Y            N       2                    SA
## 2                  N            N       2                   NSW
## 3                  N            N       3                   VIC
## 4                  N            N       2                    NZ
## 5                  N            N       1                   NSW
## 6                  N            N       1                    WA
##   recipientantibodycmvcode recipientantibodyebvcode donorsourcecode donorage
## 1                 Positive                 Positive          Friend       46
## 2                 Positive                 Positive        Deceased       59
## 3                 Positive                 Positive        Deceased       45
## 4                 Positive                 Positive          Sister       58
## 5                 Positive                 Positive        Deceased       55
## 6                 Positive                 Negative        Deceased       22
##   donorgendercode ischaemia ageattransplant hlamismatchesa hlamismatchesb
## 1               F        20              27              2              2
## 2               F        20              60              0              2
## 3               F         7              34              2              0
## 4               F        16              54              0              1
## 5               M         5              48              2              2
## 6               F        11              59              2              2
##   hlamismatchesdr hlamismatchesdq maxcytotoxicantibodies
## 1               0              NA                      0
## 2               0               0                      0
## 3               0              NA                     64
## 4               0               0                      0
## 5               2              NA                     99
## 6               1               1                      0
##   currentcytotoxicantibodies timeondialysis transplantstatus
## 1                          0       18.01506                0
## 2                          0       24.41068                0
## 3                         56       18.06981                1
## 4                          0       24.26283                0
## 5                          3       13.89459                1
## 6                          0        9.59343                1
## You can read in "dates" in R using as.Dates

#### Make Data L
DataPrep <- full_join(patientData, transplantData2, by = "id") 
use <- c("gendercode", "lastfollowupdate", "transplantdate", "graftfailuredate", "transplantdate", "ageattransplant")
DataL <- DataPrep[, use] 
DataL[DataL == ""] = NA 
DataL[DataL == "-"] = NA
DataL <- na.omit(DataL)
## Time till last follow-up
DataL
##     gendercode lastfollowupdate transplantdate graftfailuredate
## 7            M        31-Dec-16      23-Aug-07        24-Feb-15
## 9            M        16-Sep-05      25-Oct-00        26-Oct-00
## 12           M        31-Dec-16      30-Apr-08        31-May-11
## 17           F        31-Dec-16       2-Oct-08        28-Jul-16
## 24           M        28-Nov-15      16-Jul-03         1-Dec-11
## 26           M        31-Dec-16       3-Jun-08         5-May-15
## 36           M        31-Dec-16      30-Apr-07        21-Jul-16
## 42           F        14-May-11      24-Oct-00        13-May-11
## 74           F        31-Dec-16      28-Dec-11        22-Dec-16
## 75           M        31-Dec-16      15-Jan-09        16-Jan-09
## 85           M        31-Dec-16      17-Dec-08        29-Aug-16
## 89           F        31-Dec-16      29-Oct-12        31-May-13
## 104          F        31-Dec-16       7-Jul-11        12-Mar-12
## 113          M        31-Dec-16       4-Aug-10        30-Nov-16
## 115          M        31-Dec-16      27-Nov-08         5-Jun-15
## 132          M        31-Dec-16      18-Jun-09        12-Apr-12
## 135          M        31-Dec-16       2-Feb-10        10-Feb-10
## 154          M        31-Dec-16      16-Aug-14         2-Dec-16
## 160          F        31-Dec-16      18-Jun-09        16-Aug-11
## 168          F         1-Apr-12      24-Dec-08        24-Mar-11
## 174          M        31-Dec-16      18-Jul-07         9-Feb-09
## 178          M        31-Dec-16       5-May-07         3-Feb-11
## 180          M        31-Dec-16      16-Aug-08        30-Jun-14
## 198          F        17-Oct-10      12-Jul-09        13-Jul-09
## 210          M        31-Dec-16       2-Mar-11        26-Oct-15
## 297          F        31-Dec-16      26-Sep-13         8-Sep-16
## 300          F        31-Dec-16       5-Sep-13        19-Aug-16
## 326          M        31-Dec-16       8-Sep-14         1-Mar-15
## 362          F        31-Dec-16       4-Dec-15        16-Apr-16
## 392          F        31-Dec-16      30-Nov-11        19-Jan-15
## 398          M        31-Dec-16      12-Dec-08        16-Jun-14
## 404          M        31-Dec-16      27-Mar-09        12-Nov-15
## 424          M        31-Dec-16      21-Feb-12        19-Apr-13
## 440          M        31-Dec-16       9-Mar-13        20-May-13
## 447          M        31-Dec-16      23-May-15        24-May-15
## 461          F        31-Dec-16      22-Jun-11        25-Sep-13
## 490          M         2-Feb-15      22-Jun-10        23-Aug-12
## 492          M        31-Dec-16      27-Apr-11        26-Aug-14
## 511          F        16-Mar-15      23-May-13         5-Aug-14
## 516          M        31-Dec-16       1-Sep-02        23-Mar-10
## 519          F        21-Mar-13      30-Jul-03         1-Mar-07
## 521          F        31-Dec-16      12-Jun-01        27-May-09
## 524          M        24-Jun-09      20-Apr-06        22-Jun-07
## 528          F        31-Dec-16      23-Jan-06        17-Jun-16
## 558          F        31-Dec-16       2-Mar-08         4-Aug-14
## 562          M        31-Dec-16       9-Jan-08        15-Jun-10
## 565          F        31-Dec-16      21-May-08         5-Jul-16
## 566          M        31-Dec-16       8-Jun-06         1-Oct-11
## 587          M        31-Dec-16       6-Oct-10         3-Jul-13
## 593          F        17-Apr-16      18-Sep-15        17-Apr-16
## 599          F        31-Dec-16      14-Oct-10        22-Dec-14
## 611          F        31-Dec-16       5-Jul-10         5-Jul-10
## 618          M        31-Dec-16      17-May-12        31-May-12
## 627          F        18-Nov-15      23-Oct-01        30-Aug-10
## 628          M        31-Dec-16       2-Oct-02        10-Sep-12
## 634          F        15-Jan-05       5-Jun-04        12-Oct-04
## 635          M        13-May-13      12-Oct-06        11-May-13
## 638          M        31-Dec-16       2-May-01         7-May-08
## 644          M        31-Dec-16      29-Jan-01        14-Mar-05
## 651          M        31-Dec-16      12-Apr-06        24-Oct-13
## 657          F        31-Dec-16      30-Jun-05        15-Mar-16
## 660          M        31-Dec-16       8-Dec-08        21-Apr-15
## 668          M        31-Dec-16       9-Feb-05         9-Jun-14
## 673          M        31-Dec-16      12-Jun-06         5-Mar-15
## 674          M         9-Jul-06      14-Jun-06        14-Jun-06
## 676          M        31-Dec-16       2-May-07        11-Aug-16
## 686          F        30-Mar-10      20-Jun-08         5-Sep-08
## 688          M        31-Dec-16      13-Aug-09        18-May-15
## 694          F        31-Dec-16      27-Oct-03         3-Sep-04
## 696          F        31-Dec-16      29-Sep-06         7-May-08
## 698          M        31-Dec-16      21-Jul-04        27-Nov-14
## 713          M        31-Dec-16      21-Jul-04         6-Aug-08
## 725          F        31-Dec-16       2-Dec-02         5-Mar-12
## 728          F        31-Dec-16      14-Apr-03         1-Jul-14
## 729          F        31-Dec-16       8-Apr-11        25-Oct-16
## 730          F        31-Dec-16      19-Feb-10         1-Jun-10
## 737          F        11-Nov-15       9-Sep-03        19-Sep-14
## 738          M        31-Dec-16      29-Aug-08        27-Oct-08
## 740          M        31-Dec-16       4-Sep-13        19-Jun-15
## 746          M        31-Dec-16      31-Dec-04         7-Mar-14
## 747          M        31-Dec-16      16-Apr-03        17-Apr-03
## 752          F         8-Oct-13      15-Feb-05        20-Nov-06
## 757          M         9-May-06      25-Jul-02         1-Feb-05
## 761          M        31-Dec-16      22-Aug-03        12-Nov-08
## 762          F        18-Mar-05      19-Nov-04        20-Nov-04
## 766          F        31-Dec-16      23-Dec-07        24-Dec-07
## 768          M        28-Feb-14      30-Dec-04        30-Dec-09
## 772          M        31-Dec-16      31-May-04         4-Jul-04
## 774          M        31-Dec-16      26-May-04        19-Jun-08
## 783          F        31-Dec-16       5-Mar-05        24-Apr-08
## 795          M        29-Dec-15       5-Jan-06        29-Dec-15
## 796          M         1-Apr-04      14-Mar-04        14-Mar-04
## 797          M         4-Feb-08      14-Apr-04        29-Nov-07
## 798          M        31-Dec-16      10-Aug-03        12-Nov-07
## 800          M        13-Apr-16      12-Aug-03         8-Jun-04
## 807          M         6-Dec-10       9-Sep-02         5-Dec-10
## 809          M        27-Feb-10      20-May-01        20-Apr-09
## 811          M         8-Dec-16      21-Jun-01        26-Jun-14
## 812          M        31-Dec-16      30-Oct-12        31-Aug-16
## 816          F         1-Oct-06      29-Aug-05        30-Aug-05
## 818          M        31-Dec-16      10-Jun-05        10-Jun-05
## 819          M        31-Dec-16       6-May-04         3-Oct-12
## 820          M        31-Dec-16       8-Jul-02        31-Aug-08
## 821          M         4-Nov-14      15-Nov-02         9-Jul-12
## 822          M        31-Dec-16      17-Jul-06        18-Jul-06
## 826          M        22-Sep-16      20-Jun-03        21-Jun-06
## 830          F         3-Apr-14       5-Dec-02        25-Nov-11
## 831          M        31-Dec-16      24-Apr-03        18-Mar-16
## 839          F        31-Dec-16      19-Mar-00        10-Sep-12
## 840          M        31-Dec-16      18-Oct-05        17-Oct-16
## 841          M        31-Dec-16      23-Jul-08        24-Dec-09
## 843          M        31-Dec-16      31-Mar-03        13-Mar-05
## 852          M        24-Jul-15      12-Oct-01        24-Jul-15
## 858          M        31-Dec-16      22-May-01        21-Feb-05
## 859          M        28-Jan-16      23-Jun-04        28-Aug-14
## 863          M        31-Dec-16      31-Jan-05        20-Apr-12
## 868          M        31-Dec-16      18-Jan-01        10-Jan-14
## 869          M        12-Jul-09      17-Nov-06         4-Nov-08
## 872          M        17-Apr-16      15-Aug-01        18-Oct-06
## 873          F        10-Jul-09      22-Feb-07         3-Jul-09
## 874          M        31-Dec-16       5-Jul-00         4-Dec-07
## 882          F        31-Dec-16      16-Feb-04        24-Mar-14
## 885          M        31-Dec-16      17-Aug-06        21-Feb-13
## 897          F        23-Mar-12      13-Jun-05        25-Sep-05
## 898          M         3-Apr-07      22-Mar-07         1-Apr-07
## 903          M        31-Dec-16      25-Jul-04        21-Aug-04
## 908          M        31-Dec-16      30-Oct-00        30-Oct-00
## 912          F        31-Dec-16      17-Jul-06        18-Jul-06
## 915          F         9-Nov-15       9-Nov-03         4-Jul-05
## 916          M        31-Dec-16      17-Apr-03        11-Nov-06
## 919          M        31-Dec-16       3-Nov-01         6-Jan-09
## 923          F        31-Dec-16      22-Oct-06         3-Feb-16
## 926          M        31-Dec-16      14-Jul-11        30-Mar-12
## 928          M        28-Sep-03      23-Sep-03        24-Sep-03
## 931          F        31-Dec-16      16-Feb-02         4-Mar-02
## 938          F        31-Dec-16      17-Nov-02         1-Dec-02
## 941          F        31-Dec-16       1-Jul-05        14-Aug-11
## 942          F         2-Dec-01       7-Aug-00         8-Aug-00
## 943          M        31-Dec-16      12-Dec-02        11-Jan-07
## 948          F        31-Dec-16       6-Jun-10        26-Jun-12
## 951          M        11-Oct-10      18-Oct-03         1-Jun-10
## 952          M        31-Dec-16       2-Dec-02         8-Jul-11
## 954          M        31-Dec-16      11-Mar-01         3-Sep-03
## 957          F        31-Dec-16       7-Nov-00        25-May-04
## 960          M         8-Dec-13      12-Dec-01        29-Jan-07
## 961          F        31-Dec-16      12-Jun-08         9-May-11
## 962          F        31-Dec-16      13-Jul-00         1-Jul-08
## 965          F        31-Dec-16       6-Dec-06        27-May-16
## 966          M        31-Dec-16      14-Oct-03        19-Aug-15
## 968          F        11-Jan-16       9-Aug-03        29-Sep-15
## 969          M        31-Dec-16      13-Oct-00        13-Oct-00
## 988          M        31-Dec-16       9-Mar-01        19-Oct-09
## 991          M        25-Jul-00       3-Jan-00        12-Jan-00
##     transplantdate.1 ageattransplant
## 7          23-Aug-07              24
## 9          25-Oct-00              48
## 12         30-Apr-08              51
## 17          2-Oct-08              68
## 24         16-Jul-03              53
## 26          3-Jun-08              35
## 36         30-Apr-07              65
## 42         24-Oct-00              57
## 74         28-Dec-11              33
## 75         15-Jan-09              41
## 85         17-Dec-08              15
## 89         29-Oct-12              16
## 104         7-Jul-11              60
## 113         4-Aug-10              62
## 115        27-Nov-08              14
## 132        18-Jun-09              45
## 135         2-Feb-10              37
## 154        16-Aug-14              51
## 160        18-Jun-09              26
## 168        24-Dec-08              40
## 174        18-Jul-07              22
## 178         5-May-07              52
## 180        16-Aug-08              43
## 198        12-Jul-09              52
## 210         2-Mar-11              55
## 297        26-Sep-13              52
## 300         5-Sep-13              17
## 326         8-Sep-14              67
## 362         4-Dec-15              56
## 392        30-Nov-11              16
## 398        12-Dec-08              13
## 404        27-Mar-09              58
## 424        21-Feb-12              43
## 440         9-Mar-13              60
## 447        23-May-15              55
## 461        22-Jun-11              29
## 490        22-Jun-10              24
## 492        27-Apr-11              20
## 511        23-May-13              43
## 516         1-Sep-02              38
## 519        30-Jul-03              38
## 521        12-Jun-01              51
## 524        20-Apr-06              29
## 528        23-Jan-06              54
## 558         2-Mar-08              56
## 562         9-Jan-08              57
## 565        21-May-08              34
## 566         8-Jun-06               9
## 587         6-Oct-10              36
## 593        18-Sep-15              62
## 599        14-Oct-10              20
## 611         5-Jul-10              52
## 618        17-May-12              43
## 627        23-Oct-01              62
## 628         2-Oct-02               9
## 634         5-Jun-04              57
## 635        12-Oct-06              48
## 638         2-May-01               9
## 644        29-Jan-01              14
## 651        12-Apr-06              36
## 657        30-Jun-05              48
## 660         8-Dec-08              55
## 668         9-Feb-05              27
## 673        12-Jun-06              29
## 674        14-Jun-06              63
## 676         2-May-07              54
## 686        20-Jun-08              66
## 688        13-Aug-09              48
## 694        27-Oct-03              16
## 696        29-Sep-06              12
## 698        21-Jul-04              27
## 713        21-Jul-04              27
## 725         2-Dec-02              25
## 728        14-Apr-03              43
## 729         8-Apr-11              66
## 730        19-Feb-10              52
## 737         9-Sep-03              39
## 738        29-Aug-08              53
## 740         4-Sep-13              37
## 746        31-Dec-04              61
## 747        16-Apr-03              35
## 752        15-Feb-05              40
## 757        25-Jul-02              31
## 761        22-Aug-03              54
## 762        19-Nov-04              59
## 766        23-Dec-07              54
## 768        30-Dec-04              50
## 772        31-May-04              48
## 774        26-May-04              43
## 783         5-Mar-05              16
## 795         5-Jan-06              59
## 796        14-Mar-04              58
## 797        14-Apr-04              25
## 798        10-Aug-03              39
## 800        12-Aug-03              58
## 807         9-Sep-02              67
## 809        20-May-01              35
## 811        21-Jun-01              33
## 812        30-Oct-12              35
## 816        29-Aug-05              64
## 818        10-Jun-05              46
## 819         6-May-04              52
## 820         8-Jul-02              28
## 821        15-Nov-02              49
## 822        17-Jul-06              58
## 826        20-Jun-03              43
## 830         5-Dec-02              52
## 831        24-Apr-03               2
## 839        19-Mar-00              29
## 840        18-Oct-05              38
## 841        23-Jul-08              21
## 843        31-Mar-03              28
## 852        12-Oct-01              65
## 858        22-May-01               1
## 859        23-Jun-04              58
## 863        31-Jan-05              22
## 868        18-Jan-01              10
## 869        17-Nov-06              20
## 872        15-Aug-01              58
## 873        22-Feb-07              48
## 874         5-Jul-00              36
## 882        16-Feb-04              29
## 885        17-Aug-06               9
## 897        13-Jun-05              21
## 898        22-Mar-07              66
## 903        25-Jul-04              42
## 908        30-Oct-00              48
## 912        17-Jul-06              51
## 915         9-Nov-03              47
## 916        17-Apr-03              30
## 919         3-Nov-01              54
## 923        22-Oct-06              36
## 926        14-Jul-11              29
## 928        23-Sep-03              48
## 931        16-Feb-02              31
## 938        17-Nov-02              24
## 941         1-Jul-05              32
## 942         7-Aug-00              33
## 943        12-Dec-02              35
## 948         6-Jun-10              24
## 951        18-Oct-03              45
## 952         2-Dec-02              41
## 954        11-Mar-01              41
## 957         7-Nov-00              45
## 960        12-Dec-01              51
## 961        12-Jun-08              30
## 962        13-Jul-00              42
## 965         6-Dec-06              32
## 966        14-Oct-03              31
## 968         9-Aug-03              50
## 969        13-Oct-00              23
## 988         9-Mar-01              25
## 991         3-Jan-00              45
#timeFU = as.Date(DataL$lastfollowupdate, format = "%d%b%Y") - #as.Date(DataL$transplantdate, format = "%d%b%Y")

##the date format  in the as.Date() function is incorrect. In the data, #the date format is "%d-%b-%y", which means the day is represented by two #digits, the month is represented by a three-letter abbreviation, and the year #is represented by two digits.
# Convert character variables to date variables with the correct format
timeFU <- as.Date(DataL$lastfollowupdate, format = "%d-%b-%y") - as.Date(DataL$transplantdate, format = "%d-%b-%y")
timeFailure <- as.Date(DataL$graftfailuredate, format = "%d-%b-%y") - as.Date(DataL$transplantdate, format = "%d-%b-%y")

failures = !is.na(timeFailure)

## Simple survival model

library(survival)
fit = coxph(Surv(timeFailure,failures)~DataL$ageattransplant)
fit
## Call:
## coxph(formula = Surv(timeFailure, failures) ~ DataL$ageattransplant)
## 
##                           coef exp(coef) se(coef)     z     p
## DataL$ageattransplant 0.003975  1.003983 0.004880 0.814 0.415
## 
## Likelihood ratio test=0.67  on 1 df, p=0.4139
## n= 153, number of events= 153
plot(survfit(Surv(timeFailure/365,failures)~DataL$gendercode),col = 1:2,lty = 1:2,xlab = 'years',ylab = 'Probability of success')
legend('topright',c('Women','Men'),col = 1:2,lty = 1:2)

fit = coxph(Surv(timeFailure,failures)~DataL$gendercode)
fit
## Call:
## coxph(formula = Surv(timeFailure, failures) ~ DataL$gendercode)
## 
##                       coef exp(coef) se(coef)     z    p
## DataL$gendercodeM -0.07485   0.92788  0.17028 -0.44 0.66
## 
## Likelihood ratio test=0.19  on 1 df, p=0.6612
## n= 153, number of events= 153