7CS039 – Coursework Assessment

Statictis in AI

Author
Affiliation

Njinju Zilefac Fogap

MSc Data Science

Published

December 18, 2025

1 Exploratory Data Analysis of Survival from Malignant Melanoma

2 PROJECT OBJECTIVES

This project is a detailed systematic Exploratory Data Analysis (EDA) on the Melanoma Survival Data set. The data set is an open source data set and includes historical records of patients who underwent surgical treatement for malignant melanoma at the department of Plastics Surgery, University Hospital of Odense, Denmark by Andersen et al. (1993).

The main objective of this EDA is to investigate and understand the Destributions of the features of the data set , their interrelationships and provide detail interpretations and recommendation from the analysis i will carry out. Most important variables taking into cosideration for this projects are survival time, turmore thickness, patient age and sex.

The entire analysis was done using R and Rstudio IDE, incorporating relevant descriptive statistcs , visualizations, regression techniques , staticstial hypothesis testings ,and distribution assumptions as stated by the assessment guidelines.

3 Description and Preparation

3.1 Packages

Code
library(skimr) 
library(ggplot2)
library(gridExtra)
library(dplyr)
library(patchwork)

3.2 Data Import and Inspection

Code
melanoma <- read.csv("melanoma.csv")
str(melanoma)
'data.frame':   205 obs. of  8 variables:
 $ X        : int  1 2 3 4 5 6 7 8 9 10 ...
 $ time     : int  10 30 35 99 185 204 210 232 232 279 ...
 $ status   : int  3 3 2 3 1 1 1 3 1 1 ...
 $ sex      : int  1 1 1 0 1 1 1 0 1 0 ...
 $ age      : int  76 56 41 71 52 28 77 60 49 68 ...
 $ year     : int  1972 1968 1977 1968 1965 1971 1972 1974 1968 1971 ...
 $ thickness: num  6.76 0.65 1.34 2.9 12.08 ...
 $ ulcer    : int  1 0 0 0 1 1 1 1 1 1 ...
Code
summary(melanoma)
       X            time          status          sex              age       
 Min.   :  1   Min.   :  10   Min.   :1.00   Min.   :0.0000   Min.   : 4.00  
 1st Qu.: 52   1st Qu.:1525   1st Qu.:1.00   1st Qu.:0.0000   1st Qu.:42.00  
 Median :103   Median :2005   Median :2.00   Median :0.0000   Median :54.00  
 Mean   :103   Mean   :2153   Mean   :1.79   Mean   :0.3854   Mean   :52.46  
 3rd Qu.:154   3rd Qu.:3042   3rd Qu.:2.00   3rd Qu.:1.0000   3rd Qu.:65.00  
 Max.   :205   Max.   :5565   Max.   :3.00   Max.   :1.0000   Max.   :95.00  
      year        thickness         ulcer      
 Min.   :1962   Min.   : 0.10   Min.   :0.000  
 1st Qu.:1968   1st Qu.: 0.97   1st Qu.:0.000  
 Median :1970   Median : 1.94   Median :0.000  
 Mean   :1970   Mean   : 2.92   Mean   :0.439  
 3rd Qu.:1972   3rd Qu.: 3.56   3rd Qu.:1.000  
 Max.   :1977   Max.   :17.42   Max.   :1.000  

3.3 Meta data of data set

  • time: Survival time in days since operation

  • status: Patient status at end of study (1 = died from melanoma, 2 = alive, 3 = died from other causes)

  • sex: Gender (1 = male, 0 = female)

  • age: Age in years at operation

  • year: Year of operation

  • thickness: Tumour thickness in millimetres

  • ulcer: Ulceration indicator (1 = present, 0 = absent)

The Melanoma Survival Data set consists of 205 obserations and eights features. The variables  X, time, status, sex, age, year, and ulcer are of data types integers, while thickness is a numerica feature. Also i noticed that the variables status, sex, and ulcer are of type categorical instead numerical measurements according to the meta data above which do not represents appropriate data types and should be converted to categorical (factor) variables to ensure correct interpretation and analysis (Davison and Hinkley, 1997)

4 Summary Statistics

Code
melanoma <- melanoma %>%
  mutate(
    status = recode(as.character(status),
                    "1" = "Died from melanoma",
                    "2" = "Alive",
                    "3" = "Died from other causes"),
    sex = recode(as.character(sex),
                 "0" = "Female",
                 "1" = "Male"),
    ulcer = recode(as.character(ulcer),
                   "0" = "Absent",
                   "1" = "Present")
  )

melanoma$status <- factor(melanoma$status)
melanoma$sex <- factor(melanoma$sex)
melanoma$ulcer <- factor(melanoma$ulcer)

head(melanoma)
Code
summary(melanoma[, c("time", "age", "thickness", "year")])
      time           age          thickness          year     
 Min.   :  10   Min.   : 4.00   Min.   : 0.10   Min.   :1962  
 1st Qu.:1525   1st Qu.:42.00   1st Qu.: 0.97   1st Qu.:1968  
 Median :2005   Median :54.00   Median : 1.94   Median :1970  
 Mean   :2153   Mean   :52.46   Mean   : 2.92   Mean   :1970  
 3rd Qu.:3042   3rd Qu.:65.00   3rd Qu.: 3.56   3rd Qu.:1972  
 Max.   :5565   Max.   :95.00   Max.   :17.42   Max.   :1977  
Code
# Frequency tables
table(melanoma$sex)

Female   Male 
   126     79 
Code
table(melanoma$status)

                 Alive     Died from melanoma Died from other causes 
                   134                     57                     14 
Code
table(melanoma$ulcer)

 Absent Present 
    115      90 
Code
skim(melanoma)
Data summary
Name melanoma
Number of rows 205
Number of columns 8
_______________________
Column type frequency:
factor 3
numeric 5
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
status 0 1 FALSE 3 Ali: 134, Die: 57, Die: 14
sex 0 1 FALSE 2 Fem: 126, Mal: 79
ulcer 0 1 FALSE 2 Abs: 115, Pre: 90

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
X 0 1 103.00 59.32 1.0 52.00 103.00 154.00 205.00 ▇▇▇▇▇
time 0 1 2152.80 1122.06 10.0 1525.00 2005.00 3042.00 5565.00 ▃▇▅▂▁
age 0 1 52.46 16.67 4.0 42.00 54.00 65.00 95.00 ▁▃▇▆▁
year 0 1 1969.91 2.58 1962.0 1968.00 1970.00 1972.00 1977.00 ▂▆▇▇▁
thickness 0 1 2.92 2.96 0.1 0.97 1.94 3.56 17.42 ▇▂▁▁▁

Based on the summary statistics, several important features of the dataset can be observed. Tumor thickness ranges from 0.10 mm to 17.42 mm, with a mean value of 2.92 mm and a standard deviation of 2.96 mm. This shows that most patients had relatively thin tumors compared to the maximum value observed. The moderate spread indicates that thickness values are not too far from the mean, and there are no immediate signs of extreme outliers. However, this will be analyzed further using box plots. Patient age covers a wide range, from 4 to 97 years, with a mean age of about 52 years. This suggests that malignant melanoma in this dataset mainly affects adults. The standard deviation of 16.67 years shows significant variability in age and suggests there may be some outlier values, which will also be examined visually.

In terms of gender distribution, there are more female patients (126) than male patients (79), indicating a slight imbalance in the sample. Looking at patient status, most individuals were still alive at the end of the study (134 patients), while 71 had died. This included 57 deaths due to malignant melanoma and 14 from other causes. This may indicate that detection happened relatively early or that treatment was effective within this group. Ulceration was absent in most cases, with 115 patients showing no ulceration compared to 90 with ulceration.

Survival time shows a lot of variability, ranging from 10 days to 5,565 days (about 16 years). This has a large standard deviation of 2,152 days, suggesting a highly uneven distribution that needs more investigation using histograms and box plots. Lastly, the year of operation spans from 1962 to 1977, confirming that the data were collected over a 15-year period.

5 Graphical Summaries (ggplot2)

Graphical techniques were used to visualise the distributions and detect potential patterns , anomalies and relationships between the features in the Melanoma Survival dataset.

Code
library(ggplot2)


# Survival time distribution
ggplot(melanoma, aes(x = time)) +
geom_histogram(bins = 30) +
labs(title = "Distribution of Survival Time", x = "Days", y = "Frequency")

Code
# Tumour thickness distribution
ggplot(melanoma, aes(x = thickness)) +
geom_histogram(bins = 30) +
labs(title = "Distribution of Tumour Thickness", x = "Thickness (mm)", y = "Frequency")

Code
# Age distribution
ggplot(melanoma, aes(x = age)) +
geom_histogram(bins = 30) +
labs(title = "Distribution of Age", x = "Age (Years)", y = "Frequency")

Code
# Gender distribution
ggplot(melanoma, aes(x = factor(sex, labels = c("Female", "Male")))) +
geom_bar() +
labs(title = "Gender Distribution", x = "Gender", y = "Count")

From the above graphical distribution, we notice the following :

  1. Gender Distribution : We notices from the graph above that there were more Females patients than Male in the sample of our data set hence issues gender bias in the distribution of our data set since.

  2. Age : From the above graph, the age variable appears to be only weakly normally distributed, exhibiting a slight left skew, this is mostly likely due to the high STD that was observed in the discriptive analysis above, which indicates substantial dispersion around the mean. The wide age range,which starts from very young patients (as young as 4 years) to elderly individuals (up to approximately 95 years), suggests the presence of outliers that contribute to this dispersion. These outlying observations reduce the symmetry of the distribution.

  3. Distribution of tumour thickness : From the graph above, we can see the tumour thickness variable is heavily right-skewed, with the majority of observations concentrated near lower values around 0 mm. There are also outliers or extreme values , specifically exceeding 10 mm, extend the upper tail of the distribution. This rightly skewed data suggests that most patients in the dataset presented with relatively thin tumours, while very thick tumours were comparatively rare. This distrivution is common for clinical expectations and highlights the presense of few severe cases that substantially influence the overall spread of the data.

  4. Distribution of Survival time : For the survival time graph, we can see that survival time variable exhibits substantial spikes, which can be attributed to its very wide range of values, as highlighted in the summary statistics. This behaviours occurs because of the nature of the survival data , where small number of patients experience exceptionally long survival times, resulting in pronounced variability and a highly skewed distribution (Kleinbaum and Klein, 2012; Venables and Ripley, 2002).

6 gression and Correlation Analysis

Linear regression models and correlation coefficients were used to assess relationships between:

  • time and thickness

  • time and age

  • thickness and age

Code
cor(melanoma$time, melanoma$thickness)
[1] -0.2354087
Code
cor(melanoma$time, melanoma$age)
[1] -0.3015179
Code
cor(melanoma$thickness, melanoma$age)
[1] 0.2124798
Code
# Linear models
model1 <- lm(time ~ thickness, data = melanoma)
model2 <- lm(time ~ age, data = melanoma)
model3 <- lm(thickness ~ age, data = melanoma)


summary(model1)

Call:
lm(formula = time ~ thickness, data = melanoma)

Residuals:
    Min      1Q  Median      3Q     Max 
-2325.4  -707.6  -210.6   744.9  3410.4 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2413.41     107.39  22.473  < 2e-16 ***
thickness     -89.25      25.86  -3.451 0.000679 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1093 on 203 degrees of freedom
Multiple R-squared:  0.05542,   Adjusted R-squared:  0.05076 
F-statistic: 11.91 on 1 and 203 DF,  p-value: 0.0006793
Code
summary(model2)

Call:
lm(formula = time ~ age, data = melanoma)

Residuals:
    Min      1Q  Median      3Q     Max 
-2464.3  -646.2   -54.4   712.1  3179.6 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 3217.448    247.879  12.980  < 2e-16 ***
age          -20.293      4.504  -4.506 1.12e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1072 on 203 degrees of freedom
Multiple R-squared:  0.09091,   Adjusted R-squared:  0.08643 
F-statistic:  20.3 on 1 and 203 DF,  p-value: 1.116e-05
Code
summary(model3)

Call:
lm(formula = thickness ~ age, data = melanoma)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.6853 -1.7727 -0.9155  0.9558 14.0273 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  0.94105    0.67004   1.404  0.16170   
age          0.03772    0.01217   3.098  0.00222 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.899 on 203 degrees of freedom
Multiple R-squared:  0.04515,   Adjusted R-squared:  0.04044 
F-statistic: 9.598 on 1 and 203 DF,  p-value: 0.002223
Code
library(patchwork)

base_theme <- theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    axis.title = element_text(face = "bold"),
    panel.grid.minor = element_blank()
  )

# Time vs Thickness
plot1 <- ggplot(melanoma, aes(x = thickness, y = time)) +
  geom_point(alpha = 0.35, size = 2) +
  geom_smooth(method = "lm", se = TRUE, linewidth = 1) +
  labs(
    title = "Survival Time vs Tumour Thickness",
    x = "Tumour Thickness (mm)",
    y = "Survival Time (days)"
  ) +
  base_theme

# Time vs Age
plot2 <- ggplot(melanoma, aes(x = age, y = time)) +
  geom_point(alpha = 0.35, size = 2) +
  geom_smooth(method = "lm", se = TRUE, linewidth = 1) +
  labs(
    title = "Survival Time vs Age",
    x = "Age (years)",
    y = "Survival Time (days)"
  ) +
  base_theme

# Thickness vs Age
plot3 <- ggplot(melanoma, aes(x = age, y = thickness)) +
  geom_point(alpha = 0.35, size = 2) +
  geom_smooth(method = "lm", se = TRUE, linewidth = 1) +
  labs(
    title = "Tumour Thickness vs Age",
    x = "Age (years)",
    y = "Tumour Thickness (mm)"
  ) +
  base_theme

# Combine plots vertically
combined_plot <- plot1 / plot2 / plot3

options(repr.plot.width = 10, repr.plot.height = 50)

print(combined_plot)

correlation is a staticstical measure that quatifies the strength and direction of the relationship between two variables and it ranges typical from -1 to +1 ( Mukaka, 2012) while regression is a statisctical modelling technique used to examine the relationship between a dependent (outcome) abd independent variables (Montgomery, Peck).

So from the above statiscical analysis we can see that age, tumour thickness, and survival time are only weakly associated.

Survival time has a weak negative relationship with tumour thickness (r = −0.235), with the regression model time = 2413.41 − 89.25 × thickness. This shows that an increase in tumour thickness is linked to a decrease in predicted survival time, although statistically significant (p < 0.001), the model explains little variation (R² = 0.055).

Similarly, there is a weak negative relationship is observed between survival time and age (r = −0.302), with time = 3217.45 − 20.29 × age, hence decreasing the survial time will increase there’s a slight increase in age (R² = 0.091, p < 0.001).

In contrast, tumour thickness shows a weak positive association with age (r = 0.212), described by thickness = 0.94 + 0.038 × age (R² = 0.045, p = 0.002). Overall, while statistically significant, these relationships are weak and indicate limited explanatory power.

7 Two-Sample Significance Tests by Gender

Two-sample t-test is a statistical hypothesis that is used to determine if there is a statistical significant difference between the means of two independent groups(Ruxton, 2006).

Two-sample t-tests were conducted to compare males and females for the variables of interest.

  1. Gender of patient and their ages
Code
qplot(x = sex, y = age,
      geom = "boxplot", data = melanoma,
      xlab = "gender of patient",
      ylab = "age of patient",
      fill = I("blue"))

Code
melanoma %>%
  group_by(sex) %>%
  summarize(num.obs = n(),
            mean_age = round(mean(age), 0),
            sd_age = round(sd(age), 0),
            se_age = round(sd(age) / sqrt(num.obs), 0))

H0 (Null hypothesis) : THE MEAN AGE OF GENDER ARE SAME

H1 (Alternate hypothesis) : THE MEAN AGE OF GENDER ARE DIFFERENT

Code
age_t_test <- t.test(age ~ sex, data = melanoma)

age_t_test 

    Welch Two Sample t-test

data:  age by sex
t = -0.95559, df = 154.42, p-value = 0.3408
alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
95 percent confidence interval:
 -7.162764  2.492280
sample estimates:
mean in group Female   mean in group Male 
            51.56349             53.89873 

A two-sample t-test was conducted to compare the mean age between female and male participants. The null hypothesis (H0) stated that the mean ages of the two genders are equal, while the alternative hypothesis (H1​) posited that the mean ages differ. The results indicated no statistically significant difference in mean age between females (M = 51.56) and males (M = 53.90), t(154.42)=−0.956 , p=0.341. The 95% confidence interval for the difference in means ranged from -7.16 to 2.49.

Since the p-value exceeds the significance level of 0.05, we fail to reject the null hypothesis, suggesting that there is insufficient evidence to conclude a significant difference in mean age between genders.

  1. Gender of patient and Tumour thickness in mm
Code
p <- ggplot(melanoma, aes(x = sex, y = thickness)) +
  geom_boxplot(fill = "blue")

print(p)

Code
melanoma %>%
  group_by(sex) %>%
  summarize(num.obs = n(),
            mean_thickness = round(mean(thickness), 0),
            sd_thickness = round(sd(thickness), 0),
            se_thickness = round(sd(thickness) / sqrt(num.obs), 0))

H0 : THE MEAN TUMOUR THICKNESS OF GENDER ARE SAME

H1 : THE MEAN TUMOUR THICKNESS OF GENDER ARE DIFFERENT

Code
thickness_t_test <- t.test(thickness ~ sex, data = melanoma)

thickness_t_test

    Welch Two Sample t-test

data:  thickness by sex
t = -2.6059, df = 149.09, p-value = 0.01009
alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
95 percent confidence interval:
 -1.9775560 -0.2718653
sample estimates:
mean in group Female   mean in group Male 
            2.486429             3.611139 

The results showed a statistically significant difference in mean tumor thickness, with females (M = 2.49) having lower mean thickness than males (M = 3.61), t(149.09)=−2.606, p=0.010. The 95% confidence interval for the difference in means ranged from -1.978 to -0.272. Standard errors of the mean were calculated as SEthickness ≈ 0 for both groups.

Since the p-value is below the significance level of 0.05, we reject the null hypothesis, indicating that there is sufficient evidence to conclude a significant difference in tumor thickness between genders.

  1. Gender of patient and their survival time in days

    Code
    qplot(x = sex, y = time,
          geom = "boxplot", data = melanoma,
          xlab = "gender of patient",
          ylab = " survival time in day",
          fill = I("blue"))

Code
melanoma %>%
  group_by(sex) %>%
  summarize(num.obs = n(),
            mean_time = round(mean(time), 0),
            sd_time = round(sd(time), 0),
            se_time = round(sd(time) / sqrt(num.obs), 0))

H0 : THE MEAN SURVIVAL TIME OF GENDER ARE SAME

H1 : THE MEAN SURVIVAL TIME OF GENDER ARE DIFFERENT

Code
time_t_test <- t.test(time ~ sex, data = melanoma)

time_t_test

    Welch Two Sample t-test

data:  time by sex
t = 2.0848, df = 159.27, p-value = 0.03868
alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
95 percent confidence interval:
  17.74767 656.12032
sample estimates:
mean in group Female   mean in group Male 
            2282.643             1945.709 

The analysis indicated a statistically significant difference in mean survival time, with females (M = 2282.64) showing higher mean survival time than males (M = 1945.71), t(159.27)=2.085 , p=0.039. The 95% confidence interval for the difference in means ranged from 17.75 to 656.12.

Since the p-value is below the significance level of 0.05, we reject the null hypothesis, suggesting that there is sufficient evidence to conclude a significant difference in mean survival time between genders.

8 QQ-PLOTS :

We use QQ-plots to check for normality , that is seeing if data sets are normally distributed.

TIME and GENDER

Code
p_time <- ggplot(data = melanoma, aes(sample = time))
p_time + stat_qq() + stat_qq_line() + facet_grid(. ~ sex)

THICKNESS and GENDER

Code
p_thickness <- ggplot(data = melanoma, aes(sample = thickness))
p_thickness + stat_qq() + stat_qq_line() + facet_grid(. ~ sex)

SEX and GENDER

Code
p_age <- ggplot(data = melanoma, aes(sample = age))
#p_age + stat_qq() + stat_qq_line()
p_age + stat_qq() + stat_qq_line() + facet_grid(. ~ sex)

TIME ~ GENDER

For survival time by gender, the Q–Q plots show differing patterns between the two groups. Residuals for male patients align more closely with the theoretical normal line, indicating a better approximation to normality. Conversely, the female group exhibits pronounced deviations at higher quantiles, suggesting departures from normality in the upper tail. Although both groups demonstrate some non-normal behaviour at extreme values, the normality assumption appears more plausible for males than for females.

THICKNESS ~ GENDER

In contrast, the Q–Q plots for tumour thickness by gender reveal clear departures from normality in both male and female groups. Substantial deviations from the theoretical quantiles, particularly in the tails, indicate non-normal behaviour of the residuals. This observation is consistent with the summary statistics and box plots, which suggest the presence of outliers and potentially heterogeneous or skewed underlying distributions for tumour thickness.

SEX ~ GENDER

The Q–Q plot for sex stratified by gender indicates that the residuals for both male and female groups approximate normality reasonably well. However, mild departures from the theoretical reference line are evident at the distribution tails. These deviations suggest the possible presence of outliers or slight skewness, but overall the assumption of normality is largely acceptable for this variable.

9 Discussion and Recommendations

The above exploratory data analysis confirms tumour thickness as a key prognostic factor in malignant melanoma, exhibiting substantial variability and a strong association with survival-related outcomes. The very strong right-skewness and presence of extreme values indicate that a small number of advanced cases may disproportionately influence survival pattern(Klienbaun and klien, 2021). Gender-based comparisons reveal differences in distributional characteristics; however, departures from normality particularly for tumour thickness and survival time limit the reliability of purely parametric analyses(Field , 2018).

Diagnostic assessments further indicate that normality assumptions are not consistently satisfied, as evidenced by Q–Q plots showing skewness and outliers in key variables (Ghasemi and Zahediasl, 2012). These violations, together with sample size imbalance between gender groups, suggest that results from standard parametric tests should be interpreted with caution.

Future analyses should therefore adopt survival-specific methods such as Kaplan–Meier estimation and Cox proportional hazards models, alongside multivariate approaches incorporating clinically relevant covariates. In addition, non-parametric or robust statistical techniques are recommended to account for skewed distributions and improve the validity of inference(Conover, 1999).

10 References

  1. Andersen, P.K., Borgan, O., Gill, R.D. and Keiding, N. (1993) Statistical models based on counting processes. New York: Springer-Verlag.
  2. Davison, A.C. and Hinkley, D.V. (1997) Bootstrap methods and their application. Cambridge: Cambridge University Press.
  3. Kleinbaum, D.G. and Klein, M. (2012) Survival analysis: A self-learning text. 3rd edn. New York: Springer.
  4. Mukaka, M.M. (2012) ‘Statistics corner: A guide to appropriate use of correlation coefficient in medical research’, Malawi Medical Journal, 24(3), pp. 69–71.
  5. Montgomery, D.C., Peck, E.A. and Vining, G.G. (2021) Introduction to linear regression analysis. 6th edn. Hoboken, NJ: Wiley.
  6. Ruxton, G.D. (2006) ‘The unequal variance t-test is an underused alternative to Student’s t-test and the Mann–Whitney U test’, Behavioral Ecology, 17(4), pp. 688–690.
  7. Kleinbaum, D.G. and Klein, M. (2012) Survival analysis: A self-learning text. 3rd edn. New York: Springer.
  8. Conover, W.J. (1999) Practical nonparametric statistics. 3rd edn. New York: Wiley.
  9. Field, A. (2018) Discovering statistics using IBM SPSS statistics. 5th edn. London: Sage.
  10. Klein, J.P. and Moeschberger, M.L. (2003) Survival analysis: Techniques for censored and truncated data. 2nd edn. New York: Springer.