Project Timeline : 1 Days

Project-Description :

  • This project analyzes survival outcomes among patients diagnosed with AIDS in Australia prior to 1 July 1991. The dataset includes demographic characteristics, reported transmission categories, dates of diagnosis and death (or censoring), and survival status, providing a historical snapshot of the Australian AIDS epidemic in the pre–combination antiretroviral therapy era

Toss Data In :

library(MASS)
library(dplyr)
aids_df <- data.frame(Aids2) # Convert to df 
head(aids_df, 5)

Column Definitions :

  • state Grouped state of origin: “NSW”includes ACT and “other” is WA, SA, NT and TAS.

  • Sex of patient.

  • diag (Julian) date of diagnosis.

  • death (Julian) date of death or end of observation.

  • status “A” (alive) or “D” (dead) at end of observation.

  • T.categ Reported transmission category.

  • age (years) at diagnosis

Convert Julian-Dates :

aids_df$death <- as.Date(aids_df$death, origin = "1970-01-01")
aids_df$diag <- as.Date(aids_df$diag, origin = "1970-01-01")

Number of Unique levels Per Col :

# Number of Unique levels : 
lapply(as.list(aids_df), function(x){length(unique(x))})
## $state
## [1] 4
## 
## $sex
## [1] 2
## 
## $diag
## [1] 1580
## 
## $death
## [1] 1148
## 
## $status
## [1] 2
## 
## $T.categ
## [1] 8
## 
## $age
## [1] 74

Numerical variables :

aids_df |> dplyr::select(where(is.numeric)) |> colnames()
## [1] "age"

Categorical Variables :

aids_df |> dplyr::select(!where(is.numeric)) |> colnames()
## [1] "state"   "sex"     "diag"    "death"   "status"  "T.categ"

Column Level-Definitions :

  • state :

  • Sex :

  • diag :

  • death :

  • status :

  • T.categ :

Number of Unique Values per level :

Row Definition :

aids_df[1,]

Research Question :

  • What patterns emerge when studying aids death?

Basic Numerical Summaries :

Question : How many people are died/alive in the data?

aids_df |>
  count(status) |>
  mutate(pct = round(n/2843, 2))

Observation :

  • 62% of observations are Dead Cases
T.categ_sum <- 
  aids_df |>
  count(T.categ) |>
  mutate(pct = round(n/2843, 2)) |>
  arrange(desc(n))

T.categ_sum
barplot(T.categ_sum$n, names.arg = T.categ_sum$T.categ)

Observations :

  • As we can see, most people died from hs
aids_df |> filter(status == "D") |> count(T.categ); rm(T.categ_sum) # remove T.categ_sum