Part 1

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(ade4)
## Warning: package 'ade4' was built under R version 4.5.2
expr <-
  read.delim("Data/expr.txt", header=TRUE)
pheno <-
  read.delim("Data/pheno.txt")
pheno$Cancer <- as.factor(pheno$Cancer)
pheno$Batch <- as.factor(pheno$Batch)
pheno$Outcome <- as.factor(pheno$Outcome)
(mergedData <- full_join(pheno, expr, "Sample"))
cancerPCA <- dudi.pca(mergedData[,5:22283], scannf = FALSE, nf = 3)
summary(cancerPCA)
## Class: pca dudi
## Call: dudi.pca(df = mergedData[, 5:22283], scannf = FALSE, nf = 3)
## 
## Total inertia: 22280
## 
## Eigenvalues:
##     Ax1     Ax2     Ax3     Ax4     Ax5 
##  7967.5  2464.0  1369.6   764.6   627.4 
## 
## Projected inertia (%):
##     Ax1     Ax2     Ax3     Ax4     Ax5 
##  35.762  11.060   6.147   3.432   2.816 
## 
## Cumulative projected inertia (%):
##     Ax1   Ax1:2   Ax1:3   Ax1:4   Ax1:5 
##   35.76   46.82   52.97   56.40   59.22 
## 
## (Only 5 dimensions (out of 56) are shown)

Three PCA axes account for 52.97% of the variation.

1

s.class(
  cancerPCA$li,
  fac = mergedData$Cancer, 
  col = rainbow(3),
  axesell = FALSE,
  grid = FALSE,
  cstar = 0,
  cpoint = 2,
  sub = "PCA of bladder samples by cancer status"
)

There is some similarity in expression levels between between normal and cancerous samples, indicated by the slight overlap of ellipses but there are also differences since the ellipses do not fully overlap.

2

s.class(
  cancerPCA$li,
  fac = mergedData$Batch, 
  col = rainbow(5),
  axesell = FALSE,
  grid = FALSE,
  cstar = 0,
  cpoint = 2,
  sub = "PCA of bladder samples by batch number"
)

There are differences between batches, specifically Batch 1 is different from Batches 2, and 4, Batch 2 is different from batch Batches 1, 3, and 4, Batch 3 is different from Batches 2, and 4, Batch 4 is different from Batches 1, 2, and 3. Batch 5 is not different from any batch.

batch_outcome_table <- table(mergedData$Batch, mergedData$Outcome)
print(batch_outcome_table)
##    
##     Biopsy mTCC Normal sTCC-CIS sTCC+CIS
##   1      0   11      0        0        0
##   2      0    1      4       13        0
##   3      0    0      4        0        0
##   4      5    0      0        0        0
##   5      4    0      0        3       12

The data did not sort perfectly by batch and outcome. Other than the most severe outcome (sTCC+CIS) none of the clusters matched perfectly, indicating that there is overlap in the data. To correct this, I would analyze all samples in the same batch to reduce batch effects.

Part 2

hobbits <- 
  read.csv("Data/Hobbits.csv")
hobbits$Species <- as.factor(hobbits$Species)

hobbitsLog <- log10(hobbits[,3:6])

hobbitPCA <- dudi.pca(hobbitsLog,scannf = FALSE, nf = 3)
summary(hobbitPCA)
## Class: pca dudi
## Call: dudi.pca(df = hobbitsLog, scannf = FALSE, nf = 3)
## 
## Total inertia: 4
## 
## Eigenvalues:
##     Ax1     Ax2     Ax3     Ax4 
##  1.8023  1.3115  0.5859  0.3003 
## 
## Projected inertia (%):
##     Ax1     Ax2     Ax3     Ax4 
##  45.058  32.786  14.648   7.507 
## 
## Cumulative projected inertia (%):
##     Ax1   Ax1:2   Ax1:3   Ax1:4 
##   45.06   77.84   92.49  100.00

Three PCA axes account for 92.49% of the variance.

1

s.class(
  hobbitPCA$li,
  fac = hobbits$Species, 
  col = rainbow(9),
  axesell = FALSE,
  grid = FALSE,
  cstar = 0,
  cpoint = 2,
  clab = 0.5,
  sub = "PCA of hobbits by hominid species"
)

The Homo floresiensis most closely resembles the skulls of H. erectus and H. habilis. It is hard to distinguish which one exactly it most closely resembles since the cluster appears to be in the middle of both hominid species and does not overlap with either.

2

hobbit.dist <- dist(hobbitsLog)
hobbit.hc <- hclust(hobbit.dist, method = "mcquitty")
plot(hobbit.hc, 
     labels = hobbits$Species, 
     cex = 0.25)

In my analysis, the hobbit is most similar to Homo erectus and Homo habilis.