HW2_JH

HW #2

For this assignment, we will be evaluating different normalization methods on 2-channel arrays in which 4 biological samples were run. The study is from GEO and the description of the experiment is provided as follows.

Series GSE12050: Subcutaneous adipose tissue from lean and obese subjects

Obtaining adipose tissue samples are paramount to the understanding of human obesity. We have examined the impact of needle-aspirated and surgical biopsy techniques on the study of subcutaneous adipose tissue (scAT) gene expression in both obese and lean subjects. Biopsy sampling methods have a significant impact on data interpretation and revealed that gene expression profiles derived from surgical tissue biopsies better capture the significant changes in molecular pathways associated with obesity. We hypothesize that this is because needle biopsies do not aspirate the fibrotic fraction of scAT; which subsequently results in an under-representation of the inflammatory and metabolic changes that coincide with obesity. This analysis revealed that the biopsy technique influences the gene expression underlying the biological themes commonly discussed in obesity (e.g. inflammation, extracellular matrix, metabolism, etc), and is therefore a caveat to consider when designing microarray experiments. These results have crucial implications for the clinical and physiopathological understanding of human obesity and therapeutic approaches.

We will be working with 4 lean subjects from which a needle biopsy was taken.

1.) First load the marray library, then load the 4 GenePix files, making sure to extract the foreground and background median values from the Cy5 and Cy3 channels. (2.5 pts)

#load marray library
library(marray)

## Loading required package: limma

## Warning: package 'limma' was built under R version 4.1.3

#load limma
library(limma)

#load 4 GenePix files
dir_path <- "C:/Users/jacly/Desktop/Gene Expression Data Analysis and Visualization"
data <- read.GenePix(
    path = dir_path, 
    name.Gf = "F532 Median", name.Gb = "B532 Median", 
    name.Rf = "F635 Median", name.Rb = "B635 Median", 
    name.W ="Flags" 
    )

## Reading ...  C:/Users/jacly/Desktop/Gene Expression Data Analysis and Visualization/GSM304445.gpr 
## Reading ...  C:/Users/jacly/Desktop/Gene Expression Data Analysis and Visualization/GSM304446.gpr 
## Reading ...  C:/Users/jacly/Desktop/Gene Expression Data Analysis and Visualization/GSM304447.gpr 
## Reading ...  C:/Users/jacly/Desktop/Gene Expression Data Analysis and Visualization/GSM304448.gpr

#extract the foreground and background median values from Cy5 and Cy3

2.) Normalize each array using median global, loess, and print-tip-group loess methods. Then plot MvA plots of all 4 arrays comparing no normalization to the other 3 normalization approaches. (2 pts)

#Subject 1

#median global
mediannorm1 <- maNorm(data[,1], norm = "median", span=0.45)
#loess
loessnorm1 <- maNorm(data[,1], norm = "loess", span=0.45)
#print-tip-group loess
ptlgloess1 <- maNorm(data[,1], norm = "printTipLoess", span=0.45)
#MvA plots
#arrange plots
par(mfrow=c(4,1))
#normalize arrays and create MvA plots via:
#no normalization
maPlot(data[,1],main='Subject 1 No Normalization', legend = NULL)
#median global
maPlot(mediannorm1, main='Subject 1 Medial Global Normalization', legend = NULL)
#loess
maPlot(loessnorm1, main='Subject 1 Loess Normalization', legend = NULL)
#print-tip loess normalization
maPlot(ptlgloess1, main='Subject 1 Print-tip-group Loess Normalization', legend = NULL)

#Subject 2

#median global
mediannorm2 <- maNorm(data[,2], norm = "median", span=0.45)
#loess
loessnorm2 <- maNorm(data[,2], norm = "loess", span=0.45)
#print-tip-group loess
ptlgloess2 <- maNorm(data[,2], norm = "printTipLoess", span=0.45)
#MvA plots
#arrange plots
par(mfrow=c(4,1))
#normalize arrays and create MvA plots via:
#no normalization
maPlot(data[,2],main='Subject 2 No Normalization', legend = NULL)
#median global
maPlot(mediannorm2, main='Subject 2 Medial Global Normalization', legend = NULL)
#loess
maPlot(loessnorm2, main='Subject 2 Loess Normalization', legend = NULL)
#print-tip loess normalization
maPlot(ptlgloess2, main='Subject 2 Print-tip-group Loess Normalization', legend = NULL)

#Subject 3

#median global
mediannorm3 <- maNorm(data[,3], norm = "median", span=0.45)
#loess
loessnorm3 <- maNorm(data[,3], norm = "loess", span=0.45)
#print-tip-group loess
ptlgloess3 <- maNorm(data[,3], norm = "printTipLoess", span=0.45)
#MvA plots
#arrange plots
par(mfrow=c(4,1))
#normalize arrays and create MvA plots via:
#no normalization
maPlot(data[,3],main='Subject 3 No Normalization', legend = NULL)
#median global
maPlot(mediannorm3, main='Subject 3 Medial Global Normalization', legend = NULL)
#loess
maPlot(loessnorm3, main='Subject 3 Loess Normalization', legend = NULL)
#print-tip loess normalization
maPlot(ptlgloess3, main='Subject 3 Print-tip-group Loess Normalization', legend = NULL)

#Subject 4

#median global
mediannorm4 <- maNorm(data[,4], norm = "median", span=0.45)
#loess
loessnorm4 <- maNorm(data[,4], norm = "loess", span=0.45)
#print-tip-group loess
ptlgloess4 <- maNorm(data[,4], norm = "printTipLoess", span=0.45)
#MvA plots
#arrange plots
par(mfrow=c(4,1))
#normalize arrays and create MvA plots via:
#no normalization
maPlot(data[,4],main='Subject 4 No Normalization', legend = NULL)
#median global
maPlot(mediannorm4, main='Subject 4 Medial Global Normalization', legend = NULL)
#loess
maPlot(loessnorm4, main='Subject 4 Loess Normalization', legend = NULL)
#print-tip loess normalization
maPlot(ptlgloess4, main='Subject 4 Print-tip-group Loess Normalization', legend = NULL)

3.) Plot density plots of the log ratio values for each normalization (and pre normalization) for only array #4. Put them all on the same plot. Make sure to label the axes and provide a legend. (2 pts)

#density plots of the log ratio values for each normalization (and pre normalization) 
plot(density(na.omit(maM(data[,4]))), 
    main = "Array 4 Density Plot (Pre- and Post-normalized)", 
    ylim=c(0, 0.9),xlim=c(-8, 10.5),
    col="black")
#median global line
lines(density(na.omit(maM(mediannorm4))), col = "sky blue", lwd = 2)
#loess line
lines(density(na.omit(maM(loessnorm4))), col = "magenta", lwd = 2) 
#print-tip-group loess line
lines(density(na.omit(maM(ptlgloess4))), col = "purple", lwd = 2) 
#add legend
leg <- c(
    "No normalization", 
    "Global Median \nNormalization", 
    "Loess Normalization", 
    "Print-tip-group Loess \nNormalization"
    )
legend(2, 0.85, legend = leg, lty = c(1, 1), lwd = c(2.5, 2.5),col = c("black", "sky blue", "magenta", "purple"))

4.) Based on the plots generated so far, which normalization do you think is most preferred for this dataset? (2 pts)

Based on the plots generated so far, the normalization that is most preferred for this dataset is the loess or the print-tip-group loess methods because it better aligns to the horizontal reference line and is more closely packed with less outliers. Similar to using a qq-plot to visually confirm a normal distribution, the loess and the print-tip-group loess methods better aligns to the horizontal reference line and exhibits less deviations from it compared the the other normalization methods which still shows a curved shape. Additionally, the loess and the print-tip-group loess methods most closely resemble a normally distributed density curve.

5.) Research has demonstrated that often a single channel, background subtracted provides as good a normalization as using both channels. To test this, we will be utilizing the fact that these 4 samples are replicates and calculate the correlation between them. So, first extract the Cy5 foreground and background values for each of the 4 arrays and subtract the background from the foreground values, then log2 transform these values. Then calculate global median normalization on these 4 arrays using these background subtracted Cy5 values. Hint, you need to use the median of each array to scale, such that after normalization, all arrays will have a median of 1. (4 pts)

for(i in 1:4){
    name <- paste("sample", i, sep = ".")
    bg <- maRb(data[ , i])
    fg <- maRf(data[ , i])
    diff <- fg - bg
    assign(name, log2(diff))
}

## Warning in assign(name, log2(diff)): NaNs produced

## Warning in assign(name, log2(diff)): NaNs produced

## Warning in assign(name, log2(diff)): NaNs produced

## Warning in assign(name, log2(diff)): NaNs produced

data.prenorm <- cbind(sample.1, sample.2, sample.3, sample.4)
data.median  <- apply(data.prenorm, 2, median, na.rm = T)
data.norm    <- sweep(data.prenorm, 2, data.median)

colnames(data.norm) <- c("Array 1", "Array 2", "Array 3", "Array 4")

median(data.norm[ , 1], na.rm = T)

## [1] 0

median(data.norm[ , 2], na.rm = T)

## [1] 0

median(data.norm[ , 3], na.rm = T)

## [1] 0

median(data.norm[ , 4], na.rm = T)

## [1] 0

6.) Next calculate a Spearman’s rank correlation between all 4 arrays that you normalized in #5 and do the same with the M values from loess normalized data that you generated in #2. Plot a scatter plot matrix for each of the two normalizations (pairs() function), and be sure to label the arrays and title the plot. Print the correlation coefficients to the screen. (4 pts)

s1 <- maM(loessnorm1)
s2 <- maM(loessnorm2)
s3 <- maM(loessnorm3)
s4 <- maM(loessnorm4)

names <- c("Array 1", "Array 2", "Array 3", "Array 4")

data.loess               <- cbind(s1, s2, s3, s4)
colnames(data.loess)     <- make.names(names, unique = TRUE)
data.loess.cor           <- cor(data.loess, use = "complete.obs", method = "spearman") 
colnames(data.loess.cor) <- make.names(names, unique = TRUE)
rownames(data.loess.cor) <- make.names(names, unique = TRUE)
data.loess.cor

##           Array.1   Array.2   Array.3   Array.4
## Array.1 1.0000000 0.6926262 0.7529097 0.6888445
## Array.2 0.6926262 1.0000000 0.7255670 0.7082149
## Array.3 0.7529097 0.7255670 1.0000000 0.7419454
## Array.4 0.6888445 0.7082149 0.7419454 1.0000000

data.cor             <- cor(data.norm, use = "complete.obs", method = "spearman") 
rownames(data.cor)   <- make.names(names, unique = TRUE)
colnames(data.cor)   <- make.names(names, unique = TRUE)
data.cor

##           Array.1   Array.2   Array.3   Array.4
## Array.1 1.0000000 0.8957946 0.8784760 0.8985979
## Array.2 0.8957946 1.0000000 0.8758774 0.9075121
## Array.3 0.8784760 0.8758774 1.0000000 0.8848059
## Array.4 0.8985979 0.9075121 0.8848059 1.0000000

pairs(data.norm, main = "Scatterplot Matrix Global Median Normalization")

pairs(data.loess, main = "Scatterplot Matrix Loess Normalization")

7.) Now we want to compare these normalizations to quantile normalized data to see if we gain anything by leveraging the distributions across all 4 arrays. Carry out the steps in the lecture or use the paper from Bolstad et al. entitled: “A comparison of normalization methods for high density oligonucleotide array data based on variance and bias” (on the course website), but we are only going to conduct this on the Cy5 channel. The basic steps are as follows (these 6 steps are calculated on non-logged data; the data is logged after these steps are carried out): (8 pts) 1. Subtract the foreground – background for each of the 4 chips for only the Cy5 channel. This should all be on the linear or raw scale (no logging yet). 2. Sort each column independently in this new matrix 3. Calculate row means for the sorted matrix 4. Create a new matrix with each row having the same values as the sorted row mean vectors from step #3 (you should have a new R matrix) 5. Rank the columns independently on the original background subtracted matrix (from step #1) Hint: use the rank() function with the argument ties=”first” or order() 6. Reorder the columns in the new matrix from step #4 using the ranks from step #5

To verify that each array has the same distribution, use the hist() function to look at various arrays (e.g., hist(c5.norm[,1]); hist(c5.norm[,2]); etc.). Slight differences in distributions are a result of the ties in the ranking.

for(i in 1:4){
    name <- paste("samp", i, sep = "")
    bg <- maRb(data[, i])
    fg <- maRf(data[, i])
    assign(name, fg - bg)
} 

samp           <- cbind(samp1, samp2, samp3, samp4)
colnames(samp) <- c("samp1", "samp2", "samp3", "samp4")
samp.sorted    <- apply(samp, 2, sort) 
row.means      <- rowMeans(samp.sorted, na.rm = FALSE)

order1 <- order(samp1)  
order2 <- order(samp2)
order3 <- order(samp3)
order4 <- order(samp4)

norm1 <- rep(NA, nrow(samp))
norm2 <- rep(NA, nrow(samp))
norm3 <- rep(NA, nrow(samp))
norm4 <- rep(NA, nrow(samp))

norm1[order1] <- row.means
norm2[order2] <- row.means
norm3[order3] <- row.means
norm4[order4] <- row.means

quant.norm <- cbind(norm1, norm2, norm3, norm4)
head(samp)

##      samp1 samp2 samp3 samp4
## [1,]    53    27    21    23
## [2,]    48    27    20    27
## [3,]    51    21    21    16
## [4,]  1012  1905   258  1900
## [5,]    59    20    22    24
## [6,]    55    24    25    21

head(quant.norm)

##        norm1  norm2  norm3   norm4
## [1,]   40.00   30.0  24.50   26.50
## [2,]   35.25   30.0  23.25   29.50
## [3,]   38.00   23.5  24.50   21.00
## [4,] 1005.25 1926.5 406.00 1411.25
## [5,]   45.50   22.5  26.25   27.25
## [6,]   41.75   27.0  31.50   25.25

par(mfrow = c(4, 1))
hist(log2(quant.norm[ , 1]))

## Warning in hist(log2(quant.norm[, 1])): NaNs produced

hist(log2(quant.norm[ , 2]))

## Warning in hist(log2(quant.norm[, 2])): NaNs produced

hist(log2(quant.norm[ , 3]))

## Warning in hist(log2(quant.norm[, 3])): NaNs produced

hist(log2(quant.norm[ , 4]))

## Warning in hist(log2(quant.norm[, 4])): NaNs produced

par(mfrow = c(1, 1))        
boxplot(log2(quant.norm), xlab="Sample", ylab="Log2 Signal", axes=F)

## Warning in boxplot(log2(quant.norm), xlab = "Sample", ylab = "Log2 Signal", :
## NaNs produced

## Warning in bplt(at[i], wid = width[i], stats = z$stats[, i], out = z$out[z$group
## == : Outlier (-Inf) in boxplot 1 is not drawn

## Warning in bplt(at[i], wid = width[i], stats = z$stats[, i], out = z$out[z$group
## == : Outlier (-Inf) in boxplot 2 is not drawn

## Warning in bplt(at[i], wid = width[i], stats = z$stats[, i], out = z$out[z$group
## == : Outlier (-Inf) in boxplot 3 is not drawn

## Warning in bplt(at[i], wid = width[i], stats = z$stats[, i], out = z$out[z$group
## == : Outlier (-Inf) in boxplot 4 is not drawn

axis(1, labels = 1:ncol(quant.norm), at = 1:ncol(quant.norm))
axis(2)

8.) Now log (base 2) the new R matrix you created from step 6 (question #7) and calculate a Spearman’s rank correlation between the 4 arrays and plot a scatter plot matrix as you did before. Print the correlation coefficients to the screen. (5 pts)

quant.norm.log <- log2(quant.norm)

## Warning: NaNs produced

quant.norm.cor <- cor(quant.norm.log, use = "complete.obs", method = "spearman") 
quant.norm.cor

##           norm1     norm2     norm3     norm4
## norm1 1.0000000 0.8953134 0.8777633 0.8979712
## norm2 0.8953134 1.0000000 0.8765842 0.9073289
## norm3 0.8777633 0.8765842 1.0000000 0.8840355
## norm4 0.8979712 0.9073289 0.8840355 1.0000000

pairs(quant.norm.log, main = "Scatterplot Matrix Quantile Normalization")

9.) Of the 4 normalization methods, which do you suggest as optimal and why? (2.5 pts)

Of the 4 normalization method, the one I would suggest as optimal is global median normalization because it has the highest correlation values (Array 1 - 2: 0.8957946, Array 1 - 3: 0.8784760, Array 1 - 4: 0.8985979). In addition, the scatterplot matrix shows a relatively tight positive correlative relationship that looks like a straight line. In comparison, the loess method scatter plots have a curved relationship.

10.) Now we want to work with a qRT-PCR dataset from patients with an inflammatory disease. The genes measured for this experiment included a set of proinflammatory chemokines and cytokines that are related to the disease. Download the raw qRT-PCR file called Inflammation_qRT-PCR.csv. Then change the normalization script from the lecture notes to include the housekeeping genes beta actin, GAPDH, and 18S. Look at the file to make sure the housekeepers are spelled correctly.

Run the normalization script and output a data matrix of fold change values.

qrt_pcr <- read.csv("C:/Users/jacly/Desktop/Gene Expression Data Analysis and Visualization/Inflammation_qRT-PCR.csv")
qrt_pcr

#normalization script
f.parse <- function(path=pa,file=fi,out=out.fi) {
    d <- read.table(paste(path,file,sep=""),skip=11,sep=",",header=T)
    u <- as.character(unique(d$Name))
    u <- u[u!=""]; u <- u[!is.na(u)];
    ref <- unique(as.character(d$Name[d$Type=="Reference"]))
    u <- unique(c(ref,u))
    # housekeeping genes beta actin, GAPDH, and 18S
    hg <- c("B-ACTIN","GAPDH","18S")
    hg <- toupper(hg)
    p <- unique(toupper(as.character(d$Name.1)))
    p <- sort(setdiff(p,c("",hg)))

    mat <- matrix(0,nrow=length(u),ncol=length(p))
    dimnames(mat) <- list(u,p)
    for (i in 1:length(u)) {
        print(paste(i,": ",u[i],sep=""))
        tmp <- d[d$Name %in% u[i],c(1:3,6,9)]
        g <- toupper(unique(as.character(tmp$Name.1)))
        g <- sort(setdiff(g,c("",hg)))
        
for (j in 1:length(g)) {
            v <- tmp[toupper(as.character(tmp$Name.1)) %in% g[j],5]
            v <- v[v!=999]
            v <- v[((v/mean(v))<1.5) & ((v/mean(v))>0.67)]  #gene j vector

            hv3 <- NULL
            for (k in 1:length(hg)) {   #housekeeping gene vector (each filtered by reps)
                hv <- tmp[toupper(as.character(tmp$Name.1)) %in% hg[k],5]
                hv <- hv[hv!=999]
                hv3 <- c(hv3,hv[((hv/mean(hv))<1.5) & ((hv/mean(hv))>0.67)])    
            }
# qRT-PCR file formatting and calculation of fold changes (cont)
            sv <- mean(as.numeric(v)) - mean(as.numeric(hv3))   #scaled value for gene j
            if(i==1) { #reference sample only
                mat[u[i],g[j]] <- sv
                next
            }
            
            mat[u[i],g[j]] <- sv - mat[u[1],g[j]]
        }
    }

    mat[1,][!is.na(mat[1,])] <- 0
    fc <- 2^(-1 * mat)
    write.table(t(c("Subject",dimnames(mat)[[2]])),paste(path,out,sep=""),quote=F,sep="\t",col.names=F,row.names=F)
    write.table(round(fc,3),paste(path,out,sep=""),quote=F,sep="\t",append=T,col.names=F)
}

# run function
pa <- "C:/Users/jacly/Desktop/Gene Expression Data Analysis and Visualization/"
fi <- "Inflammation_qRT-PCR.csv"
out.fi <- "fold_chg_matrix1.txt"

f.parse(pa,fi,out.fi)

## [1] "1: 434_1"
## [1] "2: 434_15"
## [1] "3: 434_2"
## [1] "4: 434_14"
## [1] "5: 434_3"
## [1] "6: 434_13"
## [1] "7: 434_4"
## [1] "8: 434_12"
## [1] "9: 434_5"
## [1] "10: 434_11"
## [1] "11: 434_6"
## [1] "12: 434_10"
## [1] "13: 434_7"
## [1] "14: 434_9"
## [1] "15: 434_8"

11.) Read the normalized qRT-PCR data matrix into R, using a Spearman’s rank correlation, which two patients are most correlated? Plot these two patients against each other in a scatter plot. (3 pts)

norm_mat <- read.table("C:/Users/jacly/Desktop/Gene Expression Data Analysis and Visualization/fold_chg_matrix.txt", sep = "\t", stringsAsFactors= F, header = TRUE)
norm_mat

t_norm_mat <- t(norm_mat)
t_norm_mat

##             [,1]       [,2]       [,3]       [,4]       [,5]       [,6]      
## Subject     "434_1"    "434_15"   "434_2"    "434_14"   "434_3"    "434_13"  
## APOBEC3B    "1.000"    "4.796"    "1.603"    "1.147"    "2.205"    "0.602"   
## CCL.24      "1.000"    "1.801"    "7.474"    "2.263"    "1.643"    "0.818"   
## CCL2        "  1.000"  "  0.471"  "  2.848"  "  4.161"  " 48.361"  "  1.070" 
## CCL20       " 1.000"   " 0.724"   "13.548"   "16.354"   " 6.148"   " 6.505"  
## CCL22       "1.000"    "0.896"    "2.769"    "0.771"    "1.171"    "3.243"   
## DDX58.RIG.1 " 1.000"   " 5.309"   " 1.077"   " 0.783"   " 8.124"   " 5.259"  
## G1P2.ISG15  "  1.000"  "  0.913"  "  1.122"  "  1.275"  " 63.905"  "  3.940" 
## G1P3        " 1.000"   " 0.814"   " 1.050"   " 0.982"   "30.831"   " 0.559"  
## IFI44       "  1.000"  "  0.765"  "  1.535"  "  0.982"  " 31.378"  "  6.499" 
## IFIT1       "  1.000"  "  1.415"  "  1.078"  "  0.765"  " 40.758"  "  1.771" 
## IFIT4       "  1.000"  "  4.613"  "  0.845"  "  0.620"  " 39.712"  "  2.218" 
## IL.10       "1.000"    "2.954"    "0.240"    "0.371"    "0.344"    "0.439"   
## IL.6        NA         NA         NA         NA         NA         NA        
## IL1A        "1.000"    "2.220"    "4.461"    "9.190"    "5.496"    "1.456"   
## IL1B        " 1.000"   " 2.982"   " 4.338"   " 9.359"   " 4.004"   " 1.534"  
## INFA5       "   1.000" "  16.704" "   0.473" "   2.611" "   6.598" "   2.068"
## INFB1       "   1.000" "   2.093" "   1.213" "   0.729" "   5.368" "   5.571"
## IRF5        "1.000"    "0.786"    "1.222"    "0.580"    "1.914"    "0.982"   
## IRF7        " 1.000"   " 1.136"   " 1.098"   " 0.801"   " 9.806"   " 1.255"  
## LPL         "1.000"    "2.255"    "4.882"    "1.392"    "4.697"    "1.635"   
## LY6E        " 1.000"   " 1.224"   " 1.486"   " 0.998"   "18.473"   "11.838"  
## MX1         " 1.000"   " 1.167"   " 1.755"   " 0.964"   "33.381"   " 2.766"  
## NK4.IL.32.  "1.000"    "1.121"    "1.111"    "1.226"    "1.520"    "1.262"   
## OAS3        " 1.000"   " 1.447"   " 1.416"   " 1.474"   "19.658"   "13.247"  
## OASL        " 1.000"   " 1.424"   " 0.968"   " 0.873"   " 4.882"   " 1.095"  
## OPN.SPP1.   "1.000"    "0.637"    "5.604"    "1.366"    "3.598"    "0.532"   
## PBEF        "1.000"    "2.679"    "0.609"    "1.606"    "0.997"    "0.186"   
## PI3         "1.000"    "0.940"    "0.602"    "0.894"    "1.108"    "4.982"   
## PRKR        " 1.000"   " 1.253"   " 1.283"   " 0.959"   "12.953"   " 4.347"  
## TNF.ALPHA   "1.000"    "1.292"    "1.334"    "1.151"    "3.701"    "6.174"   
##             [,7]       [,8]       [,9]       [,10]      [,11]      [,12]     
## Subject     "434_4"    "434_12"   "434_5"    "434_11"   "434_6"    "434_10"  
## APOBEC3B    "2.820"    "3.761"    "2.251"    "2.337"    "1.634"    "2.008"   
## CCL.24      "0.387"    "1.097"    "1.091"    "2.215"    "0.296"    "0.583"   
## CCL2        "109.893"  " 51.407"  " 21.467"  "  5.721"  " 21.937"  "  4.424" 
## CCL20       "10.407"   "36.499"   "24.787"   "11.218"   "58.910"   " 5.724"  
## CCL22       "2.066"    "0.986"    "2.101"    "1.056"    "1.221"    "1.262"   
## DDX58.RIG.1 "23.824"   "25.852"   " 1.661"   " 7.771"   " 5.007"   " 3.567"  
## G1P2.ISG15  "206.269"  "279.518"  "  8.495"  " 64.094"  " 34.233"  " 19.828" 
## G1P3        "98.416"   "67.244"   " 4.965"   "11.169"   "19.792"   "10.410"  
## IFI44       "100.559"  " 60.775"  "  9.029"  "  6.082"  " 23.513"  " 10.897" 
## IFIT1       "187.207"  "286.822"  "  5.638"  " 89.333"  " 27.189"  " 12.519" 
## IFIT4       "118.593"  "386.654"  "  5.698"  " 71.939"  " 40.018"  " 20.727" 
## IL.10       "4.440"    "1.802"    "1.605"    "0.214"    "2.365"    "1.890"   
## IL.6        NA         NA         NA         NA         NA         NA        
## IL1A        "4.801"    "3.776"    "7.034"    "6.671"    "8.774"    "0.790"   
## IL1B        " 4.396"   " 4.956"   " 9.352"   " 4.469"   "10.855"   " 2.171"  
## INFA5       "   6.688" "6482.234" "   3.817" "2113.959" "   3.565" "   0.987"
## INFB1       "  10.497" "2123.549" "   0.636" " 902.025" "   1.360" "   1.796"
## IRF5        "3.168"    "2.014"    "1.439"    "1.458"    "1.323"    "1.382"   
## IRF7        "17.375"   "15.753"   " 2.644"   " 5.884"   " 6.795"   " 5.021"  
## LPL         "1.356"    "8.314"    "2.763"    "3.985"    "1.825"    "1.451"   
## LY6E        "46.339"   "21.484"   " 5.736"   "12.415"   " 8.512"   " 6.147"  
## MX1         "89.118"   "84.557"   " 6.694"   "27.462"   "23.610"   "14.139"  
## NK4.IL.32.  "1.842"    "1.186"    "1.872"    "1.413"    "0.982"    "1.472"   
## OAS3        "62.788"   "64.225"   " 3.394"   "63.714"   "11.357"   " 5.565"  
## OASL        "12.495"   "17.884"   " 4.548"   " 6.869"   " 3.710"   " 3.230"  
## OPN.SPP1.   "2.141"    "4.123"    "3.527"    "3.746"    "1.139"    "0.515"   
## PBEF        "1.258"    "1.717"    "1.767"    "2.431"    "2.655"    "1.747"   
## PI3         "1.973"    "2.328"    "2.716"    "1.127"    "1.160"    "1.391"   
## PRKR        "29.496"   "37.847"   " 5.623"   "11.171"   "12.934"   " 7.744"  
## TNF.ALPHA   "3.558"    "8.976"    "2.140"    "3.228"    "3.187"    "1.737"   
##             [,13]      [,14]      [,15]     
## Subject     "434_7"    "434_9"    "434_8"   
## APOBEC3B    "1.707"    "1.644"    "1.909"   
## CCL.24      "0.196"    "1.510"    "1.265"   
## CCL2        " 26.541"  "  0.538"  " 51.919" 
## CCL20       "38.479"   " 0.862"   " 5.120"  
## CCL22       "1.286"    "0.840"    "1.746"   
## DDX58.RIG.1 " 8.043"   " 0.826"   "10.926"  
## G1P2.ISG15  " 68.663"  "  1.305"  " 56.498" 
## G1P3        "53.424"   " 0.985"   "43.425"  
## IFI44       " 30.633"  "  1.486"  " 43.512" 
## IFIT1       " 55.688"  "  0.935"  " 53.432" 
## IFIT4       " 63.582"  "  1.398"  " 35.451" 
## IL.10       "1.452"    "0.523"    "0.820"   
## IL.6        NA         NA         NA        
## IL1A        "8.107"    "0.985"    "2.628"   
## IL1B        "11.463"   " 0.897"   " 3.590"  
## INFA5       "   2.113" "   3.582" "  12.800"
## INFB1       "   1.827" "   2.339" "   9.892"
## IRF5        "1.548"    "0.948"    "2.268"   
## IRF7        "10.815"   " 1.225"   "13.553"  
## LPL         "0.677"    "1.612"    "1.688"   
## LY6E        "27.249"   " 1.704"   "21.034"  
## MX1         "32.424"   " 1.188"   "49.412"  
## NK4.IL.32.  "1.060"    "1.173"    "1.561"   
## OAS3        "39.469"   " 1.881"   "19.060"  
## OASL        " 5.210"   " 1.287"   " 8.046"  
## OPN.SPP1.   "1.540"    "0.836"    "2.697"   
## PBEF        "1.835"    "1.422"    "1.112"   
## PI3         "0.624"    "0.630"    "1.107"   
## PRKR        "12.556"   " 1.521"   "12.876"  
## TNF.ALPHA   "2.841"    "1.367"    "3.560"

names <- c(norm_mat$Subject)
header.true <- function(df) {
  names(df) <- as.character(unlist(df[1,]))
  df[-1,]
}
header.true(t_norm_mat)

##             [,1]       [,2]       [,3]       [,4]       [,5]       [,6]      
## APOBEC3B    "1.000"    "4.796"    "1.603"    "1.147"    "2.205"    "0.602"   
## CCL.24      "1.000"    "1.801"    "7.474"    "2.263"    "1.643"    "0.818"   
## CCL2        "  1.000"  "  0.471"  "  2.848"  "  4.161"  " 48.361"  "  1.070" 
## CCL20       " 1.000"   " 0.724"   "13.548"   "16.354"   " 6.148"   " 6.505"  
## CCL22       "1.000"    "0.896"    "2.769"    "0.771"    "1.171"    "3.243"   
## DDX58.RIG.1 " 1.000"   " 5.309"   " 1.077"   " 0.783"   " 8.124"   " 5.259"  
## G1P2.ISG15  "  1.000"  "  0.913"  "  1.122"  "  1.275"  " 63.905"  "  3.940" 
## G1P3        " 1.000"   " 0.814"   " 1.050"   " 0.982"   "30.831"   " 0.559"  
## IFI44       "  1.000"  "  0.765"  "  1.535"  "  0.982"  " 31.378"  "  6.499" 
## IFIT1       "  1.000"  "  1.415"  "  1.078"  "  0.765"  " 40.758"  "  1.771" 
## IFIT4       "  1.000"  "  4.613"  "  0.845"  "  0.620"  " 39.712"  "  2.218" 
## IL.10       "1.000"    "2.954"    "0.240"    "0.371"    "0.344"    "0.439"   
## IL.6        NA         NA         NA         NA         NA         NA        
## IL1A        "1.000"    "2.220"    "4.461"    "9.190"    "5.496"    "1.456"   
## IL1B        " 1.000"   " 2.982"   " 4.338"   " 9.359"   " 4.004"   " 1.534"  
## INFA5       "   1.000" "  16.704" "   0.473" "   2.611" "   6.598" "   2.068"
## INFB1       "   1.000" "   2.093" "   1.213" "   0.729" "   5.368" "   5.571"
## IRF5        "1.000"    "0.786"    "1.222"    "0.580"    "1.914"    "0.982"   
## IRF7        " 1.000"   " 1.136"   " 1.098"   " 0.801"   " 9.806"   " 1.255"  
## LPL         "1.000"    "2.255"    "4.882"    "1.392"    "4.697"    "1.635"   
## LY6E        " 1.000"   " 1.224"   " 1.486"   " 0.998"   "18.473"   "11.838"  
## MX1         " 1.000"   " 1.167"   " 1.755"   " 0.964"   "33.381"   " 2.766"  
## NK4.IL.32.  "1.000"    "1.121"    "1.111"    "1.226"    "1.520"    "1.262"   
## OAS3        " 1.000"   " 1.447"   " 1.416"   " 1.474"   "19.658"   "13.247"  
## OASL        " 1.000"   " 1.424"   " 0.968"   " 0.873"   " 4.882"   " 1.095"  
## OPN.SPP1.   "1.000"    "0.637"    "5.604"    "1.366"    "3.598"    "0.532"   
## PBEF        "1.000"    "2.679"    "0.609"    "1.606"    "0.997"    "0.186"   
## PI3         "1.000"    "0.940"    "0.602"    "0.894"    "1.108"    "4.982"   
## PRKR        " 1.000"   " 1.253"   " 1.283"   " 0.959"   "12.953"   " 4.347"  
## TNF.ALPHA   "1.000"    "1.292"    "1.334"    "1.151"    "3.701"    "6.174"   
##             [,7]       [,8]       [,9]       [,10]      [,11]      [,12]     
## APOBEC3B    "2.820"    "3.761"    "2.251"    "2.337"    "1.634"    "2.008"   
## CCL.24      "0.387"    "1.097"    "1.091"    "2.215"    "0.296"    "0.583"   
## CCL2        "109.893"  " 51.407"  " 21.467"  "  5.721"  " 21.937"  "  4.424" 
## CCL20       "10.407"   "36.499"   "24.787"   "11.218"   "58.910"   " 5.724"  
## CCL22       "2.066"    "0.986"    "2.101"    "1.056"    "1.221"    "1.262"   
## DDX58.RIG.1 "23.824"   "25.852"   " 1.661"   " 7.771"   " 5.007"   " 3.567"  
## G1P2.ISG15  "206.269"  "279.518"  "  8.495"  " 64.094"  " 34.233"  " 19.828" 
## G1P3        "98.416"   "67.244"   " 4.965"   "11.169"   "19.792"   "10.410"  
## IFI44       "100.559"  " 60.775"  "  9.029"  "  6.082"  " 23.513"  " 10.897" 
## IFIT1       "187.207"  "286.822"  "  5.638"  " 89.333"  " 27.189"  " 12.519" 
## IFIT4       "118.593"  "386.654"  "  5.698"  " 71.939"  " 40.018"  " 20.727" 
## IL.10       "4.440"    "1.802"    "1.605"    "0.214"    "2.365"    "1.890"   
## IL.6        NA         NA         NA         NA         NA         NA        
## IL1A        "4.801"    "3.776"    "7.034"    "6.671"    "8.774"    "0.790"   
## IL1B        " 4.396"   " 4.956"   " 9.352"   " 4.469"   "10.855"   " 2.171"  
## INFA5       "   6.688" "6482.234" "   3.817" "2113.959" "   3.565" "   0.987"
## INFB1       "  10.497" "2123.549" "   0.636" " 902.025" "   1.360" "   1.796"
## IRF5        "3.168"    "2.014"    "1.439"    "1.458"    "1.323"    "1.382"   
## IRF7        "17.375"   "15.753"   " 2.644"   " 5.884"   " 6.795"   " 5.021"  
## LPL         "1.356"    "8.314"    "2.763"    "3.985"    "1.825"    "1.451"   
## LY6E        "46.339"   "21.484"   " 5.736"   "12.415"   " 8.512"   " 6.147"  
## MX1         "89.118"   "84.557"   " 6.694"   "27.462"   "23.610"   "14.139"  
## NK4.IL.32.  "1.842"    "1.186"    "1.872"    "1.413"    "0.982"    "1.472"   
## OAS3        "62.788"   "64.225"   " 3.394"   "63.714"   "11.357"   " 5.565"  
## OASL        "12.495"   "17.884"   " 4.548"   " 6.869"   " 3.710"   " 3.230"  
## OPN.SPP1.   "2.141"    "4.123"    "3.527"    "3.746"    "1.139"    "0.515"   
## PBEF        "1.258"    "1.717"    "1.767"    "2.431"    "2.655"    "1.747"   
## PI3         "1.973"    "2.328"    "2.716"    "1.127"    "1.160"    "1.391"   
## PRKR        "29.496"   "37.847"   " 5.623"   "11.171"   "12.934"   " 7.744"  
## TNF.ALPHA   "3.558"    "8.976"    "2.140"    "3.228"    "3.187"    "1.737"   
##             [,13]      [,14]      [,15]     
## APOBEC3B    "1.707"    "1.644"    "1.909"   
## CCL.24      "0.196"    "1.510"    "1.265"   
## CCL2        " 26.541"  "  0.538"  " 51.919" 
## CCL20       "38.479"   " 0.862"   " 5.120"  
## CCL22       "1.286"    "0.840"    "1.746"   
## DDX58.RIG.1 " 8.043"   " 0.826"   "10.926"  
## G1P2.ISG15  " 68.663"  "  1.305"  " 56.498" 
## G1P3        "53.424"   " 0.985"   "43.425"  
## IFI44       " 30.633"  "  1.486"  " 43.512" 
## IFIT1       " 55.688"  "  0.935"  " 53.432" 
## IFIT4       " 63.582"  "  1.398"  " 35.451" 
## IL.10       "1.452"    "0.523"    "0.820"   
## IL.6        NA         NA         NA        
## IL1A        "8.107"    "0.985"    "2.628"   
## IL1B        "11.463"   " 0.897"   " 3.590"  
## INFA5       "   2.113" "   3.582" "  12.800"
## INFB1       "   1.827" "   2.339" "   9.892"
## IRF5        "1.548"    "0.948"    "2.268"   
## IRF7        "10.815"   " 1.225"   "13.553"  
## LPL         "0.677"    "1.612"    "1.688"   
## LY6E        "27.249"   " 1.704"   "21.034"  
## MX1         "32.424"   " 1.188"   "49.412"  
## NK4.IL.32.  "1.060"    "1.173"    "1.561"   
## OAS3        "39.469"   " 1.881"   "19.060"  
## OASL        " 5.210"   " 1.287"   " 8.046"  
## OPN.SPP1.   "1.540"    "0.836"    "2.697"   
## PBEF        "1.835"    "1.422"    "1.112"   
## PI3         "0.624"    "0.630"    "1.107"   
## PRKR        "12.556"   " 1.521"   "12.876"  
## TNF.ALPHA   "2.841"    "1.367"    "3.560"

t_norm_mat <- as.numeric(t_norm_mat[-1,])
rownames(t_norm_mat) <- NULL
t_norm_mat <- matrix(t_norm_mat,nrow=30,ncol=15,byrow=TRUE)
names

##  [1] "434_1"  "434_15" "434_2"  "434_14" "434_3"  "434_13" "434_4"  "434_12"
##  [9] "434_5"  "434_11" "434_6"  "434_10" "434_7"  "434_9"  "434_8"

mat.cor <-  cor(t_norm_mat, method = "spearman") 
rownames(mat.cor)   <- make.names(names)
colnames(mat.cor)   <- make.names(names)
mat.cor

##             X434_1     X434_15       X434_2    X434_14       X434_3   X434_13
## X434_1  1.00000000  0.46061415  0.015131286 0.13707165  0.260792167 0.5349355
## X434_15 0.46061415  1.00000000 -0.170004450 0.07743658  0.192701380 0.2977303
## X434_2  0.01513129 -0.17000445  1.000000000 0.77659101 -0.007120605 0.4459279
## X434_14 0.13707165  0.07743658  0.776591010 1.00000000  0.043613707 0.4615042
## X434_3  0.26079217  0.19270138 -0.007120605 0.04361371  1.000000000 0.1700045
## X434_13 0.53493547  0.29773031  0.445927904 0.46150423  0.170004450 1.0000000
## X434_4  0.32799288  0.03960837  0.784601691 0.70983534  0.092122830 0.8059635
## X434_12 0.23055525 -0.25637031  0.771781467 0.46823189  0.054300657 0.4804718
## X434_5  0.41433022  0.12105029  0.648420116 0.59813084  0.230974633 0.8575879
## X434_11 0.36493102 -0.11570984  0.796617713 0.62527815  0.052959502 0.7214063
## X434_6  0.17000445 -0.37828215  0.680907877 0.41611037  0.011125946 0.3983089
## X434_10 0.45527370 -0.25233645 -0.011125946 0.01557632  0.036938140 0.2456609
## X434_7          NA          NA           NA         NA           NA        NA
## X434_9  0.41477526  0.46328438  0.409434802 0.69737428  0.205162439 0.6408545
## X434_8  0.15932354  0.11481976  0.735647530 0.81975968  0.045393858 0.4196707
##             X434_4     X434_12    X434_5    X434_11      X434_6     X434_10
## X434_1  0.32799288  0.23055525 0.4143302  0.3649310  0.17000445  0.45527370
## X434_15 0.03960837 -0.25637031 0.1210503 -0.1157098 -0.37828215 -0.25233645
## X434_2  0.78460169  0.77178147 0.6484201  0.7966177  0.68090788 -0.01112595
## X434_14 0.70983534  0.46823189 0.5981308  0.6252781  0.41611037  0.01557632
## X434_3  0.09212283  0.05430066 0.2309746  0.0529595  0.01112595  0.03693814
## X434_13 0.80596351  0.48047180 0.8575879  0.7214063  0.39830886  0.24566088
## X434_4  1.00000000  0.73773228 0.8887405  0.9020917  0.66266133  0.11838006
## X434_12 0.73773228  1.00000000 0.6460443  0.8182931  0.68944031  0.13441638
## X434_5  0.88874054  0.64604429 1.0000000  0.8117490  0.52558968  0.22162884
## X434_11 0.90209168  0.81829310 0.8117490  1.0000000  0.78593680  0.27681353
## X434_6  0.66266133  0.68944031 0.5255897  0.7859368  1.00000000  0.19448153
## X434_10 0.11838006  0.13441638 0.2216288  0.2768135  0.19448153  1.00000000
## X434_7          NA          NA        NA         NA          NA          NA
## X434_9  0.59635069  0.19761878 0.5696484  0.4165554  0.10814419  0.10903427
## X434_8  0.60258122  0.52030711 0.5304851  0.5821095  0.45749889 -0.07031598
##         X434_7    X434_9      X434_8
## X434_1      NA 0.4147753  0.15932354
## X434_15     NA 0.4632844  0.11481976
## X434_2      NA 0.4094348  0.73564753
## X434_14     NA 0.6973743  0.81975968
## X434_3      NA 0.2051624  0.04539386
## X434_13     NA 0.6408545  0.41967067
## X434_4      NA 0.5963507  0.60258122
## X434_12     NA 0.1976188  0.52030711
## X434_5      NA 0.5696484  0.53048509
## X434_11     NA 0.4165554  0.58210948
## X434_6      NA 0.1081442  0.45749889
## X434_10     NA 0.1090343 -0.07031598
## X434_7       1        NA          NA
## X434_9      NA 1.0000000  0.60792167
## X434_8      NA 0.6079217  1.00000000

#Patients 434_3 and 434_4 are most correlated 0.921

HW2_JH

Jaclyn Hwang

10/12/2022

HW #2