####Run through this example and try to understand what is going on with the data

####So, lets load RMarkdown

library(rmarkdown)

set messages to FALSE on everything (prevents certain boring things from being shown in the results)

knitr::opts_chunk$set(echo = FALSE, message=FALSE,warning=FALSE,collapse = FALSE)

PACKAGES

COLORS

LOAD DATA AND MAKE AN OVERALL HEATMAP

now let’s visualize the dataset and look for initial trends. We can do this by making a matrix so and then a heatmap to visualize the data

##      MCF10A_1 MCF10A_2 MCF7_1 MCF7_2 MDA231_1 MDA231_2 MDA468_1 MDA468_2
## [1,]     9.54     4.58   5.07   5.42    25.43    27.42     4.56     3.88
## [2,]    14.00    11.58   6.49   6.64     9.80    10.31     6.84     8.75
## [3,]    10.22     8.29  11.55  10.82    12.48    10.11     9.54     8.46
## [4,]     9.00     6.35  13.40  14.82    14.94    11.33     6.87     7.92
## [5,]     8.21     4.44  12.08   9.82    16.51    12.34    15.43     8.50
## [6,]    12.51    15.84   8.05   8.38     6.78     8.46     4.48     7.18
##      SKBR3_1 SKBR3_2
## [1,]    8.46    5.64
## [2,]   11.91   13.67
## [3,]    9.10    9.43
## [4,]    8.54    6.82
## [5,]    7.93    4.75
## [6,]   13.27   15.05

Data Manipulation

##  [1] "Gene_Symbol"             "Description"            
##  [3] "Peptides"                "MCF10A_1"               
##  [5] "MCF10A_2"                "MCF7_1"                 
##  [7] "MCF7_2"                  "MDA231_1"               
##  [9] "MDA231_2"                "MDA468_1"               
## [11] "MDA468_2"                "SKBR3_1"                
## [13] "SKBR3_2"                 "pvalue_MCF7_vs_MCF10A"  
## [15] "pvalue_MDA231_vs_MCF10A" "pvalue_MDA468_vs_MCF10A"
## [17] "pvalue_SKBR3_vs_MCF10A"  "mean_control"           
## [19] "mean_MCF10A"             "mean_MCF7"              
## [21] "mean_MDA231"             "mean_MDA468"            
## [23] "mean_SKBR3"              "log_MCF10A"             
## [25] "log_MCF7"                "log_MDA231"             
## [27] "log_MDA468"              "log_SKBR3"

Diving deeping with VOLCANO PLOTS

Barplots of significant points of interest

EXAMPLES OF A COUPLE PROTEINS or GENES

## # A tibble: 6 × 5
##   Gene_Symbol Description                             Peptides variable value
##   <chr>       <chr>                                      <dbl> <chr>    <dbl>
## 1 NES         Nestin OS=Homo sapiens GN=NES PE=1 SV=2        7 MCF10A_1  9.54
## 2 NES         Nestin OS=Homo sapiens GN=NES PE=1 SV=2        7 MCF10A_2  4.58
## 3 NES         Nestin OS=Homo sapiens GN=NES PE=1 SV=2        7 MCF7_1    5.07
## 4 NES         Nestin OS=Homo sapiens GN=NES PE=1 SV=2        7 MCF7_2    5.42
## 5 NES         Nestin OS=Homo sapiens GN=NES PE=1 SV=2        7 MDA231_1 25.4 
## 6 NES         Nestin OS=Homo sapiens GN=NES PE=1 SV=2        7 MDA231_2 27.4
## # A tibble: 6 × 5
##   Gene_Symbol Description                                Peptides variable value
##   <chr>       <chr>                                         <dbl> <chr>    <dbl>
## 1 HLA-A       HLA class I histocompatibility antigen, A…        3 MCF10A_1  5.42
## 2 HLA-A       HLA class I histocompatibility antigen, A…        3 MCF10A_2  6.62
## 3 HLA-A       HLA class I histocompatibility antigen, A…        3 MCF7_1    2.22
## 4 HLA-A       HLA class I histocompatibility antigen, A…        3 MCF7_2    2.69
## 5 HLA-A       HLA class I histocompatibility antigen, A…        3 MDA231_1  4.56
## 6 HLA-A       HLA class I histocompatibility antigen, A…        3 MDA231_2  4.23

##INTERPRETATION## ## What can you see in this figure? are the repeated measures/reps similar or different? What does this say about the precision and accuracy of them? ##How does the control compare to the variables? Is this what you might expect? Why? What would you look for in the literature to support this idea?

## # A tibble: 6 × 5
##   Gene_Symbol Description                                Peptides variable value
##   <chr>       <chr>                                         <dbl> <chr>    <dbl>
## 1 TDP2        Isoform 2 of Tyrosyl-DNA phosphodiesteras…        3 MCF10A_1  2.36
## 2 TDP2        Isoform 2 of Tyrosyl-DNA phosphodiesteras…        3 MCF10A_2  1.72
## 3 TDP2        Isoform 2 of Tyrosyl-DNA phosphodiesteras…        3 MCF7_1    5.89
## 4 TDP2        Isoform 2 of Tyrosyl-DNA phosphodiesteras…        3 MCF7_2    4.71
## 5 TDP2        Isoform 2 of Tyrosyl-DNA phosphodiesteras…        3 MDA231_1  5.94
## 6 TDP2        Isoform 2 of Tyrosyl-DNA phosphodiesteras…        3 MDA231_2  7.38

##INTERPRETATION## ## What can you see in this figure? are the repeated measures/reps similar or different? What does this say about the precision and accuracy of them? ##How does the control compare to the variables? Is this what you might expect? Why? What would you look for in the literature to support this idea?

#interpretation HINT:insert a chunk and create two seprate lines of code that filter for your specific upregulated genes/proteins of interest and selects for only their gene symbols and descriptions. Do this for the downregulated as well. This will generate two list of the descriptors for each gene of interest, helping you understand your figures. Be sure to view it, not just ask for the head of the table generated.

WRAP UP

##now you can knit this and publish to save and share your code. Use this to work with either the brain or breast cells and the Part_C_template to complete your lab 6 ELN. ##Annotate when you have trouble and reference which line of code you need help on ## good luck and have fun!