This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

#Your Task Instructions Work through the code chunks in the following notebook to read in and analyse your processed data on gene expression changes in response to stress in potato. There are 16 Questions in this notebook. Follow the instructions for each code chunk, adding code where specified and using the exact variable names supplied. Note that all variable names are in lower case letters. You should load the libraries to get started as instructed below, but you should complete all tasks without loading any additional libraries.

#IMPORTANT 1) For some of the questions, you will see ottr::check cells. It is important that you do NOT comment out or modify those cells because they will be used for marking your work. You will not be able to run the ottr::check cells, so please do ignore them. 2) If you have used any commands such as View() that open up a graphical output in a new tab, please COMMENT OUT these lines of code BEFORE submitting your work. 3) If you have used rm(list=ls()) in your code, please comment this line out before submitting your work. 4) Please check all your code chunks run successfully before submitting your work.

#LOADING LIBRARIES TO GET STARTED First, run the following chunk to load the necessary libraries (no other libraries will be required).

```r
library(testthat)
library(assertthat)
library(stringr)
library(readr)

<!-- rnb-source-end -->

<!-- rnb-output-begin eyJkYXRhIjoiXG5BdHRhY2hpbmcgcGFja2FnZTog4oCYcmVhZHLigJlcblxuVGhlIGZvbGxvd2luZyBvYmplY3RzIGFyZSBtYXNrZWQgZnJvbSDigJhwYWNrYWdlOnRlc3R0aGF04oCZOlxuXG4gICAgZWRpdGlvbl9nZXQsIGxvY2FsX2VkaXRpb25cbiJ9 -->

Attaching package: ‘readr’

The following objects are masked from ‘package:testthat’:

edition_get, local_edition



<!-- rnb-output-end -->

<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxubGlicmFyeShkcGx5cilcbmBgYFxuYGBgIn0= -->

```r
```r
library(dplyr)

<!-- rnb-source-end -->

<!-- rnb-output-begin eyJkYXRhIjoiXG5BdHRhY2hpbmcgcGFja2FnZTog4oCYZHBseXLigJlcblxuVGhlIGZvbGxvd2luZyBvYmplY3QgaXMgbWFza2VkIGZyb20g4oCYcGFja2FnZTp0ZXN0dGhhdOKAmTpcblxuICAgIG1hdGNoZXNcblxuVGhlIGZvbGxvd2luZyBvYmplY3RzIGFyZSBtYXNrZWQgZnJvbSDigJhwYWNrYWdlOnN0YXRz4oCZOlxuXG4gICAgZmlsdGVyLCBsYWdcblxuVGhlIGZvbGxvd2luZyBvYmplY3RzIGFyZSBtYXNrZWQgZnJvbSDigJhwYWNrYWdlOmJhc2XigJk6XG5cbiAgICBpbnRlcnNlY3QsIHNldGRpZmYsIHNldGVxdWFsLCB1bmlvblxuIn0= -->

Attaching package: ‘dplyr’

The following object is masked from ‘package:testthat’:

matches

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union



<!-- rnb-output-end -->

<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxubGlicmFyeShnZ3Bsb3QyKVxubGlicmFyeSh0aWR5cilcbmBgYFxuYGBgIn0= -->

```r
```r
library(ggplot2)
library(tidyr)

<!-- rnb-source-end -->

<!-- rnb-output-begin eyJkYXRhIjoiXG5BdHRhY2hpbmcgcGFja2FnZTog4oCYdGlkeXLigJlcblxuVGhlIGZvbGxvd2luZyBvYmplY3QgaXMgbWFza2VkIGZyb20g4oCYcGFja2FnZTp0ZXN0dGhhdOKAmTpcblxuICAgIG1hdGNoZXNcbiJ9 -->

Attaching package: ‘tidyr’

The following object is masked from ‘package:testthat’:

matches



<!-- rnb-output-end -->

<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxubGlicmFyeShnZ3Zlbm4pXG5gYGBcbmBgYCJ9 -->

```r
```r
library(ggvenn)

<!-- rnb-source-end -->

<!-- rnb-output-begin eyJkYXRhIjoiRXJyb3IgaW4gbGlicmFyeShnZ3Zlbm4pIDogdGhlcmUgaXMgbm8gcGFja2FnZSBjYWxsZWQg4oCYZ2d2ZW5u4oCZXG4ifQ== -->

Error in library(ggvenn) : there is no package called ‘ggvenn’




<!-- rnb-output-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->


#Q1) READING IN THE FULL GENE EXPRESSION DATASET

Add code to the chunk below that does the following:
* Reads in the data file you created using Python called `all_VarX_TwoTimePoints.csv` and assigns it to a data frame called `var_x_all`
* Reads in the data file you created using Python called `all_VarY_TwoTimePoints.csv` and assigns it to a data frame called `var_y_all`
# 4 marks / 30 (total 4 so far).


<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuIyMjIyMjIyMjIyMjIyMjIyMjIEFERCBZT1VSIENPREUgVU5ERVIgVEhJUyBMSU5FICMjIyMjIyNcbmdldHdkKClcbmBgYFxuYGBgIn0= -->

```r
```r
################## ADD YOUR CODE UNDER THIS LINE #######
getwd()

<!-- rnb-source-end -->

<!-- rnb-output-begin eyJkYXRhIjoiWzFdIFxcL3Jkcy9ob21lcy9tL21pYTIwNC9Vbml0OFxcXG4ifQ== -->

[1] /rds/homes/m/mia204/Unit8




<!-- rnb-output-end -->

<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc2V0d2QoXFwvcmRzL2hvbWVzL20vbWlhMjA0L1VuaXQ4L01hdHJpeFxcKVxuYGBgXG5gYGAifQ== -->

```r
```r
setwd(\/rds/homes/m/mia204/Unit8/Matrix\)

<!-- rnb-source-end -->

<!-- rnb-output-begin eyJkYXRhIjoiV2FybmluZzogVGhlIHdvcmtpbmcgZGlyZWN0b3J5IHdhcyBjaGFuZ2VkIHRvIC9yZHMvaG9tZXMvbS9taWEyMDQvVW5pdDgvTWF0cml4IGluc2lkZSBhIG5vdGVib29rIGNodW5rLiBUaGUgd29ya2luZyBkaXJlY3Rvcnkgd2lsbCBiZSByZXNldCB3aGVuIHRoZSBjaHVuayBpcyBmaW5pc2hlZCBydW5uaW5nLiBVc2UgdGhlIGtuaXRyIHJvb3QuZGlyIG9wdGlvbiBpbiB0aGUgc2V0dXAgY2h1bmsgdG8gY2hhbmdlIHRoZSB3b3JraW5nIGRpcmVjdG9yeSBmb3Igbm90ZWJvb2sgY2h1bmtzLlxuIn0= -->

Warning: The working directory was changed to /rds/homes/m/mia204/Unit8/Matrix inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the working directory for notebook chunks.




<!-- rnb-output-end -->

<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxubGlzdC5maWxlcygpXG5gYGBcbmBgYCJ9 -->

```r
```r
list.files()

<!-- rnb-source-end -->

<!-- rnb-output-begin eyJkYXRhIjoiIFsxXSBcXGFsbF9WYXJYX1R3b1RpbWVQb2ludHMuY3N2XFwgXFxhbGxfVmFyWV9Ud29UaW1lUG9pbnRzLmNzdlxcXG4gWzNdIFxcZmluYWxYX2ZpbGUuY3N2XFwgICAgICAgICAgICBcXGZpbmFsWV9maWxlLmNzdlxcICAgICAgICAgICBcbiBbNV0gXFxMZWFmX0RFR3NfVmFyWF9UMS5jc3ZcXCAgICAgIFxcTGVhZl9ERUdzX1ZhclguY3N2XFwgICAgICAgIFxuIFs3XSBcXExlYWZfREVHc19WYXJZX1QxLmNzdlxcICAgICAgXFxMZWFmX0RFR3NfVmFyWS5jc3ZcXCAgICAgICAgXG4gWzldIFxcTGludXhfTUEuc2hcXCAgICAgICAgICAgICAgICBcXG5ld19WYXJYX0hlYWRlci5jc3ZcXCAgICAgICBcblsxMV0gXFxuZXdfVmFyWV9IZWFkZXIuY3N2XFwgICAgICAgIFxcUHl0aG9uX01JICgxKS5pcHluYlxcICAgICAgIFxuWzEzXSBcXFB5dGhvbl9NSSAoMikuaXB5bmJcXCAgICAgICAgXFxQeXRob25fTUkuaXB5bmJcXCAgICAgICAgICAgXG5bMTVdIFxcU3RlcF8yLmlweW5iXFwgICAgICAgICAgICAgICBcXFVudGl0bGVkLmlweW5iXFwgICAgICAgICAgICBcblsxN10gXFxVbnRpdGxlZDEuaXB5bmJcXCAgICAgICAgICAgIFxcVmFyWFxcICAgICAgICAgICAgICAgICAgICAgIFxuWzE5XSBcXFZhclhfSGVhZGVyLmNzdlxcICAgICAgICAgICAgXFxWYXJYX291dHB1dC5jc3ZcXCAgICAgICAgICAgXG5bMjFdIFxcVmFyWF9zb3J0LmNzdlxcICAgICAgICAgICAgICBcXFZhcllcXCAgICAgICAgICAgICAgICAgICAgICBcblsyM10gXFxWYXJZX0hlYWRlci5jc3ZcXCAgICAgICAgICAgIFxcVmFyWV9sYXN0b3V0cHV0LmNzdlxcICAgICAgIFxuWzI1XSBcXFZhcllfc29ydC5jc3ZcXCAgICAgICAgICAgICBcbiJ9 -->

[1] _VarX_TwoTimePoints.csv _VarY_TwoTimePoints.csv
[3] _file.csv  _file.csv 
[5] _DEGs_VarX_T1.csv  _DEGs_VarX.csv 
[7] _DEGs_VarY_T1.csv  _DEGs_VarY.csv 
[9] _MA.sh  _VarX_Header.csv 
[11] _VarY_Header.csv  _MI (1).ipynb 
[13] _MI (2).ipynb  _MI.ipynb 
[15] _2.ipynb  .ipynb 
[17] 1.ipynb   
[19] _Header.csv  _output.csv 
[21] _sort.csv   
[23] _Header.csv  _lastoutput.csv 
[25] _sort.csv 




<!-- rnb-output-end -->

<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxudmFyX3hfYWxsIDwtIHJlYWQuY3N2KFxcYWxsX1ZhclhfVHdvVGltZVBvaW50cy5jc3ZcXClcbmhlYWQodmFyX3hfYWxsKVxuYGBgXG5gYGByXG52YXJfeV9hbGwgPC0gcmVhZC5jc3YoXFxhbGxfVmFyWV9Ud29UaW1lUG9pbnRzLmNzdlxcKVxuaGVhZCh2YXJfeV9hbGwpXG5gYGBcbmBgYCJ9 -->

```r
```r
var_x_all <- read.csv(\all_VarX_TwoTimePoints.csv\)
head(var_x_all)
var_y_all <- read.csv(\all_VarY_TwoTimePoints.csv\)
head(var_y_all)

<!-- rnb-source-end -->

<!-- rnb-frame-begin eyJtZXRhZGF0YSI6eyJjbGFzc2VzIjoiZGF0YS5mcmFtZSIsIm5yb3ciOjYsIm5jb2wiOjcsInN1bW1hcnkiOnsiRGVzY3JpcHRpb24iOiJkZiBbNiDDlyA3XSJ9fSwicmRmIjoiVWtSWU13cFlDZ0FBQUFNQUJBSUFBQU1GQUFBQUFBVlZWRVl0T0FBQUJBSUFBQUFCQUFRQUNRQUFBQUY0QUFBREV3QUFBQWNBQUFBUUFBQUFCZ0FFQUFrQUFBQVNVMjlzZEhVdVJFMHVNREZITURBd01ERXdBQVFBQ1FBQUFCSlRiMngwZFM1RVRTNHdNVWN3TURBd01qQUFCQUFKQUFBQUVsTnZiSFIxTGtSTkxqQXhSekF3TURBek1BQUVBQWtBQUFBU1UyOXNkSFV1UkUwdU1ERkhNREF3TURRd0FBUUFDUUFBQUJKVGIyeDBkUzVFVFM0d01VY3dNREF3TlRBQUJBQUpBQUFBRWxOdmJIUjFMa1JOTGpBeFJ6QXdNREEyTUFBQUFBNEFBQUFHUUhhTE0ybnZoK05BVnNmelNBRDFua0I3Y3JDczltMndRSERFOXFNcVBCUkFkVnQwRTMyTFJrQjlSRytvNm05MUFBQUFEZ0FBQUFaQWRwbDFCV1dhajBCZzRmRTR2Ti93UUlJY1hvbW9pczVBY3RrR2h3SytJRUIwSkpwamMyemZRSUFVSXEvZHFMMEFBQUFPQUFBQUJrQjAwVm1haXpFaFFGc3RKS05OZDRKQWNqUmsyWjVzWjBCdG5ZQ0thWHJ1UUhWdGNKUlNNZnhBZ1NqUHY2dDhHZ0FBQUE0QUFBQUdRSGl6cGVEcWtRTkFVVk9MWnliVnJVQm04aGw2eFpqOFFHY3VDaEQ2K2dkQWZmaExFejZoSlVDS09VR3cyaWYvQUFBQURnQUFBQVpBZUMxRC9DamE2MEJicHdpYUFuVWxRR1oxUlRmaXhWMUFiYU40OUhNRUJFQjRaKzU4dFYxRVFJSGsxcmN4V2hVQUFBQU9BQUFBQmtCM25Cc1dPNnVvUUV4Q25Jc1FsTTVBY0szTFlLRXFpRUJveGJ0emM1VWpRSHFQbm5qOGF5QkFncmFrQTlCdkdRQUFCQUlBQUFBQkFBUUFDUUFBQUFWdVlXMWxjd0FBQUJBQUFBQUhBQVFBQ1FBQUFBbG5aVzVsWDI1aGJXVUFCQUFKQUFBQUNsWmhjbGhEVW1Wd0xqRUFCQUFKQUFBQUNsWmhjbGhEVW1Wd0xqSUFCQUFKQUFBQUNsWmhjbGhEVW1Wd0xqTUFCQUFKQUFBQUNsWmhjbGd4VW1Wd0xqRUFCQUFKQUFBQUNsWmhjbGd4VW1Wd0xqSUFCQUFKQUFBQUNsWmhjbGd4VW1Wd0xqTUFBQVFDQUFBQUFRQUVBQWtBQUFBSmNtOTNMbTVoYldWekFBQUFEUUFBQUFLQUFBQUFBQUFBQmdBQUJBSUFBQUFCQUFRQUNRQUFBQVZqYkdGemN3QUFBQkFBQUFBQkFBUUFDUUFBQUFwa1lYUmhMbVp5WVcxbEFBQUEvZ0FBQkFJQUFBQUJBQVFBQ1FBQUFBZHZjSFJwYjI1ekFBQUNFd0FBQUFVQUFBQVFBQUFBQVFBRUFBa0FBQUFCY2dBQUFCQUFBQUFCQUFRQUNRQUFBQTkxYm01aGJXVmtMV05vZFc1ckxUSUFBQUFLQUFBQUFRQUFBQUVBQUFBTkFBQUFBUUFBQUFZQUFBQU5BQUFBQVFBQUFBY0FBQVFDQUFBQy93QUFBQkFBQUFBRkFBUUFDUUFBQUFabGJtZHBibVVBQkFBSkFBQUFCV3hoWW1Wc0FBUUFDUUFBQUE1eWIzZHVZVzFsY3k1d2NtbHVkQUFFQUFrQUFBQUtjbTkzY3k1MGIzUmhiQUFFQUFrQUFBQUtZMjlzY3k1MGIzUmhiQUFBQVA0QUFBRCsifQ== -->

<div data-pagedtable="false">
  <script data-pagedtable-source type="application/json">
{"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":["gene_name"],"name":[1],"type":["chr"],"align":["left"]},{"label":["VarXCRep.1"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["VarXCRep.2"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["VarXCRep.3"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["VarX1Rep.1"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["VarX1Rep.2"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["VarX1Rep.3"],"name":[7],"type":["dbl"],"align":["right"]}],"data":[{"1":"Soltu.DM.01G000010","2":"360.70005","3":"361.5911","4":"333.0844","5":"395.22800","6":"386.8291","7":"377.7566","_rn_":"1"},{"1":"Soltu.DM.01G000020","2":"91.12422","3":"135.0607","4":"108.7054","5":"69.30538","6":"110.6099","7":"56.5204","_rn_":"2"},{"1":"Soltu.DM.01G000030","2":"439.16813","3":"579.5462","4":"291.2746","5":"183.56561","6":"179.6647","7":"266.8622","_rn_":"3"},{"1":"Soltu.DM.01G000040","2":"268.31021","3":"301.5641","4":"236.9219","5":"185.43873","6":"237.1085","7":"198.1791","_rn_":"4"},{"1":"Soltu.DM.01G000050","2":"341.71584","3":"322.2877","4":"342.8400","5":"479.51833","6":"390.4957","7":"424.9762","_rn_":"5"},{"1":"Soltu.DM.01G000060","2":"468.27726","3":"514.5169","4":"549.1014","5":"839.15708","6":"572.6048","7":"598.8301","_rn_":"6"}],"options":{"columns":{"min":{},"max":[10],"total":[7]},"rows":{"min":[10],"max":[10],"total":[6]},"pages":{}}}
  </script>
</div>

<!-- rnb-frame-end -->

<!-- rnb-frame-begin eyJtZXRhZGF0YSI6eyJjbGFzc2VzIjoiZGF0YS5mcmFtZSIsIm5yb3ciOjYsIm5jb2wiOjcsInN1bW1hcnkiOnsiRGVzY3JpcHRpb24iOiJkZiBbNiDDlyA3XSJ9fSwicmRmIjoiVWtSWU13cFlDZ0FBQUFNQUJBSUFBQU1GQUFBQUFBVlZWRVl0T0FBQUJBSUFBQUFCQUFRQUNRQUFBQUY0QUFBREV3QUFBQWNBQUFBUUFBQUFCZ0FFQUFrQUFBQVNVMjlzZEhVdVJFMHVNREZITURBd01ERXdBQVFBQ1FBQUFCSlRiMngwZFM1RVRTNHdNVWN3TURBd01qQUFCQUFKQUFBQUVsTnZiSFIxTGtSTkxqQXhSekF3TURBek1BQUVBQWtBQUFBU1UyOXNkSFV1UkUwdU1ERkhNREF3TURRd0FBUUFDUUFBQUJKVGIyeDBkUzVFVFM0d01VY3dNREF3TlRBQUJBQUpBQUFBRWxOdmJIUjFMa1JOTGpBeFJ6QXdNREEyTUFBQUFBNEFBQUFHUUdjaFJMazN4SkpBWk1VT3NKenJYa0J3SU1hTyt0b21RR2c3Tzg0TVorMUFlemlSVURuVU4wQ0dPYWQ2SDREUEFBQUFEZ0FBQUFaQWJNeWFnOW9VdEVCcEZzSko4YnVhUUhEZmlCTXBxSEJBYlNjUW9ka0RQRUI3dlRncUxkRDBRSVNyL2RSMjVUb0FBQUFPQUFBQUJrQnVvekhjR3g1cFFHanlRU3JZR3Q5QWMrdEtiRFRjZ1VCc1hHc3U3Vk5WUUhsVFlrZStCTE5BZzhiZUFXVmF6Z0FBQUE0QUFBQUdRRzg3TGJBOGMwNUFhaW1JSGFZRE5FQmxRYjZZSVBPV1FHMXV1U0dKdllSQWZQRWsrbkIwQmtDSFRQMDZleVlSQUFBQURnQUFBQVpBYmFoWWlGZFI5a0JsUXJTcm80ZFpRSEx3SDl1ZFNpaEFjMkozallXUElVQjVBeTdvTC8raVFJSUxjSGYxQkNNQUFBQU9BQUFBQmtCeE9PNUVNTEY1UUc5c1hXWU1wRk5BYzY3SEtWU05JRUJyYzhUdmlMbDRRSG5QM3dHQ3R2OUFoT1F0TnlBbnZnQUFCQUlBQUFBQkFBUUFDUUFBQUFWdVlXMWxjd0FBQUJBQUFBQUhBQVFBQ1FBQUFBbG5aVzVsWDI1aGJXVUFCQUFKQUFBQUNsWmhjbGxEVW1Wd0xqRUFCQUFKQUFBQUNsWmhjbGxEVW1Wd0xqSUFCQUFKQUFBQUNsWmhjbGxEVW1Wd0xqTUFCQUFKQUFBQUNsWmhjbGt4VW1Wd0xqRUFCQUFKQUFBQUNsWmhjbGt4VW1Wd0xqSUFCQUFKQUFBQUNsWmhjbGt4VW1Wd0xqTUFBQVFDQUFBQUFRQUVBQWtBQUFBSmNtOTNMbTVoYldWekFBQUFEUUFBQUFLQUFBQUFBQUFBQmdBQUJBSUFBQUFCQUFRQUNRQUFBQVZqYkdGemN3QUFBQkFBQUFBQkFBUUFDUUFBQUFwa1lYUmhMbVp5WVcxbEFBQUEvZ0FBQkFJQUFBQUJBQVFBQ1FBQUFBZHZjSFJwYjI1ekFBQUNFd0FBQUFVQUFBQVFBQUFBQVFBRUFBa0FBQUFCY2dBQUFCQUFBQUFCQUFRQUNRQUFBQTkxYm01aGJXVmtMV05vZFc1ckxUSUFBQUFLQUFBQUFRQUFBQUVBQUFBTkFBQUFBUUFBQUFZQUFBQU5BQUFBQVFBQUFBY0FBQVFDQUFBQy93QUFBQkFBQUFBRkFBUUFDUUFBQUFabGJtZHBibVVBQkFBSkFBQUFCV3hoWW1Wc0FBUUFDUUFBQUE1eWIzZHVZVzFsY3k1d2NtbHVkQUFFQUFrQUFBQUtjbTkzY3k1MGIzUmhiQUFFQUFrQUFBQUtZMjlzY3k1MGIzUmhiQUFBQVA0QUFBRCsifQ== -->

<div data-pagedtable="false">
  <script data-pagedtable-source type="application/json">
{"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":["gene_name"],"name":[1],"type":["chr"],"align":["left"]},{"label":["VarYCRep.1"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["VarYCRep.2"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["VarYCRep.3"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["VarY1Rep.1"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["VarY1Rep.2"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["VarY1Rep.3"],"name":[7],"type":["dbl"],"align":["right"]}],"data":[{"1":"Soltu.DM.01G000010","2":"185.0396","3":"230.3939","4":"245.0998","5":"249.8493","6":"237.2608","7":"275.5582","_rn_":"1"},{"1":"Soltu.DM.01G000020","2":"166.1580","3":"200.7112","4":"199.5705","5":"209.2979","6":"170.0846","7":"251.3864","_rn_":"2"},{"1":"Soltu.DM.01G000030","2":"258.0485","3":"269.9707","4":"318.7057","5":"170.0545","6":"303.0078","7":"314.9236","_rn_":"3"},{"1":"Soltu.DM.01G000040","2":"193.8511","3":"233.2208","4":"226.8881","5":"235.4601","6":"310.1542","7":"219.6178","_rn_":"4"},{"1":"Soltu.DM.01G000050","2":"435.5355","3":"443.8262","4":"405.2115","5":"463.0715","6":"400.1990","7":"412.9919","_rn_":"5"},{"1":"Soltu.DM.01G000060","2":"711.2068","3":"661.4989","4":"632.8584","5":"745.6236","6":"577.4299","7":"668.5221","_rn_":"6"}],"options":{"columns":{"min":{},"max":[10],"total":[7]},"rows":{"min":[10],"max":[10],"total":[6]},"pages":{}}}
  </script>
</div>

<!-- rnb-frame-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->



<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuYGBgclxuLiA9IG90dHI6OmNoZWNrKFxcdGVzdHMvcTEuUlxcKVxuYGBgXG5gYGBcbmBgYCJ9 -->

```r
```r
```r
. = ottr::check(\tests/q1.R\)
[1] 28378
```r
nrow(var_y_all)

<!-- rnb-source-end -->

<!-- rnb-output-begin eyJkYXRhIjoiWzFdIDI4Mzc4XG4ifQ== -->

[1] 28378




<!-- rnb-output-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->


#Q2) HOW MANY GENES ARE IN THE WHOLE DATASET?

Add code to the chunk below that does the following:
* Find out how many genes are in your dataset and assign the result to a variable called `num_genes`.
# 1 mark / 30 (total 5 so far).


<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuYGBgclxuLiA9IG90dHI6OmNoZWNrKFxcdGVzdHMvcTIuUlxcKVxuYGBgXG5gYGBcbmBgYCJ9 -->

```r
```r
```r
. = ottr::check(\tests/q2.R\)
```r
```r
################## ADD YOUR CODE UNDER THIS LINE #######
  

#Q3) READING IN THE DATA ON DIFFERENTIALLY EXPRESSED GENES (DEGs)

Add code to the chunk below that does the following: * Reads in the data file you created using Python called Leaf_DEGs_VarX.csv and assigns it to a data frame called var_x_degs * Reads in the data file you created using Python called Leaf_DEGs_VarY.csv and assigns it to a data frame called var_y_degs # 4 marks / 30 (total 9 so far).

################## ADD YOUR CODE UNDER THIS LINE #######

getwd()
setwd("/rds/homes/m/mia204/Unit8/Matrix")
list.files()
var_x_degs <- read.csv("Leaf_DEGs_VarX.csv")
head(var_x_degs)
var_y_degs <- read.csv("Leaf_DEGs_VarY.csv")
head(var_y_degs)
. = ottr::check("tests/q3.R")

Q4) INVESTIGATE THE DISTRIBUTION OF EXPRESSION VALUES FOR ALL GENES IN EACH SAMPLE (Variety X).

First, we have to recognise that our data is currently in WIDE FORMAT, with a column for each variable (in this case, each sample). However, it is much easier, as we have seen in previous practicals, to have our data in LONG FORMAT, with a column for each variable type and column for the values.

Run the following cell to use the tidyr long_format() to transform your var_x_all data frame into a long format.

var_x_all.long <- pivot_longer(var_x_all,cols=VarXCRep.1:VarX1Rep.3,names_to = "sample", values_to = "expression")
#View(var_x_all.long)

Now you have your data in long format: Add code to the chunk below that does the following: * Create a suitable plot to look at the distribution of expression values for all the genes as a function of the sample, for Variety X.

################## ADD YOUR CODE UNDER THIS LINE #######
  

Q5) INVESTIGATE THE DISTRIBUTION OF EXPRESSION VALUES FOR ALL GENES IN EACH SAMPLE (Variety Y).

Now you can repeat the above process for Variety Y. Add code to the chunk below that does the following: * Use the tidyr long_format() to transform your var_y_all data frame into a long format and call the data frame var_y_all.long. * Create a suitable plot to look at the distribution of expression values for all the genes as a function of the sample, for Variety Y.

################## ADD YOUR CODE UNDER THIS LINE #######
var_y_all.long <- pivot_longer(var_y_all,cols=VarYCRep.1:VarY1Rep.3,names_to = "sample", values_to = "expression")
#View(var_Y_all.long)

#Q6) INVESTIGATE THE DISTRIBUTION OF EXPRESSION VALUES FOR THE DEGs IN EACH SAMPLE (Variety X).

Add code to the chunk below that does the following: * Use the tidyr long_format() to transform your var_x_degs data frame into a long format and call the data frame var_x_degs.long. * Create a suitable plot to look at the distribution of expression values for DEGs as a function of the sample, for Variety X.

################## ADD YOUR CODE UNDER THIS LINE #######

#Q7) INVESTIGATE THE DISTRIBUTION OF EXPRESSION VALUES FOR THE DEGs IN EACH SAMPLE (Variety Y).

Add code to the chunk below that does the following: * Use the tidyr long_format() to transform your var_y_degs data frame into a long format and call the data frame var_y_degs.long. * Create a suitable plot to look at the distribution of expression values for DEGs as a function of the sample, for Variety Y.

################## ADD YOUR CODE UNDER THIS LINE #######

#Q8) HOW MANY DIFFERENTIALLY EXPRESSED GENES ARE THERE IN EACH VARIETY?

Add code to the chunk below that does the following: * Find out how many duplicate Soltu gene names there are in the var_x_degs data frame and assign the result to a variable called var_x_dup * Find out how many duplicate Soltu gene names there are in the var_y_degs data frame and assign the result to a variable called var_y_dup

2 marks / 30 (total 11 so far).

################## ADD YOUR CODE UNDER THIS LINE #######
. = ottr::check("tests/q8.R")

#Q9) INVESTIGATE IF THE SAME OR DIFFERENT GENES ARE DIFFERENTIALLY EXPRESSED IN THE TWO VARIETIES.

Add code to the chunk below that does the following: * Create a suitable plot to look at the overlap in the DEGs between the two Varieties.

################## ADD YOUR CODE UNDER THIS LINE #######

#Q10) SEPARATE OUT THE UP- AND DOWN- REGULATED DEGs (BETWEEN STRESS AND CONTROL CONDITION).

By looking at the gene expression data in the var_x_degs and var_y_degs data frames, you can see that some genes have a positive log 2 fold change and others have a negative log 2 fold change.

Add code to the chunk below that does the following: * Create a data frame called var_x_degs.up containing only genes that are upregulated in Stress Treatment compared to control in Variety X. * Create a data frame called var_x_degs.down containing only genes that are downregulated in Stress Treatment compared to control in Variety X. * Create a data frame called var_y_degs.up containing only genes that are upregulated in Stress Treatment compared to control in Variety Y. * Create a data frame called var_y_degs.down containing only genes that are downregulated in Stress Treatment compared to control in Variety Y. # 4 marks / 30 (total 15 so far).

################## ADD YOUR CODE UNDER THIS LINE #######
. = ottr::check("tests/q10.R")

#Q11) INVESTIGATE THE FOLD CHANGE IN GENE EXPRESSION FOR THE DEGs, BETWEEN STRESS AND CONTROL CONDITION.

Add code to the chunk below that does the following: * Create a box plot to show the distribution of log2 fold change for all DEGs by variety. Hint: the base R boxplot() command and the abs() function could be helpful here. * Create a box plot to show the distribution of log2 fold change for upregulated DEGs by variety. Hint: the base R boxplot() command could be helpful here. * Create a box plot to show the distribution of log2 fold change for downregulated DEGs by variety. Hint: the base R boxplot() command could be helpful here.

################## ADD YOUR CODE UNDER THIS LINE #######

#Q12) INVESTIGATE THE FUNCTIONS OF THE BOTTOM MOST DIFFERENTIALLY EXPRESSED (DOWNREGULATED) GENES

Add code to the chunk below that does the following: * Find out the function of the least downregulated gene in Variety X and assign the result to variable called bottom_gene.x. * Find out the function of the least downregulated gene in Variety Y and assign the result to variable called bottom_gene.y. # 5 marks / 30 (total 20 so far).

################## ADD YOUR CODE UNDER THIS LINE #######
. = ottr::check("tests/q12.R")

#Q13) INVESTIGATE THE BEHAVIOUR OF THE BIOLOGICAL REPLICATES FOR THE DEGs in Variety X IN THE TREATMENT TIME POINT.

Add code to the chunk below that does the following: * Create a set of scatterplots to visually inspect how well the different replicates agree/correlate for the DEGs in Variety X in the treatment time point.

################## ADD YOUR CODE UNDER THIS LINE #######

#Q14) INVESTIGATE THE BEHAVIOUR OF THE BIOLOGICAL REPLICATES FOR THE DEGs in Variety X IN THE CONTROL TIME POINT.

Add code to the chunk below that does the following: * Create a set of scatterplots to visually inspect how well the different replicates agree/correlate for the DEGs in Variety X in the control time point.

################## ADD YOUR CODE UNDER THIS LINE #######

#Q15) COMPARE THE MEAN EXPRESSION IN TREATMENT VERSUS CONTROL REPLICATES FOR EACH DEG.

Add code to the chunk below that does the following: * Modify your data frame var_x_degs to include two new (additional) columns as follows: * The first new column should be named control_mean and contain the mean expression value for the three control replicates. * The second new column should be named stress_mean and contain the mean expression value for the three stress treatment replicates. # 6 marks / 30 (total 26 so far).

################## ADD YOUR CODE UNDER THIS LINE #######
. = ottr::check("tests/q15.R")

#Q16) PRIORITISE GENES OF INTEREST FOR FURTHER INVESTIGATION.

Add code to the chunk below that does the following: * Create a data frame called var_y_degs.down.big containing only genes in Variety y that are downregulated in Stress Treatment compared to control, have at least a 2 fold absolute change in expression and have a p value less than 1e-03. *Hint: remember you are dealing with log 2 fold change. # 4 marks / 30 (total 30 so far).

################## ADD YOUR CODE UNDER THIS LINE #######
. = ottr::check("tests/q16.R")

Perhaps these genes you have extracted could be important candidates for further analysis!

END OF ASSESSMENT.

