Introduction:

In this lab we will learn:

PART 1: Loading data

There are essentially two ways in which you can easily load data files into R. R has the ability to read in many different file types (ex: .csv, .xlsx, etc). For the first portion of this course, we are going to specifically be working with “.csv” files, later on we will work with other file types common in bioinformatics.

FIRST WAY: files can be imported using the “Import Dataset” drop down under the “Environment” tab (top right). For reading in “.csv” files, you would select “From text (base)” then simply search your computer or your cloud repository for the specific file you need. For “.csv” files, you will get the option to re-name your file (highly recommended for complicated file names - simple is best), you will also want to be sure that heading is clicked “Yes” and that Separator is set to “Comma.” This will give you a preview of what your file will look like, if it looks good, go ahead and import!

SECOND WAY: files can be imported using code. You will have to find and set an appropriate working directory. Running code tells R where to look for certain files. The coding will give you the same opportunities above to rename your file, tell R what separates cells, and whether or not to keep column titles. We will practice this method below.

Finding and Setting a working directory

Regardless of whether you utilize the cloud or desktop version of RStudio, you should create a folder on your desktop or in the cloud that can be used to store and save files needed for this class, something like “R Data”

When you try and import datasets into R, the program has a default location that it looks for files. You can check what the default location is by using the code below:

getwd()
[1] "/Users/alysonbarsalou/Downloads/R Data"

You’ll want to set your working directory to a location of your choice, likely a folder you’ve created specifically for this purpose, use the following code but plugging in your specific file path (HINT: it can be really easy to copy and paste the output from the getwd() code into the parentheses of setwd(), you will likely need to add an extra step for the path):

setwd("/Users/alysonbarsalou/Downloads/R Data")

Here is what mine looks like when I switch from the default wd to one where my data lives: > getwd() [1] “/Users/gordonober”

setwd(“/Users/gordonober/Desktop/R data”)

I had to add the “/Desktop/R data” to link to the R data folder on my Desktop (if you right click on whatever folder you want to use, you can typically find it’s path)

Loading data and data frames

Assuming your file is in CSV format (as most are), you can read information into R by using the read.table() command. The general syntax will be read.table('file_path/file_name.csv', sep=',', header=T), below is code to read in the remission .csv files (which you will download from Canvas and move to your working directory):

remission_data<-read.table('remission.csv',sep=',', header=T)

The remission_data variable references an R data frame; this is essentially the R version of an excel sheet. To see the contents, we can just type the name and run a code chunk.

remission_data

Accessing data and statistical summaries

We can access the data in particular columns by using the $ operator:

remission_data$LI
 [1] 0.4 0.4 0.5 0.5 0.6 0.6 0.6 0.7 0.7 0.7 0.8 0.8 0.8 0.9 1.0 1.0 1.0 1.1
[19] 1.1 1.2 1.3 1.4 1.6 1.7 1.9 1.9 1.9

Exercise 1:

*Below, create a code chunk and access the m column of the remission_data` data frame.*


# Access the 'm' column from the 'remission_data' data frame
remission_data$m
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Once you have data loaded into R, you can easily manipulate it! This might mean adding to it, converting values from a certain column, removing certain values or rows.

Exercise 2: Applying some math!

What do you think the following command will do? (Answer below the code chunk.)

remission_data$NewColumn <- log(remission_data$LI)

The code chunk above creates a new column titled “NewColumn” with the natural logarithm values of the existing LI column. ****

Exercise 3: removing data

In some cases, you may want to exclude certain rows of data based on one reason or another.

What do you think the following command will do? (Answer below the code chunk.) BE SPECIFIC

remission_data <- remission_data[-c(2, 4, 6), ]

This command removes the 2nd, 4th, and 6th rows from the remission_data data frame. • c(2, 4, 6) indicates the rows to remove. • The negative sign (-) tells R to exclude the rows. • The empty space after the comma (,) indicates that all the rest of the columns should be retained. ****

NOTE: when you see [,] you can interpret this as [rows, columns]

Exercise 4: saving your new data

Exercises 2 and 3 had you add a column that manipulated an existing one and had you remove rows you didn’t want from your original dataset. Given this change, it can be useful to save your changes as a new .csv file. The code below will save your data as a .csv file and the file can be found in your working directory. The code below also allows you to create your own new file name.

write.csv(remission_data, "new_remission_data.csv")

Exercise 5: Practice this with another data set!

Using a repository of data set for R (https://vincentarelbundock.github.io/Rdatasets/datasets.html) choose any data set you wish and perform the following tasks:

1. Load the data into R, give it a unique name, and visualize it

bank_data<-read.table('BankWages.csv',sep=',', header=T)
bank_data

2. Apply math of your choice to one of the numerical columns (make sure your dataset has one of these!) - you can log transform, you can add, subtract, multiply, divide, etc. values - create a new column when doing this

#Add 5 to each numeric value in the education column 
bank_data$EducationAdd5 <- bank_data$education + 5 
bank_data

3. Remove rows from your data

bank_data <- bank_data[-c(30, 119, 204), ]

4. Save the data set as a new csv file.

write.csv(bank_data, "new_bank_data.csv")

NOTE: you do this in one code chunk or in many code chunks! As long as you get there, that’s what matters.

Knitting to HTML and submitting

We’ve learned some very some useful commands for data analysis today. We don’t want to forget them! Let’s hit the Preview button to create an HTML file where we can review these commands and their output.

Once you get the markdown file into HTML format, please right click on the page and save the file. You can upload this file to Canvas.

---
title: "BIN510 UNIT 1: Lab 1"
name: Alyson Barsalou 
subtitle: Loading data, manipulating data, loading packages
output:
  html_notebook: default
  html_document:
    df_print: paged
  pdf_document: default
---

## Introduction:
In this lab we will learn:

* loading/storing data
* manipulating data
* saving data
* installing and using packages

## PART 1: Loading data
There are essentially two ways in which you can easily load data files into R. R has the ability to read in many different file types (ex: .csv, .xlsx, etc). For the first portion of this course, we are going to specifically be working with ".csv" files, later on we will work with other file types common in bioinformatics.

FIRST WAY: files can be imported using the "Import Dataset" drop down under the "Environment" tab (top right). For reading in ".csv" files, you would select "From text (base)" then simply search your computer or your cloud repository for the specific file you need. For ".csv" files, you will get the option to re-name your file (highly recommended for complicated file names - simple is best), you will also want to be sure that heading is clicked "Yes" and that Separator is set to "Comma." This will give you a preview of what your file will look like, if it looks good, go ahead and import!

SECOND WAY: files can be imported using code. You will have to find and set an appropriate working directory. Running code tells R where to look for certain files. The coding will give you the same opportunities above to rename your file, tell R what separates cells, and whether or not to keep column titles. We will practice this method below.

## Finding and Setting a working directory
Regardless of whether you utilize the cloud or desktop version of RStudio, you should create a folder on your desktop or in the cloud that can be used to store and save files needed for this class, something like "R Data"

When you try and import datasets into R, the program has a default location that it looks for files. You can check what the default location is by using the code below:
```{r}
getwd()
```

You'll want to set your working directory to a location of your choice, likely a folder you've created specifically for this purpose, use the following code but plugging in your specific file path (HINT: it can be really easy to copy and paste the output from the getwd() code into the parentheses of setwd(), you will likely need to add an extra step for the path):
```{r}
setwd("/Users/alysonbarsalou/Downloads/R Data")
```

Here is what mine looks like when I switch from the default wd to one where my data lives:
> getwd()
[1] "/Users/gordonober"

> setwd("/Users/gordonober/Desktop/R data")

I had to add the "/Desktop/R data" to link to the R data folder on my Desktop (if you right click on whatever folder you want to use, you can typically find it's path)


## Loading data and data frames
Assuming your file is in CSV format (as most are), you can read information into R by using the `read.table()` command.  The general syntax will be `read.table('file_path/file_name.csv', sep=',', header=T)`, below is code to read in the remission .csv files (which you will download from Canvas and move to your working directory):
```{r}
remission_data<-read.table('remission.csv',sep=',', header=T)
```
The `remission_data` variable references an R *data frame*; this is essentially the R version of an excel sheet. To see the contents, we can just type the name and run a code chunk.
```{r}
remission_data
```


## Accessing data and statistical summaries
We can access the data in particular columns by using the `$` operator:
```{r}
remission_data$LI
```

#### Exercise 1:
*Below, create a code chunk and access the `m` column of the `remission`_data` data frame.*

****
```{r}
# Access the 'm' column from the 'remission_data' data frame
remission_data$m
```

Once you have data loaded into R, you can easily manipulate it! This might mean adding to it, converting values from a certain column, removing certain values or rows. 

#### Exercise 2: Applying some math!

*What do you think the following command will do? (Answer below the code chunk.)*
```{r}
remission_data$NewColumn <- log(remission_data$LI)
```

****
The code chunk above creates a new column titled "NewColumn" with the natural 
logarithm values of the existing LI column. 
****


#### Exercise 3: removing data
In some cases, you may want to exclude certain rows of data based on one reason or another. 

*What do you think the following command will do? (Answer below the code chunk.) BE SPECIFIC*
```{r}
remission_data <- remission_data[-c(2, 4, 6), ]
```

****
This command removes the 2nd, 4th, and 6th rows from the remission_data data frame.
• c(2, 4, 6) indicates the rows to remove.
• The negative sign (-) tells R to exclude the rows.
• The empty space after the comma (,) indicates that all the rest of the columns should be retained.
****

NOTE: when you see [,] you can interpret this as [rows, columns]

#### Exercise 4: saving your new data

Exercises 2 and 3 had you add a column that manipulated an existing one and had you remove rows you didn't want from your original dataset. Given this change, it can be useful to save your changes as a new .csv file. The code below will save your data as a .csv file and the file can be found in your working directory. The code below also allows you to create your own new file name.

```{r}
write.csv(remission_data, "new_remission_data.csv")
```

#### Exercise 5: Practice this with another data set!

Using a repository of data set for R (https://vincentarelbundock.github.io/Rdatasets/datasets.html) choose any data set you wish and perform the following tasks:

# 1. Load the data into R, give it a unique name, and visualize it

```{r}
bank_data<-read.table('BankWages.csv',sep=',', header=T)
bank_data
```

# 2. Apply math of your choice to one of the numerical columns (make sure your dataset has one of these!) - you can log transform, you can add, subtract, multiply, divide, etc. values - create a new column when doing this
```{r}
#Add 5 to each numeric value in the education column 
bank_data$EducationAdd5 <- bank_data$education + 5 
bank_data
```

# 3. Remove rows from your data
```{r}
bank_data <- bank_data[-c(30, 119, 204), ]
```

# 4. Save the data set as a new csv file.
```{r}
write.csv(bank_data, "new_bank_data.csv")
```

NOTE: you do this in one code chunk or in many code chunks! As long as you get there, that's what matters.

#### Knitting to HTML and submitting

We've learned some very some useful commands for data analysis today. We don't want to forget them!  Let's hit the `Preview` button to create an HTML file where we can review these commands and their output.

Once you get the markdown file into HTML format, please right click on the page and save the file. You can upload this file to Canvas.
