Introduction:
In this lab we will learn:
- loading/storing data
- manipulating data
- saving data
- installing and using packages
PART 1: Loading data
There are essentially two ways in which you can easily load data
files into R. R has the ability to read in many different file types
(ex: .csv, .xlsx, etc). For the first portion of this course, we are
going to specifically be working with “.csv” files, later on we will
work with other file types common in bioinformatics.
FIRST WAY: files can be imported using the “Import Dataset” drop down
under the “Environment” tab (top right). For reading in “.csv” files,
you would select “From text (base)” then simply search your computer or
your cloud repository for the specific file you need. For “.csv” files,
you will get the option to re-name your file (highly recommended for
complicated file names - simple is best), you will also want to be sure
that heading is clicked “Yes” and that Separator is set to “Comma.” This
will give you a preview of what your file will look like, if it looks
good, go ahead and import!
SECOND WAY: files can be imported using code. You will have to find
and set an appropriate working directory. Running code tells R where to
look for certain files. The coding will give you the same opportunities
above to rename your file, tell R what separates cells, and whether or
not to keep column titles. We will practice this method below.
Finding and Setting a working directory
Regardless of whether you utilize the cloud or desktop version of
RStudio, you should create a folder on your desktop or in the cloud that
can be used to store and save files needed for this class, something
like “R Data”
When you try and import datasets into R, the program has a default
location that it looks for files. You can check what the default
location is by using the code below:
getwd()
[1] "/Users/alysonbarsalou/Downloads/R Data"
You’ll want to set your working directory to a location of your
choice, likely a folder you’ve created specifically for this purpose,
use the following code but plugging in your specific file path (HINT: it
can be really easy to copy and paste the output from the getwd() code
into the parentheses of setwd(), you will likely need to add an extra
step for the path):
setwd("/Users/alysonbarsalou/Downloads/R Data")
Here is what mine looks like when I switch from the default wd to one
where my data lives: > getwd() [1] “/Users/gordonober”
setwd(“/Users/gordonober/Desktop/R data”)
I had to add the “/Desktop/R data” to link to the R data folder on my
Desktop (if you right click on whatever folder you want to use, you can
typically find it’s path)
Loading data and data frames
Assuming your file is in CSV format (as most are), you can read
information into R by using the read.table()
command. The
general syntax will be
read.table('file_path/file_name.csv', sep=',', header=T)
,
below is code to read in the remission .csv files (which you will
download from Canvas and move to your working directory):
remission_data<-read.table('remission.csv',sep=',', header=T)
The remission_data
variable references an R data
frame; this is essentially the R version of an excel sheet. To see
the contents, we can just type the name and run a code chunk.
remission_data
Accessing data and statistical summaries
We can access the data in particular columns by using the
$
operator:
remission_data$LI
[1] 0.4 0.4 0.5 0.5 0.6 0.6 0.6 0.7 0.7 0.7 0.8 0.8 0.8 0.9 1.0 1.0 1.0 1.1
[19] 1.1 1.2 1.3 1.4 1.6 1.7 1.9 1.9 1.9
Exercise 1:
*Below, create a code chunk and access the m
column of
the remission
_data` data frame.*
# Access the 'm' column from the 'remission_data' data frame
remission_data$m
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Once you have data loaded into R, you can easily manipulate it! This
might mean adding to it, converting values from a certain column,
removing certain values or rows.
Exercise 2: Applying some math!
What do you think the following command will do? (Answer below
the code chunk.)
remission_data$NewColumn <- log(remission_data$LI)
The code chunk above creates a new column titled “NewColumn” with the
natural logarithm values of the existing LI column. ****
Exercise 3: removing data
In some cases, you may want to exclude certain rows of data based on
one reason or another.
What do you think the following command will do? (Answer below
the code chunk.) BE SPECIFIC
remission_data <- remission_data[-c(2, 4, 6), ]
This command removes the 2nd, 4th, and 6th rows from the
remission_data data frame. • c(2, 4, 6) indicates the rows to remove. •
The negative sign (-) tells R to exclude the rows. • The empty space
after the comma (,) indicates that all the rest of the columns should be
retained. ****
NOTE: when you see [,] you can interpret this as [rows, columns]
Exercise 4: saving your new data
Exercises 2 and 3 had you add a column that manipulated an existing
one and had you remove rows you didn’t want from your original dataset.
Given this change, it can be useful to save your changes as a new .csv
file. The code below will save your data as a .csv file and the file can
be found in your working directory. The code below also allows you to
create your own new file name.
write.csv(remission_data, "new_remission_data.csv")
1. Load the data into R, give it a unique name, and visualize
it
bank_data<-read.table('BankWages.csv',sep=',', header=T)
bank_data
3. Remove rows from your data
bank_data <- bank_data[-c(30, 119, 204), ]
4. Save the data set as a new csv file.
write.csv(bank_data, "new_bank_data.csv")
NOTE: you do this in one code chunk or in many code chunks! As long
as you get there, that’s what matters.
Knitting to HTML and submitting
We’ve learned some very some useful commands for data analysis today.
We don’t want to forget them! Let’s hit the Preview
button
to create an HTML file where we can review these commands and their
output.
Once you get the markdown file into HTML format, please right click
on the page and save the file. You can upload this file to Canvas.
