Overview

In today’s lab we will first get set up with the accounts and software you will use in this course. Next you will learn how to analyze data using R. At the end of today’s lab you will learn how to import data sets into R, manipulate data and produce visuals of this data using the software. You are encouraged to work with your neighbot on the assignment, however do not copy each other. The work you turn in should be your own.

Lab Submission

At the end of lab today you will upload your assignment as both a R Notebook file (.Rmd) and html file to our Canvas site. There is a template on the Canvas site to get you started with some tips on formatting you document. Your lab must be nicely organized and have a title and a header. - name your file using the format LabName_Lastname - for example for todays lab my file would be names “Lab1_Olson”

A very helpful guide on formatting R Markdown documents for assignments is available at http://www.stat.cmu.edu/~cshalizi/rmarkdown/.

Gettting Started

Go to https://docs.posit.co/cloud/get_started/. Follow the instructions to make a free account and create a new R Studio project.

Part 1 Working with R

You should follow along with instructions below by entering them into your R console as you read through this document.

Below is a code chunk to say hello. If you click the green arrow in the corner of the box it will run the code and print the message.

print('hello')
## [1] "hello"

You can also use R to do arithmetic and R will give you the answer as a calculator would.

100*2+5
## [1] 205

Note that R will follow the order of operations (PEMDAS). If you want to change this you can use parentheses. For example, you will get a different answer from above if we add parentheses to part of the equation.

100*(2+5)
## [1] 700

In the example about the addition ‘+’ is a function. There are many functions within the R base package. There are also functions you can add to your work space environment by installing additional packages. To do this you can use the R Studio interface to search under the Packages tab. Or you can use the command line like below:

install.packages('tidyverse')
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)

Once you have installed a package you can load it with the libary() command.

library('tidyverse')
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

An important function to know is the get working directory command. This is where the code is hosted and where it will look for files if not given another file path.

getwd() 
## [1] "/cloud/project"

You can name variable using the <- command. Variable can be numeric or characters.

 name<- "Elizabeth"
  height<- 63 

You can also create lists of data.

list_names<- c('lisa','joe','sam','trevor','eric')
list_heights<- c(63,54,68,71,75)
list_notes<-c(5, '10 ml', 3,2,1)

You can then combine the lists into a dataframe.

heights_df<-data.frame(list_names,list_heights, list_notes)
print(heights_df)
##   list_names list_heights list_notes
## 1       lisa           63          5
## 2        joe           54      10 ml
## 3        sam           68          3
## 4     trevor           71          2
## 5       eric           75          1

Notice how the data frame we created above has listed data type beneath the row headers and . This shows how R will read the data for function input. The way data is coded in is important and will determine how you can use it. We can simplify our workflow by using the tidyverse package and the commands within. The tidyverse package uses tibbles instead of data frames and these are easier to work with.

heights_tib <- tibble(list_names, list_heights, list_notes) 

You can see what files are in your work space using the ls function

ls()
## [1] "height"       "heights_df"   "heights_tib"  "list_heights" "list_names"  
## [6] "list_notes"   "name"

And you can remove objects with the rm function.

rm(heights_df)
ls()
## [1] "height"       "heights_tib"  "list_heights" "list_names"   "list_notes"  
## [6] "name"

You have been working with the code here in your own R console. This is the html output of an Rmd file. You can see how this lab looks as the .Rmd format by clicking the link here().

Part 2 Getting Started with Data Analysis

For this portion of the lab you will create your own Rmd document in Posit and document your work. Follow the lab submission guidelines above. You can also find a helpful R Markdown “cheat-sheet” at https://rstudio.github.io/cheatsheets/rmarkdown.pdf.

Importing Data

Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.