Learn to use R

Welcome to R! This is an RMarkdown notebook. This allows you to intersperse writing text with calculations, making it perfect for lab writeups and reproducible data analyses.

Below this writing, you can see a code chunk. It loads packages which are code other people have written to make your life easier. The tidyverse package is life changing, and is used extremely commonly throughout the sciences. That being said, different disciplines use different software. However, the bonus here is that you will also develop light coding skills.

You need to install this package first, by writing install.packages(tidyverse) into the console down at the bottom. You should only ever have to do this once on any given machine!

The first thing to do when you use R is to load packages. Pressing ctrl-shift-enter (all at once) will run the following cell/chunk. The same thing will happen if you press the green arrow . The stuff in the curly braces is extra instructions to R. The load-packages just gives this block of code a name and is optional, but it makes it easier to troubleshoot code (you should always put a name in the blocks!). Then, the other parts tell R not to spit out a bunch of information. For this class, you don’t have to worry about changing what is in the curly braces basically ever, unless you really want to.

There is a line with a # in front of it in the code block that does nothing. The # is a comment symbol and will make R ignore anything that comes after it on a line.

If you ever want to add another code block, press ctrl-alt-i to insert a block.
(Or go to Insert>Executable Cell>R, but that takes so many clicks…)

Exercise 1: Treating R like a calculator.

R can do most calculator computations. In addition, for multiple-step calculations it can store the value of a variable at intermediate steps.

Try running the following cells multiple times. Tweak a few things. Then, below the code chunk, answer the questions.

Note: you can also do any of these one-line calculations in the console (bottom left). Sometimes, you may want to test some things in the console before putting them into a code chunk, or you’re just using R as a calculator for short calculations and you don’t need to track multiple code chunks.

5218+4683

## [1] 9901

x <- 500 
y <- 7

x^2

## [1] 250000

y+9

## [1] 16

x <- x+3   
x

## [1] 503

y <- y*2 
y

## [1] 14

x+y

## [1] 517

When does a line print something, and when doesn’t it? (Hint: Consider when there are arrows <-) -It acts as an equal sign
When is a value stored or updated? What does <- do? -It is updated when is a new value
Is anything about the behavior of these code chunks surprising to you? -There is a new number every time you click play again

Exercise 2: Looking at data

Our textbook gives us several datasets. The single line of code included in this code chunk instructs R to load some data. In this case, it is a list of cars that were sold in 1993 at a particular dealership. The command head displays the head, or first 6 rows of the dataset. To see the FULL dataset, use the environment tab.

cars_table <- cars93

head(cars_table)

To access just one column of this dataset, use the $ symbol. For example, we can get the column of prices by the following. Here price is in thousands of dollars, so a price of 8.0 corresponds to an $8,000 car.

cars_table$price

##  [1] 15.9 33.9 37.7 30.0 15.7 20.8 23.7 26.3 34.7 40.1 15.9 18.8 18.4 29.5  9.2
## [16] 11.3 15.6 12.2 19.3  7.4 10.1 20.2 20.9  8.4 12.1  8.0 10.0 13.9 47.9 28.0
## [31] 35.2 34.3 36.1  8.3 11.6 61.9 14.9 10.3 26.1 11.8 21.5 16.3 20.7  9.0 18.5
## [46] 24.4 11.1  8.4 10.9  8.6  9.8 18.2  9.1 26.7

You can ask R to get information from a single column by putting a column name into a function, like max or min. The following computes the minimum (lowest) priced car sold at this dealership in 1993. The function max computes the maximum (highest).

min(cars_table$price)

## [1] 7.4

How many observations does this dataset have?
-54 observations in the data set
What are the variables in this dataset, and are they categorical or quantitative?
-6 categorical variables
What was the most expensive car that was sold in this dataset? Use R to compute the answer.
-$61.90 was the most expensive
```
max(cars_table$price)
```
```
## [1] 61.9
```
What proportion of the cars were more expensive than $35,000? Count them, and use R as a calculator below to get the answer.
```
6/54
```
```
## [1] 0.1111111
```
Was this data collected from an observational study or an experiment? Explain how this tells you what types of conclusions you can draw from this data.
- the expensive cars were drove less than the cheaper cars

Exercise 3: Visualizing Data.

R has some powerful functions for making graphics. We can create a simple plot of the relationship between the weight of a car and its price as follows. When we describe one of these plots, we always write y versus x. In this case, we are plotting price versus weight.

cars_table %>% 
  ggplot(aes(x=weight, y=price))+ 
  geom_point()+ 
  ggtitle("Price versus Weight in cars")

Is there an apparent trend between price and weight? How would you describe it?
-The heaver the weight of the car, the higher the price.
Make a second scatter plot plotting price versus mpg in the box below. Give it a new title. Is there an apparent trend? If so, how would you describe it?
```
cars_table %>% 
  ggplot(aes(x=weight, y=price))+ 
  geom_point()+ 
  ggtitle("Price versus mpg_city in cars")
```
The below makes a different type of plot by using a different geometry. Give it a title by adding + ggtitle("YOUR TITLE HERE") that is appropriate for what it is measuring.
```
cars_table %>% 
  ggplot(aes(x=type, fill=drive_train))+
  geom_bar() 
```
1. Do you see any apparent relationships between the type of car and the type of drive train? Explain.
  -all of them have mostly front wheel drive trains.

Exercise 4: Try this yourself!

The cars04 dataset contains another set of cars from a different dealership that were sold in – you guessed it – 2004.

cars_table_2 <- cars04 
head(cars_table_2)

How many cars were represented by this dataset?
-263 cars in the data set
How do the prices of cars in this dataset compare to the prices of cars in the cars93 dataset? Explain what you are looking at to give your answer.
–I compaired the prices in the cars93 and the msrp prices in the cars 04 data set. the cars in the 04 data set are way more expensive than the 1993 cars.

Make a plot that displays the price of each car (from dealer_cost) plotted against weight, with an appropriate title. Do you see a similar trend as in cars93? Explain.

cars_table_2 %>% 
  ggplot(aes(x=weight, y=dealer_cost))+ 
  geom_point()+ 
  ggtitle("weight versus dealer_cost in cars")

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

What was the most expensive car sold in this dealership in 1994? Use R code to answer.
```
max(cars_table_2$msrp)
```
```
## [1] 192465
```

Resources for learning R and working in RStudio

That was a short introduction to R and RStudio, but we will provide you with more functions and a more complete sense of the language as the course progresses.

In this course we will be using the suite of R packages from the tidyverse. The book R For Data Science by Grolemund and Wickham is a fantastic resource for data analysis in R with the tidyverse. If you are Googling for R code, make sure to also include these package names in your search query. For example, instead of Googling “scatterplot in R”, Google “scatterplot in R with the tidyverse”.

These may come in handy throughout the semester:

Note that some of the code on these cheatsheets may be too advanced for this course. However the majority of it will become useful throughout the semester.

Lab 1: Intro to R, Normal Distributions.

Victor Haimovitch-Gore

2025-01-28