When you make an Rmarkdown file, always keep this chunk:
lets also load tidyverse
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
To put it simply: R is a programming language, but it’s basically a spreadsheet program (like Excel) that you operate using text commands.
To put it even more simply: it’s a big calculator! Example:
1+1
## [1] 2
You’ll notice that there is a [1] in front of the “2”. This may seem strange, but it’s put there to tell you that the answer (2) is the “1st” answer. This will make more sense later, when we have problems with multiple answers.
2*2
## [1] 4
Get the idea? It’s pretty simple.
More complex problems are do-able, and in my opinion, even easier than doing them with a regular calculator:
2*2+4*8-3*7
## [1] 15
Division and multiplication too!
12/4
## [1] 3
12*4
## [1] 48
Have a look:
#lets store "2+2" in an object called "a"
a<-2+2
print(a)
## [1] 4
It’s possible to make complex equations using this method:
b<-(4*a)+(3*a)
print(b)
## [1] 28
c<-b+a
print(c)
## [1] 32
It’s also possible to put entire lists of things into a single variable. We can put a bunch of numbers inside a single variable, with the “c()” command. The “c” in “c()” stands for “concatenate” or “combine”. When we put something inside a command like “c()”, we often say that we are “wrapping” it inside “c()”.
So, for example, “wrapping 1,2,3,4 in c()” looks like this: c(1,2,3,4)
Try wrapping 1,2,3,4 into a new variable called “test1234” in the console. It should look something like this:
test1234<-c(1,2,3,4)
Now lets do the same in code chunks:
d_list<-c(1,2,3,4,5,6)
e_list<-c(1,2,3,4,5,6)
print(d_list)
## [1] 1 2 3 4 5 6
print(e_list)
## [1] 1 2 3 4 5 6
# these lists can be multiplied together:
combined_list<-d_list*e_list
print(combined_list)
## [1] 1 4 9 16 25 36
When we create lists like these, the lists are called “vectors”.
We can also ask R how long a vector is (this will be very useful later, trust me):
length(d_list)
## [1] 6
This will also be useful: we can ask R to give us the specific value of a vector position. Here we are asking R, “give us the 3rd value of new_list”:
new_list<-c(5,10,15,20,25)
new_list[3]
## [1] 15
We can add all elements of a vector together:
sum(new_list)
## [1] 75
Or we can just add together some parts of the vector:
sum(new_list[3:4])
## [1] 35
There are easier ways to do this, but you can use “sum()” and “length()” to calculate an average. For example:
sum(new_list)/length(new_list)
## [1] 15
You can also check logical statements. Such as “is x bigger than y”?
x<-50
y<-2
x>y
## [1] TRUE
print(new_list)
## [1] 5 10 15 20 25
new_list[3]>new_list[4]
## [1] FALSE
You can also discover the minimum and maximum element of a vector very easily:
min(new_list)
## [1] 5
max_b<-max(new_list)
We can import files from the working directory. Remember how we set the working directory earlier? The dataset you are working on should be there already.
For this lecture, we will be using a small dataset called “city.csv”:
wd<-getwd()
city<-read.csv("city.csv")
tibble(city)
## # A tibble: 7 × 6
## X city high_far low_far high_cent low_cent
## <int> <chr> <int> <int> <int> <int>
## 1 1 Mumbai 91 72 33 22
## 2 2 Nairobi 88 57 31 14
## 3 3 Paris 48 36 9 2
## 4 4 Sao Paulo 82 68 28 20
## 5 5 Sydney 79 68 27 20
## 6 6 Tokyo 48 37 9 3
## 7 7 Toronto 27 10 -3 -12
R can load almost any type of data, starting with read.csv, read_excel, and so forth.
What if something is off and you need to read it’s documentation? For specific commands, like “print()”, you can just write “?print” in the command console. Try it now, without the quotation marks.
This will open up a window on the lower right called “Help”. Here you see a brief description of what “print()” does, followed by allowed commands (or “arguments”) that can be passed to “print()”. At the very bottom of the page, you’ll see a section called “Examples” which is very valuable. This basically shows how the “print()” command can be used in real life.
You can do this with almost any command in R! I’m not kidding when I say that “RTFM” (Read The Freakin’ Manual) is great advice!
It’s also important to know: we use the dollar sign “$” to call specific variables (columns) in a code.
For example, “city” has 6 columns:
colnames(city)
## [1] "X" "city" "high_far" "low_far" "high_cent" "low_cent"
If we want to find the average of a specific column (say, “high_far”), we call mean(city$high_far), like this:
mean(city$high_far)
## [1] 66.14286
Keep this in mind for later.
Unlike Excel, it’s very easy to make simple plots in R. For very simple x-y plots, there is just one command: plot()
plot(city$high_far,main="CityPlot",ylab="Y axis",type="b",sub="Data from Unknown Source")
R automatically chooses a column, in this case, “X” (the Index), for the
bottom row. The vertical row (Y axis) is the “city$high_far” temperature
for that city.
Lets explore “plot()” a little bit more. Type the following in console:
?plot
In this case, we’ll click on the first link “Generic X-Y Plotting”
Under “Usage” we see this:
plot(x, y, …)
This means that plot needs an “x” and “y”, and can also accept “…” (additional functions). In the example we just did, we left out the “x” and R just inserted whatever it thought was best (the index). We can add some additional functions now. Look below “usage” to “Arguments”:
plot(city$high_far,type="l")
# the "l" stands for "line". We are asking R to dispense with points and replace with lines.
Much nicer! Now try this:
plot(city$high_far,type="b")
# "b" stands for "both" -- both lines and points
I think you’ll agree this is a good plot to see the difference in cities. We can also add labels to the plot. Check out the arguments like “main” and “sub”.
plot(city$high_far,type="b",main="City High F temps",sub="Data From Unknown Source")
Can we add city names? Yes, but we have to explicitly call “city$city” in the “X” argument, as a “factor()”. Don’t worry about “factor()” for now, I’ll explain it in detail later.
plot(factor(city$city),city$high_far,type="b",main="City High F temps",sub="Data From Unknown Source")
This looks kind of weird. In future classes we will discuss easier ways
to make graphs literally however we want them – using ggplot.
Answer these questions in the Rmarkdown document. Show your code:
What is the average (mean()) mpg for all cars?
What is the minimum mpg for all cars?
What is the maximum mpg for all cars?
How many cars are there in the dataset? (Hint: length() of mpg)
upload this to Rpubs and submit the URL via CANVAS. Good luck!