When you first open RStudio, this is what you see
1 + 1 then hit “Enter” and R will return the answer
10 + 5
10 - 5
10 * 5
10 / 5
=x and y by assigning some numbers to themx = 10
y = 5
x + y
[1] 15
(Above, the top panel is what you run in your script, the bottom panel is the output)
In RStudio, you will see the variables we created in the top right panel
x
[1] 10
x = 20
x
[1] 20
In the top right panel you can see that the number stored in the variable x has changed
R has three main variable types
| Type | Description | Examples |
|---|---|---|
character |
letters and words | "z", "red", "H2O" |
numeric |
numbers | 1, 3.14, log(10) |
logical |
binary | TRUE, FALSE |
There are several ways to group data to make them easier to work with:
c( ) as a container for vector elementsx = c(1, 2, 3, 4, 5)
x
[1] 1 2 3 4 5
list() as a container for list itemsx = list("Benzene", 1.3, TRUE)
x
[[1]]
[1] "Benzene"
[[2]]
[1] 1.3
[[3]]
[1] TRUE
data.frame() as a container for many vectors of the same lengthstudent = c("Bob", "Thomas", "Cory")
score = c(90, 15, 6)
pass = c(TRUE, FALSE, FALSE)
my.data = data.frame(student, score, pass)
my.data
student score pass
1 Bob 90 TRUE
2 Thomas 15 FALSE
3 Cory 6 FALSE
x = c(4, 8, 1, 14, 34)
mean(x) # Calculate the mean of the data set
[1] 12.2
y = c(1, 4, 3, 5, 10)
mean(y) # Mean of a different data set
[1] 4.6
# in front of your comment# will not be evaluated# Full line comment
x # partial line comment
"new line"
function()function is the name, which usually gives you a clue about what it does() is where you put your data or indicate options(), type a question mark in front of the function and run it?mean()
In RStudio, you will see the help page for mean() in the bottom right corner
Usage, you see mean(x, ...)() is xArguments you will find a description of what x needs to bex in the mean function to be a numeric vector)plot()plot(x, y)x is a numeric vector that will be the x-axis coordinates of the ploty is a numeric vector (of the same length as x) that will be the y-axis coordinates of the plotscore = c(1.3, 4.5, 2.6, 3.4, 6.4)
day = c(1, 2, 3, 4, 5)
plot(x = day, y = score)
score = c(1.3, 3.5, 2.6, 3.4, 6.4)
day = c(1, 2, 3, 4, 5)
op = par()
par(oma = c(0, 0.5, 0, 0))
plot(x = day, y = score, cex.axis = 2, cex.lab = 2, cex = 2)
par(op)
library() function to load “descr” package (for example)library("descr")
data(airquality)
head(airquality)
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
airquality is a data frame with ozone readings from a monitor in New Yorkcolnames(airquality)
[1] "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day"
nrow() function to get the number of rowsnrow(airquality)
[1] 153
RStudio has a special function called View() that makes it easier to look at data in a data frame
View(airquality)
$ operatormean(airquality$Temp) # Calculate the mean temperature
[1] 77.88
Take a look at the data using plot(x, y)
plot(airquality$Temp, airquality$Ozone)
source("http://www.openintro.org/stat/data/cdc.R")
##data frame called cdc is loaded.
names(cdc)
[1] "genhlth" "exerany" "hlthplan" "smoke100" "height" "weight"
[7] "wtdesire" "age" "gender"
head(cdc)
genhlth exerany hlthplan smoke100 height weight wtdesire age gender
1 good 0 1 0 70 175 175 77 m
2 good 0 1 1 64 125 115 33 f
3 good 1 1 1 60 105 105 49 f
4 good 1 1 0 66 132 124 42 f
5 very good 0 1 0 61 150 130 55 f
6 very good 1 1 0 64 114 114 55 f