Nathan Byers, Eric Bailey, Kali Frost
April 28, 2014
When you first open RStudio, this is what you see
1 + 1
then hit “Enter” and R will return the answer
10 + 5
10 - 5
10 * 5
10 / 5
<-
x
and y
by assigning some numbers to themx <- 10
y <- 5
x + y
[1] 15
(Above, the top panel is what you run in your script, the bottom panel is the output)
In RStudio, you will see the variables we created in the top right panel
x
[1] 10
x <- 20
x
[1] 20
In the top right panel you can see that the number stored in the variable x
has changed
R has three main variable types
Type | Description | Examples |
---|---|---|
character |
letters and words | "z" , "red" , "H2O" |
numeric |
numbers | 1 , 3.14 , log(10) |
logical |
binary | TRUE , FALSE |
There are several ways to group data to make them easier to work with:
c( )
as a container for vector elementsx <- c(1, 2, 3, 4, 5)
x
[1] 1 2 3 4 5
list()
as a container for list itemsx <- list("Benzene", 1.3, TRUE)
x
[[1]]
[1] "Benzene"
[[2]]
[1] 1.3
[[3]]
[1] TRUE
data.frame()
as a container for many vectors of the same lengthpollutant <- c("Benzene", "Toluene", "Xylenes")
concentration <- c(1.3, 5.5, 6.0)
carcinogen <- c(TRUE, FALSE, FALSE)
my.data <- data.frame(pollutant, concentration, carcinogen)
my.data
pollutant concentration carcinogen
1 Benzene 1.3 TRUE
2 Toluene 5.5 FALSE
3 Xylenes 6.0 FALSE
x <- c(4, 8, 1, 14, 34)
mean(x) # Calculate the mean of the data set
[1] 12.2
y <- c(1, 4, 3, 5, 10)
mean(y) # Mean of a different data set
[1] 4.6
#
in front of your comment#
will not be evaluated# Full line comment
x # partial line comment
"new line"
function()
function
is the name, which usually gives you a clue about what it does()
is where you put your data or indicate options()
, type a question mark in front of the function and run it?mean()
In RStudio, you will see the help page for mean()
in the bottom right corner
Usage
, you see mean(x, ...)
()
is x
Arguments
you will find a description of what x
needs to bex
in the mean function to be a numeric vector)plot()
plot(x, y)
x
is a numeric vector that will be the x-axis coordinates of the ploty
is a numeric vector (of the same length as x
) that will be the y-axis coordinates of the plotbenzene <- c(1.3, 4.5, 2.6, 3.4, 6.4)
day <- c(1, 2, 3, 4, 5)
plot(x = day, y = benzene)
EnvStats
has a function called serialCorrelationTest()
x <- c(1.3, 3.5, 2.6, 3.4, 6.4)
serialCorrelationTest(x)
library()
functionlibrary("EnvStats")
Now we can use the function we want
x <- c(1.3, 3.5, 2.6, 3.4, 6.4)
serialCorrelationTest(x)
Results of Hypothesis Test
--------------------------
Null Hypothesis: rho = 0
Alternative Hypothesis: True rho is not equal to 0
Test Name: Rank von Neumann Test for
Lag-1 Autocorrelation
(Exact Method)
Estimated Parameter(s): rho = -0.01876
Estimation Method: Yule-Walker
Data: x
Sample Size: 5
Test Statistic: RVN = 1.8
P-value: 0.7833
Confidence Interval for: rho
Confidence Interval Method: Normal Approximation
Confidence Interval Type: two-sided
Confidence Level: 95%
Confidence Interval: LCL = -0.8951
UCL = 0.8576
xlsx
XLConnect
Accept the defaults in the popup window and click “Import”
airquality
that is a data frame of the spreadsheet we importedread.csv()
is a function that takes the name of a csv file as its main argument
read.csv()
to a variable to be able to work with the dataread.csv()
airquality <- read.csv("C:/My Data/airquality.csv")
airquality
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
7 23 299 8.6 65 5 7
8 19 99 13.8 59 5 8
9 8 19 20.1 61 5 9
10 NA 194 8.6 69 5 10
data()
functiondata(airquality)
airquality
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
airquality
is a data frame with ozone readings from a monitor in New Yorkcolnames(airquality)
[1] "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day"
nrow()
function to get the number of rowsnrow(airquality)
[1] 153
RStudio has a special function called View()
that makes it easier to look at data in a data frame
View(airquality)
$
operatormean(airquality$Temp) # Calculate the mean temperature
[1] 77.88
Take a look at the data using plot(x, y)
plot(airquality$Temp, airquality$Ozone)
Use lm(y ~ x, data)
to fit a linear regression to a data set and summary()
to see the results
fit <- lm(Ozone ~ Temp, airquality)
summary(fit)
abline()
abline(fit)
With a little extra code we can get a better fit with the data by using a quadratic model
temp <- airquality$Temp
temp2 <- temp^2 # ^ = expontentiation
ozone <- airquality$Ozone
fit2 <- lm(ozone ~ temp + temp2)
temp.curve <- seq(min(temp), max(temp), length=50)
lines(temp.curve, predict(fit2, list(temp=temp.curve, temp2 = temp.curve^2)))
Based on this dataset and our quadratic model, what would we expect the ozone concentration to be when the temperature is 75 degrees?
prediction <- predict(fit2, list(temp = 75, temp2 = 75^2))
prediction
1
28.32
points(x = 75, y = prediction, col="red", pch=19)