Nathan Byers, Eric Bailey, Kali Frost
April 28, 2014
When you first open RStudio, this is what you see
1 + 1 then hit “Enter” and R will return the answer
10 + 5
10 - 5
10 * 5
10 / 5
<-x and y by assigning some numbers to themx <- 10
y <- 5
x + y
[1] 15
(Above, the top panel is what you run in your script, the bottom panel is the output)
In RStudio, you will see the variables we created in the top right panel
x
[1] 10
x <- 20
x
[1] 20
In the top right panel you can see that the number stored in the variable x has changed
R has three main variable types
| Type | Description | Examples |
|---|---|---|
character |
letters and words | "z", "red", "H2O" |
numeric |
numbers | 1, 3.14, log(10) |
logical |
binary | TRUE, FALSE |
There are several ways to group data to make them easier to work with:
c( ) as a container for vector elementsx <- c(1, 2, 3, 4, 5)
x
[1] 1 2 3 4 5
list() as a container for list itemsx <- list("Benzene", 1.3, TRUE)
x
[[1]]
[1] "Benzene"
[[2]]
[1] 1.3
[[3]]
[1] TRUE
data.frame() as a container for many vectors of the same lengthpollutant <- c("Benzene", "Toluene", "Xylenes")
concentration <- c(1.3, 5.5, 6.0)
carcinogen <- c(TRUE, FALSE, FALSE)
my.data <- data.frame(pollutant, concentration, carcinogen)
my.data
pollutant concentration carcinogen
1 Benzene 1.3 TRUE
2 Toluene 5.5 FALSE
3 Xylenes 6.0 FALSE
x <- c(4, 8, 1, 14, 34)
mean(x) # Calculate the mean of the data set
[1] 12.2
y <- c(1, 4, 3, 5, 10)
mean(y) # Mean of a different data set
[1] 4.6
# in front of your comment# will not be evaluated# Full line comment
x # partial line comment
"new line"
function()function is the name, which usually gives you a clue about what it does() is where you put your data or indicate options(), type a question mark in front of the function and run it?mean()
In RStudio, you will see the help page for mean() in the bottom right corner
Usage, you see mean(x, ...)() is xArguments you will find a description of what x needs to bex in the mean function to be a numeric vector)plot()plot(x, y)x is a numeric vector that will be the x-axis coordinates of the ploty is a numeric vector (of the same length as x) that will be the y-axis coordinates of the plotbenzene <- c(1.3, 4.5, 2.6, 3.4, 6.4)
day <- c(1, 2, 3, 4, 5)
plot(x = day, y = benzene)
EnvStats has a function called serialCorrelationTest()x <- c(1.3, 3.5, 2.6, 3.4, 6.4)
serialCorrelationTest(x)
library() functionlibrary("EnvStats")
Now we can use the function we want
x <- c(1.3, 3.5, 2.6, 3.4, 6.4)
serialCorrelationTest(x)
Results of Hypothesis Test
--------------------------
Null Hypothesis: rho = 0
Alternative Hypothesis: True rho is not equal to 0
Test Name: Rank von Neumann Test for
Lag-1 Autocorrelation
(Exact Method)
Estimated Parameter(s): rho = -0.01876
Estimation Method: Yule-Walker
Data: x
Sample Size: 5
Test Statistic: RVN = 1.8
P-value: 0.7833
Confidence Interval for: rho
Confidence Interval Method: Normal Approximation
Confidence Interval Type: two-sided
Confidence Level: 95%
Confidence Interval: LCL = -0.8951
UCL = 0.8576
xlsxXLConnectAccept the defaults in the popup window and click “Import”
airquality that is a data frame of the spreadsheet we importedread.csv() is a function that takes the name of a csv file as its main argument
read.csv() to a variable to be able to work with the dataread.csv()airquality <- read.csv("C:/My Data/airquality.csv")
airquality
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
7 23 299 8.6 65 5 7
8 19 99 13.8 59 5 8
9 8 19 20.1 61 5 9
10 NA 194 8.6 69 5 10
data() functiondata(airquality)
airquality
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
airquality is a data frame with ozone readings from a monitor in New Yorkcolnames(airquality)
[1] "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day"
nrow() function to get the number of rowsnrow(airquality)
[1] 153
RStudio has a special function called View() that makes it easier to look at data in a data frame
View(airquality)
$ operatormean(airquality$Temp) # Calculate the mean temperature
[1] 77.88
Take a look at the data using plot(x, y)
plot(airquality$Temp, airquality$Ozone)
Use lm(y ~ x, data) to fit a linear regression to a data set and summary() to see the results
fit <- lm(Ozone ~ Temp, airquality)
summary(fit)
abline()abline(fit)
With a little extra code we can get a better fit with the data by using a quadratic model
temp <- airquality$Temp
temp2 <- temp^2 # ^ = expontentiation
ozone <- airquality$Ozone
fit2 <- lm(ozone ~ temp + temp2)
temp.curve <- seq(min(temp), max(temp), length=50)
lines(temp.curve, predict(fit2, list(temp=temp.curve, temp2 = temp.curve^2)))
Based on this dataset and our quadratic model, what would we expect the ozone concentration to be when the temperature is 75 degrees?
prediction <- predict(fit2, list(temp = 75, temp2 = 75^2))
prediction
1
28.32
points(x = 75, y = prediction, col="red", pch=19)