Jeho Park
October 22, 2014
HMC Scientific Computing Workshop Series, Fall 2014
(Slides are made with the R Presentations tool in RStudio) (Some materials are adapted from the R-Bootcamp by Jared Knowles: http://jaredknowles.com/r-bootcamp/)
Google is our friend!
1+1
2+runif(1,0,1)
2+runif(1,min=0,max=1)
3^2
3*3
sqrt(3*3) # comments
# comments are preceded by hash sign
Numerical Integral of
\( \displaystyle\int_0^{\infty} \frac{1}{(x+1)\sqrt{x}}dx \)
integrand <- function(x) {1/((x+1)*sqrt(x))} ## define the integrated function
integrate(integrand, lower=0, upper=Inf) ## integrate the function from 0 to infinity
3.142 with absolute error < 2.7e-05
demo() # display available demos
demo(graphics) # try graphics demo
library() # show available packages on the computer
search() # show loaded packages
?hist # search for the usage of hist function
??histogram # search for package documents containing the word "histogram"
R workspace stores objects like vecors, datasets and functions in memory (the available space for calculation is limited to the size of the RAM).
a <- 5 # notice a in your Environment window
A <- "text"
a
A
ls()
print(c(a,A))
print(a,A)
VECTOR (homogeneous)
A vector is an array object of the same type data elements.
class(a)
class(A)
B <- c(a,A) # concatenation
print(B)
class(B) # why?
LIST (heterogeneous)
A list is an object that can store different types of vectors.
aList <- list(name=c("Joseph"), married=T, kids=2)
aList
aList$kids <- aList$kids+1
aList$kids
aList <- list(numeric_data=a,character_data=A)
aList
Data Frame
A data frame is used for storing data tables. It is a list of vectors of equal length.
n <- c(2, 3, 5) # a vector
s <- c("aa", "bb", "cc") # a vector
b <- c(TRUE, FALSE, TRUE) # a vector
df <- data.frame(n, s, b) # a data frame
df
mtcars # a built-in (attached) data frame
mtcars$mpg
Data Frame (cont.)
myFrame <- data.frame(y1=rnorm(100),y2=rnorm(100), y3=rnorm(100))
head(myFrame) # display first few lines of data
names(myFrame) # display column names
summary(myFrame) # output depends on the data types
plot(myFrame)
myFrame2 <- read.table(file="http://scicomp.hmc.edu/data/R/Rtest.txt", header=T, sep=",")
myFrame2
FACTOR
v <- c("a","b","c","c","b")
x <- factor(v) # turn the character vector into a factor object
z <- factor(v, ordered = TRUE) # ordered factor
x
z
table(x)
Single Sample Tests
Our questions for the sample might be:
Single Sample Tests
For our valid inferences, we need to find some facts about the distribution of data:
Based on the facts, standard parametric tests like Student's t test might not be applicable and we may have to seek for non-parametric techniques.
Single Sample Tests
data <- read.table(file="http://scicomp.hmc.edu/data/R/das.txt", header=T)
names(data)
attach(data)
par(mfrow=c(2,2))
plot(y)
boxplot(y)
hist(y,main="")
y2 <- y
y2[52] <- 21.75
plot(y2)
dev.off() # reset the graphic device
Single Sample Tests
summary(y)
# Graphical test for normality
qqnorm(y)
qqline(y,lty=2)
# Test for normality
x <- exp(rnorm(30)) # lognormally distributed
shapiro.test(x) # look at the p value
The null-hypothesis of this test is that the population is normally distributed. Thus if the p-value is less than the chosen alpha level (e.g., p < 0.05), then the null hypothesis is rejected and there is evidence that the data tested are not from a normally distributed population.
Please submit your R environemnt file at http://bit.ly/hmc-r-workshop-homework (a digital badge requirement).
The file has .RData extension. To attach to the form, it must be changed to .txt.
Useful links:
- R tutorial site: http://jaredknowles.com/r-bootcamp/ and
- R search site: http://rseek.org
- R cheat sheets: http://devcheatsheet.com/tag/r/
- R Markdown cheat sheet: http://shiny.rstudio.com/articles/rm-cheatsheet.html