For a brief overview of this document follow this link https://drive.google.com/open?id=0B7IWU1Fx-u0pQjMxckNnUGI3V0U
R has a number of built in data sets that you may find useful in creating examples and practice problems. To access the list of built in data sets type:
data()
To access a specific data set from this list, type data(Name of Data Set) For example:
data(ChickWeight)
Let’s do some examples with the ChickWeight data set First, open RStudio Then type
data(ChickWeight)
attach(ChickWeight)
head(ChickWeight)
## weight Time Chick Diet
## 1 42 0 1 1
## 2 51 2 1 1
## 3 59 4 1 1
## 4 64 6 1 1
## 5 76 8 1 1
## 6 93 10 1 1
This will load and attach the data set as well as listing the first several rows. This is a good idea so you can see the types of data in the set.
Now try the following commands:
mean(weight)
## [1] 121.8183
Which will calculate the arithmetic mean of the values of the variable “weight”.
median(weight)
## [1] 103
Which will return the middle value of the scores
range(weight)
## [1] 35 373
Which returns the high and low scores of the variable
quantile(weight)
## 0% 25% 50% 75% 100%
## 35.00 63.00 103.00 163.75 373.00
Which gives the high score, low score, 25th, 50th, and 75th percentile values
sd(weight)
## [1] 71.07196
var(weight)
## [1] 5051.223
Which return the standard deviation and variance of the weights
summary(weight)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 35.0 63.0 103.0 121.8 163.8 373.0
Which returns many of the above values (but not all). You may also want to try some plotting with the built in data sets:
boxplot(weight ~ Diet)
Creates a boxplot of weight as a function of Diet
plot(weight ~ Diet)
Produces the same plot but automatically labels the axes (who knows why)
plot(weight, Diet)
Produces a rather useless scatterplot of the data as there are only four levels of diet
hist(weight)
Produces a histogram of the distribution of weights.
All of these polts can be fine tuned with information which can be found in Mike Marin’s Series 2 R videos.