Text can be decorated with bold or italics. It is also possible to
Be sure to put a space after the * when you are creating bullets and a space after # when creating section headers, but not between $ and the mathematical formulas.
We will be using RStudio server for now, however later on as you ease in you might want to use RStudio, which is an interface (IDE) to the program R, in conjunction with R on your PC/Laptop. In order to do so
If the code of an R chunk produces a plot, this plot can be displayed in the resulting file.
xyplot(births ~ date, data=Births78)
Other forms of R output are also displayed as they are produced.
favstats(~ births, data=Births78)
## min Q1 median Q3 max mean sd n missing
## 7135 8554 9218 9705 10711 9132.162 817.8821 365 0
This file can be knit to HTML, PDF, or Word. In RStudio, just select the desired output file type and click on Knit HTML
, Knit PDF
, or Knit Word
. Use the dropdown menu next to that to change the desired file type.
a <- c(1.8, 4.5) #numeric
b <- c(1 + 2i, 3 - 6i) #complex
d <- c(23, 44) #integer
e <- vector("logical", length = 5)
a
## [1] 1.8 4.5
b
## [1] 1+2i 3-6i
d
## [1] 23 44
e
## [1] FALSE FALSE FALSE FALSE FALSE
class(qt) # to check the class of an R object
## [1] "function"
my_list <- list(22, "ab", TRUE, 1 + 2i) # Vector with elements of different types
my_list
## [[1]]
## [1] 22
##
## [[2]]
## [1] "ab"
##
## [[3]]
## [1] TRUE
##
## [[4]]
## [1] 1+2i
my_matrix <- matrix(1:6, nrow=3, ncol=2) #matrix
my_matrix
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
dim(my_matrix) #checking dimensions of the matrix
## [1] 3 2
b1 <- 0:5
b1
## [1] 0 1 2 3 4 5
class(b1)
## [1] "integer"
b2<-as.factor(b1)
b2
## [1] 0 1 2 3 4 5
## Levels: 0 1 2 3 4 5
class(b2)
## [1] "factor"
a1<-3+2 # Simple arithmetic, a1 is the object name which will store the result
a1
## [1] 5
a2<-c(1,2,3,4) # Note that as above the symbol c is concatination operator used to create vectors
a2 # the object a2 is called a vector
## [1] 1 2 3 4
a3<-sum(a2) # adds up all elements of a2
a3
## [1] 10
a4<-20:30 # creating a sequence incrementing by 1
a4
## [1] 20 21 22 23 24 25 26 27 28 29 30
#To obtain help on any of the commands, type the name of the command you wish help on:
?hist
head(Births78) # to see first few columns
## date births dayofyear wday
## 1 1978-01-01 7701 1 Sun
## 2 1978-01-02 7527 2 Mon
## 3 1978-01-03 8825 3 Tues
## 4 1978-01-04 8859 4 Wed
## 5 1978-01-05 9043 5 Thurs
## 6 1978-01-06 9208 6 Fri
#To check the size (dimension) of the data frame, type
dim(Births78)
## [1] 365 4
str(Births78) # compactly displays internal structure of an R object in this case dataframe
## 'data.frame': 365 obs. of 4 variables:
## $ date : POSIXct, format: "1978-01-01" "1978-01-02" ...
## $ births : int 7701 7527 8825 8859 9043 9208 8084 7611 9172 9089 ...
## $ dayofyear: int 1 2 3 4 5 6 7 8 9 10 ...
## $ wday : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tues"<..: 1 2 3 4 5 6 7 1 2 3 ...
summary(Births78)
## date births dayofyear wday
## Min. :1978-01-01 Min. : 7135 Min. : 1 Sun :53
## 1st Qu.:1978-04-02 1st Qu.: 8554 1st Qu.: 92 Mon :52
## Median :1978-07-02 Median : 9218 Median :183 Tues :52
## Mean :1978-07-02 Mean : 9132 Mean :183 Wed :52
## 3rd Qu.:1978-10-01 3rd Qu.: 9705 3rd Qu.:274 Thurs:52
## Max. :1978-12-31 Max. :10711 Max. :365 Fri :52
## Sat :52
If you strike the
The columns are the variables. There are two types of variables: numeric, for example, number of births and factor (also called categorical), for example work day. The rows are called observations or cases.
The $ is one way to access the variables of a data frame. Another option, which we will learn later, is to create a new vector object by subsetting.
R is case-sensitive! for example a1 and A1 are considered different objects
barplot(table(Births78$wday))
hist(Births78$births)
boxplot(Births78$births)
boxplot(births ~ wday, data = Births78) #births grouped by week day
The boxplot command offers the option of using a formula syntax. Here, since we can specify the data set, we don’t have to use the $ to access the variables.
In general, the variables in a data frame are not immediately accessible for use try calling births, you will see an error
Error: object ‘births’ not found
If you use a variable frequently, you may want to extract it and store it in as a vector
b1<-Births78$births
mean(b1)
## [1] 9132.162
median(b1)
## [1] 9218
range(b1)
## [1] 7135 10711
var(b1)
## [1] 668931.1
quantile(b1)
## 0% 25% 50% 75% 100%
## 7135 8554 9218 9705 10711
#The tapply command allows you to compute numeric summaries on values based on levels
#of a factor variable. For instance, find the mean or median births by wday,
bb<-tapply(b1, Births78$wday, median)
bb
## Sun Mon Tues Wed Thurs Fri Sat
## 7936.0 9321.0 9667.5 9361.5 9397.0 9544.5 8260.5
sd
## function (x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm",
## FALSE))
## {
## if (lazyeval::is_formula(x)) {
## if (is.null(data))
## data <- lazyeval::f_env(x)
## formula <- mosaic_formula_q(x, groups = groups, max.slots = 3)
## return(maggregate(formula, data = data, FUN = stats::sd,
## ..., na.rm = na.rm, .multiple = FALSE))
## }
## stats::sd(x, ..., na.rm = na.rm)
## }
## <environment: namespace:mosaic>
getwd() # you must set your WD to the current project folder
## [1] "/Users/TASNEEM/Downloads"
#setwd() # to set working directory
R has powerful indexing features for accessing object elements. These features can be used to select and exclude variables and observations.
z <- c(8, 3, 0, 9, 9, 2, 1, 3)
z1<-z[4]
z1 #z1 is the fourth element of z
## [1] 9
z2<-z[c(1, 4, 5)]
z2 #The first, fourth and fifth element
## [1] 8 9 9
z3<-z[-c(1, 3, 4)]
z3 #All elements except the first, third and fourth
## [1] 3 9 2 1 3
z4<-z[8:1]
z4 #The elements of z in reverse
## [1] 3 1 2 9 9 0 3 8
head(HELPrct)
## age anysubstatus anysub cesd d1 daysanysub dayslink drugrisk e2b female
## 1 37 1 yes 49 3 177 225 0 NA 0
## 2 37 1 yes 30 22 2 NA 0 NA 0
## 3 26 1 yes 39 0 3 365 20 NA 0
## 4 39 1 yes 15 2 189 343 0 1 1
## 5 32 1 yes 39 12 2 57 0 1 0
## 6 47 1 yes 6 1 31 365 0 NA 1
## sex g1b homeless i1 i2 id indtot linkstatus link mcs pcs
## 1 male yes housed 13 26 1 39 1 yes 25.111990 58.41369
## 2 male yes homeless 56 62 2 43 NA <NA> 26.670307 36.03694
## 3 male no housed 0 0 3 41 0 no 6.762923 74.80633
## 4 female no housed 5 5 4 28 0 no 43.967880 61.93168
## 5 male no homeless 10 13 5 38 1 yes 21.675755 37.34558
## 6 female no housed 4 4 6 29 0 no 55.508991 46.47521
## pss_fr racegrp satreat sexrisk substance treat
## 1 0 black no 4 cocaine yes
## 2 1 white no 7 alcohol yes
## 3 13 black no 2 heroin no
## 4 11 white yes 4 heroin no
## 5 10 black no 6 cocaine no
## 6 5 black no 5 cocaine yes
myvars<-c("age","racegrp","sex","substance","treat")
newdata<-HELPrct[myvars]
head(newdata)
## age racegrp sex substance treat
## 1 37 black male cocaine yes
## 2 37 white male alcohol yes
## 3 26 black male heroin no
## 4 39 white female heroin no
## 5 32 black male cocaine no
## 6 47 black female cocaine yes
newdata1<-newdata[c(2,6:8),]
head(newdata1)
## age racegrp sex substance treat
## 2 37 white male alcohol yes
## 6 47 black female cocaine yes
## 7 49 black female cocaine no
## 8 28 white male alcohol yes
newdata3<-newdata[c(-3,-4),]
head(newdata3)
## age racegrp sex substance treat
## 1 37 black male cocaine yes
## 2 37 white male alcohol yes
## 5 32 black male cocaine no
## 6 47 black female cocaine yes
## 7 49 black female cocaine no
## 8 28 white male alcohol yes
newdata4<-newdata[1:5,]
head(newdata4)
## age racegrp sex substance treat
## 1 37 black male cocaine yes
## 2 37 white male alcohol yes
## 3 26 black male heroin no
## 4 39 white female heroin no
## 5 32 black male cocaine no
#based on variable value
newdata5<-newdata[which(newdata$sex=="female"),]
head(newdata5)
## age racegrp sex substance treat
## 4 39 white female heroin no
## 6 47 black female cocaine yes
## 7 49 black female cocaine no
## 9 50 white female alcohol no
## 11 34 white female heroin yes
## 12 58 black female alcohol no
#or
attach(newdata)
newdata6<-newdata[which(sex=="female" & age>45),]
head(newdata6)
## age racegrp sex substance treat
## 6 47 black female cocaine yes
## 7 49 black female cocaine no
## 9 50 white female alcohol no
## 12 58 black female alcohol no
## 25 48 black female cocaine no
## 122 50 white female alcohol no
detach(newdata)
newdata7<-subset(newdata,age>=45|age<55,select=c(age,sex,substance))
head(newdata7)
## age sex substance
## 1 37 male cocaine
## 2 37 male alcohol
## 3 26 male heroin
## 4 39 female heroin
## 5 32 male cocaine
## 6 47 female cocaine
newdata8<-subset(newdata,sex=="female" & age>50,select=c(age,sex,substance))
head(newdata8)
## age sex substance
## 12 58 female alcohol
## 156 57 female alcohol
## 225 55 female heroin
# take a random sample of size 50 from a dataset
# sample without replacement
mysample <- HELPrct[sample(1:nrow(HELPrct), 50,
replace=FALSE),]
The Arbuthnot data set refers to Dr. John Arbuthnot, an 18th century physician, writer, and mathematician. He was interested in the ratio of newborn boys to newborn girls, so he gathered the baptism records for children born in London for every year from 1629 to 1710.
source("http://www.openintro.org/stat/data/arbuthnot.R")
head(arbuthnot)
## year boys girls
## 1 1629 5218 4683
## 2 1630 4858 4457
## 3 1631 4422 4102
## 4 1632 4994 4590
## 5 1633 5158 4839
## 6 1634 5035 4820
plot(x = arbuthnot$year, y = arbuthnot$girls, type = "l")
?plot
# Plot of the total number of baptisms per year with the command
plot(arbuthnot$year, arbuthnot$boys + arbuthnot$girls, type = "l")
# plot of the proportion of boys over time
plot(x = arbuthnot$year, y = arbuthnot$boys, type = "l")
Open up a new project, call it RLAb1 and save it in your working directory.
Load up the present day birth records of USA, data with the following command.
source("http://www.openintro.org/stat/data/present.R")
The data are stored in a data frame called present.
What years are included in this data set? What are the dimensions of the data frame and what are the variable or column names?
How do these counts compare to Arbuthnot’s? Are they on a similar scale?
Make a plot that displays the boy-to-girl ratio for every year in the data set. What do you see? Is it right to claim that boys are born in greater proportion than girls in the U.S.? Include the plot in your response.
In what year did we see the most total number of births in the U.S.? You can refer to the help files or the R reference card linksto find helpful commands.
Obtain a subset of data which has only females, who are born in the year 1956 or later.
These data come from a report by the Centers for Disease Control links
If you’re interested in learning more, or find more labs for practice at links
It’s useful to record some information about how your file was created.
mosaic
package version: 0.14.4## R version 3.3.3 (2017-03-06)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: macOS 10.15.6
##
## locale:
## [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] mosaic_0.14.4 Matrix_1.2-10 mosaicData_0.14.0 ggplot2_2.2.1
## [5] lattice_0.20-35 dplyr_0.5.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.0 knitr_1.21 magrittr_1.5 MASS_7.3-51.1
## [5] splines_3.3.3 munsell_0.4.3 colorspace_1.3-2 R6_2.2.1
## [9] ggdendro_0.1-20 rlang_0.1.6 stringr_1.2.0 plyr_1.8.4
## [13] tools_3.3.3 grid_3.3.3 gtable_0.2.0 xfun_0.4
## [17] DBI_0.6-1 htmltools_0.3.6 yaml_2.2.0 lazyeval_0.2.0
## [21] assertthat_0.2.0 digest_0.6.12 tibble_1.3.1 gridExtra_2.2.1
## [25] tidyr_0.6.3 evaluate_0.10 rmarkdown_1.11 stringi_1.1.5
## [29] scales_0.4.1