Text can be decorated with bold or italics. It is also possible to
Be sure to put a space after the * when you are creating bullets and a space after # when creating section headers, but not between $ and the mathematical formulas.
We will be using RStudio server for now, however later on as you ease in you might want to use RStudio, which is an interface (IDE) to the program R, in conjunction with R on your PC/Laptop. In order to do so
If the code of an R chunk produces a plot, this plot can be displayed in the resulting file.
xyplot(births ~ date, data=Births78)
Other forms of R output are also displayed as they are produced.
favstats(~ births, data=Births78)
## min Q1 median Q3 max mean sd n missing
## 7135 8554 9218 9705 10711 9132.162 817.8821 365 0
This file can be knit to HTML, PDF, or Word. In RStudio, just select the desired output file type and click on Knit HTML
, Knit PDF
, or Knit Word
. Use the dropdown menu next to that to change the desired file type.
a <- c(1.8, 4.5) #numeric
b <- c(1 + 2i, 3 - 6i) #complex
d <- c(23, 44) #integer
e <- vector("logical", length = 5)
a
## [1] 1.8 4.5
b
## [1] 1+2i 3-6i
d
## [1] 23 44
e
## [1] FALSE FALSE FALSE FALSE FALSE
class(qt) # to check the class of an R object
## [1] "function"
my_list <- list(22, "ab", TRUE, 1 + 2i) # Vector with elements of different types
my_list
## [[1]]
## [1] 22
##
## [[2]]
## [1] "ab"
##
## [[3]]
## [1] TRUE
##
## [[4]]
## [1] 1+2i
my_matrix <- matrix(1:6, nrow=3, ncol=2) #matrix
my_matrix
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
dim(my_matrix) #checking dimensions of the matrix
## [1] 3 2
b1 <- 0:5
b1
## [1] 0 1 2 3 4 5
class(b1)
## [1] "integer"
b2<-as.factor(b1)
b2
## [1] 0 1 2 3 4 5
## Levels: 0 1 2 3 4 5
class(b2)
## [1] "factor"
a1<-3+2 # Simple arithmetic, a1 is the object name which will store the result
a1
## [1] 5
a2<-c(1,2,3,4) # Note that as above the symbol c is concatination operator used to create vectors
a2 # the object a2 is called a vector
## [1] 1 2 3 4
a3<-sum(a2) # adds up all elements of a2
a3
## [1] 10
a4<-20:30 # creating a sequence incrementing by 1
a4
## [1] 20 21 22 23 24 25 26 27 28 29 30
#To obtain help on any of the commands, type the name of the command you wish help on:
?hist
head(Births78) # to see first few columns
## date births wday year month day_of_year day_of_month day_of_week
## 1 1978-01-01 7701 Sun 1978 1 1 1 1
## 2 1978-01-02 7527 Mon 1978 1 2 2 2
## 3 1978-01-03 8825 Tue 1978 1 3 3 3
## 4 1978-01-04 8859 Wed 1978 1 4 4 4
## 5 1978-01-05 9043 Thu 1978 1 5 5 5
## 6 1978-01-06 9208 Fri 1978 1 6 6 6
#To check the size (dimension) of the data frame, type
dim(Births78)
## [1] 365 8
str(Births78) # compactly displays internal structure of an R object in this case dataframe
## 'data.frame': 365 obs. of 8 variables:
## $ date : Date, format: "1978-01-01" "1978-01-02" ...
## $ births : int 7701 7527 8825 8859 9043 9208 8084 7611 9172 9089 ...
## $ wday : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tue"<..: 1 2 3 4 5 6 7 1 2 3 ...
## $ year : num 1978 1978 1978 1978 1978 ...
## $ month : num 1 1 1 1 1 1 1 1 1 1 ...
## $ day_of_year : int 1 2 3 4 5 6 7 8 9 10 ...
## $ day_of_month: int 1 2 3 4 5 6 7 8 9 10 ...
## $ day_of_week : num 1 2 3 4 5 6 7 1 2 3 ...
summary(Births78)
## date births wday year
## Min. :1978-01-01 Min. : 7135 Sun:53 Min. :1978
## 1st Qu.:1978-04-02 1st Qu.: 8554 Mon:52 1st Qu.:1978
## Median :1978-07-02 Median : 9218 Tue:52 Median :1978
## Mean :1978-07-02 Mean : 9132 Wed:52 Mean :1978
## 3rd Qu.:1978-10-01 3rd Qu.: 9705 Thu:52 3rd Qu.:1978
## Max. :1978-12-31 Max. :10711 Fri:52 Max. :1978
## Sat:52
## month day_of_year day_of_month day_of_week
## Min. : 1.000 Min. : 1 Min. : 1.00 Min. :1.000
## 1st Qu.: 4.000 1st Qu.: 92 1st Qu.: 8.00 1st Qu.:2.000
## Median : 7.000 Median :183 Median :16.00 Median :4.000
## Mean : 6.526 Mean :183 Mean :15.72 Mean :3.992
## 3rd Qu.:10.000 3rd Qu.:274 3rd Qu.:23.00 3rd Qu.:6.000
## Max. :12.000 Max. :365 Max. :31.00 Max. :7.000
##
If you strike the
The columns are the variables. There are two types of variables: numeric, for example, number of births and factor (also called categorical), for example work day. The rows are called observations or cases.
The $ is one way to access the variables of a data frame. Another option, which we will learn later, is to create a new vector object by subsetting.
R is case-sensitive! for example a1 and A1 are considered different objects
barplot(table(Births78$wday))
hist(Births78$births)
boxplot(Births78$births)
boxplot(births ~ wday, data = Births78) #births grouped by week day
The boxplot command offers the option of using a formula syntax. Here, since we can specify the data set, we don’t have to use the $ to access the variables.
In general, the variables in a data frame are not immediately accessible for use try calling births, you will see an error
Error: object ‘births’ not found
If you use a variable frequently, you may want to extract it and store it in as a vector
b1<-Births78$births
mean(b1)
## [1] 9132.162
median(b1)
## [1] 9218
range(b1)
## [1] 7135 10711
var(b1)
## [1] 668931.1
quantile(b1)
## 0% 25% 50% 75% 100%
## 7135 8554 9218 9705 10711
#The tapply command allows you to compute numeric summaries on values based on levels
#of a factor variable. For instance, find the mean or median births by wday,
bb<-tapply(b1, Births78$wday, median)
bb
## Sun Mon Tue Wed Thu Fri Sat
## 7936.0 9321.0 9667.5 9361.5 9397.0 9544.5 8260.5
sd
## function (x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm",
## FALSE))
## {
## if (lazyeval::is_formula(x)) {
## if (is.null(data))
## data <- lazyeval::f_env(x)
## formula <- mosaicCore::mosaic_formula_q(x, groups = groups,
## max.slots = 3)
## return(maggregate(formula, data = data, FUN = stats::sd,
## ..., na.rm = na.rm, .multiple = FALSE))
## }
## stats::sd(x, ..., na.rm = na.rm)
## }
## <bytecode: 0x7f96f7ef26d0>
## <environment: namespace:mosaic>
getwd() # you must set your WD to the current project folder
## [1] "/Users/tzaihra/Downloads"
#setwd() # to set working directory
R has powerful indexing features for accessing object elements. These features can be used to select and exclude variables and observations.
z <- c(8, 3, 0, 9, 9, 2, 1, 3)
z1<-z[4]
z1 #z1 is the fourth element of z
## [1] 9
z2<-z[c(1, 4, 5)]
z2 #The first, fourth and fifth element
## [1] 8 9 9
z3<-z[-c(1, 3, 4)]
z3 #All elements except the first, third and fourth
## [1] 3 9 2 1 3
z4<-z[8:1]
z4 #The elements of z in reverse
## [1] 3 1 2 9 9 0 3 8
head(HELPrct)
## age anysubstatus anysub cesd d1 daysanysub dayslink drugrisk e2b female
## 1 37 1 yes 49 3 177 225 0 NA 0
## 2 37 1 yes 30 22 2 NA 0 NA 0
## 3 26 1 yes 39 0 3 365 20 NA 0
## 4 39 1 yes 15 2 189 343 0 1 1
## 5 32 1 yes 39 12 2 57 0 1 0
## 6 47 1 yes 6 1 31 365 0 NA 1
## sex g1b homeless i1 i2 id indtot linkstatus link mcs pcs
## 1 male yes housed 13 26 1 39 1 yes 25.111990 58.41369
## 2 male yes homeless 56 62 2 43 NA <NA> 26.670307 36.03694
## 3 male no housed 0 0 3 41 0 no 6.762923 74.80633
## 4 female no housed 5 5 4 28 0 no 43.967880 61.93168
## 5 male no homeless 10 13 5 38 1 yes 21.675755 37.34558
## 6 female no housed 4 4 6 29 0 no 55.508991 46.47521
## pss_fr racegrp satreat sexrisk substance treat avg_drinks max_drinks
## 1 0 black no 4 cocaine yes 13 26
## 2 1 white no 7 alcohol yes 56 62
## 3 13 black no 2 heroin no 0 0
## 4 11 white yes 4 heroin no 5 5
## 5 10 black no 6 cocaine no 10 13
## 6 5 black no 5 cocaine yes 4 4
myvars<-c("age","racegrp","sex","substance","treat")
newdata<-HELPrct[myvars]
head(newdata)
## age racegrp sex substance treat
## 1 37 black male cocaine yes
## 2 37 white male alcohol yes
## 3 26 black male heroin no
## 4 39 white female heroin no
## 5 32 black male cocaine no
## 6 47 black female cocaine yes
newdata1<-HELPrct[c(1,6:8)]
head(newdata1)
## age daysanysub dayslink drugrisk
## 1 37 177 225 0
## 2 37 2 NA 0
## 3 26 3 365 20
## 4 39 189 343 0
## 5 32 2 57 0
## 6 47 31 365 0
newdata3<-newdata[c(-1,-2)]
head(newdata3)
## sex substance treat
## 1 male cocaine yes
## 2 male alcohol yes
## 3 male heroin no
## 4 female heroin no
## 5 male cocaine no
## 6 female cocaine yes
newdata4<-newdata[1:5,]
head(newdata4)
## age racegrp sex substance treat
## 1 37 black male cocaine yes
## 2 37 white male alcohol yes
## 3 26 black male heroin no
## 4 39 white female heroin no
## 5 32 black male cocaine no
#based on variable value
newdata5<-newdata[which(newdata$sex=="female"),]
head(newdata5)
## age racegrp sex substance treat
## 4 39 white female heroin no
## 6 47 black female cocaine yes
## 7 49 black female cocaine no
## 9 50 white female alcohol no
## 11 34 white female heroin yes
## 12 58 black female alcohol no
#or
attach(newdata)
newdata6<-newdata[which(sex=="female" & age>45),]
head(newdata6)
## age racegrp sex substance treat
## 6 47 black female cocaine yes
## 7 49 black female cocaine no
## 9 50 white female alcohol no
## 12 58 black female alcohol no
## 25 48 black female cocaine no
## 122 50 white female alcohol no
detach(newdata)
newdata7<-subset(newdata,age>=45|age<55,select=c(age,sex,substance))
head(newdata7)
## age sex substance
## 1 37 male cocaine
## 2 37 male alcohol
## 3 26 male heroin
## 4 39 female heroin
## 5 32 male cocaine
## 6 47 female cocaine
newdata8<-subset(newdata,sex=="female" & age>50,select=c(age,sex,substance))
head(newdata8)
## age sex substance
## 12 58 female alcohol
## 156 57 female alcohol
## 225 55 female heroin
# take a random sample of size 50 from a dataset
# sample without replacement
mysample <- HELPrct[sample(1:nrow(HELPrct), 50,
replace=FALSE),]
The Arbuthnot data set refers to Dr. John Arbuthnot, an 18th century physician, writer, and mathematician. He was interested in the ratio of newborn boys to newborn girls, so he gathered the baptism records for children born in London for every year from 1629 to 1710.
source("http://www.openintro.org/stat/data/arbuthnot.R")
head(arbuthnot)
## year boys girls
## 1 1629 5218 4683
## 2 1630 4858 4457
## 3 1631 4422 4102
## 4 1632 4994 4590
## 5 1633 5158 4839
## 6 1634 5035 4820
plot(x = arbuthnot$year, y = arbuthnot$girls, type = "l")
?plot
# Plot of the total number of baptisms per year with the command
plot(arbuthnot$year, arbuthnot$boys + arbuthnot$girls, type = "l")
# plot of the proportion of boys over time
plot(x = arbuthnot$year, y = arbuthnot$boys, type = "l")
Open up a new project, call it RLAb1 and save it in your working directory.
Load up the present day birth records of USA, data with the following command.
source("http://www.openintro.org/stat/data/present.R")
The data are stored in a data frame called present.
What years are included in this data set? What are the dimensions of the data frame and what are the variable or column names?
How do these counts compare to Arbuthnot’s? Are they on a similar scale?
Make a plot that displays the boy-to-girl ratio for every year in the data set. What do you see? Is it right to claim that boys are born in greater proportion than girls in the U.S.? Include the plot in your response.
In what year did we see the most total number of births in the U.S.? You can refer to the help files or the R reference card linksto find helpful commands.
Obtain a subset of data which has only females, who are born in the year 1956 or later.
These data come from a report by the Centers for Disease Control links
If you’re interested in learning more, or find more labs for practice at links
It’s useful to record some information about how your file was created.
mosaic
package version: 1.5.0## R version 3.5.2 (2018-12-20)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] mosaic_1.5.0 Matrix_1.2-15 mosaicData_0.17.0 ggformula_0.9.1
## [5] ggstance_0.3.1 ggplot2_3.1.0 lattice_0.20-38 dplyr_0.7.8
##
## loaded via a namespace (and not attached):
## [1] tidyselect_0.2.5 xfun_0.7 purrr_0.2.5 splines_3.5.2
## [5] colorspace_1.4-0 generics_0.0.2 htmltools_0.3.6 yaml_2.2.0
## [9] rlang_0.3.1 pillar_1.3.1 later_0.8.0 glue_1.3.0
## [13] withr_2.1.2 bindrcpp_0.2.2 bindr_0.1.1 plyr_1.8.4
## [17] mosaicCore_0.6.0 stringr_1.3.1 munsell_0.5.0 gtable_0.2.0
## [21] htmlwidgets_1.3 evaluate_0.14 knitr_1.23 httpuv_1.5.1
## [25] crosstalk_1.0.0 broom_0.5.2 Rcpp_1.0.0 readr_1.3.1
## [29] xtable_1.8-4 scales_1.0.0 backports_1.1.4 promises_1.0.1
## [33] leaflet_2.0.2 mime_0.6 gridExtra_2.3 hms_0.4.2
## [37] digest_0.6.18 stringi_1.2.4 ggrepel_0.8.1 shiny_1.3.2
## [41] grid_3.5.2 tools_3.5.2 magrittr_1.5 lazyeval_0.2.1
## [45] tibble_2.0.1 ggdendro_0.1-20 crayon_1.3.4 tidyr_0.8.3
## [49] pkgconfig_2.0.2 MASS_7.3-51.1 assertthat_0.2.0 rmarkdown_1.13
## [53] R6_2.3.0 nlme_3.1-137 compiler_3.5.2