R markdown is an extra layer to R programming, but ultimately it is one of the best for presenting analyses and plots in ways understandable to non-R users, but also in a reproducible way in which your code can be copied and your analyses rerun with precision. This is different from other statistic programs that use GUI (graphical user interfaces) with point and click functions. While these may be easier to learn and intuitive, they are not reproducible and shareable in the precise way that R code and R markdown documents are.
In this lab, we will illustrate the basic functions of R Markdown and R.
Do you know why headers have ##? This tells Markdown that the text following will be a header of some sort. The more ##’s you put the more of a sub header it becomes in the knitted document. This is incredibly useful for breaking apart documents into meaningful units. You will see these ## at the beginning of each header and sub header.
The other thing to note are the gray boxes. These gray boxes are the code chunks. Three ticks tells knitr that you are creating a code chunk. The {r} tells knitr what language of code is used in this chunk. The final three ticks indicate that the code chunk has ended. It is important that all three of these are present when creating a new code chunk.
Try creating one now! Look ahead at the other code chunks if you are having trouble.
Next, if you look to the left, you will see that lines are numbered. This is to help you and R easily reference sections of code. If you get an error, R will tell you in the console where it encountered that error. In addition, R will sometimes show red error lines in code script. This is R telling you that there is an issue. Most of the time, this is a misspelling, a missing comma (,), or a missing parentheses ().
Next to these numbers you will see little up and down arrows. These are tabs on chunks and headers. If they are pointing up, that means you have hidden that sections and if they are down, you have revealed that chunk/section. If you think code has disappeared, it might just be hidden.
Last, within the code chunks (gray boxes) you will see a green arrow on the left hand side. This is the run button and it will run the code in code chunk and print the results below. This is one way to run the code. Another would be to copy the code from the chunk and paste it in the consule below.
Note, that R reads code from beginning to end. If you try and run code but haven’t run the code that created the objects in that code, it will give you an error. You can always check if the object you are calling is loaded by looking for it in the Environment window in the top right corner of RStudio.
R is an incredibly powerful tool, but it is also a calculator. To illustrate this and the information covered above, lets run through some basics.
2+2
## [1] 4
7-1
## [1] 6
9*2
## [1] 18
27/3
## [1] 9
Describe what just happened!
EKA: The Calculations are done.
Now, lets warm up for some statistics. To do statistics you will need data. Most of the data today will be with “random” numbers.
runif(1)
## [1] 0.08196117
You can run the following to learn more about any function.
# <- these little symbols tell R not to read this as code and to exclude it. So to run the below, you will need to first remove the # and then add the additional code. But, since this is a help page, just copy the code without the # and paste it in the console.
#?runif
runif(n=1, min=5, max=16) #note that runif() defaults to min=0 and max=1, but you can change these to any numbers you want.
## [1] 5.645783
Note n, min, max. What are
these in the context of R? Function, parameter, variable, or data? EKA:
n refer to the number of observations EKA: Min and Max defines the lower
and upper limits for the calculations
What about runif? EKA: a command, that only runs if
a certain parameter is true.
With R it is possibly to name (part of your) data with a name of your choice. This is known as creating an object. R is an object-oriented programming language. For now your “data” will be a random number between 0 and 1.
r1 (the <- part of
the code means “give the name”)r1 <- runif(1)
The code only generated the number, but did not display it.
r1
## [1] 0.7018028
print(r1)
## [1] 0.7018028
Note that in your environment in the top right you now see the object
r1 along with its value.
print() function does? EKA: Used
for exporting data based on different factorsNow, let us make some more random numbers:
r2 <- runif(1)
r2
## [1] 0.5933874
r3 <- runif(1)
r3
## [1] 0.7913255
r4 <- runif(1)
r4
## [1] 0.754808
r5 <- runif(1)
r5
## [1] 0.6424698
r6 <- runif(1)
r6
## [1] 0.9410589
Now it is your turn to write some code! Look carefully at the code chunks above. What do you notice about those that are R code and those that are text (like this) and those that are headers.
In your new R chunk write a code that generates and prints 4 more
random numbers: r7,r8,r9,r10
EKA: The () refers to ???
r7 <- runif(1)
r7
## [1] 0.9393032
r8 <- runif(1)
r8
## [1] 0.1126541
r9 <- runif(1)
r9
## [1] 0.9618108
r10 <- runif(1)
r10
## [1] 0.945175
Soon we are ready do some statistics on your random numbers.
Bur first we make a list (vector) with all you numbers. For that we
use the c()-concatenate.
r7, r8, r9 and r10r_list <- c(r1,r2,r3,r4,r5,r6,r7,r8,r9,r10)
print(r_list)# c means concatenate (def: link together in a chain or series)
## [1] 0.7018028 0.5933874 0.7913255 0.7548080 0.6424698 0.9410589 0.9393032
## [8] 0.1126541 0.9618108 0.9451750
r_list
## [1] 0.7018028 0.5933874 0.7913255 0.7548080 0.6424698 0.9410589 0.9393032
## [8] 0.1126541 0.9618108 0.9451750
Now we can calculate the mean
IMPORTANT: But before you do this, try to think what will be the mean of your random numbers. Note, that we are creating uniform distributions.
r_mean <- mean(r_list)
mean_dif <- 0.5-0.3611859
Also try to extend the code chunk that saves the mean with the name “r_mean” and print it
Finally write a code chunk that calculates the difference between your guess and the actual mean - you can get inspiration from exercise 0
In the exercises above we created all the random numbers one at the time. We could also have created them at once
r_long <- runif(100)
r_long
## [1] 0.730070238 0.658520974 0.972493692 0.659948956 0.739603211 0.436245293
## [7] 0.711156723 0.550769480 0.636405011 0.784397414 0.930321685 0.086628236
## [13] 0.765174360 0.513990997 0.698986398 0.342948998 0.278876404 0.652822978
## [19] 0.499494605 0.547732381 0.105259286 0.454860187 0.476615187 0.747203789
## [25] 0.822798520 0.883884885 0.437633683 0.741495993 0.928417177 0.196087902
## [31] 0.373623308 0.963736031 0.873422671 0.972338451 0.053403564 0.359785156
## [37] 0.104324209 0.443684720 0.828328293 0.586310610 0.611180832 0.692971979
## [43] 0.672866126 0.054789532 0.009874568 0.729656289 0.152138554 0.774589846
## [49] 0.413302465 0.361305832 0.875190920 0.720269777 0.837610557 0.573141821
## [55] 0.570442036 0.258253170 0.402558297 0.698314931 0.737297381 0.834738616
## [61] 0.268925380 0.200546464 0.230257859 0.526267602 0.158737940 0.607945477
## [67] 0.814985045 0.369135464 0.723496686 0.345437354 0.018521146 0.146327385
## [73] 0.756736601 0.647371715 0.908180137 0.609784047 0.687517355 0.093861284
## [79] 0.624294871 0.332607105 0.270216087 0.037475637 0.772392970 0.083122959
## [85] 0.062133359 0.781305026 0.383265236 0.434085222 0.944576579 0.133595499
## [91] 0.218057332 0.894429188 0.828530233 0.317273935 0.096979904 0.821751233
## [97] 0.364403016 0.364448656 0.950836806 0.144320450
The (unfinished) code chunk bellow will calculate some standard statistics of r_long - that is the mean, the minimum and the maximum.
Again each group member makes a guess for the mean, the minimum and the maximum.
After you have made your guesses try to finish and run the code chunk to see who had the better guess.
gs_mean <- 0.46
gs_min <- 0.87
gs_max <- 0.12
# <- these little symbols tell R not to read this as code and to exclude it. So to run the below, you will need to first remove the # and then add the additional code.
lg_mean <- mean(r_long)
lg_max <- max(r_long)
lg_min <- min(r_long)
df_meanlg <- gs_mean-lg_mean
df_maxlg <- gs_max-lg_max
df_minlg <- gs_min-lg_min
A yet simpler way to obtain standard statistics on you data set is use the “summary()” function
summary(r_long)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.009875 0.307675 0.571792 0.525044 0.749587 0.972494
In the final exercise today we will move a little beyond only using numbers to describing and understanding our data set (which as a reminder is a list of 100 random numbers between 0 and 1).
Instead we will try to make graphical visualizations of the data set. Such graphics are often called a plot in statistics lingo.
hist(r_long)
We will discuss histograms a lot more in later lectures, but for now we will just use what you already know
What does the histogram show?
Run the code chunk below to make a even larger data set,
r_longer, with 10000 random number
r_longer <- runif(n=10000,min=0,max=1)
summary(r_longer)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.245e-05 2.537e-01 5.038e-01 5.035e-01 7.575e-01 1.000e+00
hist(r_longer)
Make a histogram of r_longer and describe how it looks.
EKA: with that many unique numbers, most will look similar - many will begin with 0,1-9
What is the maximum number of unique numbers we can generate between 0 and 1?
EKA: Infinity
Sweet! Now lets move on to some archaeological examples.
First, let’s talk about R packages. We are going to use a number of
packages, but for data we will rely on two packages. The first you
encountered in class, which is yarrr. The second is
archdata. First we need to install archdata.
We did this with knitr in the first class. In the viewer pane in the
lower right corner, select packages and install, then type
archdata and click on it when it appears. Once installed,
we will need to load it.
Once you have a package installed on your machine, all you need to do is tell R to load it. Below we will load the package we just installed.
library(archdata)
Next, lets load some measurements on Early and Late Bronze Age
ceramic cups from Italy analysed by Lukesh and Howe (1978). The data are
stored within the package archdata. Below, we use the
data() function to load it into R. Note that most data are
not stored within R and we will learn how to load your own data from
spreadsheets later.
data("BACups")
#na___()
Use the ?help function to learn what the variable names represent.
?BACups
What do H and ND represent? EKA: Total height and Neck Diameter
What type of data do you think these are? EKA: Count? As there is true measurements
Next, let’s try and figure how to describe these data. We will focus on height for now.
Create a code chunk below and find out the mean, min, and max
heights of BACups.
Next, let’s make a new code chunk and create a histrogram of the
BACups heights.
Are you able to add a line indicating the mean? (See the lecture slide!)
Try renaming the main title, x axis label, and y label.
summary(BACups)
## RD ND SD H
## Min. : 6.600 Min. : 6.200 Min. : 7.000 Min. : 3.300
## 1st Qu.: 9.725 1st Qu.: 8.975 1st Qu.: 9.875 1st Qu.: 4.950
## Median :12.050 Median :10.900 Median :12.000 Median : 6.300
## Mean :14.020 Mean :13.040 Mean :14.063 Mean : 6.987
## 3rd Qu.:18.500 3rd Qu.:17.000 3rd Qu.:18.125 3rd Qu.: 8.850
## Max. :29.500 Max. :28.000 Max. :28.000 Max. :14.300
## NH Phase
## Min. :1.400 Protoapennine:20
## 1st Qu.:2.275 Subapennine :40
## Median :3.000
## Mean :3.155
## 3rd Qu.:4.000
## Max. :5.300
hist(BACups$H,
xlab="height (cm)",
ylab="count",
main="Cup height")
abline(v=mean(BACups$H))
abline(h=mean(BACups$H))
Let’s try this again, but with a different variable of
BACups. Choose your favorite!
What is the variables mean, min, and max (show in code chunk).
Can you make a histogram?
summary(BACups)
## RD ND SD H
## Min. : 6.600 Min. : 6.200 Min. : 7.000 Min. : 3.300
## 1st Qu.: 9.725 1st Qu.: 8.975 1st Qu.: 9.875 1st Qu.: 4.950
## Median :12.050 Median :10.900 Median :12.000 Median : 6.300
## Mean :14.020 Mean :13.040 Mean :14.063 Mean : 6.987
## 3rd Qu.:18.500 3rd Qu.:17.000 3rd Qu.:18.125 3rd Qu.: 8.850
## Max. :29.500 Max. :28.000 Max. :28.000 Max. :14.300
## NH Phase
## Min. :1.400 Protoapennine:20
## 1st Qu.:2.275 Subapennine :40
## Median :3.000
## Mean :3.155
## 3rd Qu.:4.000
## Max. :5.300
hist(BACups$ND,
xlab= "diameter (cm)",
ylab= "count",
main= "Neck diameter")
abline(v=mean(BACups$H))
abline(h=mean(BACups$H))
Once you have completed the above, click the ‘knit’ button in the top of this window to compile it into a tidy html document. If you have any errors in your code, it will produce and error and show you where that error is.
Once the document is created, look through it and see if there are any issues. This HTML document is like a document in the form of a webpage. If you send the HTML document to someone, it will open in their web browser and show them the knitted document. This is another aspect of R and knitr that make the work widely distributable. Anyone with a web browser can access your work! No need to install Adobe or MS Word.