Phillips 4.5 Q1-Q5
- Create a new R script. Using comments, write your name, the date, and “Testing my Chapter 2 R Might” at the top of the script.
#Silvana Montanola
#January 22
#Testing my Chapter 2 R Might
- Which (if any) of the following objects names is/are invalid?
thisone <- 1
THISONE <- 2
#1This <- 3 #Invalid Name. Starts with 1
this.one <- 4
This.1 <- 5
ThIS.....ON...E <- 6
#This!On!e <- 7 #Invalid name.Contains characters that are not accepted
lkjasdfkjsdf <- 8
1This and This!On!e are invalid object names, since object names cannot start with numbers or contain characters like !
- 2015 was a good year for pirate booty - your ship collected 100,800 gold coins. Create an object called gold.in.2015 and assign the correct value to it.
gold.in.2015 <-100800 #Do not use , as dividers in numbers
#You must create the object and run the code before you can recall it in the console
gold.in.2015
[1] 100800
- Oops, during the last inspection we discovered that one of your pirates Skippy McGee hid 800 gold coins in his underwear. Go ahead and add those gold coins to the object gold.in.2015. Next, create an object called plank.list with the name of the pirate thief.
gold.in.2015 <- gold.in.2015 + 800 #800 coins hidden by Skippy McGee
gold.in.2015
[1] 101600
It is unclear whether the assignment meant to add the coins to the object (i.e reassign) or merely add them one time without affecting the gold.in.2015 object. In this case, gold.in.2015 now has a different value than the previous code, so recalling the previous code name will provide me with a dissimilar value.
plank.list <- "skippymcgee" #Always use "" for string data. Used all lowercase on name to prevent capitalization mistakes
plank.list
[1] "skippymcgee"
- Look at the code below. What will R return after the third line? Make a prediction, then test the code yourself. a <- 10 a + 10 a I predict that after the third line, R will return a as 10, since a+10 did not reassign a different value to a.
a <- 10
a+10
[1] 20
a
[1] 10
My prediction was correct. As Phillips states, ASSIGN IT AGAIN if you want the vector to have a different value
Phillips 5.4 Q1-Q9
- Create the vector [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] in three ways: once using c(), once using a:b, and once using seq().
x<- c(1,2,3,4,5,6,7,8,9,10) #Using the c function
x
[1] 1 2 3 4 5 6 7 8 9 10
y<- 1:10 #Using the a:b function
y
[1] 1 2 3 4 5 6 7 8 9 10
z<- seq(from=1, to=10, by=1) #Using the seq function
z
[1] 1 2 3 4 5 6 7 8 9 10
- Create the vector [2.1, 4.1, 6.1, 8.1] in two ways, once using c() and once using seq()
a<- c(2.1,4.1,6.1,8.1) #Using the c function
a
[1] 2.1 4.1 6.1 8.1
b<- seq(from= 2.1, to= 8.1, by= 2) #Using the seq function
b
[1] 2.1 4.1 6.1 8.1
- Create the vector [0, 5, 10, 15] in 3 ways: using c(), seq() with a by argument, and seq() with a length.out argument.
c<- c(0,5,10,15) #Using the c function
c
[1] 0 5 10 15
d<- seq(from= 0, to=15, by=5) #Using the seq function and by argument
d
[1] 0 5 10 15
e<- seq(from= 0, to= 15, length.out =4) #Using the seq function and length.out argument
e
[1] 0 5 10 15
The seq() with the by argument tells R to create the sequence once ever x amount of numbers. The seq() wit the length.out argument tells R to create the sequence with a total amount of x breaks.
- Create the vector [101, 102, 103, 200, 205, 210, 1000, 1100, 1200] using a combination of the c() and seq() functions
f<- c(seq(from= 101, to=103, by= 1), seq(from=200, to=210, length.out =3), seq(from= 1000, to=1200, by=100)) #Must add three different seq functions. Cannot combine all seq functions into one
f
[1] 101 102 103 200 205 210 1000 1100 1200
I used both length.out and by arguments to show that it can be done with a combination and still yield the same results.
- A new batch of 100 pirates are boarding your ship and need new swords. You have 10 scimitars, 40 broadswords, and 50 cutlasses that you need to distribute evenly to the 100 pirates as they board. Create a vector of length 100 where there is 1 scimitar, 4 broadswords, and 5 cutlasses in each group of 10. That is, in the first 10 elements there should be exactly 1 scimitar, 4 broadswords and 5 cutlasses. The next 10 elements should also have the same number of each sword (and so on).
swords10 <- c(rep("s", times= 1), rep("b", times= 4), rep("c", times =5)) #setting up a function that gives 10 pirates the exact amount needed of each sword, where "s" stands for scimitars, "b" stands for broadswords, and "c" stands for "cutlasses"
swords10
[1] "s" "b" "b" "b" "b" "c" "c" "c" "c" "c"
swords100 <- rep(swords10, times=10) #setting up a function that repeats the previous function 10 times so all the pirates have a sword.
swords100
[1] "s" "b" "b" "b" "b" "c" "c" "c" "c" "c" "s" "b" "b" "b" "b" "c" "c" "c"
[19] "c" "c" "s" "b" "b" "b" "b" "c" "c" "c" "c" "c" "s" "b" "b" "b" "b" "c"
[37] "c" "c" "c" "c" "s" "b" "b" "b" "b" "c" "c" "c" "c" "c" "s" "b" "b" "b"
[55] "b" "c" "c" "c" "c" "c" "s" "b" "b" "b" "b" "c" "c" "c" "c" "c" "s" "b"
[73] "b" "b" "b" "c" "c" "c" "c" "c" "s" "b" "b" "b" "b" "c" "c" "c" "c" "c"
[91] "s" "b" "b" "b" "b" "c" "c" "c" "c" "c"
swords10 creates a vector with 10 swords and the right amount of each type per 10 pirates. swords100 repeats the previous vector 10 times to give all pirates swords. These are alternative ways of solving the problem:
swordstry2 <-rep(c(rep("s", times= 1), rep("b", times= 4), rep("c", times =5)), times=10)
swordstry2
[1] "s" "b" "b" "b" "b" "c" "c" "c" "c" "c" "s" "b" "b" "b" "b" "c" "c" "c"
[19] "c" "c" "s" "b" "b" "b" "b" "c" "c" "c" "c" "c" "s" "b" "b" "b" "b" "c"
[37] "c" "c" "c" "c" "s" "b" "b" "b" "b" "c" "c" "c" "c" "c" "s" "b" "b" "b"
[55] "b" "c" "c" "c" "c" "c" "s" "b" "b" "b" "b" "c" "c" "c" "c" "c" "s" "b"
[73] "b" "b" "b" "c" "c" "c" "c" "c" "s" "b" "b" "b" "b" "c" "c" "c" "c" "c"
[91] "s" "b" "b" "b" "b" "c" "c" "c" "c" "c"
In swordstry2, I joined the two previous codes into one string, To do that, I simply pasted the code for the swords10 vector into the rep() function used to create the swords100. This code is less wordy, but a little bit harder to understand at first.
swordstry3 <-rep(c("s","b","c"), c(1,4,5)) #This is a faster, easier to understand version of swords10.
swordstry3
[1] "s" "b" "b" "b" "b" "c" "c" "c" "c" "c"
swordstry4 <-c(rep(rep(c("s","b","c"), c(1,4,5)), times=10)) #This is swords 100 in an easier format.
swordstry4
[1] "s" "b" "b" "b" "b" "c" "c" "c" "c" "c" "s" "b" "b" "b" "b" "c" "c" "c"
[19] "c" "c" "s" "b" "b" "b" "b" "c" "c" "c" "c" "c" "s" "b" "b" "b" "b" "c"
[37] "c" "c" "c" "c" "s" "b" "b" "b" "b" "c" "c" "c" "c" "c" "s" "b" "b" "b"
[55] "b" "c" "c" "c" "c" "c" "s" "b" "b" "b" "b" "c" "c" "c" "c" "c" "s" "b"
[73] "b" "b" "b" "c" "c" "c" "c" "c" "s" "b" "b" "b" "b" "c" "c" "c" "c" "c"
[91] "s" "b" "b" "b" "b" "c" "c" "c" "c" "c"
swordstry3 is an easier format in which the rep() function is only one, and the arguments for the repetition are given in two c() vectors. swordstry4 is the repetition of swordstry3 10 times.
6 Create a vector that repeats the integers from 1 to 5, 10 times. That is [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, …]. The length of the vector should be 50!
g<- rep(1:5, times =10)
g
[1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1
[37] 2 3 4 5 1 2 3 4 5 1 2 3 4 5
galt <- c(1:5)
galt
[1] 1 2 3 4 5
galt2 <- rep(galt, times =10) #wordier code but more clear to understand if I forget the gist of it
galt2
[1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1
[37] 2 3 4 5 1 2 3 4 5 1 2 3 4 5
galt is used within galt2 to create the same result as g did. The code is wordier, but it is easier to understand how the vectors come together
- Now, create the same vector as before, but this time repeat 1, 10 times, then 2, 10 times, etc., That is [1, 1, 1, …, 2, 2, 2, …, … 5, 5, 5]. The length of the vector should also be 50
h <- rep(c(1:5), each =10) #no need to put times =1 as a repeat of only once is understood
h
[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4
[37] 4 4 4 4 5 5 5 5 5 5 5 5 5 5
This is the same function as galt2, except that the argument times is defaulted as one, and I have added the argument each=10 to repeat each number 10 times.
- Create a vector containing 50 samples from a Normal distribution with a population mean of 20 and standard deviation of 2.
i<- rnorm(50, mean= 20, sd=2)
i
[1] 21.44587 21.77640 21.04310 20.14839 23.02322 18.66330 19.68733 15.33596
[9] 18.92312 24.32141 17.19646 17.75971 19.93962 18.56834 15.26917 17.92537
[17] 20.19802 19.05123 20.62410 22.35940 17.25237 21.13339 16.73193 17.78779
[25] 21.86034 20.00431 18.04051 24.30635 22.86871 21.80200 18.70542 21.57680
[33] 20.04373 24.75403 14.28635 22.06207 19.00333 20.46928 21.66797 22.57542
[41] 18.48260 20.35375 17.73336 20.77811 18.92271 23.01623 21.12487 22.39415
[49] 23.75842 18.85717
Created a vector from a Normal Distribution here.
- Create a vector containing 25 samples from a Uniform distribution with a lower bound of -100 and an upper bound of -50.
j<- runif(25, -100, -50) #I dont need to add min= and max=, since the function understands those arguments as implied in that order.
j
[1] -51.63333 -77.93536 -74.24919 -93.21453 -83.95558 -63.15832 -99.01899
[8] -54.87189 -70.16456 -82.75437 -98.61315 -85.43581 -71.12319 -68.97504
[15] -78.94778 -54.39112 -93.73673 -81.95808 -97.21357 -80.92272 -60.74707
[22] -63.26358 -72.90650 -90.70691 -50.66368
Drennan Ch.1
KRV <- data.frame(Area=c(12.8, 11.5, 14, 1.3, 10.3, 9.8, 2.3, 15.3, 11.2, 3.4, 12.8, 13.9, 9, 10.6, 9.9, 13.4, 8.7, 3.8, 11.7, 1.7, 12.3, 11, 2.9, 10.7, 7.4, 8.2, 2, 2.2, 4.5))
This is the data from the Drennan chapter. It needs to be run in the console before it can be used to create a histogram. I also loaded the data set called Scrapers
- Recreate the Kiskiminetas River Valley histogram in the chapter.Note the number of bins and how the bin members are decided. Is the cutoff at the bottom or top of the range? How can you adjust this?
KRV$Area #Subset only the area portion of the data frame in order to have a numeric value for the histogram
[1] 12.8 11.5 14.0 1.3 10.3 9.8 2.3 15.3 11.2 3.4 12.8 13.9 9.0 10.6
[15] 9.9 13.4 8.7 3.8 11.7 1.7 12.3 11.0 2.9 10.7 7.4 8.2 2.0 2.2
[29] 4.5
KRVhist<- hist(KRV$Area, main= "Areas of 29 Sites in the Kiskiminetas River Valley", xlab= "Area", breaks = c(1:16)) #Create a histogram. Making it an object and then running the object gives a breakdown summary of the histogram

KRVhist
$breaks
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
$counts
[1] 3 3 2 1 0 0 1 3 2 4 3 3 3 0 1
$density
[1] 0.10344828 0.10344828 0.06896552 0.03448276 0.00000000 0.00000000
[7] 0.03448276 0.10344828 0.06896552 0.13793103 0.10344828 0.10344828
[13] 0.10344828 0.00000000 0.03448276
$mids
[1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5
[15] 15.5
$xname
[1] "KRV$Area"
$equidist
[1] TRUE
attr(,"class")
[1] "histogram"
This one has 12 bins and 16 breaks. I had to ensure the breaks were 16 for the histogram to have the same cuts as the one on the book. This histogram has the cutoff at the bottom of the bin, since the margin value is added to the next bin and not the preceding one. This is why the histogram is not exactly like the one in the book. By readjusting the argument, I can readjust the bin cutoff to be at the top of the bin.
KRV1 <-hist(KRV$Area, main= "Areas of Sites Right=True", xlab= "Area",breaks = c(1:16), right=TRUE)

KRV2 <-hist(KRV$Area, main= "Areas of Sites Right =False", xlab= "Area", breaks=c(1:16), right=FALSE)

The right=FALSE argument ensures the cells are left-closed intervals or cutoff at the top of the range. Now, the histogram looks exactly like the one in the book.
- Make two histogram of the scraper length data with different bin sizes. Do you notice anything different in the data distribution when you change the number of breaks?
Scrapers$Length #Subset only the length data to have numeric values and plot them on a histogram
[1] 25.8 6.3 44.6 21.3 25.7 20.6 22.2 10.5 18.9 25.9 23.8 22.0 10.6 33.2
[15] 16.8 21.8 48.3 15.8 39.4 43.5 39.8 16.3 40.5 91.7 21.7 17.9 29.3 39.1
[29] 42.5 49.6 13.7 19.1 40.6 49.1 41.7 15.2 21.2 30.2 40.0 20.2 31.9 42.3
[43] 47.2 50.5 10.6 23.1 44.1 45.8
Scrapershist <- hist(Scrapers$Length, main= "Scraper Lengths from Pine Ridge Cave and Willow Flats Site",xlab = "Scraper Length") #Create histogram. You can also run the object to find the breakdown of the histogram (i.e the breaks for the bins)

Scarperhist5 <- hist(Scrapers$Length, main= "Scraper Lengths 5 breaks",xlab = "Scraper Length", breaks= 5)

Scrapershist2 <- hist(Scrapers$Length, main= "Scraper Lengths 2 breaks",xlab = "Scraper Length", breaks= 2)

Scrapershit20 <- hist(Scrapers$Length,main= "Scraper Lengths 20 breaks",xlab = "Scraper Length", breaks= 20)

As the number of breaks changes, the data distribution becomes more or less obvious. For example, R creates a histogram with 10 breaks. The data shows a clear outlier. By changing the breaks to 5, the data becomes more uniform, but the importance of the outlier starts fading, as the gap between the outlier and the rest of the data closes. With a break of 2, the outlier is completely lost, and the data becomes hard to read statistically. As Drennan mentions, the ideal number of breaks is usually between 20 and 30. I creates a histogram with 20 breaks and it becomes easy to see the multiple peaks and the clear outlier.
