If you are reading this, then you successfully downloaded, saved, and opened the RMarkdown file! Nice Job!
Please complete the following exercises by typing your code under each question. When finished, click Knit and include your name in the filename. Upload the file you saved back to Canvas as your submission.
As you write your code in the code chunks below, you can click run or use ctrl + enter to run/test your code as you go. When you are finished, click Knit to create the final document for submission. If you are getting errors that you cannot figure out, ask for help!
We are going to use the runif() function to complete some tasks. Using the code chunk below, open the help documentation for the function named runif (hint: use the ? symbol). Underneath the code chunk, explain what the function does.
?runif
#The runif() function generates random deviates of the uniform distribution on the interval from min to max.
Draw 20 random numbers (n = 20) from the Uniform distribution with the arguments min = 0 and max = 10.
runif(n = 20, min = 0, max = 10)
## [1] 6.2065641 8.1450313 0.0313013 6.2715751 1.1353400 4.4485919 5.5859726
## [8] 8.7550860 8.0529153 6.3076408 6.1655133 4.6373875 0.8571545 2.1636679
## [15] 7.6820102 3.4722197 5.5509027 7.4466877 2.4857735 0.3272843
Repeat question 2, except this time store your 20 numbers in a variable called: numbers.
runif(n = 20, min = 0, max = 10)
## [1] 3.6192673 1.1675209 4.9118029 8.9858831 3.9628462 0.4439872 5.0197602
## [8] 1.9010373 1.5679335 3.8267468 8.5342930 3.0782721 0.7986062 2.4136563
## [15] 5.2585676 1.9310907 0.0101714 9.7165511 2.3349594 9.3111983
numbers <- runif(n = 20, min = 0, max = 10)
Use what you know or use Google to complete the following on your numbers variable: - Find the mean of the 20 numbers, find the median of the 20 numbers - Find the minimum value in your set, and the maximum value in your set.
mean(numbers)
## [1] 4.657741
median(numbers)
## [1] 3.89651
?min
min(numbers)
## [1] 0.0106498
max(numbers)
## [1] 9.443054
Try to use the function called summary() on your numbers variable. What does it do?
?summary
summary(numbers)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01065 2.74615 3.89651 4.65774 7.56674 9.44305
#The function provides summaries of the results from the runif() function. It provides information on the minimum, maximum, mean, median, 1st quartile, and 3rd quartile of the runif() function.
Run the following code: numbers < 10 What did this do? What is the result telling you?
numbers < 10
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE
#The code "numbers < 10" is testing every value in the runif() function I ran earlier against this statement and assessing if it is deemed "TRUE" or "FALSE". The results show that every value in the function is less than 10, which makes sense given that the maximum was set to 10.
Instead, try running the following: numbers < 5 What did this do? What is the result telling you?
numbers < 5
## [1] TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
## [13] FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
#Akin to earlier, the code "numbers < 5" tests every value in the runif() function I ran earlier against this statement and assessing if it is deemed "TRUE" or "FALSE". The results show that half of the values in the function is less than 5, which is denoted by "TRUE". As stated previously, this makes sense given that the maximum was set to 10.
The data type “logical” is equivalent to the binary 0 (FALSE) and 1 (TRUE). Try to take the sum() of your result from step 7, what is the result? What does it mean?
seven <- numbers < 5
sum(seven)
## [1] 11
#The result is 10. Since 0 is assigned to FALSE and 1 is assigned to TRUE, the result of sum(seven) indicates that there are 10 TRUE statements in the runif() function I ran earlier. There are also 20 values in the function, showing that there are also 10 FALSE statements.
Install the package named palmerpenguins by using the install.packages() function. Alternatively, use Tools -> Install Packages. Then, load the package by running the following code: library(palmerpenguins). Open the help documentation for the dataset called penguins by using the ? symbol. Run the summary function on the dataset called penguins.
# install.packages("palmerpenguins")
library(palmerpenguins)
?penguins
summary(penguins)
## species island bill_length_mm bill_depth_mm
## Adelie :152 Biscoe :168 Min. :32.10 Min. :13.10
## Chinstrap: 68 Dream :124 1st Qu.:39.23 1st Qu.:15.60
## Gentoo :124 Torgersen: 52 Median :44.45 Median :17.30
## Mean :43.92 Mean :17.15
## 3rd Qu.:48.50 3rd Qu.:18.70
## Max. :59.60 Max. :21.50
## NA's :2 NA's :2
## flipper_length_mm body_mass_g sex year
## Min. :172.0 Min. :2700 female:165 Min. :2007
## 1st Qu.:190.0 1st Qu.:3550 male :168 1st Qu.:2007
## Median :197.0 Median :4050 NA's : 11 Median :2008
## Mean :200.9 Mean :4202 Mean :2008
## 3rd Qu.:213.0 3rd Qu.:4750 3rd Qu.:2009
## Max. :231.0 Max. :6300 Max. :2009
## NA's :2 NA's :2
Install the package named tidyverse. This package will be useful throughout the course for visualizations and data manipulation. Removing the “comments” (# symbols) from the following code (check the “Code” menu, next to File and Edit; highlight your code and then click comment/uncomment lines) to get a sneak preview at some of what we will learn this semester. Once you see the graph, answer the following question: Are we able to differentiate penguin species by their body mass and flipper length?
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
ggplot(data = penguins,
aes(x = flipper_length_mm,
y = body_mass_g)) +
geom_point(aes(color = island,
shape = species),
size = 3,
alpha = 0.8) +
scale_color_manual(values = c("darkorange","purple","cyan4")) +
labs(title = "Penguin size, Palmer Station LTER",
subtitle = "Flipper length and body mass for each island",
x = "Flipper length (mm)",
y = "Body mass (g)",
color = "Penguin island",
shape = "Penguin species") +
theme_minimal()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
# Based on the code above, it appears that we are able to distnguish Gentoo penguins apart from other penguin species, as they tend to have longer flippers and a higher body mass. However, Adelie and Chinstrap penguins have more overlap regarding their body mass and flipper lengths, making it more difficult to distinguish the two species from each other. The two variables includes are not enough to differentiate them, indicating another variable is needed.