Hello Penguins!

Note: The above is called a YAML header. It is optional, but titles the document you are working on.

Data

For this analysis, we will be using the “penguins” dataset from the palmerpenguins.

Note - the below is called a “code chunk” and is identified with an {r}. Hashtags (#) allow you to put text within the code that will not execute.

Code
#|label: load-packages
#|include: false

library (tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
library (palmerpenguins)

Note: In the next example, using “echo: false” means that the code will not be shown in the output.

The code for this is:

{r} #| label: load-packages #| echo: false

library(ggplot2)

In the following example, echo is set to true which means the code will be shown:

Code
ggplot(mpg, aes(x = hwy, y = cty, color = cyl)) +
  geom_point(alpha = 0.5, size = 2) +
  scale_color_viridis_c() +
  theme_minimal()

In the chunk below we play with some statistical numbers to create a list of numbers and calculate their mean.

Code
c1_1 <- c(8.3, 8.3, 8.2, 8.1, 8.2, 8.2, 8.2, 8.1, 7.8, 7.9, 7.8, 7.8)

c1_1
 [1] 8.3 8.3 8.2 8.1 8.2 8.2 8.2 8.1 7.8 7.9 7.8 7.8
Code
mean(c1_1)
[1] 8.075

There are several other options available for cell output, which include:

-warning: often occurring with package loading messages

-include: prevents output (code or results) from being included in the output

-error: prevents errors in code execution from halting rendering of the document

You can also do something called “code folding” which allows users to view the code at their discretion.

This is represented by the following:

title: “Quarto Computations” format: html: code-fold: true

There is something called “code-link” which is supposed to include hyperlinks to online functions within the code; however, I was unable to successfully implement this code for some reason.

Now we will change the way the scatterplot above looks by using fig-width and fig-height, as seen in the code.

#| label: fig-scatterplot

#| fig-cap: “City and highway mileage for 38 popular models of cars.”

#| fig-alt: “Scatterplot of city vs. highway mileage for cars, where points are colored by the number of cylinders. The plot displays a positive, linear, and strong relationship between city and highway mileage, and mileage increases as the number of cylinders decreases.”

#| fig-width: 6

#| fig-height: 3.5

Code
ggplot(mpg, aes(x = hwy, y = cty, color = cyl)) +
  geom_point(alpha = 0.5, size = 2) +
  scale_color_viridis_c() +
  theme_minimal()

ggplot(mpg, aes(x = hwy, y = cty, color = displ)) +
  geom_point(alpha = 0.5, size = 2) +
  scale_color_viridis_c(option = "E") +
  theme_minimal()
(a) Color by number of cylinders
(b) Color by engine displacement, in liters
Figure 1: City and highway mileage for 38 popular models of cars.

Week 3 Data - 16 October 2024:

Code
library(tidyverse)

Today we are working with the diamonds dataset.

Code
view(diamonds) #this allows us to view the table of the diamonds dataset 
str(diamonds) #this allows us to view the structure of the diamonds dataset. The "structure" is the type of data being used; for example, numerical, letter characters, boolean (True or False), categorical,etc. 
tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
 $ carat  : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
 $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
 $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
 $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
 $ depth  : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
 $ table  : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
 $ price  : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
 $ x      : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
 $ y      : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
 $ z      : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
Code
#For fun, let's see if we can see the structure of the palmerpenguins dataset. 
library(palmerpenguins)
view(penguins)
ggplot(penguins, 
       aes(x = flipper_length_mm, y = bill_length_mm)) +
  geom_point(aes(color = species, shape = species)) +
  scale_color_manual(values = c("darkorange","purple","cyan4")) +
  labs(
    title = "Flipper and bill length",
    subtitle = "Dimensions for penguins at Palmer Station LTER",
    x = "Flipper length (mm)", y = "Bill length (mm)",
    color = "Penguin species", shape = "Penguin species"
  ) +
  theme_minimal()
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

Code
str(penguins) #After we created the plot from the data, we were able to see the structure of the data.
tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
 $ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ bill_length_mm   : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
 $ bill_depth_mm    : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
 $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
 $ body_mass_g      : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
 $ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
 $ year             : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...

AND IT WORKED!!!

Code
data("penguins")
penguins %>%
  select (1:5)
# A tibble: 344 × 5
   species island    bill_length_mm bill_depth_mm flipper_length_mm
   <fct>   <fct>              <dbl>         <dbl>             <int>
 1 Adelie  Torgersen           39.1          18.7               181
 2 Adelie  Torgersen           39.5          17.4               186
 3 Adelie  Torgersen           40.3          18                 195
 4 Adelie  Torgersen           NA            NA                  NA
 5 Adelie  Torgersen           36.7          19.3               193
 6 Adelie  Torgersen           39.3          20.6               190
 7 Adelie  Torgersen           38.9          17.8               181
 8 Adelie  Torgersen           39.2          19.6               195
 9 Adelie  Torgersen           34.1          18.1               193
10 Adelie  Torgersen           42            20.2               190
# ℹ 334 more rows
Code
penguins %>%
  group_by(species) %>% 
  ggplot(aes(x=bill_length_mm, color=species, fill=species))+
  geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_bin()`).

Code
data ("penguins")
penguins %>%
  group_by (species) %>%
  ggplot(aes(x=species, 
             y=bill_length_mm, 
             color=species, 
             fill=species))+
  geom_boxplot(alpha=0.5)+
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=12))
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_boxplot()`).

Let’s look at species:

Code
penguins %>% 
  ggplot(aes(x=species,
             color=species, 
             fill=species))+
  geom_bar(alpha=0.5)+
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=12))

And now observations per year.

Code
penguins %>% 
  ggplot(aes(x=year,
             color=species, 
             fill=species))+
  geom_bar()+
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=12))

Per Island:

Code
penguins %>% 
  ggplot(aes(x=island,
             color=species, 
             fill=species))+
  geom_bar()+
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=12))

Correlations:

Code
penguins %>%
ggplot(aes(x=bill_length_mm, 
             y = bill_depth_mm))+
  geom_point()+
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=12))
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

Correlations per species:

Code
penguins %>% 
  ggplot(aes(x=bill_length_mm, 
             y = bill_depth_mm,
             color=species, 
             fill=species))+
  geom_point()+
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=12))
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

Body mass per sex:

Code
penguins %>% 
  na.omit() %>% 
  ggplot(aes(x=sex, 
             y = body_mass_g,
             color=species, 
             fill=species))+
  geom_boxplot(alpha=0.7)+
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=12))

Body mass per sex:

Code
penguins %>% 
  na.omit() %>% 
  ggplot(aes(x=species, 
             y = body_mass_g,
             color=sex, 
             fill=sex))+
  geom_boxplot(alpha=0.7)+
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=12))

Check distributions:

Code
penguins %>% 
  na.omit() %>% 
  pivot_longer(bill_length_mm:body_mass_g, names_to = "trait") %>% 
  ggplot(aes(x=value,
         group=species,
         fill=species,
         color=species))+
  geom_density(alpha=0.7)+
  facet_grid(~trait, scales = "free_x" )+
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=12))+
  theme_minimal()