Hello Penguins!

Note: The above is called a YAML header. It is optional, but titles the document you are working on.

Data

For this analysis, we will be using the “penguins” dataset from the palmerpenguins.

Note - the below is called a “code chunk” and is identified with an {r}. Hashtags (#) allow you to put text within the code that will not execute.

Code

#|label: load-packages
#|include: false

library (tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Code

library (palmerpenguins)

Note: In the next example, using “echo: false” means that the code will not be shown in the output.

The code for this is:

{r} #| label: load-packages #| echo: false

library(ggplot2)

In the following example, echo is set to true which means the code will be shown:

Code

ggplot(mpg, aes(x = hwy, y = cty, color = cyl)) +
  geom_point(alpha = 0.5, size = 2) +
  scale_color_viridis_c() +
  theme_minimal()

In the chunk below we play with some statistical numbers to create a list of numbers and calculate their mean.

Code

c1_1 <- c(8.3, 8.3, 8.2, 8.1, 8.2, 8.2, 8.2, 8.1, 7.8, 7.9, 7.8, 7.8)

c1_1

 [1] 8.3 8.3 8.2 8.1 8.2 8.2 8.2 8.1 7.8 7.9 7.8 7.8

Code

mean(c1_1)

[1] 8.075

There are several other options available for cell output, which include:

-warning: often occurring with package loading messages

-include: prevents output (code or results) from being included in the output

-error: prevents errors in code execution from halting rendering of the document

You can also do something called “code folding” which allows users to view the code at their discretion.

This is represented by the following:

title: “Quarto Computations” format: html: code-fold: true

There is something called “code-link” which is supposed to include hyperlinks to online functions within the code; however, I was unable to successfully implement this code for some reason.

Now we will change the way the scatterplot above looks by using fig-width and fig-height, as seen in the code.

#| label: fig-scatterplot

#| fig-cap: “City and highway mileage for 38 popular models of cars.”

#| fig-alt: “Scatterplot of city vs. highway mileage for cars, where points are colored by the number of cylinders. The plot displays a positive, linear, and strong relationship between city and highway mileage, and mileage increases as the number of cylinders decreases.”

#| fig-width: 6

#| fig-height: 3.5

Code

ggplot(mpg, aes(x = hwy, y = cty, color = cyl)) +
  geom_point(alpha = 0.5, size = 2) +
  scale_color_viridis_c() +
  theme_minimal()

ggplot(mpg, aes(x = hwy, y = cty, color = displ)) +
  geom_point(alpha = 0.5, size = 2) +
  scale_color_viridis_c(option = "E") +
  theme_minimal()

Week 3 Data - 16 October 2024:

Code

library(tidyverse)

Today we are working with the diamonds dataset.

Code

view(diamonds) #this allows us to view the table of the diamonds dataset 
str(diamonds) #this allows us to view the structure of the diamonds dataset. The "structure" is the type of data being used; for example, numerical, letter characters, boolean (True or False), categorical,etc.

tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
 $ carat  : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
 $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
 $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
 $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
 $ depth  : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
 $ table  : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
 $ price  : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
 $ x      : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
 $ y      : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
 $ z      : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...

Code

#For fun, let's see if we can see the structure of the palmerpenguins dataset. 
library(palmerpenguins)
view(penguins)
ggplot(penguins, 
       aes(x = flipper_length_mm, y = bill_length_mm)) +
  geom_point(aes(color = species, shape = species)) +
  scale_color_manual(values = c("darkorange","purple","cyan4")) +
  labs(
    title = "Flipper and bill length",
    subtitle = "Dimensions for penguins at Palmer Station LTER",
    x = "Flipper length (mm)", y = "Bill length (mm)",
    color = "Penguin species", shape = "Penguin species"
  ) +
  theme_minimal()

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

Code

str(penguins) #After we created the plot from the data, we were able to see the structure of the data.

tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
 $ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ bill_length_mm   : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
 $ bill_depth_mm    : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
 $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
 $ body_mass_g      : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
 $ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
 $ year             : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...

AND IT WORKED!!!

Code

data("penguins")
penguins %>%
  select (1:5)

# A tibble: 344 × 5
   species island    bill_length_mm bill_depth_mm flipper_length_mm
   <fct>   <fct>              <dbl>         <dbl>             <int>
 1 Adelie  Torgersen           39.1          18.7               181
 2 Adelie  Torgersen           39.5          17.4               186
 3 Adelie  Torgersen           40.3          18                 195
 4 Adelie  Torgersen           NA            NA                  NA
 5 Adelie  Torgersen           36.7          19.3               193
 6 Adelie  Torgersen           39.3          20.6               190
 7 Adelie  Torgersen           38.9          17.8               181
 8 Adelie  Torgersen           39.2          19.6               195
 9 Adelie  Torgersen           34.1          18.1               193
10 Adelie  Torgersen           42            20.2               190
# ℹ 334 more rows

Code

penguins %>%
  group_by(species) %>% 
  ggplot(aes(x=bill_length_mm, color=species, fill=species))+
  geom_histogram()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_bin()`).

Code

data ("penguins")
penguins %>%
  group_by (species) %>%
  ggplot(aes(x=species, 
             y=bill_length_mm, 
             color=species, 
             fill=species))+
  geom_boxplot(alpha=0.5)+
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=12))

Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_boxplot()`).

Let’s look at species:

Code

penguins %>% 
  ggplot(aes(x=species,
             color=species, 
             fill=species))+
  geom_bar(alpha=0.5)+
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=12))

And now observations per year.

Code

penguins %>% 
  ggplot(aes(x=year,
             color=species, 
             fill=species))+
  geom_bar()+
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=12))

Per Island:

Code

penguins %>% 
  ggplot(aes(x=island,
             color=species, 
             fill=species))+
  geom_bar()+
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=12))

Correlations:

Code

penguins %>%
ggplot(aes(x=bill_length_mm, 
             y = bill_depth_mm))+
  geom_point()+
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=12))

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

Correlations per species:

Code

penguins %>% 
  ggplot(aes(x=bill_length_mm, 
             y = bill_depth_mm,
             color=species, 
             fill=species))+
  geom_point()+
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=12))

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

Body mass per sex:

Code

penguins %>% 
  na.omit() %>% 
  ggplot(aes(x=sex, 
             y = body_mass_g,
             color=species, 
             fill=species))+
  geom_boxplot(alpha=0.7)+
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=12))

Body mass per sex:

Code

penguins %>% 
  na.omit() %>% 
  ggplot(aes(x=species, 
             y = body_mass_g,
             color=sex, 
             fill=sex))+
  geom_boxplot(alpha=0.7)+
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=12))

Check distributions:

Code

penguins %>% 
  na.omit() %>% 
  pivot_longer(bill_length_mm:body_mass_g, names_to = "trait") %>% 
  ggplot(aes(x=value,
         group=species,
         fill=species,
         color=species))+
  geom_density(alpha=0.7)+
  facet_grid(~trait, scales = "free_x" )+
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=12))+
  theme_minimal()