Introductory Computer Science Term Project

Term Project:

Due Date: April 18th 2025

You will use the public PlantGrowth data set to create a document, website or power point presentation using the Quatro package. Quatro is an open-sourced scientific and technical publishing system. It allows you to create dynamic documents, reports, presentations, and websites from plain text files that contain code, text, and visualizations.

The objective of this project is to use simple calculations, functions, and visualizations to describe data vectors and arrays. You will create a quatro file presenting the code you used and the output you received. Please format as a scientific report. A grading rubric is attached below.

Case Study: Analyzing the PlantGrowth data set

Step 1: Simple Calculations

First, let’s perform some simple calculations. We will calculate the mean and median of a set of numbers.

# Simple calculations

numbers <- c(10, 20, 30, 40, 50)

mean_value <- mean(numbers)

median_value <- median(numbers)

mean_value
[1] 30
median_value
[1] 30

Step 2: Working with Vectors

We will now load the palmerpenguins dataset and create a vector of penguin flipper lengths.

# Load the palmerpenguins package
library(palmerpenguins)

# Extract penguin data
data("penguins", package = "palmerpenguins")

# Create a vector of flipper lengths
flipper_lengths <- penguins$flipper_length_mm

# Remove NA values
flipper_lengths <- flipper_lengths[!is.na(flipper_lengths)]

# Calculate the mean and standard deviation of flipper lengths
mean_flipper_length <- mean(flipper_lengths)
sd_flipper_length <- sd(flipper_lengths)

mean_flipper_length
[1] 200.9152
sd_flipper_length
[1] 14.06171

Step 3: Creating and Using Functions

Next, we will create a function to calculate the sum of squares of a vector.

# Function to calculate culmen area
calculate_culmen_area <- function(length, depth) {
  return(length * depth)
}

# Remove rows with NA values in culmen_length_mm or culmen_depth_mm
penguins_clean <- penguins[!is.na(penguins$bill_length_mm) & !is.na(penguins$bill_depth_mm), ]

# Calculate culmen area for each penguin
culmen_areas <- calculate_culmen_area(penguins_clean$bill_length_mm, penguins_clean$bill_depth_mm)

# Add culmen area to the cleaned penguins dataset
penguins_clean$culmen_area <- culmen_areas

# View the first few rows of the updated dataset
head(penguins_clean)
# A tibble: 6 × 9
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
4 Adelie  Torgersen           36.7          19.3               193        3450
5 Adelie  Torgersen           39.3          20.6               190        3650
6 Adelie  Torgersen           38.9          17.8               181        3625
# ℹ 3 more variables: sex <fct>, year <int>, culmen_area <dbl>

Step 4: Creating Visualizations with ggplot2

Finally, we will create a bar plot of the populations of some Canadian wildlife species using ggplot2

# Load ggplot2 library
library(ggplot2)

# Create a scatter plot
ggplot(penguins_clean, aes(x = flipper_length_mm, y = culmen_area)) +
  geom_point(color = "blue") +
  geom_smooth(method = "lm", color = "red") +
  labs(title = "Culmen Area vs Flipper Length", x = "Flipper Length (mm)", y = "Culmen Area (mm^2)") +
  theme_minimal()
`geom_smooth()` using formula = 'y ~ x'

Conclusion

In this report we used basic calculations, functions, and visualizations to analyze palmer penguin data.

References

Adélie penguins: Palmer Station Antarctica LTER and K. Gorman. 2020. Structural size measurements and isotopic signatures of foraging among adult male and female Adélie penguins (Pygoscelis adeliae) nesting along the Palmer Archipelago near Palmer Station, 2007-2009 ver 5. Environmental Data Initiative. doi: 10.6073/pasta/98b16d7d563f265cb52372c8ca99e60f

Gentoo penguins: Palmer Station Antarctica LTER and K. Gorman. 2020. Structural size measurements and isotopic signatures of foraging among adult male and female Gentoo penguin (Pygoscelis papua) nesting along the Palmer Archipelago near Palmer Station, 2007-2009 ver 5. Environmental Data Initiative. doi: 10.6073/pasta/7fca67fb28d56ee2ffa3d9370ebda689

Chinstrap penguins: Palmer Station Antarctica LTER and K. Gorman. 2020. Structural size measurements and isotopic signatures of foraging among adult male and female Chinstrap penguin (Pygoscelis antarcticus) nesting along the Palmer Archipelago near Palmer Station, 2007-2009 ver 6. Environmental Data Initiative. doi: 10.6073/pasta/c14dfcfada8ea13a17536e73eb6fbe9e

Originally published in: Gorman KB, Williams TD, Fraser WR (2014) Ecological Sexual Dimorphism and Environmental Variability within a Community of Antarctic Penguins (Genus Pygoscelis). PLoS ONE 9(3): e90081. doi:10.1371/journal.pone.0090081

Grading Rubric for CPSC 111 Term Project

Total Score: ____/40
Criteria Excellent Points
Introduction Clearly introduces the project and its objectives. 5
Simple Calculations Correctly performs calculations and explains the results clearly. Includes correct code and output. 5
Creating and Using Functions Correctly creates and uses functions with clear explanation. Includes correct code and output. 5
Working with Vectors and Arrays Correctly manipulates vectors and arrays and explains the results clearly. Includes correct code and output. 5
Visualizations in ggplot2 Creates clear and accurate visualizations with ggplot2 and explains them. Includes correct code and output. 5
Conclusion and References Clearly summarizes the project and its findings. 5
Document Formatting and Presentation Well-organized, clear, and free of grammatical errors. 5
Use of Quatro Successful document creation using the quatro package. 5