Assignment 2B

Author

Michael Mayne

Data at a Glance

 library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.0     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Penguins_Raw <- read_csv("https://raw.githubusercontent.com/acatlin/data/refs/heads/master/penguin_predictions.csv")
Rows: 93 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): .pred_class, sex
dbl (1): .pred_female

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Penguins_Raw%>%
   count(.pred_class, sort = TRUE, name = "Predictions")
# A tibble: 2 × 2
  .pred_class Predictions
  <chr>             <int>
1 male                 54
2 female               39

Pre-Coding Approach

For Assignment 2B, I am essentially asked to outline the rate of error for the penguins sex predictions assuming we are to consistently assume male (since it is the value most commonly suggested by the data after a cursory look.. Then build confusion matrices with fixed values. I will first calculate the null rate by filtering and counting the number times the actual sex is female. Then taking that result over 100. As for the confusion matrix problems , I intend to solve them manually for simplicity. I will count via filter as before make a table labeling the values in which the:

Actual sex and predictions are male(TP).

Prediction is male but true sex is female (FP)

Prediction is female but true sex is male (FN)

Prediction is female and true sex is female (TN).

-End of Report