Assignment 1 – Loading Data into a Data Frame

Author

Shawn Ganz

Introduction

For this assignment I chose to use a csv from NYC OpenData called 2018 Central Park Squirrel Census - Squirrel Data provided by “The Squirrel Census”.

Approach

Since the assignment is to transform this dataframe, I want to drop a couple of columns. Below is a list of all the columns:

df <- read.csv("https://raw.githubusercontent.com/Siganz/data_607_week_1/refs/heads/main/data/2018_Central_Park_Squirrel_Census_Squirrel_Data_20260126.csv")
colnames(df)

 [1] "X"                                         
 [2] "Y"                                         
 [3] "Unique.Squirrel.ID"                        
 [4] "Hectare"                                   
 [5] "Shift"                                     
 [6] "Date"                                      
 [7] "Hectare.Squirrel.Number"                   
 [8] "Age"                                       
 [9] "Primary.Fur.Color"                         
[10] "Highlight.Fur.Color"                       
[11] "Combination.of.Primary.and.Highlight.Color"
[12] "Color.notes"                               
[13] "Location"                                  
[14] "Above.Ground.Sighter.Measurement"          
[15] "Specific.Location"                         
[16] "Running"                                   
[17] "Chasing"                                   
[18] "Climbing"                                  
[19] "Eating"                                    
[20] "Foraging"                                  
[21] "Other.Activities"                          
[22] "Kuks"                                      
[23] "Quaas"                                     
[24] "Moans"                                     
[25] "Tail.flags"                                
[26] "Tail.twitches"                             
[27] "Approaches"                                
[28] "Indifferent"                               
[29] "Runs.from"                                 
[30] "Other.Interactions"                        
[31] "Lat.Long"

I want to create a dataframe with only these columns:

colnames(df)[c(3,9:11,16:20,1,2,31)]

 [1] "Unique.Squirrel.ID"                        
 [2] "Primary.Fur.Color"                         
 [3] "Highlight.Fur.Color"                       
 [4] "Combination.of.Primary.and.Highlight.Color"
 [5] "Running"                                   
 [6] "Chasing"                                   
 [7] "Climbing"                                  
 [8] "Eating"                                    
 [9] "Foraging"                                  
[10] "X"                                         
[11] "Y"                                         
[12] "Lat.Long"

Afterwards I want to create the following:

A binary “Active” squirrel column using the “Running,” “Chasing,” “Climbing,” “Eating,” and “Foraging” columns.
Convert the “Above Ground Sighter Measurement”, “x”, “y” columns to numeric (INT/FLOAT) values only.

The motivation to use this dataset is simple, I just chose the first interesting popular dataset I found on NYC OpenData. This encourages an exploratory approach, which might be useful when learning new skills.

Code-base

Setup

read the data
check data types

library(tidyverse, ggplot2)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

# Original Data
url <- "https://raw.githubusercontent.com/Siganz/data_607_week_1/refs/heads/main/data/2018_Central_Park_Squirrel_Census_Squirrel_Data_20260126.csv"

# Read Data
df <- read.csv(url, stringsAsFactors = FALSE)

# Optional, view first row (there's a lot of fields so it looks)
df[1,]

          X        Y Unique.Squirrel.ID Hectare Shift     Date
1 -73.95613 40.79408     37F-PM-1014-03     37F    PM 10142018
  Hectare.Squirrel.Number Age Primary.Fur.Color Highlight.Fur.Color
1                       3                                          
  Combination.of.Primary.and.Highlight.Color Color.notes Location
1                                          +                     
  Above.Ground.Sighter.Measurement Specific.Location Running Chasing Climbing
1                                                      false   false    false
  Eating Foraging Other.Activities  Kuks Quaas Moans Tail.flags Tail.twitches
1  false    false                  false false false      false         false
  Approaches Indifferent Runs.from Other.Interactions
1      false       false     false                   
                                    Lat.Long
1 POINT (-73.9561344937861 40.7940823884086)

# Optional, view the field types
str(df)

'data.frame':   3023 obs. of  31 variables:
 $ X                                         : num  -74 -74 -74 -74 -74 ...
 $ Y                                         : num  40.8 40.8 40.8 40.8 40.8 ...
 $ Unique.Squirrel.ID                        : chr  "37F-PM-1014-03" "21B-AM-1019-04" "11B-PM-1014-08" "32E-PM-1017-14" ...
 $ Hectare                                   : chr  "37F" "21B" "11B" "32E" ...
 $ Shift                                     : chr  "PM" "AM" "PM" "PM" ...
 $ Date                                      : int  10142018 10192018 10142018 10172018 10172018 10102018 10102018 10082018 10062018 10102018 ...
 $ Hectare.Squirrel.Number                   : int  3 4 8 14 5 3 2 2 1 3 ...
 $ Age                                       : chr  "" "" "" "Adult" ...
 $ Primary.Fur.Color                         : chr  "" "" "Gray" "Gray" ...
 $ Highlight.Fur.Color                       : chr  "" "" "" "" ...
 $ Combination.of.Primary.and.Highlight.Color: chr  "+" "+" "Gray+" "Gray+" ...
 $ Color.notes                               : chr  "" "" "" "Nothing selected as Primary. Gray selected as Highlights. Made executive adjustments." ...
 $ Location                                  : chr  "" "" "Above Ground" "" ...
 $ Above.Ground.Sighter.Measurement          : chr  "" "" "10" "" ...
 $ Specific.Location                         : chr  "" "" "" "" ...
 $ Running                                   : chr  "false" "false" "false" "false" ...
 $ Chasing                                   : chr  "false" "false" "true" "false" ...
 $ Climbing                                  : chr  "false" "false" "false" "false" ...
 $ Eating                                    : chr  "false" "false" "false" "true" ...
 $ Foraging                                  : chr  "false" "false" "false" "true" ...
 $ Other.Activities                          : chr  "" "" "" "" ...
 $ Kuks                                      : chr  "false" "false" "false" "false" ...
 $ Quaas                                     : chr  "false" "false" "false" "false" ...
 $ Moans                                     : chr  "false" "false" "false" "false" ...
 $ Tail.flags                                : chr  "false" "false" "false" "false" ...
 $ Tail.twitches                             : chr  "false" "false" "false" "false" ...
 $ Approaches                                : chr  "false" "false" "false" "false" ...
 $ Indifferent                               : chr  "false" "false" "false" "false" ...
 $ Runs.from                                 : chr  "false" "false" "false" "true" ...
 $ Other.Interactions                        : chr  "" "" "" "" ...
 $ Lat.Long                                  : chr  "POINT (-73.9561344937861 40.7940823884086)" "POINT (-73.9688574691102 40.7837825208444)" "POINT (-73.97428114848522 40.775533619083)" "POINT (-73.9596413903948 40.7903128889029)" ...

Interesting that instead of using int/bool they used character. I would like to change that, while also creating vectors for the columns with different names.

# cols vector for field selection
# removed Lat.Long because you can get point file from x/y alone.

cols <- c(
  "Unique.Squirrel.ID",
  "Primary.Fur.Color",
  "Highlight.Fur.Color",
  "Combination.of.Primary.and.Highlight.Color",
  "Running",
  "Chasing",
  "Climbing",
  "Eating",
  "Foraging",
  "Above.Ground.Sighter.Measurement",
  "X",
  "Y"
)

# Copy
df2 <- df[ , cols]

# Check matrix
str(df2)

'data.frame':   3023 obs. of  12 variables:
 $ Unique.Squirrel.ID                        : chr  "37F-PM-1014-03" "21B-AM-1019-04" "11B-PM-1014-08" "32E-PM-1017-14" ...
 $ Primary.Fur.Color                         : chr  "" "" "Gray" "Gray" ...
 $ Highlight.Fur.Color                       : chr  "" "" "" "" ...
 $ Combination.of.Primary.and.Highlight.Color: chr  "+" "+" "Gray+" "Gray+" ...
 $ Running                                   : chr  "false" "false" "false" "false" ...
 $ Chasing                                   : chr  "false" "false" "true" "false" ...
 $ Climbing                                  : chr  "false" "false" "false" "false" ...
 $ Eating                                    : chr  "false" "false" "false" "true" ...
 $ Foraging                                  : chr  "false" "false" "false" "true" ...
 $ Above.Ground.Sighter.Measurement          : chr  "" "" "10" "" ...
 $ X                                         : num  -74 -74 -74 -74 -74 ...
 $ Y                                         : num  40.8 40.8 40.8 40.8 40.8 ...

df2[1:5,]

  Unique.Squirrel.ID Primary.Fur.Color Highlight.Fur.Color
1     37F-PM-1014-03                                      
2     21B-AM-1019-04                                      
3     11B-PM-1014-08              Gray                    
4     32E-PM-1017-14              Gray                    
5     13E-AM-1017-05              Gray            Cinnamon
  Combination.of.Primary.and.Highlight.Color Running Chasing Climbing Eating
1                                          +   false   false    false  false
2                                          +   false   false    false  false
3                                      Gray+   false    true    false  false
4                                      Gray+   false   false    false   true
5                              Gray+Cinnamon   false   false    false  false
  Foraging Above.Ground.Sighter.Measurement         X        Y
1    false                                  -73.95613 40.79408
2    false                                  -73.96886 40.78378
3    false                               10 -73.97428 40.77553
4     true                                  -73.95964 40.79031
5     true                                  -73.97027 40.77621

# Change column names
colnames(df2) <- c(
  "unique_id",
  "primary_color",
  "highlight_color",
  "combination_color",
  "running",
  "chasing",
  "climbing",
  "eating",
  "foraging",
  "above_ground_measurement",
  "x",
  "y"
)

cols <- c(
  "unique_id",
  "primary_color",
  "highlight_color",
  "combination_color",
  "running",
  "chasing",
  "climbing",
  "eating",
  "foraging",
  "x",
  "y"
)

bool_cols <- c(
  "running",
  "chasing",
  "climbing",
  "eating",
  "foraging"
)

color_cols <- c(
  "primary_color",
  "highlight_color",
  "combination_color"
)

Data viewing & Manipulation

I would now like to view the data and check out the uniques for certain columns and see what I can do with them.

# check names
str(df2)

'data.frame':   3023 obs. of  12 variables:
 $ unique_id               : chr  "37F-PM-1014-03" "21B-AM-1019-04" "11B-PM-1014-08" "32E-PM-1017-14" ...
 $ primary_color           : chr  "" "" "Gray" "Gray" ...
 $ highlight_color         : chr  "" "" "" "" ...
 $ combination_color       : chr  "+" "+" "Gray+" "Gray+" ...
 $ running                 : chr  "false" "false" "false" "false" ...
 $ chasing                 : chr  "false" "false" "true" "false" ...
 $ climbing                : chr  "false" "false" "false" "false" ...
 $ eating                  : chr  "false" "false" "false" "true" ...
 $ foraging                : chr  "false" "false" "false" "true" ...
 $ above_ground_measurement: chr  "" "" "10" "" ...
 $ x                       : num  -74 -74 -74 -74 -74 ...
 $ y                       : num  40.8 40.8 40.8 40.8 40.8 ...

# check unique
for (col in bool_cols) {
  print(col)
  print(unique(df2[[col]]))
}

[1] "running"
[1] "false" "true" 
[1] "chasing"
[1] "false" "true" 
[1] "climbing"
[1] "false" "true" 
[1] "eating"
[1] "false" "true" 
[1] "foraging"
[1] "false" "true"

It seems that most of the columns are already clean, there might be an issue with the lowercase false/true so I’m going to upper those values.

# the columns are clean, will use toupper() on the values 
for (col in bool_cols) {
  df2[[col]] <- toupper(df2[[col]])
}

str(df2)

'data.frame':   3023 obs. of  12 variables:
 $ unique_id               : chr  "37F-PM-1014-03" "21B-AM-1019-04" "11B-PM-1014-08" "32E-PM-1017-14" ...
 $ primary_color           : chr  "" "" "Gray" "Gray" ...
 $ highlight_color         : chr  "" "" "" "" ...
 $ combination_color       : chr  "+" "+" "Gray+" "Gray+" ...
 $ running                 : chr  "FALSE" "FALSE" "FALSE" "FALSE" ...
 $ chasing                 : chr  "FALSE" "FALSE" "TRUE" "FALSE" ...
 $ climbing                : chr  "FALSE" "FALSE" "FALSE" "FALSE" ...
 $ eating                  : chr  "FALSE" "FALSE" "FALSE" "TRUE" ...
 $ foraging                : chr  "FALSE" "FALSE" "FALSE" "TRUE" ...
 $ above_ground_measurement: chr  "" "" "10" "" ...
 $ x                       : num  -74 -74 -74 -74 -74 ...
 $ y                       : num  40.8 40.8 40.8 40.8 40.8 ...

I am using a for loop, this type of iteration is similar in python so it makes it simpler for me to remember. Now I would like to convert bool_cols into logical data types, using another loop.

for (col in bool_cols) {
  df2[[col]] <- as.logical(df2[[col]])
}

str(df2)

'data.frame':   3023 obs. of  12 variables:
 $ unique_id               : chr  "37F-PM-1014-03" "21B-AM-1019-04" "11B-PM-1014-08" "32E-PM-1017-14" ...
 $ primary_color           : chr  "" "" "Gray" "Gray" ...
 $ highlight_color         : chr  "" "" "" "" ...
 $ combination_color       : chr  "+" "+" "Gray+" "Gray+" ...
 $ running                 : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ chasing                 : logi  FALSE FALSE TRUE FALSE FALSE FALSE ...
 $ climbing                : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ eating                  : logi  FALSE FALSE FALSE TRUE FALSE FALSE ...
 $ foraging                : logi  FALSE FALSE FALSE TRUE TRUE TRUE ...
 $ above_ground_measurement: chr  "" "" "10" "" ...
 $ x                       : num  -74 -74 -74 -74 -74 ...
 $ y                       : num  40.8 40.8 40.8 40.8 40.8 ...

Now I will create a new column called activity.

df2$activity <- as.logical(rowSums(df2[bool_cols]) > 0)
str(df2)

'data.frame':   3023 obs. of  13 variables:
 $ unique_id               : chr  "37F-PM-1014-03" "21B-AM-1019-04" "11B-PM-1014-08" "32E-PM-1017-14" ...
 $ primary_color           : chr  "" "" "Gray" "Gray" ...
 $ highlight_color         : chr  "" "" "" "" ...
 $ combination_color       : chr  "+" "+" "Gray+" "Gray+" ...
 $ running                 : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ chasing                 : logi  FALSE FALSE TRUE FALSE FALSE FALSE ...
 $ climbing                : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ eating                  : logi  FALSE FALSE FALSE TRUE FALSE FALSE ...
 $ foraging                : logi  FALSE FALSE FALSE TRUE TRUE TRUE ...
 $ above_ground_measurement: chr  "" "" "10" "" ...
 $ x                       : num  -74 -74 -74 -74 -74 ...
 $ y                       : num  40.8 40.8 40.8 40.8 40.8 ...
 $ activity                : logi  FALSE FALSE TRUE TRUE TRUE TRUE ...

I don’t like that activity column is at the tail, moving it:

df2 <- df2[, c(
  "unique_id",
  "primary_color",
  "running", "chasing", "climbing", "eating", "foraging",
  "activity", "above_ground_measurement",
  "x", "y"
)]

str(df2)

'data.frame':   3023 obs. of  11 variables:
 $ unique_id               : chr  "37F-PM-1014-03" "21B-AM-1019-04" "11B-PM-1014-08" "32E-PM-1017-14" ...
 $ primary_color           : chr  "" "" "Gray" "Gray" ...
 $ running                 : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ chasing                 : logi  FALSE FALSE TRUE FALSE FALSE FALSE ...
 $ climbing                : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ eating                  : logi  FALSE FALSE FALSE TRUE FALSE FALSE ...
 $ foraging                : logi  FALSE FALSE FALSE TRUE TRUE TRUE ...
 $ activity                : logi  FALSE FALSE TRUE TRUE TRUE TRUE ...
 $ above_ground_measurement: chr  "" "" "10" "" ...
 $ x                       : num  -74 -74 -74 -74 -74 ...
 $ y                       : num  40.8 40.8 40.8 40.8 40.8 ...

I don’t like the highlight_color and combination_color columns. I am going to drop them, while also checking for the unique in the remaining color column, primary_color.

df2$highlight_color <- NULL
df2$combination_color <- NULL

# should return three lines, last two should just be NULL
for (col in color_cols){
  print(unique(df2[[col]]))
}

[1] ""         "Gray"     "Cinnamon" "Black"   
NULL
NULL

Now, I want to remove the empty string (““) from being a unique value.

df2$primary_color[df2$primary_color == ""] <- NA

unique(df2$primary_color)

[1] NA         "Gray"     "Cinnamon" "Black"

Checking on uniques for the above_ground_measurement

unique(df2$above_ground_measurement)

 [1] ""      "10"    "FALSE" "30"    "6"     "24"    "8"     "25"    "5"    
[10] "50"    "4"     "3"     "70"    "12"    "2"     "20"    "7"     "13"   
[19] "15"    "28"    "35"    "100"   "1"     "80"    "65"    "40"    "18"   
[28] "17"    "55"    "60"    "180"   "9"     "45"    "0"     "43"    "16"   
[37] "33"    "11"    "23"    "31"    "14"    "19"

Looks like int, except for FALSE and ““. So, we will look for any values %in% those and change them to”0” before converting to numeric. If this was a pipeline, I would build a function to normalize then find any non-numeric values.

df2$above_ground_measurement[df2$above_ground_measurement %in% c("", "FALSE")] <- "0"

df2$above_ground_measurement <- as.numeric(df2$above_ground_measurement)

str(df2$above_ground_measurement)

 num [1:3023] 0 0 10 0 0 0 0 0 0 30 ...

Checking on X/Y to see if they have any non numeric symbols:

bad_rows <- which(grepl("^$|[^0-9.-]", df2$x))
bad_rows

integer(0)

bad_rows <- which(grepl("^$|[^0-9.-]", df2$y))
bad_rows

integer(0)

They aren’t, so let’s just do a simple as.numeric conversion

df2$x <- as.numeric(df2$x)
df2$y <- as.numeric(df2$y)
str(df2)

'data.frame':   3023 obs. of  11 variables:
 $ unique_id               : chr  "37F-PM-1014-03" "21B-AM-1019-04" "11B-PM-1014-08" "32E-PM-1017-14" ...
 $ primary_color           : chr  NA NA "Gray" "Gray" ...
 $ running                 : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ chasing                 : logi  FALSE FALSE TRUE FALSE FALSE FALSE ...
 $ climbing                : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ eating                  : logi  FALSE FALSE FALSE TRUE FALSE FALSE ...
 $ foraging                : logi  FALSE FALSE FALSE TRUE TRUE TRUE ...
 $ activity                : logi  FALSE FALSE TRUE TRUE TRUE TRUE ...
 $ above_ground_measurement: num  0 0 10 0 0 0 0 0 0 30 ...
 $ x                       : num  -74 -74 -74 -74 -74 ...
 $ y                       : num  40.8 40.8 40.8 40.8 40.8 ...

This data looks good to me!

Reporting

Now I would like to plot these using ggplot2, I had an AI generate the following code since I’m not familiar.

Squirrel Behaviors

rates <- colSums(df2[bool_cols]) / nrow(df2)

plot_df <- data.frame(
  behavior = names(rates),
  proportion = as.numeric(rates)
)

ggplot(plot_df, aes(x = behavior, y = proportion)) +
  geom_col() +
  scale_y_continuous(labels = scales::percent) +
  labs(
    title = "Observed Squirrel Behaviors",
    x = "Behavior",
    y = "Percent of Observations"
  ) +
  theme_minimal()

Share of Total Activity by Fur Color

counts <- aggregate(
  activity ~ primary_color,
  data = df2,
  FUN = sum,
  na.action = na.omit
)

counts$percent <- counts$activity / sum(counts$activity)

ggplot(counts, aes(x = primary_color, y = percent, fill = primary_color)) +
  geom_col() +
  scale_y_continuous(labels = scales::percent) +
  labs(
    title = "Share of Total Activity by Fur Color",
    x = "Primary Fur Color",
    y = "Percent of Total Activity"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

Summary Table for All Behaviors

# Create a summary table for all behaviors
do.call(rbind, lapply(bool_cols, function(col) {
  df2 %>%
    group_by(primary_color) %>%
    summarise(
      behavior = col,
      n_false = sum(get(col) == FALSE, na.rm = TRUE),
      n_true = sum(get(col) == TRUE, na.rm = TRUE),
      total = n(),
      .groups = 'drop'
    )
}))

# A tibble: 20 × 5
   primary_color behavior n_false n_true total
   <chr>         <chr>      <int>  <int> <int>
 1 Black         running       77     26   103
 2 Cinnamon      running      289    103   392
 3 Gray          running     1876    597  2473
 4 <NA>          running       51      4    55
 5 Black         chasing       96      7   103
 6 Cinnamon      chasing      362     30   392
 7 Gray          chasing     2235    238  2473
 8 <NA>          chasing       51      4    55
 9 Black         climbing      78     25   103
10 Cinnamon      climbing     310     82   392
11 Gray          climbing    1940    533  2473
12 <NA>          climbing      37     18    55
13 Black         eating        79     24   103
14 Cinnamon      eating       281    111   392
15 Gray          eating      1854    619  2473
16 <NA>          eating        49      6    55
17 Black         foraging      60     43   103
18 Cinnamon      foraging     189    203   392
19 Gray          foraging    1290   1183  2473
20 <NA>          foraging      49      6    55

Behavior Rates by Primary Fur Color

# Calculate activity rates for all behaviors
activity_rates <- do.call(rbind, lapply(bool_cols, function(col) {
  df2 %>%
    filter(!is.na(primary_color)) %>%  # Remove NA colors
    group_by(primary_color) %>%
    summarise(
      behavior = col,
      n_true = sum(get(col) == TRUE, na.rm = TRUE),
      total = n(),
      activity_rate = n_true / total,
      .groups = 'drop'
    )
}))

# Create the bar chart
ggplot(activity_rates, aes(x = primary_color, y = activity_rate, fill = primary_color)) +
  geom_col() +
  facet_wrap(~ behavior, ncol = 3) +
  scale_y_continuous(labels = scales::percent) +
  labs(
    title = "Behavior Rates by Primary Fur Color",
    x = "Primary Fur Color",
    y = "Percent Observed"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

Coming from Python, I like f-strings, so here are some ones I made in R:

[1] "Squirrel climbing stats: 658  /  3023"

[1] "Squirrel eating stats 760  /  3023"

[1] "Squirrel foraging stats 1435  /  3023"

[1] "Squirrel running stats 730  /  3023"

Average Activity Rate by Primary Fur Color This one is pretty funny, I actually spent a good 30 minutes trying to figure out if my data was wrong, like if FALSE was somehow being aggregated, but it just turns out all the squirrels were very active!

rates <- aggregate(
  activity ~ primary_color,
  data = df2,
  FUN = mean,
  na.action = na.omit
)

ggplot(rates, aes(x = primary_color, y = activity, fill = primary_color)) +
  geom_col() +
  scale_y_continuous(labels = scales::percent) +
  labs(
    title = "Average Activity Rate by Primary Fur Color",
    x = "Primary Fur Color",
    y = "Percent of Observations with Activity"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

Code Base Conclusion

I went a little overboard, but I found it really helpful to learn more about r. I enjoy using the qmd since the “{r}” is easy to type and I can always check the output via html or using the console in rstudio. There is obviously a lot that I need to work on (ggplot2, aggregate, tidyverse), however this was a great introduction. I actually enjoy that base r is very intuitive and I feel like I can pick this up rather quickly.

Video

TODO: Video