AnS 4550/5550 Intro to R

R, RStudio, Quarto MarkDown and Reading Data into R

Author

Juan Steibel, Credit: Austin Putz

Published

February 1, 2026

Introduction

In this course we will use the language R for data science:

R (R home page).

R is excellent for reading data, data, and implementing statistical analyses.

R

This programming language is based in an older commercial language called S. It is “Open Source”, which means that all the code needed to build and modify R is available free of charge for non-commercial use.

R also estimulates developing open-source tools. And there is a huge community of R users and developers that continuously contribute new features in this language.

Let’s start with the basics:

Home: https://www.r-project.org/

Download R: https://mirror.las.iastate.edu/CRAN/

Basic introduction to R on YouTube: https://www.youtube.com/watch?v=yZ0bV2Afkjc

Check Version: When R starts up, it prints the version number, please refer to this number when describing your work (e.g. R 4.5.1). A generic citation of R can be obtained with the citation(). Also, be aware of “package versions”

RStudio

It is an Interactive Development Environment (IDE) for R (and other languages). This means that it’s an interface to make working wiht R easy.

A frequent error: To report that “RStudio was used to analyze data”. This is not the case. R is used for data analysis RStudio is just the interfase.

Home: https://posit.co/ (now Posit and not RStudio)

Download: https://posit.co/download/rstudio-desktop/

Basic introduction to RStudio on YouTube: https://www.youtube.com/watch?v=TQMAKGDIe_8

Quarto

Quarto is an open-source, next-generation scientific and technical publishing system designed for creating dynamic, reproducible documents, websites, books, and presentations.

It allows users to weave code elements (in R in this case) with formatted narrative elements to create reports in HTML (browser will open), PDF, or Word.

We prefer HTML formats and we will provide the quarto files (called Quarto Markdown or .qmd) files and expect that you will modify them or create other files and return back .qmd and .html files

Home: https://quarto.org/

Download: https://quarto.org/docs/get-started/

Markdown Basics: https://quarto.org/docs/authoring/markdown-basics.html

Basic introduction to Quarto on YouTube: https://www.youtube.com/watch?v=_f3latmOhew

Quarto Themes

Themes here

Quarto YAML Header

YAML is the header of every Quarto (.qmd) file that tells the program how to format the output. One example would be how to format and place the table of contents for the page.

Quarto Options for HTML

HTML

HTML is the language used to build internet documents (along with some extensions). By Default Quarto compiles everything into HTML.

Knitr options

The act of building a report from a quarto markdown using R and RStudio is called “knitting”.

A first “code chunk” in R should establish clear instructions for knitting:

Setup Code

It is recommended to set up all the utilities needed by a certain R program, at the top of your qmd script. Some of these are:

  • Install all needed packages

  • call/invoke packages

  • specify printing and formatting options

  • prepare paths to working files

In R, we first have to download the packages using a function install.packages() only once. And then call the installed packages every time using the function library().

#====================================================================#
# Setup Options
#====================================================================#

# remove all objects if restarting script
rm(list=ls())

# set tibble width for printing
options(tibble.width = Inf)

# remove scientific notation
options(scipen=999)

#==============================================================================#
# Install Packages / Load Packages
#==============================================================================#

#install.packages("lubridate")
#install.packages("tidyverse")

library(lubridate)

Attaching package: 'lubridate'
The following objects are masked from 'package:base':

    date, intersect, setdiff, union
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr   1.1.4     ✔ readr   2.1.5
✔ forcats 1.0.1     ✔ stringr 1.5.2
✔ ggplot2 4.0.0     ✔ tibble  3.3.0
✔ purrr   1.1.0     ✔ tidyr   1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
#==============================================================================#
# Set paths
#==============================================================================#

# set all paths
path_main    <- "C:/Users/jsteibel/OneDrive/Documents/job/455/"
path_data    <- str_c(path_main, "Data/", sep="")

# str_c() is from the stringr package, which is a part of 'tidyverse'
# this concatenates two pieces of the path. We will use it a lot.


# NOTE: We cannot set the working directory in quarto with this method
# as we do in an R script. Use the root.dir option in YAML. 

#==============================================================================#
# Set Inputs
#==============================================================================#

# data file name
data_file <- "Production.csv"
data_file_raw <- "Production_raw.csv"

TIP: Load packages in order of least important to most important (i.e. load tidyverse last) so that the most commonly used functions are most accessible.

Data

Dr. Gustavo Silva provided all data.

Tips for reading in data:

  • Avoid opeining a CSV file with Excel!
  • Open CSV and other ‘flat’ files with barebones text editors
  • Write R code to document all changes/alterations

Here are some common functions to read data into R:

  • read.table()
  • fread() (data.table package)
  • read_delim() (from the readr package)
  • read_excel() (readxl package)

We will use predominantly the read_delim() function from readr package in this class. This function provides modern features and a very good performance.

Read in CSV file from folder

full_file_path<-str_c(path_data, data_file)

# read sow production data
#this code lets R guess the types of columns and read their names. 
#it's simple, but there are risks in letting R guess
production<-read_delim(full_file_path,
                       delim = ",",
                       col_names = TRUE)
Rows: 105 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl  (4): ID, PARITY, TOTALBORN, LIVEBORN
date (2): SERVDATE, FARROWINGDATE

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#this code explicitly assignes names and types of variables
production_2<-read_delim(
          file  = full_file_path,  # path to file
          delim = ",",      # 'comma' separated (fields/columns)
          skip  = 1,        # Skip the first line (we will give col names)
          col_names = c("ID", "PARITY", "SERVDATE", "FARROWDATE", "TOTALBORN", "BORNALIVE"),
          col_types = "ficcii", # Set Types
          na = c("", "NULL", "NA")  # vector of possible missing values
)

Print first 5 lines using head() function

# print first 5 lines
head(production, 5)
# A tibble: 5 × 6
     ID PARITY SERVDATE   FARROWINGDATE TOTALBORN LIVEBORN
  <dbl>  <dbl> <date>     <date>            <dbl>    <dbl>
1 18152      7 2024-07-16 2024-11-11           13       13
2 18367      7 2024-07-16 2024-11-11           23       14
3 19166      7 2024-07-16 2024-11-09           18       17
4 19600      6 2024-07-12 2024-11-05           15       14
5 20619      6 2024-07-16 2024-11-10           13       13
head(production_2, 5)
# A tibble: 5 × 6
  ID    PARITY SERVDATE   FARROWDATE TOTALBORN BORNALIVE
  <fct>  <int> <chr>      <chr>          <int>     <int>
1 18152      7 2024-07-16 2024-11-11        13        13
2 18367      7 2024-07-16 2024-11-11        23        14
3 19166      7 2024-07-16 2024-11-09        18        17
4 19600      6 2024-07-12 2024-11-05        15        14
5 20619      6 2024-07-16 2024-11-10        13        13

Print last 5 lines using tail() function

# print last 5 lines
tail(production, 5)
# A tibble: 5 × 6
     ID PARITY SERVDATE   FARROWINGDATE TOTALBORN LIVEBORN
  <dbl>  <dbl> <date>     <date>            <dbl>    <dbl>
1 34254      2 2024-07-12 2024-11-05           11       10
2 34889      1 2024-07-12 2024-11-06           14       11
3 34889      2 2024-12-07 2025-04-03           12       10
4 34905      2 2024-12-07 2025-04-03           19       17
5 34906      1 2024-07-14 2024-11-08           18       17
tail(production_2, 5)
# A tibble: 5 × 6
  ID    PARITY SERVDATE   FARROWDATE TOTALBORN BORNALIVE
  <fct>  <int> <chr>      <chr>          <int>     <int>
1 34254      2 2024-07-12 2024-11-05        11        10
2 34889      1 2024-07-12 2024-11-06        14        11
3 34889      2 2024-12-07 2025-04-03        12        10
4 34905      2 2024-12-07 2025-04-03        19        17
5 34906      1 2024-07-14 2024-11-08        18        17

Default print of a ‘tibble’

production
# A tibble: 105 × 6
      ID PARITY SERVDATE   FARROWINGDATE TOTALBORN LIVEBORN
   <dbl>  <dbl> <date>     <date>            <dbl>    <dbl>
 1 18152      7 2024-07-16 2024-11-11           13       13
 2 18367      7 2024-07-16 2024-11-11           23       14
 3 19166      7 2024-07-16 2024-11-09           18       17
 4 19600      6 2024-07-12 2024-11-05           15       14
 5 20619      6 2024-07-16 2024-11-10           13       13
 6 21079      6 2024-07-16 2024-11-12           10        8
 7 21228      6 2024-07-16 2024-11-10           16       14
 8 21502      6 2024-07-16 2024-11-09           16       14
 9 22119      6 2024-07-16 2024-11-08           15       14
10 22192      4 2024-07-12 2024-11-04           11        8
# ℹ 95 more rows
production_2
# A tibble: 105 × 6
   ID    PARITY SERVDATE   FARROWDATE TOTALBORN BORNALIVE
   <fct>  <int> <chr>      <chr>          <int>     <int>
 1 18152      7 2024-07-16 2024-11-11        13        13
 2 18367      7 2024-07-16 2024-11-11        23        14
 3 19166      7 2024-07-16 2024-11-09        18        17
 4 19600      6 2024-07-12 2024-11-05        15        14
 5 20619      6 2024-07-16 2024-11-10        13        13
 6 21079      6 2024-07-16 2024-11-12        10         8
 7 21228      6 2024-07-16 2024-11-10        16        14
 8 21502      6 2024-07-16 2024-11-09        16        14
 9 22119      6 2024-07-16 2024-11-08        15        14
10 22192      4 2024-07-12 2024-11-04        11         8
# ℹ 95 more rows

Help

The best way to view and search is using the “Help” tab in RStudio.

Or from the Console, you can type ?function() such as ?str_c. This will pull up the help file from the str_c() function.

Notice up top, the brackets will tell you what package the function is from. In this case, {stringr}.

Types

Here is a list of the characters you can give to the read_delim() function and what they stand for.

  • c = character
  • f = factor
  • D = date
  • n = numeric (double)
  • i = integer
  • l = logical
  • ? = guess

To determine what the ‘class’ of the object is, we use the class() function.

Character

Character types are just held as strings in R. One common example are IDs or farm names. Anything that may need string manipulation later on.

You can convert something into a character with as.character().

Factors

Factors look like characters on the outside, however internally they are stored as ordered integers. Factors are used to store categorical variables

For example sex may be stored as:

  • 1 = Female
  • 2 = Male

This is because factors will order alphabetically until reordered with another function.

The different categories are called levels, in this example, there are 2 levels (Female and Male).

You can convert something into a factor with as.factor().

Dates

Dates have the structure YYYY-MM-DD in R, this is an unambiguous structure for the values within it, unlike Excel. Everyone should use this format for their research to avoid mix ups with others.

In R dates can be read as characters and then use one of many functions to convert a character into a date. This is my preferred method.

Numeric

Numeric is also known as a ‘double’ for double precision floating point (has a decimal point). We store continuos variables with this type in R.

Use as.numeric() to convert something into a number.

Integers

Integers are positive and negative numbers such as -1, 0, 1, 2, 3, etc. We use this type to store discrete variables. (Although, many times discrete are stored as numeric)

Convert a column into an integer with as.integer().

Logical

TRUE/FALSE are the logical type known as boolean. Represented with 0’s and 1’s internally.

  • 0 = FALSE
  • 1 = TRUE

Use as.logical() to convert something into a logical type.

Check Types

To check the types in R, you can use the following 2 functions.

# check types - base R function
str(production_2)
spc_tbl_ [105 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ ID        : Factor w/ 76 levels "18152","18367",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ PARITY    : int [1:105] 7 7 7 6 6 6 6 6 6 4 ...
 $ SERVDATE  : chr [1:105] "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-12" ...
 $ FARROWDATE: chr [1:105] "2024-11-11" "2024-11-11" "2024-11-09" "2024-11-05" ...
 $ TOTALBORN : int [1:105] 13 23 18 15 13 10 16 16 15 11 ...
 $ BORNALIVE : int [1:105] 13 14 17 14 13 8 14 14 14 8 ...
 - attr(*, "spec")=
  .. cols(
  ..   ID = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
  ..   PARITY = col_integer(),
  ..   SERVDATE = col_character(),
  ..   FARROWDATE = col_character(),
  ..   TOTALBORN = col_integer(),
  ..   BORNALIVE = col_integer()
  .. )
 - attr(*, "problems")=<externalptr> 
str(production)
spc_tbl_ [105 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ ID           : num [1:105] 18152 18367 19166 19600 20619 ...
 $ PARITY       : num [1:105] 7 7 7 6 6 6 6 6 6 4 ...
 $ SERVDATE     : Date[1:105], format: "2024-07-16" "2024-07-16" ...
 $ FARROWINGDATE: Date[1:105], format: "2024-11-11" "2024-11-11" ...
 $ TOTALBORN    : num [1:105] 13 23 18 15 13 10 16 16 15 11 ...
 $ LIVEBORN     : num [1:105] 13 14 17 14 13 8 14 14 14 8 ...
 - attr(*, "spec")=
  .. cols(
  ..   ID = col_double(),
  ..   PARITY = col_double(),
  ..   SERVDATE = col_date(format = ""),
  ..   FARROWINGDATE = col_date(format = ""),
  ..   TOTALBORN = col_double(),
  ..   LIVEBORN = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

glimpse function

# check types - from dplyr
glimpse(production_2)
Rows: 105
Columns: 6
$ ID         <fct> 18152, 18367, 19166, 19600, 20619, 21079, 21228, 21502, 221…
$ PARITY     <int> 7, 7, 7, 6, 6, 6, 6, 6, 6, 4, 5, 5, 6, 6, 6, 6, 6, 6, 5, 5,…
$ SERVDATE   <chr> "2024-07-16", "2024-07-16", "2024-07-16", "2024-07-12", "20…
$ FARROWDATE <chr> "2024-11-11", "2024-11-11", "2024-11-09", "2024-11-05", "20…
$ TOTALBORN  <int> 13, 23, 18, 15, 13, 10, 16, 16, 15, 11, 16, 15, 19, 13, 17,…
$ BORNALIVE  <int> 13, 14, 17, 14, 13, 8, 14, 14, 14, 8, 12, 14, 16, 12, 16, 1…
glimpse(production)
Rows: 105
Columns: 6
$ ID            <dbl> 18152, 18367, 19166, 19600, 20619, 21079, 21228, 21502, …
$ PARITY        <dbl> 7, 7, 7, 6, 6, 6, 6, 6, 6, 4, 5, 5, 6, 6, 6, 6, 6, 6, 5,…
$ SERVDATE      <date> 2024-07-16, 2024-07-16, 2024-07-16, 2024-07-12, 2024-07…
$ FARROWINGDATE <date> 2024-11-11, 2024-11-11, 2024-11-09, 2024-11-05, 2024-11…
$ TOTALBORN     <dbl> 13, 23, 18, 15, 13, 10, 16, 16, 15, 11, 16, 15, 19, 13, …
$ LIVEBORN      <dbl> 13, 14, 17, 14, 13, 8, 14, 14, 14, 8, 12, 14, 16, 12, 16…

Here is an example of the class function on an individual column.

# grab class of Tattoo column
class(production$ID)
[1] "numeric"
class(production_2$ID)
[1] "factor"

Class activity

use one of the provided functions to check classes of the two objects production and production_2.

Access Columns

In R columns of data frames using the $ operator. We will learn other uses of this operator later.

We use the notation data$column to access individual columns to perform operations.

# access column in R
production$PARITY
  [1] 7 7 7 6 6 6 6 6 6 4 5 5 6 6 6 6 6 6 5 5 6 5 5 6 4 5 5 5 4 4 5 4 4 4 5 3 4
 [38] 1 2 1 1 2 1 2 1 2 1 2 1 2 2 1 2 1 1 0 0 0 1 0 0 1 1 0 1 0 0 0 0 0 1 0 0 1
 [75] 0 1 0 2 3 2 3 2 2 2 2 2 3 2 2 3 2 2 3 2 3 2 2 3 2 3 2 1 2 2 1

Type conversions

A common operation is type conversion. This is done to ensure that all columns in a dataset have a correct format for further analyses. There are many type conversion functions in R. Here are a few:

  • as.numeric()
  • as.character()
  • as.factor()
  • as.Date()
  • as.logical()

Dates

It is highly recommended that date variables/features are managed using functions in the lubridate packagein R. For a full account of lubridate, please see https://lubridate.tidyverse.org/.

Let’s convert SERVDATE and FARROWDATE from character to date in proruction_3

# print a date column
production_2$SERVDATE
  [1] "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-12" "2024-07-16"
  [6] "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-12"
 [11] "2024-12-06" "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-13"
 [16] "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-12"
 [21] "2024-12-06" "2024-07-12" "2024-07-12" "2024-12-06" "2024-07-12"
 [26] "2024-12-06" "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-12"
 [31] "2024-12-06" "2024-07-13" "2024-07-16" "2024-07-15" "2024-12-06"
 [36] "2024-07-12" "2024-12-07" "2024-07-13" "2024-12-07" "2024-07-15"
 [41] "2024-07-16" "2024-12-08" "2024-07-13" "2024-12-07" "2024-07-15"
 [46] "2024-12-06" "2024-07-15" "2024-12-06" "2024-07-13" "2024-12-06"
 [51] "2024-12-06" "2024-07-15" "2024-12-06" "2024-07-16" "2024-07-14"
 [56] "2024-07-16" "2024-07-14" "2024-07-16" "2024-12-07" "2024-07-14"
 [61] "2024-07-15" "2024-12-07" "2024-12-06" "2024-07-14" "2024-12-06"
 [66] "2024-07-15" "2024-07-16" "2024-07-14" "2024-07-14" "2024-07-15"
 [71] "2024-12-07" "2024-07-15" "2024-07-14" "2024-12-07" "2024-07-14"
 [76] "2024-12-07" "2024-07-22" "2024-07-16" "2024-12-07" "2024-07-12"
 [81] "2024-12-06" "2024-07-12" "2024-07-16" "2024-07-16" "2024-07-13"
 [86] "2024-07-13" "2024-12-06" "2024-07-15" "2024-07-12" "2024-12-06"
 [91] "2024-07-12" "2024-07-16" "2024-12-08" "2024-07-16" "2024-12-04"
 [96] "2024-07-12" "2024-07-16" "2024-12-06" "2024-07-16" "2024-12-07"
[101] "2024-07-12" "2024-07-12" "2024-12-07" "2024-12-07" "2024-07-14"
as.Date(production_2$SERVDATE)
  [1] "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-12" "2024-07-16"
  [6] "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-12"
 [11] "2024-12-06" "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-13"
 [16] "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-12"
 [21] "2024-12-06" "2024-07-12" "2024-07-12" "2024-12-06" "2024-07-12"
 [26] "2024-12-06" "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-12"
 [31] "2024-12-06" "2024-07-13" "2024-07-16" "2024-07-15" "2024-12-06"
 [36] "2024-07-12" "2024-12-07" "2024-07-13" "2024-12-07" "2024-07-15"
 [41] "2024-07-16" "2024-12-08" "2024-07-13" "2024-12-07" "2024-07-15"
 [46] "2024-12-06" "2024-07-15" "2024-12-06" "2024-07-13" "2024-12-06"
 [51] "2024-12-06" "2024-07-15" "2024-12-06" "2024-07-16" "2024-07-14"
 [56] "2024-07-16" "2024-07-14" "2024-07-16" "2024-12-07" "2024-07-14"
 [61] "2024-07-15" "2024-12-07" "2024-12-06" "2024-07-14" "2024-12-06"
 [66] "2024-07-15" "2024-07-16" "2024-07-14" "2024-07-14" "2024-07-15"
 [71] "2024-12-07" "2024-07-15" "2024-07-14" "2024-12-07" "2024-07-14"
 [76] "2024-12-07" "2024-07-22" "2024-07-16" "2024-12-07" "2024-07-12"
 [81] "2024-12-06" "2024-07-12" "2024-07-16" "2024-07-16" "2024-07-13"
 [86] "2024-07-13" "2024-12-06" "2024-07-15" "2024-07-12" "2024-12-06"
 [91] "2024-07-12" "2024-07-16" "2024-12-08" "2024-07-16" "2024-12-04"
 [96] "2024-07-12" "2024-07-16" "2024-12-06" "2024-07-16" "2024-12-07"
[101] "2024-07-12" "2024-07-12" "2024-12-07" "2024-12-07" "2024-07-14"
class(production_2$SERVDATE)
[1] "character"
class(as.Date(production_2$SERVDATE))
[1] "Date"

But so far, we did not convert the column in the dataset, we just converted it in the workspace.

To introduce changes in a dataset, we will use the mutate() function and make sure to re-store the dataset:

# glimpse first to check initial column classes
glimpse(production_2)
Rows: 105
Columns: 6
$ ID         <fct> 18152, 18367, 19166, 19600, 20619, 21079, 21228, 21502, 221…
$ PARITY     <int> 7, 7, 7, 6, 6, 6, 6, 6, 6, 4, 5, 5, 6, 6, 6, 6, 6, 6, 5, 5,…
$ SERVDATE   <chr> "2024-07-16", "2024-07-16", "2024-07-16", "2024-07-12", "20…
$ FARROWDATE <chr> "2024-11-11", "2024-11-11", "2024-11-09", "2024-11-05", "20…
$ TOTALBORN  <int> 13, 23, 18, 15, 13, 10, 16, 16, 15, 11, 16, 15, 19, 13, 17,…
$ BORNALIVE  <int> 13, 14, 17, 14, 13, 8, 14, 14, 14, 8, 12, 14, 16, 12, 16, 1…
production_2<-mutate(production_2,
                     SERVDATE=as.Date(SERVDATE),
                     FARROWDATE=as.Date(FARROWDATE))
glimpse(production_2)
Rows: 105
Columns: 6
$ ID         <fct> 18152, 18367, 19166, 19600, 20619, 21079, 21228, 21502, 221…
$ PARITY     <int> 7, 7, 7, 6, 6, 6, 6, 6, 6, 4, 5, 5, 6, 6, 6, 6, 6, 6, 5, 5,…
$ SERVDATE   <date> 2024-07-16, 2024-07-16, 2024-07-16, 2024-07-12, 2024-07-16…
$ FARROWDATE <date> 2024-11-11, 2024-11-11, 2024-11-09, 2024-11-05, 2024-11-10…
$ TOTALBORN  <int> 13, 23, 18, 15, 13, 10, 16, 16, 15, 11, 16, 15, 19, 13, 17,…
$ BORNALIVE  <int> 13, 14, 17, 14, 13, 8, 14, 14, 14, 8, 12, 14, 16, 12, 16, 1…

Reformatting dates

read data in European format (day/month/year)

data_file<-"Production_Raw.csv"
full_file_path<-str_c(path_data, data_file)

#Explicitly assignes names and types of variables
production_r<-read_delim(
          file  = full_file_path,  # path to file
          delim = ",",      # 'comma' separated (fields/columns)
          skip  = 1,        # Skip the first line (we will give col names)
          col_names = c("ID", "PARITY", "SERVDATE", "FARROWDATE", "TOTALBORN", "BORNALIVE"),
          col_types = "ficcii", # Set Types
          na = c("", "NULL", "NA")  # vector of possible missing values
)


production_r #Service is in European format, Farrow is in American format
# A tibble: 105 × 28
   ID    PARITY SERVDATE   FARROWDATE TOTALBORN BORNALIVE X7    X8    X9   
   <fct>  <int> <chr>      <chr>          <int>     <int> <lgl> <lgl> <lgl>
 1 18152      7 16/07/2024 11/11/2024        13        13 NA    NA    NA   
 2 18367      7 16/07/2024 11/11/2024        23        14 NA    NA    NA   
 3 19166      7 16/07/2024 11/9/2024         18        17 NA    NA    NA   
 4 19600      6 12/7/2024  11/5/2024         15        14 NA    NA    NA   
 5 20619      6 16/07/2024 11/10/2024        13        13 NA    NA    NA   
 6 21079      6 16/07/2024 11/12/2024        10         8 NA    NA    NA   
 7 21228      6 16/07/2024 11/10/2024        16        14 NA    NA    NA   
 8 21502      6 16/07/2024 11/9/2024         16        14 NA    NA    NA   
 9 22119      6 16/07/2024 11/8/2024         15        14 NA    NA    NA   
10 22192      4 12/7/2024  11/4/2024         11         8 NA    NA    NA   
   X10   X11   X12   X13   X14   X15   X16   X17   X18   X19   X20   X21   X22  
   <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
 1 NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   
 2 NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   
 3 NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   
 4 NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   
 5 NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   
 6 NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   
 7 NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   
 8 NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   
 9 NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   
10 NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   
   X23   X24   X25   X26   X27   X28  
   <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
 1 NA    NA    NA    NA    NA    NA   
 2 NA    NA    NA    NA    NA    NA   
 3 NA    NA    NA    NA    NA    NA   
 4 NA    NA    NA    NA    NA    NA   
 5 NA    NA    NA    NA    NA    NA   
 6 NA    NA    NA    NA    NA    NA   
 7 NA    NA    NA    NA    NA    NA   
 8 NA    NA    NA    NA    NA    NA   
 9 NA    NA    NA    NA    NA    NA   
10 NA    NA    NA    NA    NA    NA   
# ℹ 95 more rows
production_2
# A tibble: 105 × 6
   ID    PARITY SERVDATE   FARROWDATE TOTALBORN BORNALIVE
   <fct>  <int> <date>     <date>         <int>     <int>
 1 18152      7 2024-07-16 2024-11-11        13        13
 2 18367      7 2024-07-16 2024-11-11        23        14
 3 19166      7 2024-07-16 2024-11-09        18        17
 4 19600      6 2024-07-12 2024-11-05        15        14
 5 20619      6 2024-07-16 2024-11-10        13        13
 6 21079      6 2024-07-16 2024-11-12        10         8
 7 21228      6 2024-07-16 2024-11-10        16        14
 8 21502      6 2024-07-16 2024-11-09        16        14
 9 22119      6 2024-07-16 2024-11-08        15        14
10 22192      4 2024-07-12 2024-11-04        11         8
# ℹ 95 more rows
dmy(production_r$SERVDATE)
  [1] "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-12" "2024-07-16"
  [6] "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-12"
 [11] "2024-12-06" "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-13"
 [16] "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-12"
 [21] "2024-12-06" "2024-07-12" "2024-07-12" "2024-12-06" "2024-07-12"
 [26] "2024-12-06" "2024-07-16" "2024-07-16" "2024-07-16" "2024-07-12"
 [31] "2024-12-06" "2024-07-13" "2024-07-16" "2024-07-15" "2024-12-06"
 [36] "2024-07-12" "2024-12-07" "2024-07-13" "2024-12-07" "2024-07-15"
 [41] "2024-07-16" "2024-12-08" "2024-07-13" "2024-12-07" "2024-07-15"
 [46] "2024-12-06" "2024-07-15" "2024-12-06" "2024-07-13" "2024-12-06"
 [51] "2024-12-06" "2024-07-15" "2024-12-06" "2024-07-16" "2024-07-14"
 [56] "2024-07-16" "2024-07-14" "2024-07-16" "2024-12-07" "2024-07-14"
 [61] "2024-07-15" "2024-12-07" "2024-12-06" "2024-07-14" "2024-12-06"
 [66] "2024-07-15" "2024-07-16" "2024-07-14" "2024-07-14" "2024-07-15"
 [71] "2024-12-07" "2024-07-15" "2024-07-14" "2024-12-07" "2024-07-14"
 [76] "2024-12-07" "2024-07-22" "2024-07-16" "2024-12-07" "2024-07-12"
 [81] "2024-12-06" "2024-07-12" "2024-07-16" "2024-07-16" "2024-07-13"
 [86] "2024-07-13" "2024-12-06" "2024-07-15" "2024-07-12" "2024-12-06"
 [91] "2024-07-12" "2024-07-16" "2024-12-08" "2024-07-16" "2024-12-04"
 [96] "2024-07-12" "2024-07-16" "2024-12-06" "2024-07-16" "2024-12-07"
[101] "2024-07-12" "2024-07-12" "2024-12-07" "2024-12-07" "2024-07-14"
dmy(production_r$FARROWDATE) #dangerous!
Warning: 2 failed to parse.
  [1] "2024-11-11" "2024-11-11" "2024-09-11" "2024-05-11" "2024-10-11"
  [6] "2024-12-11" "2024-10-11" "2024-09-11" "2024-08-11" "2024-04-11"
 [11] NA           "2024-09-11" "2024-09-11" "2024-10-11" "2024-08-11"
 [16] "2024-09-11" "2024-11-11" "2024-11-11" "2024-09-11" "2024-05-11"
 [21] "2025-02-04" "2024-07-11" "2024-06-11" "2025-02-04" "2024-05-11"
 [26] "2025-01-04" "2024-11-11" "2024-10-11" "2024-02-12" "2024-06-11"
 [31] "2025-02-04" "2024-08-11" "2024-09-11" "2024-10-11" "2025-02-04"
 [36] "2024-07-11" "2025-03-04" "2024-08-11" "2025-04-04" "2024-10-11"
 [41] "2024-09-11" "2025-04-04" "2024-07-11" "2025-03-04" "2024-10-11"
 [46] "2025-04-04" "2024-09-11" "2025-02-04" "2024-06-11" "2025-03-04"
 [51] "2025-02-04" "2024-07-11" "2025-02-04" "2024-10-11" "2024-06-11"
 [56] "2024-08-11" "2024-06-11" "2024-12-11" "2025-03-04" "2024-08-11"
 [61] "2024-09-11" "2025-05-04" "2025-01-04" "2024-06-11" "2025-02-04"
 [66] "2024-08-11" "2024-11-11" "2024-07-11" "2024-06-11" "2024-10-11"
 [71] "2025-03-04" "2024-09-11" "2024-08-11" "2025-03-04" "2024-06-11"
 [76] "2025-03-04" NA           "2024-10-11" "2025-03-04" "2024-08-11"
 [81] "2025-03-04" "2024-08-11" "2024-02-12" "2024-12-11" "2024-06-11"
 [86] "2024-06-11" "2025-02-04" "2024-11-11" "2024-06-11" "2025-02-04"
 [91] "2024-08-11" "2024-12-11" "2025-05-04" "2024-12-11" "2025-05-04"
 [96] "2024-05-11" "2024-11-11" "2025-03-04" "2024-11-11" "2025-01-04"
[101] "2024-05-11" "2024-06-11" "2025-03-04" "2025-03-04" "2024-08-11"
mdy(production_r$FARROWDATE)
  [1] "2024-11-11" "2024-11-11" "2024-11-09" "2024-11-05" "2024-11-10"
  [6] "2024-11-12" "2024-11-10" "2024-11-09" "2024-11-08" "2024-11-04"
 [11] "2025-03-31" "2024-11-09" "2024-11-09" "2024-11-10" "2024-11-08"
 [16] "2024-11-09" "2024-11-11" "2024-11-11" "2024-11-09" "2024-11-05"
 [21] "2025-04-02" "2024-11-07" "2024-11-06" "2025-04-02" "2024-11-05"
 [26] "2025-04-01" "2024-11-11" "2024-11-10" "2024-12-02" "2024-11-06"
 [31] "2025-04-02" "2024-11-08" "2024-11-09" "2024-11-10" "2025-04-02"
 [36] "2024-11-07" "2025-04-03" "2024-11-08" "2025-04-04" "2024-11-10"
 [41] "2024-11-09" "2025-04-04" "2024-11-07" "2025-04-03" "2024-11-10"
 [46] "2025-04-04" "2024-11-09" "2025-04-02" "2024-11-06" "2025-04-03"
 [51] "2025-04-02" "2024-11-07" "2025-04-02" "2024-11-10" "2024-11-06"
 [56] "2024-11-08" "2024-11-06" "2024-11-12" "2025-04-03" "2024-11-08"
 [61] "2024-11-09" "2025-04-05" "2025-04-01" "2024-11-06" "2025-04-02"
 [66] "2024-11-08" "2024-11-11" "2024-11-07" "2024-11-06" "2024-11-10"
 [71] "2025-04-03" "2024-11-09" "2024-11-08" "2025-04-03" "2024-11-06"
 [76] "2025-04-03" "2024-11-13" "2024-11-10" "2025-04-03" "2024-11-08"
 [81] "2025-04-03" "2024-11-08" "2024-12-02" "2024-11-12" "2024-11-06"
 [86] "2024-11-06" "2025-04-02" "2024-11-11" "2024-11-06" "2025-04-02"
 [91] "2024-11-08" "2024-11-12" "2025-04-05" "2024-11-12" "2025-04-05"
 [96] "2024-11-05" "2024-11-11" "2025-04-03" "2024-11-11" "2025-04-01"
[101] "2024-11-05" "2024-11-06" "2025-04-03" "2025-04-03" "2024-11-08"

Class Activity

Use these functions to modify production_r to match exactly the other two datasets.

All you need to do is alter the order of the ‘y’, ‘m’, and ‘d’ to get the correct function. If you have a date such as 25/01/2021, use the function dmy.

  • ymd() for year-month-day formats (any separator)
  • dmy() for day-month-year formats
  • mdy() for month-day-year formats

The lubridate package also has functions to extract parts of the date such as year(), quarter(), month(), and day().

# extract month from birth date
month(production_2$SERVDATE)
  [1]  7  7  7  7  7  7  7  7  7  7 12  7  7  7  7  7  7  7  7  7 12  7  7 12  7
 [26] 12  7  7  7  7 12  7  7  7 12  7 12  7 12  7  7 12  7 12  7 12  7 12  7 12
 [51] 12  7 12  7  7  7  7  7 12  7  7 12 12  7 12  7  7  7  7  7 12  7  7 12  7
 [76] 12  7  7 12  7 12  7  7  7  7  7 12  7  7 12  7  7 12  7 12  7  7 12  7 12
[101]  7  7 12 12  7
# set to a factor
as.factor(month(production_2$SERVDATE))
  [1] 7  7  7  7  7  7  7  7  7  7  12 7  7  7  7  7  7  7  7  7  12 7  7  12 7 
 [26] 12 7  7  7  7  12 7  7  7  12 7  12 7  12 7  7  12 7  12 7  12 7  12 7  12
 [51] 12 7  12 7  7  7  7  7  12 7  7  12 12 7  12 7  7  7  7  7  12 7  7  12 7 
 [76] 12 7  7  12 7  12 7  7  7  7  7  12 7  7  12 7  7  12 7  12 7  7  12 7  12
[101] 7  7  12 12 7 
Levels: 7 12

We can use the table() function to create a count of each level of a factor. This is called, frequency table

# table of birth months
table(month(production_2$SERVDATE))

 7 12 
73 32 

Other functions in lubridate: - year() - weekdays() day of the week by name - quarters() to extract quarter - floor_date(), round_date(), and ceiling_date() to round down the date to specific units - week() to extract the week of the year - yq() to extract year-quarter dates - today() and now() to extract current date or time

Many more to deal with periods, durations, and intervals!

See the cheatsheet here: https://rawgit.com/rstudio/cheatsheets/main/lubridate.pdf