Update August 2023

Quarto

Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.

Running Code

When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:

library(lme4)

Loading required package: Matrix

library(lmerTest)


Attaching package: 'lmerTest'

The following object is masked from 'package:lme4':

    lmer

The following object is masked from 'package:stats':

    step

library(emmeans)
library(car)

Loading required package: carData

library(RVAideMemoire)

*** Package RVAideMemoire v 0.9-81 ***


Attaching package: 'RVAideMemoire'

The following object is masked from 'package:lme4':

    dummy

library(DHARMa)

This is DHARMa 0.4.4. For overview type '?DHARMa'. For recent changes, type news(package = 'DHARMa')

library(MuMIn)
library(tidyverse)

Warning: package 'tidyverse' was built under R version 4.1.3

Warning: package 'ggplot2' was built under R version 4.1.3

Warning: package 'tibble' was built under R version 4.1.3

Warning: package 'tidyr' was built under R version 4.1.3

Warning: package 'readr' was built under R version 4.1.3

Warning: package 'purrr' was built under R version 4.1.3

Warning: package 'dplyr' was built under R version 4.1.3

Warning: package 'stringr' was built under R version 4.1.3

Warning: package 'forcats' was built under R version 4.1.3

Warning: package 'lubridate' was built under R version 4.1.3

-- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --
v dplyr     1.1.2     v readr     2.1.4
v forcats   1.0.0     v stringr   1.5.0
v ggplot2   3.4.2     v tibble    3.2.1
v lubridate 1.9.2     v tidyr     1.3.0
v purrr     1.0.1

-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x tidyr::expand() masks Matrix::expand()
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
x tidyr::pack()   masks Matrix::pack()
x dplyr::recode() masks car::recode()
x purrr::some()   masks car::some()
x tidyr::unpack() masks Matrix::unpack()
i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(glmmTMB)

Warning in checkMatrixPackageVersion(): Package version inconsistency detected.
TMB was built with Matrix version 1.4.0
Current Matrix version is 1.3.4
Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package

library(corrplot)

corrplot 0.92 loaded

library(RColorBrewer)
library(ggplot2)
library(MASS)


Attaching package: 'MASS'

The following object is masked from 'package:dplyr':

    select

library(agricolae)

Warning: package 'agricolae' was built under R version 4.1.3

library(vegan)

Warning: package 'vegan' was built under R version 4.1.3

Loading required package: permute
Loading required package: lattice

Warning: package 'lattice' was built under R version 4.1.3

Registered S3 methods overwritten by 'vegan':
  method      from
  plot.rda    klaR
  predict.rda klaR
  print.rda   klaR
This is vegan 2.6-4

data <- read.csv("PIC_65_FIRE.AN.csv", header = T, na.strings = "N/A")
head(data)

        ID LINE     SIRE      DAM   LITTER   PEN FARM       ENTRY_TIME
1 96251326   65 91032775 92185339 78043216 B0113  774 10/27/2022 12:51
2 96251327   65 91032775 92185339 78043216 B0113  774 10/27/2022 12:00
3 96251327   65 91032775 92185339 78043216 B0113  774 10/27/2022 13:03
4 96284921   65 90218081 90208620 78064883 B0113  774  10/27/2022 7:27
5 96284921   65 90218081 90208620 78064883 B0113  774  10/27/2022 8:04
6 96284921   65 90218081 90208620 78064883 B0113  774  10/27/2022 8:41
         EXIT_TIME STAY_IN FEED_INTK ENTRY_WT EXIT_WT FEEDER_NO START_DAY
1 10/27/2022 13:02     649       265     1056     791         5 27-Oct-22
2 10/27/2022 12:32    1948       514     1086     572         5 27-Oct-22
3 10/27/2022 13:06     169        57      791     734         5 27-Oct-22
4  10/27/2022 7:37     627        85      473     388         5 27-Oct-22
5  10/27/2022 8:17     768       196      613     417         5 27-Oct-22
6  10/27/2022 8:48     413        88      691     603         5 27-Oct-22
  OFFTEST_DAY
1    2-Jan-23
2    2-Jan-23
3    2-Jan-23
4    2-Jan-23
5    2-Jan-23
6    2-Jan-23

summary(data)

       ID                LINE         SIRE               DAM          
 Min.   :96251326   Min.   :65   Min.   :87591163   Min.   :86315923  
 1st Qu.:96601065   1st Qu.:65   1st Qu.:89122431   1st Qu.:90888369  
 Median :97257792   Median :65   Median :90896190   Median :92297569  
 Mean   :97267678   Mean   :65   Mean   :90604058   Mean   :91786593  
 3rd Qu.:97887849   3rd Qu.:65   3rd Qu.:92013824   3rd Qu.:93218718  
 Max.   :98368540   Max.   :65   Max.   :93693188   Max.   :94232962  
     LITTER             PEN                 FARM      ENTRY_TIME       
 Min.   :78043165   Length:114263      Min.   :774   Length:114263     
 1st Qu.:78261882   Class :character   1st Qu.:774   Class :character  
 Median :78637026   Mode  :character   Median :774   Mode  :character  
 Mean   :78609475                      Mean   :774                     
 3rd Qu.:78945651                      3rd Qu.:774                     
 Max.   :79177365                      Max.   :774                     
  EXIT_TIME            STAY_IN        FEED_INTK          ENTRY_WT   
 Length:114263      Min.   :    5   Min.   :-3543.0   Min.   :-913  
 Class :character   1st Qu.:  576   1st Qu.:  245.0   1st Qu.: 817  
 Mode  :character   Median : 1315   Median :  593.0   Median :1117  
                    Mean   : 1402   Mean   :  638.9   Mean   :1167  
                    3rd Qu.: 2038   3rd Qu.:  947.0   3rd Qu.:1480  
                    Max.   :10169   Max.   : 9102.0   Max.   :9090  
    EXIT_WT         FEEDER_NO      START_DAY         OFFTEST_DAY       
 Min.   :-913.0   Min.   : 4.00   Length:114263      Length:114263     
 1st Qu.: 414.0   1st Qu.:13.00   Class :character   Class :character  
 Median : 516.0   Median :31.00   Mode  :character   Mode  :character  
 Mean   : 528.1   Mean   :28.65                                        
 3rd Qu.: 635.0   3rd Qu.:46.00                                        
 Max.   :3917.0   Max.   :64.00

# Assuming your dataset is named "data"
data$PEN <- as.factor(data$PEN)
data$START <- as.factor(data$START_DAY)

# Create the new variable PEN_START by pasting PEN and START together
data$PEN_START <- paste(data$PEN, data$START_DAY, sep = "_")

# Perform linear regression
FEED_INTK_model <- lm(FEED_INTK ~ PEN_START, data = data)

# Print the summary of the regression model
summary(FEED_INTK_model)


Call:
lm(formula = FEED_INTK ~ PEN_START, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-4218.5  -387.9   -40.2   310.1  8337.3 

Coefficients:
                         Estimate Std. Error t value Pr(>|t|)    
(Intercept)               683.190      8.749  78.091  < 2e-16 ***
PEN_STARTB0111_27-Oct-22  -17.811     11.901  -1.497 0.134521    
PEN_STARTB0111_5-Jan-23    11.917     12.460   0.956 0.338867    
PEN_STARTB0113_16-Mar-23 -148.118     11.698 -12.662  < 2e-16 ***
PEN_STARTB0113_27-Oct-22  -13.105     12.152  -1.078 0.280865    
PEN_STARTB0113_5-Jan-23   -35.557     12.048  -2.951 0.003165 ** 
PEN_STARTB0211_12-Jan-23  -11.215     12.781  -0.877 0.380229    
PEN_STARTB0211_23-Mar-23 -111.176     12.091  -9.195  < 2e-16 ***
PEN_STARTB0211_3-Nov-22   -81.436     11.911  -6.837 8.11e-12 ***
PEN_STARTB0213_12-Jan-23  -43.755     12.098  -3.617 0.000299 ***
PEN_STARTB0213_23-Mar-23  -27.313     12.713  -2.148 0.031684 *  
PEN_STARTB0213_3-Nov-22    64.286     12.340   5.210 1.89e-07 ***
PEN_STARTB0311_10-Nov-22   27.153     12.576   2.159 0.030840 *  
PEN_STARTB0311_19-Jan-23   81.474     12.442   6.548 5.84e-11 ***
PEN_STARTB0311_29-Mar-23  -94.306     12.312  -7.660 1.88e-14 ***
PEN_STARTB0313_10-Nov-22  103.927     12.165   8.543  < 2e-16 ***
PEN_STARTB0313_19-Jan-23   20.497     12.523   1.637 0.101667    
PEN_STARTB0313_29-Mar-23    9.299     12.709   0.732 0.464387    
PEN_STARTB0411_17-Nov-22 -101.142     11.549  -8.757  < 2e-16 ***
PEN_STARTB0411_26-Jan-23 -150.333     11.663 -12.889  < 2e-16 ***
PEN_STARTB0411_5-Apr-23   -49.017     12.701  -3.859 0.000114 ***
PEN_STARTB0413_17-Nov-22  -90.186     12.009  -7.510 5.97e-14 ***
PEN_STARTB0413_26-Jan-23 -114.832     11.744  -9.778  < 2e-16 ***
PEN_STARTB0413_5-Apr-23  -101.393     12.141  -8.351  < 2e-16 ***
PEN_STARTB0502_13-Apr-23  -76.899     12.200  -6.303 2.93e-10 ***
PEN_STARTB0506_2-Feb-23   -80.731     12.707  -6.353 2.12e-10 ***
PEN_STARTB0506_23-Nov-22  114.995     13.344   8.617  < 2e-16 ***
PEN_STARTB0602_20-Apr-23  -51.492     12.947  -3.977 6.98e-05 ***
PEN_STARTB0606_1-Dec-22  -151.245     11.746 -12.876  < 2e-16 ***
PEN_STARTB0606_9-Feb-23  -124.174     11.688 -10.624  < 2e-16 ***
PEN_STARTB0706_16-Feb-23  -40.273     12.069  -3.337 0.000847 ***
PEN_STARTB0706_8-Dec-22   -91.277     11.712  -7.793 6.59e-15 ***
PEN_STARTB0906_2-Mar-23   -18.866     12.133  -1.555 0.119959    
PEN_STARTB0906_22-Dec-22   -7.728     12.702  -0.608 0.542917    
PEN_STARTB1006_29-Dec-22   57.441     12.530   4.584 4.56e-06 ***
PEN_STARTB1006_9-Mar-23   -27.339     12.341  -2.215 0.026738 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 479.7 on 114227 degrees of freedom
Multiple R-squared:  0.0205,    Adjusted R-squared:  0.0202 
F-statistic: 68.29 on 35 and 114227 DF,  p-value: < 2.2e-16

# Assuming your dataset is named "data"
data$TIME <- paste(data$PEN, data$START_DAY, data$OFFTEST_DAY, sep = "_")


library(dplyr)

# Assuming your dataset is named "data"
data_summary <- data %>%
  group_by(TIME) %>%
  summarize(
    Mean_FEED_INTK = mean(FEED_INTK),
    Median_FEED_INTK = median(FEED_INTK),
    SD_FEED_INTK = sd(FEED_INTK),
    Num_Animals = n(),
    Num_Unique_PEN = n_distinct(PEN)
  )

print(data_summary)

# A tibble: 36 x 6
   TIME  Mean_FEED_INTK Median_FEED_INTK SD_FEED_INTK Num_Animals Num_Unique_PEN
   <chr>          <dbl>            <dbl>        <dbl>       <int>          <int>
 1 B011~           683.             670          420.        3006              1
 2 B011~           665.             600.         490.        3534              1
 3 B011~           695.             680          443.        2923              1
 4 B011~           535.             521          361.        3815              1
 5 B011~           670.             646.         428.        3234              1
 6 B011~           648.             606          459.        3353              1
 7 B021~           672.             614          520.        2650              1
 8 B021~           572.             533          420.        3303              1
 9 B021~           602.             546.         465.        3522              1
10 B021~           639.             605          460.        3295              1
# i 26 more rows

data_summary

# A tibble: 36 x 6
   TIME  Mean_FEED_INTK Median_FEED_INTK SD_FEED_INTK Num_Animals Num_Unique_PEN
   <chr>          <dbl>            <dbl>        <dbl>       <int>          <int>
 1 B011~           683.             670          420.        3006              1
 2 B011~           665.             600.         490.        3534              1
 3 B011~           695.             680          443.        2923              1
 4 B011~           535.             521          361.        3815              1
 5 B011~           670.             646.         428.        3234              1
 6 B011~           648.             606          459.        3353              1
 7 B021~           672.             614          520.        2650              1
 8 B021~           572.             533          420.        3303              1
 9 B021~           602.             546.         465.        3522              1
10 B021~           639.             605          460.        3295              1
# i 26 more rows

#######################TRYING DIFFERENT APPROACH##########################
library(dplyr)
library(lubridate)

# Convert date columns to Date format
data$START_DAY <- ymd(data$START_DAY)       # Assuming 'ymd' is the correct format, adjust if needed
data$OFFTEST_DAY <- ymd(data$OFFTEST_DAY)

# Define the date ranges
date_range1 <- c("2022-10-27", "2023-01-02")
date_range2 <- c("2023-01-05", "2023-03-15")
date_range3 <- c("2023-03-16", "2023-05-15")

# Convert date ranges to Date objects
date_range1 <- ymd(date_range1)
date_range2 <- ymd(date_range2)
date_range3 <- ymd(date_range3)

# Subset the data based on date ranges
data_subset1 <- data %>%
  filter(START_DAY >= date_range1[1] & OFFTEST_DAY <= date_range1[2])

data_subset2 <- data %>%
  filter(START_DAY >= date_range2[1] & OFFTEST_DAY <= date_range2[2])

data_subset3 <- data %>%
  filter(START_DAY >= date_range3[1] & OFFTEST_DAY <= date_range3[2])
head(data)

        ID LINE     SIRE      DAM   LITTER   PEN FARM       ENTRY_TIME
1 96251326   65 91032775 92185339 78043216 B0113  774 10/27/2022 12:51
2 96251327   65 91032775 92185339 78043216 B0113  774 10/27/2022 12:00
3 96251327   65 91032775 92185339 78043216 B0113  774 10/27/2022 13:03
4 96284921   65 90218081 90208620 78064883 B0113  774  10/27/2022 7:27
5 96284921   65 90218081 90208620 78064883 B0113  774  10/27/2022 8:04
6 96284921   65 90218081 90208620 78064883 B0113  774  10/27/2022 8:41
         EXIT_TIME STAY_IN FEED_INTK ENTRY_WT EXIT_WT FEEDER_NO  START_DAY
1 10/27/2022 13:02     649       265     1056     791         5 2027-10-22
2 10/27/2022 12:32    1948       514     1086     572         5 2027-10-22
3 10/27/2022 13:06     169        57      791     734         5 2027-10-22
4  10/27/2022 7:37     627        85      473     388         5 2027-10-22
5  10/27/2022 8:17     768       196      613     417         5 2027-10-22
6  10/27/2022 8:48     413        88      691     603         5 2027-10-22
  OFFTEST_DAY     START       PEN_START                     TIME
1  2002-01-23 27-Oct-22 B0113_27-Oct-22 B0113_27-Oct-22_2-Jan-23
2  2002-01-23 27-Oct-22 B0113_27-Oct-22 B0113_27-Oct-22_2-Jan-23
3  2002-01-23 27-Oct-22 B0113_27-Oct-22 B0113_27-Oct-22_2-Jan-23
4  2002-01-23 27-Oct-22 B0113_27-Oct-22 B0113_27-Oct-22_2-Jan-23
5  2002-01-23 27-Oct-22 B0113_27-Oct-22 B0113_27-Oct-22_2-Jan-23
6  2002-01-23 27-Oct-22 B0113_27-Oct-22 B0113_27-Oct-22_2-Jan-23

# Create linear regression models for each subset
model1 <- lm(FEED_INTK ~ TIME, data = data_subset1)
model2 <- lm(FEED_INTK ~ TIME, data = data_subset2)
model3 <- lm(FEED_INTK ~ TIME, data = data_subset3)

# ... rest of the code for plotting ...

library(ggplot2)

# Plot for date range 1
ggplot(data_subset1, aes(x = TIME, y = FEED_INTK)) +
  geom_boxplot() +
  labs(title = "Boxplots of FEED_INTK for Date Range 1",
       x = "Social Group", y = "FEED_INTK")

# Plot for date range 2
ggplot(data_subset2, aes(x = TIME, y = FEED_INTK)) +
  geom_boxplot() +
  labs(title = "Boxplots of FEED_INTK for Date Range 2",
       x = "Social Group", y = "FEED_INTK")

# Plot for date range 3
ggplot(data_subset3, aes(x = TIME, y = FEED_INTK)) +
  geom_boxplot() +
  labs(title = "Boxplots of FEED_INTK for Date Range 3",
       x = "Social Group", y = "FEED_INTK")

library(ggplot2)

# Convert PEN column to numeric
#data$PEN <- as.numeric(data$PEN)

# Categorize PEN as a factor with 15 groups
data$PEN <- as.factor(data$PEN)
data$ID <- as.factor(data$ID)

# Plotting boxplots
ggplot(data = data, aes(x= PEN, y= FEED_INTK))+
  geom_boxplot(outlier.shape = 1)+
  theme(axis.text.x = element_text(hjust = 1))+
  labs(title = "Boxplot of FEED_INTK by PEN")+
  labs(y= "FEED_INTAKE")+
  labs(x="PEN")+
  theme(legend.position = "none")

ggplot(data = data, aes(x= PEN, y= STAY_IN))+
  geom_boxplot(outlier.shape = 1)+
  theme(axis.text.x = element_text(hjust = 1))+
  labs(title = "Boxplot of STAY_IN by PEN")+
  labs(y= "STAY_IN")+
  labs(x="PEN")+
  theme(legend.position = "none")

# Plot scatter plot with smooth curves for FEED_INTK and STAY_IN
library(ggplot2)

ggplot(data, aes(y = STAY_IN, x = FEED_INTK)) +
  geom_point() +
  facet_wrap(~PEN)+
  geom_smooth(method = "gam", se = FALSE, color = "blue") +
  labs(title = "Scatter Plot with Smooth Curves",
       x = "STAY_IN", y = "FEED_INTK") +
  theme_minimal()

`geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'

library(ggplot2)

filter(data, PEN== "B0502") %>%
ggplot(aes(y = STAY_IN, x = FEED_INTK, color=ID)) +
  geom_point() +
  facet_wrap(~PEN)+
  geom_smooth(method = "gam", se = TRUE) +
  labs(title = "Scatter Plot with Smooth Curves",
       x = "STAY_IN", y = "FEED_INTK") +
  theme_minimal()

`geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'

#I will break down the output from the linear regression model summary:

#1. Residual Standard Error: This is the estimated standard deviation of the residuals, which are the differences between the observed values and the predicted values from the regression model. In this case, the residual standard error is approximately 479.7.

#2. Degrees of Freedom (DF): The first value after the residual standard error (114227) represents the total degrees of freedom, which is the number of data points minus the number of estimated coefficients in the model.

#3. Multiple R-squared: This is a measure of how well the independent variables (PEN_START) explain the variation in the dependent variable (FEED_INTK). It ranges from 0 to 1, with 1 indicating that the model explains all the variation, and 0 indicating that the model explains none of the variation. In this case, the multiple R-squared is approximately 0.0205, suggesting that the model explains only about 2.05% of the variation in FEED_INTK.

#4. Adjusted R-squared: This is a modified version of the R-squared that takes into account the number of independent variables in the model and adjusts for the degrees of freedom. It penalizes the model for including additional variables that may not contribute significantly to the explanation of the dependent variable. In this case, the adjusted R-squared is approximately 0.0202.

#5. F-statistic: This is a test statistic that measures whether the overall regression model is statistically significant. It compares the variation explained by the model to the variation that cannot be explained by the model. In this case, the F-statistic is 68.29, and it has 35 and 114227 degrees of freedom for the numerator and denominator, respectively.

#6. p-value: This is the probability of obtaining an F-statistic as extreme as the one observed, assuming that the null hypothesis is true (i.e., the model has no explanatory power). In this case, the p-value is less than 2.2e-16, which is very close to zero. This extremely small p-value indicates that the regression model is highly statistically significant, suggesting that at least some of the independent variables (PEN_START) have a significant effect on the dependent variable (FEED_INTK).

#In summary, the linear regression model explains a small portion of the variation in the dependent variable (FEED_INTK), but it is statistically significant, indicating that at least some of the independent variables (PEN_START) are significantly associated with FEED_INTK. However, the overall explanatory power of the model is relatively low (R-squared is low), suggesting that other factors not included in the model may also influence FEED_INTK.

You can add options to executable code like this

[1] 4

The echo: false option disables the printing of code (only output is displayed).