feature_selection

options(warn = -1)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(lmerTest)
Loading required package: lme4
Loading required package: Matrix

Attaching package: 'lmerTest'
The following object is masked from 'package:lme4':

    lmer
The following object is masked from 'package:stats':

    step
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ readr     2.1.5
✔ ggplot2   3.5.1     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ tidyr::expand() masks Matrix::expand()
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
✖ tidyr::pack()   masks Matrix::pack()
✖ tidyr::unpack() masks Matrix::unpack()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(modelr)
library(purrr)
library(emmeans)
Welcome to emmeans.
Caution: You lose important information if you filter this package's results.
See '? untidy'
library(gridExtra)

Attaching package: 'gridExtra'

The following object is masked from 'package:dplyr':

    combine
library(writexl)
library(gt)
library(webshot2)
library(broom.mixed)
library(ggplot2)
library(isotree)


load("Z:/Isaac/Visual Features/1-5/step2.RData")

below are the repeatability tables

these values are from the model with mean value

print(df_10min_random_vars)
# A tibble: 17 × 4
   feature                  sow  Residual repeatability
   <fct>                  <dbl>     <dbl>         <dbl>
 1 Area              10653.     5575.             0.656
 2 Centroid.X           23.3      21.1            0.525
 3 Centroid.Y           16.7      11.2            0.597
 4 Concavity             0.0310    0.0484         0.390
 5 Convex.Area       12243.     8730.             0.584
 6 Convex.Perimeter     90.1      53.8            0.626
 7 Eccentricity          0.0266    0.0506         0.345
 8 Elasticity            0.159     0.143          0.526
 9 Elongation            0.144     0.239          0.376
10 Height               22.0      27.0            0.450
11 Major.Axis.Length    34.0      22.2            0.605
12 Minor.Axis.Length    22.1      27.0            0.450
13 Perimeter           173.      197.             0.467
14 Rightmost.X          27.4      14.7            0.650
15 Rightmost.Y          23.6      46.0            0.339
16 Roundness             0.0956    0.0871         0.523
17 Width                34.0      22.2            0.605
print(df_20min_random_vars)
# A tibble: 17 × 4
   feature                  sow  Residual repeatability
   <fct>                  <dbl>     <dbl>         <dbl>
 1 Area              10606.     5288.             0.667
 2 Centroid.X           23.3      19.8            0.541
 3 Centroid.Y           16.6      10.5            0.614
 4 Concavity             0.0302    0.0448         0.403
 5 Convex.Area       12142.     8179.             0.598
 6 Convex.Perimeter     89.8      50.0            0.642
 7 Eccentricity          0.0266    0.0479         0.357
 8 Elasticity            0.159     0.134          0.542
 9 Elongation            0.143     0.224          0.390
10 Height               21.6      25.3            0.461
11 Major.Axis.Length    34.1      20.8            0.621
12 Minor.Axis.Length    21.6      25.2            0.461
13 Perimeter           170.      184.             0.480
14 Rightmost.X          27.2      13.6            0.667
15 Rightmost.Y          23.4      42.0            0.358
16 Roundness             0.0952    0.0806         0.541
17 Width                34.1      20.8            0.621
print(df_30min_random_vars)
# A tibble: 17 × 4
   feature                  sow  Residual repeatability
   <fct>                  <dbl>     <dbl>         <dbl>
 1 Area              10549.     5096.             0.674
 2 Centroid.X           23.2      19.0            0.550
 3 Centroid.Y           16.4       9.87           0.624
 4 Concavity             0.0302    0.0425         0.415
 5 Convex.Area       12109.     7788.             0.609
 6 Convex.Perimeter     89.0      47.6            0.651
 7 Eccentricity          0.0266    0.0462         0.365
 8 Elasticity            0.159     0.130          0.550
 9 Elongation            0.144     0.215          0.402
10 Height               21.7      24.0            0.475
11 Major.Axis.Length    33.6      20.0            0.627
12 Minor.Axis.Length    21.8      24.0            0.475
13 Perimeter           169.      177.             0.489
14 Rightmost.X          26.7      12.9            0.675
15 Rightmost.Y          23.1      39.6            0.369
16 Roundness             0.0950    0.0770         0.552
17 Width                33.7      20.0            0.627
print(df_60min_random_vars)
# A tibble: 17 × 4
   feature                  sow  Residual repeatability
   <fct>                  <dbl>     <dbl>         <dbl>
 1 Area              10465.     4669.             0.691
 2 Centroid.X           23.2      17.8            0.567
 3 Centroid.Y           16.5       8.48           0.660
 4 Concavity             0.0292    0.0384         0.432
 5 Convex.Area       11981.     7002.             0.631
 6 Convex.Perimeter     89.7      42.9            0.677
 7 Eccentricity          0.0261    0.0425         0.381
 8 Elasticity            0.159     0.119          0.573
 9 Elongation            0.141     0.195          0.419
10 Height               20.8      21.7            0.490
11 Major.Axis.Length    34.4      18.4            0.651
12 Minor.Axis.Length    20.8      21.7            0.490
13 Perimeter           168.      161.             0.511
14 Rightmost.X          27.5      11.7            0.701
15 Rightmost.Y          23.0      34.1            0.402
16 Roundness             0.0954    0.0701         0.576
17 Width                34.5      18.4            0.651

With these values, we are choosing to continue to look at for the mean model: Rightmost.Y, Eccentricity, Elongation, Concavity, Height, Minor.Axis.Length

these values are from the model with var value

print(df_10min_random_vars_var)
# A tibble: 17 × 4
   feature               sow Residual repeatability
   <fct>               <dbl>    <dbl>         <dbl>
 1 Area              1.69e+7  2.39e+7         0.414
 2 Centroid.X        3.14e+2  5.78e+2         0.352
 3 Centroid.Y        3.10e+1  6.56e+1         0.321
 4 Concavity         1.33e-3  1.49e-3         0.472
 5 Convex.Area       1.54e+7  5.00e+7         0.236
 6 Convex.Perimeter  2.39e+3  3.25e+3         0.424
 7 Eccentricity      1.17e-3  1.68e-3         0.410
 8 Elasticity        1.16e-2  1.91e-2         0.377
 9 Elongation        1.24e-2  3.20e-2         0.279
10 Height            6.15e+1  3.89e+2         0.137
11 Major.Axis.Length 6.59e+2  7.48e+2         0.468
12 Minor.Axis.Length 6.15e+1  3.89e+2         0.137
13 Perimeter         1.91e+4  3.60e+4         0.346
14 Rightmost.X       4.72e+2  5.53e+2         0.460
15 Rightmost.Y       6.99e+2  1.74e+3         0.286
16 Roundness         2.59e-3  6.64e-3         0.281
17 Width             6.64e+2  7.50e+2         0.470
print(df_20min_random_vars_var)
# A tibble: 17 × 4
   feature               sow Residual repeatability
   <fct>               <dbl>    <dbl>         <dbl>
 1 Area              1.82e+7  2.52e+7         0.419
 2 Centroid.X        3.10e+2  5.62e+2         0.356
 3 Centroid.Y        3.40e+1  7.69e+1         0.306
 4 Concavity         1.34e-3  1.58e-3         0.459
 5 Convex.Area       1.29e+7  5.26e+7         0.197
 6 Convex.Perimeter  2.61e+3  3.26e+3         0.445
 7 Eccentricity      1.20e-3  1.85e-3         0.393
 8 Elasticity        1.39e-2  1.98e-2         0.412
 9 Elongation        1.16e-2  3.43e-2         0.254
10 Height            7.92e+1  4.21e+2         0.158
11 Major.Axis.Length 7.27e+2  7.23e+2         0.501
12 Minor.Axis.Length 7.92e+1  4.21e+2         0.158
13 Perimeter         2.28e+4  3.81e+4         0.374
14 Rightmost.X       5.02e+2  5.35e+2         0.484
15 Rightmost.Y       8.45e+2  1.74e+3         0.327
16 Roundness         2.72e-3  6.75e-3         0.287
17 Width             7.31e+2  7.25e+2         0.502
print(df_30min_random_vars_var)
# A tibble: 17 × 4
   feature               sow Residual repeatability
   <fct>               <dbl>    <dbl>         <dbl>
 1 Area              1.61e+7  2.61e+7         0.381
 2 Centroid.X        3.69e+2  5.50e+2         0.401
 3 Centroid.Y        3.04e+1  8.18e+1         0.271
 4 Concavity         1.37e-3  1.60e-3         0.461
 5 Convex.Area       1.21e+7  5.42e+7         0.182
 6 Convex.Perimeter  2.59e+3  3.28e+3         0.442
 7 Eccentricity      1.36e-3  1.92e-3         0.415
 8 Elasticity        1.47e-2  1.90e-2         0.435
 9 Elongation        1.37e-2  3.48e-2         0.283
10 Height            1.01e+2  4.31e+2         0.190
11 Major.Axis.Length 7.44e+2  7.08e+2         0.512
12 Minor.Axis.Length 1.01e+2  4.31e+2         0.189
13 Perimeter         2.41e+4  3.71e+4         0.394
14 Rightmost.X       5.69e+2  5.44e+2         0.511
15 Rightmost.Y       9.27e+2  1.68e+3         0.356
16 Roundness         2.75e-3  6.52e-3         0.296
17 Width             7.50e+2  7.10e+2         0.513
print(df_60min_random_vars_var)
# A tibble: 17 × 4
   feature               sow Residual repeatability
   <fct>               <dbl>    <dbl>         <dbl>
 1 Area              1.60e+7  2.78e+7         0.366
 2 Centroid.X        3.79e+2  5.10e+2         0.427
 3 Centroid.Y        4.15e+1  8.91e+1         0.318
 4 Concavity         1.15e-3  1.53e-3         0.428
 5 Convex.Area       1.49e+7  5.59e+7         0.210
 6 Convex.Perimeter  2.82e+3  3.26e+3         0.464
 7 Eccentricity      1.56e-3  2.06e-3         0.431
 8 Elasticity        1.67e-2  1.92e-2         0.464
 9 Elongation        1.52e-2  3.46e-2         0.306
10 Height            1.42e+2  4.32e+2         0.247
11 Major.Axis.Length 8.18e+2  6.61e+2         0.553
12 Minor.Axis.Length 1.42e+2  4.32e+2         0.247
13 Perimeter         2.79e+4  3.82e+4         0.422
14 Rightmost.X       6.64e+2  5.13e+2         0.564
15 Rightmost.Y       1.07e+3  1.63e+3         0.396
16 Roundness         2.79e-3  6.06e-3         0.315
17 Width             8.25e+2  6.63e+2         0.554

With these values, we are choosing to continue to look at for the var model: Convex.Area, Minor.Axis.Length, Height, Centroid.Y, Elongation, Roundness

The following are the residual plots for each of the features for the different time windows

aug_res_10_filt <- aug_res_10 %>%
  group_by(sow) %>%
  filter(n() > 2000) %>%
  ungroup()
aug_res_20_filt <- aug_res_20 %>%
  group_by(sow) %>%
  filter(n() > 1100) %>%
  ungroup()
aug_res_30_filt <- aug_res_30 %>%
  group_by(sow) %>%
  filter(n() > 700) %>%
  ungroup()
aug_res_60_filt <- aug_res_60 %>%
  group_by(sow) %>%
  filter(n() > 400) %>%
  ungroup()

plots for the mean value residuals accounting for hour of the day

options(repr.plot.width = 16, repr.plot.height = 250)
library(ggforce)
for (i in 1:5){
  p1<- ggplot(aug_res_10_filt,aes(x=ttf,y=.resid,color=sow))+
        geom_smooth(se=F)+
        geom_smooth(aes(x=ttf,y=.resid),linewidth=1.2,color="black")+
        facet_wrap_paginate(~feature,scales="free",ncol=2,nrow=2,page=i)+
        ggtitle("10 min window mean")
  print(p1)
  p2<- ggplot(aug_res_20_filt,aes(x=ttf,y=.resid,color=sow))+
        geom_smooth(se=F)+
        geom_smooth(aes(x=ttf,y=.resid),linewidth=1.2,color="black")+
        facet_wrap_paginate(~feature,scales="free",ncol=2,nrow=2,page=i)+
        ggtitle("20 min window mean")
  print(p2)
  p3<- ggplot(aug_res_30_filt,aes(x=ttf,y=.resid,color=sow))+
        geom_smooth(se=F)+
        geom_smooth(aes(x=ttf,y=.resid),linewidth=1.2,color="black")+
        facet_wrap_paginate(~feature,scales="free",ncol=2,nrow=2,page=i)+
        ggtitle("30 min window mean")
  print(p3)
  p4<- ggplot(aug_res_60_filt,aes(x=ttf,y=.resid,color=sow))+
        geom_smooth(se=F)+
        geom_smooth(aes(x=ttf,y=.resid),linewidth=1.2,color="black")+
        facet_wrap_paginate(~feature,scales="free",ncol=2,nrow=2,page=i)+
        ggtitle("60 min window mean")
  print(p4)
}
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

After seeing these plots with the model with the mean value we are going to continue to look at the following features: Width - all windows. Consistent then jumps and departs from trend Rightmost.X - all windows. Consistent then departs from trend

aug_res_10_var_filt <- aug_res_10_var %>%
  group_by(sow) %>%
  filter(n() > 2000) %>%
  ungroup()
aug_res_20_var_filt <- aug_res_20_var %>%
  group_by(sow) %>%
  filter(n() > 1100) %>%
  ungroup()
aug_res_30_var_filt <- aug_res_30_var %>%
  group_by(sow) %>%
  filter(n() > 700) %>%
  ungroup()
aug_res_60_var_filt <- aug_res_60_var %>%
  group_by(sow) %>%
  filter(n() > 400) %>%
  ungroup()

plots for the var value residuals accounting for hour of the day

for (i in 1:5){
  p1<- ggplot(aug_res_10_var_filt,aes(x=ttf,y=.resid,color=sow))+
        geom_smooth(se=F)+
        geom_smooth(aes(x=ttf,y=.resid),linewidth=1.2,color="black")+
        facet_wrap_paginate(~feature,scales="free",ncol=2,nrow=2,page=i)+
        ggtitle("10 min window var")
  print(p1)
  p2<- ggplot(aug_res_20_var_filt,aes(x=ttf,y=.resid,color=sow))+
        geom_smooth(se=F)+
        geom_smooth(aes(x=ttf,y=.resid),linewidth=1.2,color="black")+
        facet_wrap_paginate(~feature,scales="free",ncol=2,nrow=2,page=i)+
        ggtitle("20 min window var")
  print(p2)
  p3<- ggplot(aug_res_30_var_filt,aes(x=ttf,y=.resid,color=sow))+
        geom_smooth(se=F)+
        geom_smooth(aes(x=ttf,y=.resid),linewidth=1.2,color="black")+
        facet_wrap_paginate(~feature,scales="free",ncol=2,nrow=2,page=i)+
        ggtitle("30 min window var")
  print(p3)
  p4<- ggplot(aug_res_60_var_filt,aes(x=ttf,y=.resid,color=sow))+
        geom_smooth(se=F)+
        geom_smooth(aes(x=ttf,y=.resid),linewidth=1.2,color="black")+
        facet_wrap_paginate(~feature,scales="free",ncol=2,nrow=2,page=i)+
        ggtitle("60 min window var")
  print(p4)
}
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

After seeing these plots with the model with the var value we are going to continue to look at the following features: Centroid.X - all windows. very consistent then departs from trendline Height - all windows. very consistent then departs from trendline Major.Axis.Length - all windows but 60. very consistent then departs from trendline Rightmost.X - all windows but 60. very consistent then departs from trendline Width - all windows but 60. very consistent then departs from trendline

All features of interest:

Mean

from repeatability table - all windows

Rightmost.Y, Eccentricity, Elongation, Concavity, Height, Minor.Axis.Length

from plots of residuals

Width - all windows. Consistent then jumps and departs from trend Rightmost.X - all windows. Consistent then departs from trend

Var

from repeatability table - all windows

Convex.Area, Minor.Axis.Length, Height, Centroid.Y, Elongation, Roundness

from plots of residuals

Centroid.X - all windows. very consistent then departs from trendline Height - all windows. very consistent then departs from trendline Major.Axis.Length - all windows but 60. very consistent then departs from trendline Rightmost.X - all windows but 60. very consistent then departs from trendline Width - all windows but 60. very consistent then departs from trendline