Summary of pork quality populations

Phenotype files

Meat quality phenotypes and relevant covariates were obtained from three populations described in Bernal Rubio et al. 2015.

rm(list = ls())
load("phenotypes.Rdata")
ls()

[1] "y_comm" "y_marc" "y_msu"

eapply(dim,env=.GlobalEnv)

$y_comm
[1] 1920   14

$y_marc
[1] 1237   13

$y_msu
[1] 948  20

eapply(head,env=.GlobalEnv)

$y_comm
  animal   cg   ph    ssf purge  imf moist cookl plantL  plantA  plantB L_M1_M2
1      1 7003 5.68 10.925    NA 1.26    NA    NA  46.50 -2.2747 20.6245      NA
2      2 7003 5.52 11.530    NA 1.45    NA    NA  50.66 -2.4326 21.0791      NA
3      3 7005 5.48 13.090    NA 4.03    NA    NA  51.12 -0.0843 18.3517      NA
4      4 7003 6.00 16.790    NA 2.98    NA    NA  44.16  0.8251 15.0149      NA
5      5 7005 5.50 12.750    NA 1.17    NA    NA  52.21 -1.5025 20.2692      NA
6      6 7003 5.61 12.215    NA 1.90    NA    NA  48.27 -2.3384 17.9502      NA
  A_M1_M2 B_M1_M2
1      NA      NA
2      NA      NA
3      NA      NA
4      NA      NA
5      NA      NA
6      NA      NA

$y_marc
     animal sex age cg plant ttloss   ssf  imf  ph purge color_l color_a
1 200521304   2 189  4     2  21.47 12.61 2.09 5.9  2.29      NA      NA
2 200521402   2 201  7     2  24.89 12.97 2.20 5.6  3.33      NA      NA
3 200521405   2 224 11     2  23.24 15.39 2.61 5.8  3.90      NA      NA
4 200521502   2 207  8     2  24.47 13.55 2.27 5.9  1.63      NA      NA
5 200521504   2 207  8     2  21.96 12.79 1.90 5.9  3.14      NA      NA
6 200521601   2 215 10     2  25.86 15.12 2.33 5.8  1.94      NA      NA
  color_b
1      NA
2      NA
3      NA
4      NA
5      NA
6      NA

$y_msu
  animal sex age_slg cgsl car_wt  wbs  fat ph24 dripl  cieL  cieA cieB avgCl_Cb
1 991001   1     182    3  97.51 2.35 3.33 5.54  0.63 56.04 18.35 7.95   31.995
2 991002   2     161    1  78.00 3.28 3.79 5.50  1.36    NA    NA   NA       NA
3 991003   1     161    1  99.77 2.44 1.90 5.40  1.61    NA    NA   NA       NA
4 991004   2     161    1  86.62 3.29 1.28 5.33  1.72    NA    NA   NA       NA
5 991006   2     179    3  60.77 3.06 2.64 5.52  0.80 53.36 18.25 6.89   30.125
6 991007   1     161    1  83.45 2.61 2.02 5.47  3.28    NA    NA   NA       NA
  cook_yi cookl ph45m  bf10    NA    NA    NA
1   81.49 18.51  6.36 33.02 40.64 33.02 25.40
2   77.93 22.07  6.32 21.59 46.99 30.48 20.32
3   78.90 21.10  7.03 27.94 49.53 36.83 29.21
4   76.26 23.74  6.16 15.24 33.02 21.59 12.70
5   76.81 23.19  6.19 22.86 33.02 17.78 12.70
6   75.75 24.25  6.29 21.59 40.64 27.94 20.32

A good variable to analyze first is ultimate PH (named Ph or Ph24, ). Covariates are contemporary slaughter group (cg or cgsl) and age at slaughter (age or age_sgl) if available.

library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0      ✔ purrr   1.0.0 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.5.0 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

library(lme4)

Loading required package: Matrix

Attaching package: 'Matrix'

The following objects are masked from 'package:tidyr':

    expand, pack, unpack

library(lmerTest)


Attaching package: 'lmerTest'

The following object is masked from 'package:lme4':

    lmer

The following object is masked from 'package:stats':

    step

lmer(ph~(1|cg),data=y_comm)%>%summary()

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: ph ~ (1 | cg)
   Data: y_comm

REML criterion at convergence: -1666.1

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.8733 -0.6615 -0.2050  0.4389  5.5344 

Random effects:
 Groups   Name        Variance Std.Dev.
 cg       (Intercept) 0.004388 0.06624 
 Residual             0.023512 0.15333 
Number of obs: 1885, groups:  cg, 18

Fixed effects:
            Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)  5.66223    0.01667 16.71957   339.6   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lmer(ph24~sex+age_slg+(1|cgsl),data=y_msu)%>%summary()

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: ph24 ~ sex + age_slg + (1 | cgsl)
   Data: y_msu

REML criterion at convergence: -1150.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.6408 -0.6187 -0.0769  0.4412  5.6483 

Random effects:
 Groups   Name        Variance Std.Dev.
 cgsl     (Intercept) 0.004568 0.06759 
 Residual             0.015304 0.12371 
Number of obs: 927, groups:  cgsl, 32

Fixed effects:
              Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)   5.732340   0.202127  41.944695  28.360   <2e-16 ***
sex          -0.019847   0.008472 899.102873  -2.343   0.0194 *  
age_slg      -0.001097   0.001220  42.281713  -0.899   0.3738    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
        (Intr) sex   
sex     -0.009       
age_slg -0.996 -0.053

lmer(ph~sex+age+(1|cg),data=y_marc)%>%summary()

fixed-effect model matrix is rank deficient so dropping 1 column / coefficient

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: ph ~ sex + age + (1 | cg)
   Data: y_marc

REML criterion at convergence: -388.7

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.2042 -0.6168 -0.1372  0.4896  7.3658 

Random effects:
 Groups   Name        Variance Std.Dev.
 cg       (Intercept) 0.002818 0.05309 
 Residual             0.025680 0.16025 
Number of obs: 531, groups:  cg, 29

Fixed effects:
              Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)  5.8704737  0.1458112 56.8693925  40.261   <2e-16 ***
age         -0.0002926  0.0007394 57.3334417  -0.396    0.694    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
    (Intr)
age -0.997
fit warnings:
fixed-effect model matrix is rank deficient so dropping 1 column / coefficient

load("metadata.Rdata")
table(popid)

popid
   COMM    MARC     MSU unknown 
   1892    1234     922       6

head(map)

            chr    pos
MARC0044150   1 286933
ASGA0000014   1 342481
ASGA0000021   1 489855
ALGA0000009   1 538161
ALGA0000014   1 565627
H3GA0000032   1 573088

Metadata

SNP names (in the order of genotype matrix), animal ID (in the order of genotype matrix), population (in the order of animals in genotype matrix) and SNP map are stored in R objects saved in the metadata.Rdata file.

load("metadata.Rdata")
table(popid)

popid
   COMM    MARC     MSU unknown 
   1892    1234     922       6

head(map)

            chr    pos
MARC0044150   1 286933
ASGA0000014   1 342481
ASGA0000021   1 489855
ALGA0000009   1 538161
ALGA0000014   1 565627
H3GA0000032   1 573088

dim(map)

[1] 37069     2

table(map$chr)


   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
4409 2545 1867 2648 1611 2370 2283 1901 2293 1368 1335 1217 2905 2849 2004 1416 
  17   18 
1211  837

Genotype data

Genotype data is provided in loingeno.txt (SNP in rows and animals in columns), column ID match animal ID column in phenotype files. Note: some genotyped animals have missing phenotypes and viceversa.

library(data.table)


Attaching package: 'data.table'

The following objects are masked from 'package:dplyr':

    between, first, last

The following object is masked from 'package:purrr':

    transpose

library(ggplot2)
library(smartsnp)
geno<-fread("loingeno.txt")

dim(geno)

[1] 37069  4054

pc<-smart_pca("loingeno.txt",sample_group = popid,missing_impute = "mean",pc_axes = 3)

Checking argument options selected...

Argument options are correct...

Loading data...

Imported 37069 SNP by 4054 sample genotype matrix

Time elapsed: 0h 0m 22s

Filtering data...

No samples projected after PCA computation

37069 SNPs included in PCA computation

1 SNPs omitted from PCA computation

4054 samples included in PCA computation

Completed data filtering

Time elapsed: 0h 0m 32s

Scanning for invariant SNPs...

Scan complete: no invariant SNPs found

Time elapsed: 0h 0m 40s

Checking for missing values...

Scan completed: no missing values found

Time elapsed: 0h 0m 41s

Scaling values by SNP...

Centering and scaling by drift dispersion...

Completed scaling using drift

Time elapsed: 0h 0m 42s

Computing singular value decomposition using RSpectra...

Completed singular value decomposition using RSpectra

Time elapsed: 0h 0m 56s

Extracting eigenvalues and eigenvectors...

Eigenvalues and eigenvectors extracted

Time elapsed: 0h 0m 56s

Tabulating PCA outputs...

Completed tabulations of PCA outputs...

ggplot(pc$pca.sample_coordinates,aes(x=PC1,y=PC2,color=Group))+
  geom_point(size = 0.5)

ggplot(pc$pca.sample_coordinates,aes(x=PC1,y=PC3,color=Group))+
  geom_point(size = 0.5)

ggplot(pc$pca.sample_coordinates,aes(x=PC2,y=PC3,color=Group))+
  geom_point(size = 0.5)

Next steps

Next steps will be:

Building GRMs
Fitting GBLUP models to each dataset separately
Fitting GBLUP to all data (discuss different scenarios before doing this)
1. Homogeneous variances and dense GRM
2. Homogeneous variances and block-diagnola GRM by population
3. Heterogeneous variances and block diagonal GRM