Summary of ABA pedigree data

Author

Juan Steibel

Pedigree data description

ABA made available pedigree records that included animal’s regnum, DOB, number of registered progeny and sire’s and dam’s regnums

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0      ✔ purrr   1.0.0 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.5.0 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
Loading required package: timechange


Attaching package: 'lubridate'


The following objects are masked from 'package:base':

    date, intersect, setdiff, union
# A tibble: 835 × 9
         REG SEX   DOB                 PROGENY `SIRE REG` `DAM REG`
       <dbl> <chr> <dttm>                <dbl>      <dbl>     <dbl>
 1 110890004 MALE  2011-08-29 00:00:00     634  102107001  96272012
 2 112533002 MALE  2012-01-11 00:00:00       8  103308003 101072006
 3 113412001 MALE  2012-02-08 00:00:00       3  103447005 104857004
 4 113456001 MALE  2012-01-28 00:00:00    1303  109548003 103355002
 5 114007006 MALE  2012-01-23 00:00:00      13   96064001 106601003
 6 114486003 MALE  2012-05-22 00:00:00    1165  108856003  98849005
 7 115051001 MALE  2012-09-12 00:00:00      10  107289008 103905002
 8 115056001 MALE  2012-07-28 00:00:00      20  108856003 108032005
 9 115094010 MALE  2012-08-18 00:00:00      15  109562005 107631002
10 115286008 MALE  2012-07-01 00:00:00      12  109072003 104681002
# ℹ 825 more rows
# ℹ 3 more variables: `Grand Sire REG` <dbl>, `Grand Dam REG` <dbl>, yob <fct>

There is a grand sire and grand dam column. But I am not sure what those are: paternal or maternal

Distribution of progeny records per year

ggplot(pd,aes(y=PROGENY,x=yob))+geom_bar(stat = "identity")

How many animals share a sire?

    table(table(pd$`SIRE REG`)>1)

FALSE  TRUE 
  283   168 

TRUE means that a sire has more than 1 entry = at least 2 animals share that sire

Number of grand progeny per sire

group_by(pd,`SIRE REG`)%>%summarize(GP=sum(PROGENY))%>%ggplot(aes(x=GP))+geom_bar()

group_by(pd,`SIRE REG`)%>%summarize(GP=sum(PROGENY))%>%arrange(desc(GP))
# A tibble: 451 × 2
   `SIRE REG`    GP
        <dbl> <dbl>
 1  151050002  6462
 2  144730008  3326
 3  135219001  2270
 4  152569001  2114
 5  130514001  2018
 6  144800005  1872
 7  127678003  1788
 8  108856003  1641
 9  113576001  1445
10  148736001  1368
# ℹ 441 more rows

Proposed next steps:

  1. Select all Sires of sires with a large number of grand progeny (e,g: >1000)

  2. Select Sires with large number of progeny (e.g: >50) but avoid full sibs

  3. Create a list of 700 regnumbs according to these rules

  4. Submit 500 for genotyping, save 200 for later genotyping