Pedigree data description
ABA made available pedigree records that included animal’s regnum, DOB, number of registered progeny and sire’s and dam’s regnums
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0 ✔ purrr 1.0.0
✔ tibble 3.1.8 ✔ dplyr 1.0.10
✔ tidyr 1.2.1 ✔ stringr 1.5.0
✔ readr 2.1.3 ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
Loading required package: timechange
Attaching package: 'lubridate'
The following objects are masked from 'package:base':
date, intersect, setdiff, union
# A tibble: 835 × 9
REG SEX DOB PROGENY `SIRE REG` `DAM REG`
<dbl> <chr> <dttm> <dbl> <dbl> <dbl>
1 110890004 MALE 2011-08-29 00:00:00 634 102107001 96272012
2 112533002 MALE 2012-01-11 00:00:00 8 103308003 101072006
3 113412001 MALE 2012-02-08 00:00:00 3 103447005 104857004
4 113456001 MALE 2012-01-28 00:00:00 1303 109548003 103355002
5 114007006 MALE 2012-01-23 00:00:00 13 96064001 106601003
6 114486003 MALE 2012-05-22 00:00:00 1165 108856003 98849005
7 115051001 MALE 2012-09-12 00:00:00 10 107289008 103905002
8 115056001 MALE 2012-07-28 00:00:00 20 108856003 108032005
9 115094010 MALE 2012-08-18 00:00:00 15 109562005 107631002
10 115286008 MALE 2012-07-01 00:00:00 12 109072003 104681002
# ℹ 825 more rows
# ℹ 3 more variables: `Grand Sire REG` <dbl>, `Grand Dam REG` <dbl>, yob <fct>
There is a grand sire and grand dam column. But I am not sure what those are: paternal or maternal
Distribution of progeny records per year
ggplot(pd,aes(y=PROGENY,x=yob))+geom_bar(stat = "identity")
How many animals share a sire?
table(table(pd$`SIRE REG`)>1)
TRUE means that a sire has more than 1 entry = at least 2 animals share that sire
Number of grand progeny per sire
group_by(pd,`SIRE REG`)%>%summarize(GP=sum(PROGENY))%>%ggplot(aes(x=GP))+geom_bar()
group_by(pd,`SIRE REG`)%>%summarize(GP=sum(PROGENY))%>%arrange(desc(GP))
# A tibble: 451 × 2
`SIRE REG` GP
<dbl> <dbl>
1 151050002 6462
2 144730008 3326
3 135219001 2270
4 152569001 2114
5 130514001 2018
6 144800005 1872
7 127678003 1788
8 108856003 1641
9 113576001 1445
10 148736001 1368
# ℹ 441 more rows
Proposed next steps:
Select all Sires of sires with a large number of grand progeny (e,g: >1000)
Select Sires with large number of progeny (e.g: >50) but avoid full sibs
Create a list of 700 regnumbs according to these rules
Submit 500 for genotyping, save 200 for later genotyping