Steps 1–4 are complete and documented in this report. The data are now structured, cleaned, and characterised — ready for variance component estimation (Step 5) using BLUPF90.
The ADGG/TDGG projects (Bill & Melinda Gates Foundation / ILRI) aim to accelerate genetic gains in smallholder dairy systems in East Africa. Kenyan farms predominantly keep crossbred cows (local × exotic) that balance tropical adaptation with improved milk potential.
Accurate genetic evaluations require a well-characterised phenotypic foundation. This report documents the complete pipeline from raw field data to analysis-ready trait datasets for four key traits:
| Abbreviation | Full name | Unit |
|---|---|---|
| AFC | Age at First Calving | months |
| DMY | Daily Milk Yield (test-day) | kg per day |
| TLY | Total Lactation Yield | kg per lactation |
| CI | Calving Interval | days |
data_path <- "C:/Users/SAlemu/OneDrive - CGIAR/Documents/KenyaData/New_Kenya_data/Version2/"
output_path <- paste0(data_path, "BLUPF90CL/")
if (!dir.exists(output_path)) dir.create(output_path, recursive = TRUE)
cat("Data path :", data_path, "\nOutput path:", output_path, "\n")Data path : C:/Users/SAlemu/OneDrive - CGIAR/Documents/KenyaData/New_Kenya_data/Version2/
Output path: C:/Users/SAlemu/OneDrive - CGIAR/Documents/KenyaData/New_Kenya_data/Version2/BLUPF90CL/
pedigree <- fread(paste0(data_path, "pedigree-extract.csv"),
na.strings = c("NULL","null","NA","N/A",""," "), stringsAsFactors = FALSE)
lactation <- fread(paste0(data_path, "lactation_calving_improved(in).csv"),
na.strings = c("NULL","null","NA","N/A",""," "), stringsAsFactors = FALSE)
milking <- fread(paste0(data_path, "milking_data_improved.csv"),
na.strings = c("NULL","null","NA","N/A",""," "), stringsAsFactors = FALSE)
cat("Records loaded:\n")Records loaded:
Pedigree : 248,208 rows, 29 columns
Lactation : 88,352 rows, 34 columns
Milking : 1,048,575 rows, 36 columns
head(pedigree, 8) %>%
select(animal_id, main_breed, region, district, farm_id, birthdate, sex) %>%
kable(caption = "Sample pedigree records") %>%
kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE)| animal_id | main_breed | region | district | farm_id | birthdate | sex |
|---|---|---|---|---|---|---|
| 170285 | 3 | Migori | Kuria West | 515160 | 7/27/2016 | FEMALE |
| 170298 | 3 | Kakamega | Likuyani | 515626 | 6/15/2016 | FEMALE |
| 170315 | 3 | Kakamega | Lugari | 598316 | 5/18/2018 | FEMALE |
| 170328 | 3 | Kakamega | Likuyani | 595412 | 05/08/2018 | FEMALE |
| 170506 | 26 | Homa Bay | Ndhiwa | 514950 | 05/04/2017 | FEMALE |
| 170513 | 23 | Homa Bay | Ndhiwa | 515024 | 5/29/2019 | FEMALE |
| 170515 | 15 | Migori | Kuria West | 510092 | 01/07/2018 | FEMALE |
| 170554 | 9 | Kakamega | Likuyani | 616901 | 05/12/2017 | FEMALE |
head(lactation, 8) %>%
select(animal_id, parity, calving_date, afc_months, calving_interval_days, total_lactation_yield) %>%
kable(caption = "Sample lactation records (one row = one calving event)") %>%
kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE)| animal_id | parity | calving_date | afc_months | calving_interval_days | total_lactation_yield |
|---|---|---|---|---|---|
| 170285 | 1 | 11/23/1993 | -273 | NA | 11.1 |
| 170285 | 2 | 10/9/1994 | -273 | 320 | 35.4 |
| 170285 | 3 | 8/4/1996 | -273 | 665 | NA |
| 170285 | 4 | 4/7/1997 | -273 | 246 | 5.3 |
| 170285 | 5 | 9/9/1997 | -273 | 155 | 10.4 |
| 170285 | 6 | 12/19/1998 | -273 | 466 | 25.0 |
| 170285 | 7 | 1/1/2005 | -273 | 2205 | 8.9 |
| 170285 | 8 | 10/13/2007 | -273 | 1015 | 5.0 |
head(milking, 8) %>%
select(animal_id, parity, milking_date, dim, milk_morning, milk_evening, daily_yield) %>%
kable(caption = "Sample test-day records (one row = one farm visit)") %>%
kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE)| animal_id | parity | milking_date | dim | milk_morning | milk_evening | daily_yield |
|---|---|---|---|---|---|---|
| 170285 | 1 | 15/12/1993 | 22 | 1.6 | 0 | 1.6 |
| 170285 | 1 | 15/01/1994 | 53 | 1.6 | 0 | 1.6 |
| 170285 | 1 | 15/02/1994 | 84 | 1.2 | 0 | 1.2 |
| 170285 | 1 | 15/03/1994 | 112 | 1.2 | 0 | 1.2 |
| 170285 | 1 | 15/04/1994 | 143 | 1.7 | 0 | 1.7 |
| 170285 | 1 | 15/05/1994 | 173 | 0.7 | 0 | 0.7 |
| 170285 | 1 | 15/06/1994 | 204 | 1.5 | 0 | 1.5 |
| 170285 | 1 | 15/07/1994 | 234 | 1.6 | 0 | 1.6 |
pedigree_factors <- pedigree %>%
mutate(animal_id = as.character(animal_id)) %>%
select(animal_id, country, region, district, ward, village, farm_id, main_breed) %>%
group_by(animal_id) %>% slice(1) %>% ungroup()
breed_lookup <- milking %>%
filter(!is.na(main_breed), !is.na(main_breed_name),
trimws(as.character(main_breed_name)) != "") %>%
mutate(main_breed = as.integer(main_breed),
main_breed_name = trimws(as.character(main_breed_name))) %>%
distinct(main_breed, main_breed_name)
pedigree_factors <- pedigree_factors %>%
mutate(main_breed = as.integer(main_breed)) %>%
left_join(breed_lookup, by = "main_breed") %>%
mutate(main_breed_name = ifelse(is.na(main_breed_name),
paste0("Breed_", main_breed), main_breed_name))
region_levels <- pedigree_factors %>% filter(!is.na(region)) %>% distinct(region) %>% mutate(region_code = row_number())
district_levels <- pedigree_factors %>% filter(!is.na(district)) %>% distinct(district) %>% mutate(district_code = row_number())
farm_levels <- pedigree_factors %>% filter(!is.na(farm_id)) %>% distinct(farm_id) %>% mutate(farm_code = row_number())
breed_levels <- pedigree_factors %>% filter(!is.na(main_breed)) %>% distinct(main_breed, main_breed_name) %>% mutate(breed_code = row_number())
pedigree_factors_coded <- pedigree_factors %>%
left_join(region_levels, by="region") %>% left_join(district_levels, by="district") %>%
left_join(farm_levels, by="farm_id") %>% left_join(breed_levels, by=c("main_breed","main_breed_name"))
data.frame(
Factor = c("Countries","Regions","Districts","Wards","Villages","Farms/Herds","Breeds"),
`Number of levels` = c(n_distinct(na.omit(pedigree_factors$country)),
nrow(region_levels), nrow(district_levels),
n_distinct(na.omit(pedigree_factors$ward)),
n_distinct(na.omit(pedigree_factors$village)),
nrow(farm_levels), nrow(breed_levels)), check.names = FALSE
) %>%
kable(caption = "Number of unique levels for each factor", align = c("l","r")) %>%
kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE) %>%
column_spec(2, bold = TRUE)| Factor | Number of levels |
|---|---|
| Countries | 2 |
| Regions | 45 |
| Districts | 211 |
| Wards | 505 |
| Villages | 1050 |
| Farms/Herds | 28018 |
| Breeds | 40 |
pedigree_factors %>%
filter(!is.na(region), !is.na(district), !is.na(farm_id)) %>%
group_by(region) %>%
summarise(Districts = n_distinct(district), Wards = n_distinct(ward),
Villages = n_distinct(village), Farms = n_distinct(farm_id),
Animals = n_distinct(animal_id), .groups = "drop") %>%
arrange(desc(Animals)) %>%
kable(caption = "Geographic hierarchy: sub-units per region",
format.args = list(big.mark = ",")) %>%
kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE) %>%
row_spec(0, bold = TRUE, background = "#2b6cb0", color = "white")| region | Districts | Wards | Villages | Farms | Animals |
|---|---|---|---|---|---|
| Nakuru | 11 | 25 | 21 | 1,187 | 27,146 |
| Kiambu | 10 | 14 | 1 | 450 | 7,685 |
| Laikipia | 2 | 4 | 1 | 140 | 7,363 |
| Muranga | 7 | 8 | 1 | 1,501 | 6,011 |
| Nyandarua | 7 | 18 | 12 | 1,053 | 5,630 |
| Makueni | 7 | 33 | 180 | 3,256 | 4,965 |
| Kisumu | 7 | 37 | 115 | 1,364 | 4,160 |
| Uasin Gishu | 6 | 10 | 2 | 586 | 4,160 |
| Narok | 5 | 9 | 12 | 95 | 3,318 |
| Nyeri | 5 | 6 | 1 | 459 | 3,097 |
| Kakamega | 11 | 34 | 83 | 1,168 | 3,094 |
| Nandi County | 5 | 8 | 9 | 549 | 2,484 |
| Siaya | 6 | 28 | 123 | 1,033 | 2,374 |
| Meru | 5 | 5 | 1 | 395 | 1,641 |
| Homa Bay | 8 | 36 | 105 | 835 | 1,571 |
| Trans Nzoia | 4 | 5 | 1 | 35 | 1,570 |
| Taita Taveta | 5 | 12 | 20 | 174 | 1,566 |
| Kilifi | 2 | 3 | 2 | 5 | 1,551 |
| Busia | 7 | 33 | 76 | 701 | 1,499 |
| Migori | 9 | 27 | 63 | 819 | 1,488 |
| Kajiado | 4 | 8 | 11 | 82 | 1,355 |
| Nairobi | 9 | 8 | 2 | 28 | 1,325 |
| Baringo | 5 | 7 | 1 | 99 | 1,234 |
| Tharaka Nithi | 2 | 3 | 1 | 197 | 973 |
| Vihiga | 5 | 23 | 60 | 419 | 904 |
| Kitui | 8 | 22 | 59 | 504 | 816 |
| Bomet | 5 | 6 | 1 | 106 | 699 |
| Kirinyaga | 3 | 3 | 1 | 78 | 684 |
| Nyamira | 4 | 6 | 1 | 193 | 656 |
| Machakos | 5 | 8 | 1 | 102 | 614 |
| Bungoma | 8 | 38 | 59 | 344 | 585 |
| Kisii | 9 | 34 | 63 | 163 | 394 |
| Embu | 2 | 1 | 1 | 63 | 344 |
| Kericho | 4 | 9 | 1 | 49 | 294 |
| West Pokot | 1 | 1 | 1 | 8 | 112 |
| Marsabit | 1 | 1 | 1 | 13 | 57 |
| Kwale | 1 | 1 | 1 | 1 | 44 |
| Oromia | 1 | 1 | 1 | 1 | 18 |
| Elgeyo Marakwet | 3 | 3 | 1 | 4 | 13 |
| Amhara | 2 | 3 | 3 | 3 | 5 |
| Osun | 1 | 1 | 1 | 1 | 1 |
Note: The hierarchy is strictly nested. We use Region + Farm to represent geography without collinearity.
breed_dist <- pedigree_factors %>%
filter(!is.na(main_breed_name)) %>% count(main_breed_name, name = "Count") %>%
arrange(desc(Count)) %>%
mutate(Percentage = round(Count / sum(Count) * 100, 1),
main_breed_name = factor(main_breed_name, levels = main_breed_name))
ggplot(breed_dist, aes(x = main_breed_name, y = Count, fill = main_breed_name)) +
geom_bar(stat = "identity", show.legend = FALSE) +
geom_text(aes(label = paste0(Percentage, "%")), vjust = -0.5, size = 3.5) +
scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.12))) +
scale_fill_brewer(palette = "Set3") +
labs(title = "Distribution of Animals by Breed",
subtitle = paste("Total animals:", format(sum(breed_dist$Count), big.mark = ",")),
x = "Breed", y = "Number of Animals") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))breed_dist %>% rename(Breed = main_breed_name, `N Animals` = Count, `%` = Percentage) %>%
kable(caption = "Breed composition", format.args = list(big.mark = ",")) %>%
kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)| Breed | N Animals | % |
|---|---|---|
| Holstein Friesian | 109,158 | 41.4 |
| Ayrshire | 32,484 | 12.3 |
| Sahiwal | 20,880 | 7.9 |
| Boran | 18,976 | 7.2 |
| FFFB | 15,203 | 5.8 |
| FFFF | 15,203 | 5.8 |
| Breed_0 | 12,611 | 4.8 |
| Jersey | 8,631 | 3.3 |
| Guernsey | 6,293 | 2.4 |
| Holstein-Friesian crosses | 3,726 | 1.4 |
| Unknown | 3,484 | 1.3 |
| Others | 3,341 | 1.3 |
| Breed_17 | 2,824 | 1.1 |
| Local Zebu | 2,169 | 0.8 |
| Sahiwal crosses | 1,143 | 0.4 |
| Breed_43 | 1,010 | 0.4 |
| Brown Swiss | 907 | 0.3 |
| Gir | 707 | 0.3 |
| Breed_10 | 650 | 0.2 |
| Breed_80 | 634 | 0.2 |
| Zebu | 498 | 0.2 |
| Red Poll | 443 | 0.2 |
| Breed_35 | 434 | 0.2 |
| Ayrshire crosses | 381 | 0.1 |
| Fleckvieh | 337 | 0.1 |
| Hereford | 258 | 0.1 |
| Breed_87 | 140 | 0.1 |
| Breed_36 | 136 | 0.1 |
| Breed_1 | 125 | 0.0 |
| Jersey crosses | 116 | 0.0 |
| Magic | 89 | 0.0 |
| Breed_38 | 87 | 0.0 |
| Breed_85 | 72 | 0.0 |
| Girolando | 66 | 0.0 |
| Breed_86 | 64 | 0.0 |
| Scandinavian Red | 53 | 0.0 |
| Guernsey crosses | 35 | 0.0 |
| Breed_12 | 28 | 0.0 |
| Breed_77 | 6 | 0.0 |
| Breed_16 | 5 | 0.0 |
| Breed_72 | 2 | 0.0 |
| Breed_74 | 2 | 0.0 |
| Trait | Minimum | Maximum | Biological Rationale |
|---|---|---|---|
| AFC (months) | 18 | 72 | Puberty ≥ 18 mo; > 72 mo = recording error |
| Daily yield (kg) | 0.5 | 60 | < 0.5 = missed milking; > 60 implausible |
| DIM (days) | 5 | 500 | Exclude colostrum; > 500 = extended lactation |
| TLY (kg) | 100 | 20,000 | < 100 = incomplete; > 20k implausible |
| CI (days) | 280 | 900 | Min = gestation; max ≈ 2.5 years |
ped_slim <- pedigree_factors_coded %>%
select(animal_id, region, region_code, farm_id, farm_code,
main_breed, main_breed_name, breed_code)
afc_data <- lactation %>%
mutate(animal_id = as.character(animal_id), afc_months = as.numeric(afc_months),
parity = as.numeric(parity)) %>%
filter(parity == 1, !is.na(afc_months), afc_months >= 18, afc_months <= 72) %>%
select(-any_of(c("region","farm_id","main_breed","main_breed_name"))) %>%
left_join(ped_slim, by = "animal_id") %>%
filter(!is.na(region), !is.na(farm_id), !is.na(main_breed_name)) %>%
group_by(animal_id) %>% slice(1) %>% ungroup() %>%
mutate(region = factor(region), farm_id = factor(farm_id),
main_breed = factor(main_breed), main_breed_name = factor(main_breed_name))
daily_milk_data <- milking %>%
mutate(animal_id = as.character(animal_id), daily_yield = as.numeric(daily_yield),
dim = as.numeric(dim), parity = as.numeric(parity)) %>%
filter(!is.na(daily_yield), daily_yield > 0.5, daily_yield < 60,
!is.na(dim), dim >= 5, dim <= 500) %>%
select(-any_of(c("region","farm_id","main_breed","main_breed_name"))) %>%
left_join(ped_slim, by = "animal_id") %>%
filter(!is.na(region), !is.na(farm_id), !is.na(main_breed_name)) %>%
mutate(parity_class = factor(ifelse(is.na(parity)|parity<1, 1, ifelse(parity>4, 4, parity))),
dim_class = factor(ifelse(ceiling(dim/30) > 17, 17, ceiling(dim/30))),
region = factor(region), farm_id = factor(farm_id),
main_breed = factor(main_breed), main_breed_name = factor(main_breed_name))
tly_data <- lactation %>%
mutate(animal_id = as.character(animal_id),
total_lactation_yield = as.numeric(total_lactation_yield),
parity = as.numeric(parity)) %>%
filter(!is.na(total_lactation_yield), total_lactation_yield >= 100,
total_lactation_yield <= 20000) %>%
select(-any_of(c("region","farm_id","main_breed","main_breed_name"))) %>%
left_join(ped_slim, by = "animal_id") %>%
filter(!is.na(region), !is.na(farm_id), !is.na(main_breed_name)) %>%
mutate(parity_class = factor(ifelse(is.na(parity)|parity<1, 1, ifelse(parity>4, 4, parity))),
region = factor(region), farm_id = factor(farm_id),
main_breed = factor(main_breed), main_breed_name = factor(main_breed_name))
ci_data <- lactation %>%
mutate(animal_id = as.character(animal_id),
calving_interval_days = as.numeric(calving_interval_days),
parity = as.numeric(parity)) %>%
filter(parity > 1, !is.na(calving_interval_days),
calving_interval_days >= 280, calving_interval_days <= 900) %>%
select(-any_of(c("region","farm_id","main_breed","main_breed_name"))) %>%
left_join(ped_slim, by = "animal_id") %>%
filter(!is.na(region), !is.na(farm_id), !is.na(main_breed_name)) %>%
mutate(parity_class = factor(ifelse(parity > 4, 4, parity)),
region = factor(region), farm_id = factor(farm_id),
main_breed = factor(main_breed), main_breed_name = factor(main_breed_name))
data.frame(
Trait = c("Age at First Calving","Daily Milk Yield","Total Lactation Yield","Calving Interval"),
Abbrev. = c("AFC","DMY","TLY","CI"),
Records = c(nrow(afc_data), nrow(daily_milk_data), nrow(tly_data), nrow(ci_data)),
Animals = c(n_distinct(afc_data$animal_id), n_distinct(daily_milk_data$animal_id),
n_distinct(tly_data$animal_id), n_distinct(ci_data$animal_id)),
Farms = c(n_distinct(afc_data$farm_id), n_distinct(daily_milk_data$farm_id),
n_distinct(tly_data$farm_id), n_distinct(ci_data$farm_id)),
Regions = c(n_distinct(afc_data$region), n_distinct(daily_milk_data$region),
n_distinct(tly_data$region), n_distinct(ci_data$region))
) %>%
kable(caption = "Data after quality-control filtering",
format.args = list(big.mark = ","), align = c("l","c","r","r","r","r")) %>%
kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE) %>%
row_spec(0, bold = TRUE, background = "#2b6cb0", color = "white")| Trait | Abbrev. | Records | Animals | Farms | Regions |
|---|---|---|---|---|---|
| Age at First Calving | AFC | 27,669 | 27,669 | 1,247 | 35 |
| Daily Milk Yield | DMY | 584,617 | 15,478 | 651 | 35 |
| Total Lactation Yield | TLY | 9,573 | 4,941 | 356 | 22 |
| Calving Interval | CI | 44,250 | 17,635 | 453 | 32 |
summary_stats <- data.frame(
Trait = c("AFC (months)","Daily Milk (kg)","Lactation Milk (kg)","Calving Interval (days)"),
N = c(nrow(afc_data), nrow(daily_milk_data), nrow(tly_data), nrow(ci_data)),
Mean = c(mean(afc_data$afc_months), mean(daily_milk_data$daily_yield),
mean(tly_data$total_lactation_yield), mean(ci_data$calving_interval_days)),
SD = c(sd(afc_data$afc_months), sd(daily_milk_data$daily_yield),
sd(tly_data$total_lactation_yield), sd(ci_data$calving_interval_days)),
Min = c(min(afc_data$afc_months), min(daily_milk_data$daily_yield),
min(tly_data$total_lactation_yield), min(ci_data$calving_interval_days)),
Q1 = c(quantile(afc_data$afc_months,.25), quantile(daily_milk_data$daily_yield,.25),
quantile(tly_data$total_lactation_yield,.25), quantile(ci_data$calving_interval_days,.25)),
Median = c(median(afc_data$afc_months), median(daily_milk_data$daily_yield),
median(tly_data$total_lactation_yield), median(ci_data$calving_interval_days)),
Q3 = c(quantile(afc_data$afc_months,.75), quantile(daily_milk_data$daily_yield,.75),
quantile(tly_data$total_lactation_yield,.75), quantile(ci_data$calving_interval_days,.75)),
Max = c(max(afc_data$afc_months), max(daily_milk_data$daily_yield),
max(tly_data$total_lactation_yield), max(ci_data$calving_interval_days)),
CV = c(sd(afc_data$afc_months)/mean(afc_data$afc_months)*100,
sd(daily_milk_data$daily_yield)/mean(daily_milk_data$daily_yield)*100,
sd(tly_data$total_lactation_yield)/mean(tly_data$total_lactation_yield)*100,
sd(ci_data$calving_interval_days)/mean(ci_data$calving_interval_days)*100),
check.names = FALSE)
summary_stats %>% mutate(across(where(is.numeric), ~round(., 2))) %>%
kable(caption = "Descriptive statistics (after QC filtering)",
format.args = list(big.mark = ","),
col.names = c("Trait","N","Mean","SD","Min","Q1","Median","Q3","Max","CV (%)")) %>%
kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE) %>%
row_spec(0, bold = TRUE, background = "#2b6cb0", color = "white")| Trait | N | Mean | SD | Min | Q1 | Median | Q3 | Max | CV (%) |
|---|---|---|---|---|---|---|---|---|---|
| AFC (months) | 27,669 | 39.33 | 11.88 | 18.00 | 30.00 | 36.0 | 46.0 | 72.0 | 30.22 |
| Daily Milk (kg) | 584,617 | 9.21 | 7.07 | 0.51 | 1.77 | 9.2 | 14.2 | 50.0 | 76.81 |
| Lactation Milk (kg) | 9,573 | 977.77 | 1,705.30 | 100.00 | 186.00 | 363.5 | 730.6 | 19,583.5 | 174.41 |
| Calving Interval (days) | 44,250 | 459.54 | 130.92 | 280.00 | 365.00 | 416.0 | 516.0 | 900.0 | 28.49 |
p1 <- ggplot(afc_data, aes(x = afc_months)) +
geom_histogram(aes(y = after_stat(density)), bins = 40, fill = trait_colors["AFC"], color = "white", alpha = 0.8) +
geom_density(linewidth = 1, color = "darkred") +
geom_vline(xintercept = mean(afc_data$afc_months), linetype = "dashed", linewidth = 1) +
annotate("text", x = mean(afc_data$afc_months)+2, y = Inf,
label = paste0("Mean = ", round(mean(afc_data$afc_months),1), " mo"),
vjust = 2, hjust = 0, fontface = "bold", size = 4) +
labs(title = "Age at First Calving (AFC)", x = "AFC (months)", y = "Density") +
scale_x_continuous(breaks = seq(18, 72, 6))
p2 <- ggplot(daily_milk_data, aes(x = daily_yield)) +
geom_histogram(aes(y = after_stat(density)), bins = 50, fill = trait_colors["Daily Milk"], color = "white", alpha = 0.8) +
geom_density(linewidth = 1, color = "darkblue") +
geom_vline(xintercept = mean(daily_milk_data$daily_yield), linetype = "dashed", linewidth = 1) +
annotate("text", x = mean(daily_milk_data$daily_yield)+1, y = Inf,
label = paste0("Mean = ", round(mean(daily_milk_data$daily_yield),1), " kg"),
vjust = 2, hjust = 0, fontface = "bold", size = 4) +
labs(title = "Daily Milk Yield", x = "Daily Milk (kg/day)", y = "Density")
p3 <- ggplot(tly_data, aes(x = total_lactation_yield)) +
geom_histogram(aes(y = after_stat(density)), bins = 50, fill = trait_colors["Lactation Milk"], color = "white", alpha = 0.8) +
geom_density(linewidth = 1, color = "darkgreen") +
geom_vline(xintercept = mean(tly_data$total_lactation_yield), linetype = "dashed", linewidth = 1) +
annotate("text", x = mean(tly_data$total_lactation_yield)+200, y = Inf,
label = paste0("Mean = ", round(mean(tly_data$total_lactation_yield),0), " kg"),
vjust = 2, hjust = 0, fontface = "bold", size = 4) +
labs(title = "Total Lactation Yield (TLY)", x = "Lactation Milk (kg)", y = "Density") +
scale_x_continuous(labels = comma)
p4 <- ggplot(ci_data, aes(x = calving_interval_days)) +
geom_histogram(aes(y = after_stat(density)), bins = 50, fill = trait_colors["Calving Interval"], color = "white", alpha = 0.8) +
geom_density(linewidth = 1, color = "purple4") +
geom_vline(xintercept = mean(ci_data$calving_interval_days), linetype = "dashed", linewidth = 1) +
geom_vline(xintercept = 365, linetype = "dotted", color = "red", linewidth = 0.8) +
annotate("text", x = mean(ci_data$calving_interval_days)+15, y = Inf,
label = paste0("Mean = ", round(mean(ci_data$calving_interval_days),0), " d"),
vjust = 2, hjust = 0, fontface = "bold", size = 4) +
annotate("text", x = 370, y = Inf, label = "Optimal\n(365 d)", vjust = 2, hjust = 0,
color = "red", size = 3.5, fontface = "italic") +
labs(title = "Calving Interval (CI)", x = "Calving Interval (days)", y = "Density")
(p1 + p2) / (p3 + p4) +
plot_annotation(title = "Distribution of All Four Traits",
subtitle = "Histograms with density curves; dashed = mean",
theme = theme(plot.title = element_text(face = "bold", size = 18, hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5, color = "gray40", size = 14)))p1b <- ggplot(afc_data, aes(x = reorder(region, afc_months, FUN=median), y = afc_months)) +
geom_boxplot(fill = trait_colors["AFC"], alpha = 0.7, outlier.size = 0.4) +
stat_summary(fun = mean, geom = "point", shape = 18, size = 3, color = "darkred") +
labs(title = "AFC by Region", x = NULL, y = "AFC (months)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
p2b <- ggplot(daily_milk_data, aes(x = reorder(region, daily_yield, FUN=median), y = daily_yield)) +
geom_boxplot(fill = trait_colors["Daily Milk"], alpha = 0.7, outlier.size = 0.4) +
stat_summary(fun = mean, geom = "point", shape = 18, size = 3, color = "darkblue") +
labs(title = "Daily Milk by Region", x = NULL, y = "Daily Milk (kg)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
p3b <- ggplot(tly_data, aes(x = reorder(region, total_lactation_yield, FUN=median), y = total_lactation_yield)) +
geom_boxplot(fill = trait_colors["Lactation Milk"], alpha = 0.7, outlier.size = 0.4) +
stat_summary(fun = mean, geom = "point", shape = 18, size = 3, color = "darkgreen") +
scale_y_continuous(labels = comma) +
labs(title = "Lactation Milk by Region", x = NULL, y = "Lactation Milk (kg)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
p4b <- ggplot(ci_data, aes(x = reorder(region, calving_interval_days, FUN=median), y = calving_interval_days)) +
geom_boxplot(fill = trait_colors["Calving Interval"], alpha = 0.7, outlier.size = 0.4) +
stat_summary(fun = mean, geom = "point", shape = 18, size = 3, color = "purple4") +
geom_hline(yintercept = 365, linetype = "dashed", color = "red") +
labs(title = "Calving Interval by Region", x = NULL, y = "CI (days)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
(p1b + p2b) / (p3b + p4b) +
plot_annotation(title = "Trait Distributions by Region",
subtitle = "Boxes = middle 50%; diamond = mean; line = median",
theme = theme(plot.title = element_text(face="bold", size=18, hjust=0.5),
plot.subtitle = element_text(hjust=0.5, color="gray40", size=14)))region_afc <- afc_data %>% group_by(region) %>%
summarise(AFC_N=n(), AFC_Mean=round(mean(afc_months),1), AFC_SD=round(sd(afc_months),1), .groups="drop")
region_dmk <- daily_milk_data %>% group_by(region) %>%
summarise(DMY_N=n(), DMY_Mean=round(mean(daily_yield),2), DMY_SD=round(sd(daily_yield),2), .groups="drop")
region_tly <- tly_data %>% group_by(region) %>%
summarise(TLY_N=n(), TLY_Mean=round(mean(total_lactation_yield),0), TLY_SD=round(sd(total_lactation_yield),0), .groups="drop")
region_ci <- ci_data %>% group_by(region) %>%
summarise(CI_N=n(), CI_Mean=round(mean(calving_interval_days),0), CI_SD=round(sd(calving_interval_days),0), .groups="drop")
region_afc %>% left_join(region_dmk, by="region") %>% left_join(region_tly, by="region") %>%
left_join(region_ci, by="region") %>% arrange(region) %>%
kable(caption = "Regional summary: mean (SD) for all four traits",
col.names = c("Region","N","Mean","SD","N","Mean","SD","N","Mean","SD","N","Mean","SD"),
format.args = list(big.mark = ",")) %>%
kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = TRUE) %>%
row_spec(0, bold = TRUE, background = "#2b6cb0", color = "white") %>%
add_header_above(c(" "=1, "AFC (months)"=3, "Daily Milk (kg/d)"=3, "Lactation Milk (kg)"=3, "Calving Interval (days)"=3))| Region | N | Mean | SD | N | Mean | SD | N | Mean | SD | N | Mean | SD |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Baringo | 110 | 39.5 | 9.9 | 4,990 | 1.25 | 1.25 | 2 | 196 | 20 | 328 | 489 | 131 |
| Bomet | 24 | 46.1 | 14.0 | 374 | 1.44 | 0.45 | NA | NA | NA | 14 | 450 | 132 |
| Bungoma | 10 | 51.6 | 12.7 | 83 | 0.94 | 0.26 | NA | NA | NA | 7 | 464 | 115 |
| Busia | 14 | 51.1 | 10.3 | 65 | 9.43 | 3.24 | 3 | 132 | 26 | 1 | 364 | NA |
| Elgeyo Marakwet | 5 | 52.0 | 16.8 | 130 | 1.11 | 0.61 | NA | NA | NA | 4 | 433 | 51 |
| Embu | 108 | 45.2 | 13.6 | 2,247 | 1.34 | 0.88 | 1 | 105 | NA | 91 | 472 | 129 |
| Kajiado | 71 | 42.7 | 13.1 | 255 | 1.58 | 1.18 | 42 | 573 | 290 | 17 | 498 | 202 |
| Kakamega | 238 | 39.1 | 12.7 | 1,211 | 2.04 | 2.44 | 7 | 129 | 16 | 441 | 438 | 122 |
| Kericho | 5 | 37.6 | 19.0 | 40 | 8.53 | 8.80 | 1 | 317 | NA | NA | NA | NA |
| Kiambu | 1,676 | 37.7 | 11.0 | 17,553 | 2.80 | 4.82 | 75 | 263 | 105 | 2,306 | 453 | 134 |
| Kilifi | 622 | 32.0 | 9.8 | 234,482 | 14.19 | 5.27 | 1,056 | 3,353 | 2,721 | 734 | 454 | 111 |
| Kirinyaga | 18 | 35.7 | 13.3 | 287 | 1.50 | 0.53 | NA | NA | NA | 4 | 476 | 104 |
| Kisii | 57 | 38.4 | 12.1 | 506 | 5.32 | 4.72 | 4 | 165 | 61 | 12 | 410 | 65 |
| Kisumu | 202 | 43.3 | 14.1 | 4,646 | 1.39 | 1.29 | 5 | 130 | 25 | 244 | 475 | 151 |
| Kwale | 6 | 30.2 | 2.6 | 24 | 2.05 | 0.32 | NA | NA | NA | NA | NA | NA |
| Laikipia | 1,601 | 31.8 | 7.8 | 20,265 | 3.74 | 5.13 | 339 | 151 | 32 | 3,019 | 449 | 120 |
| Machakos | 44 | 35.0 | 8.2 | 1,676 | 2.51 | 0.83 | 1 | 112 | NA | 82 | 434 | 107 |
| Makueni | 3,144 | 34.6 | 5.0 | 94,386 | 11.32 | 3.90 | 2,385 | 459 | 161 | 10,106 | 417 | 90 |
| Meru | 209 | 40.6 | 11.9 | 2,173 | 2.14 | 4.00 | 10 | 185 | 107 | 356 | 494 | 141 |
| Migori | 1 | 33.0 | NA | 388 | 1.41 | 0.92 | NA | NA | NA | 30 | 519 | 153 |
| Mombasa | 62 | 28.9 | 7.0 | NA | NA | NA | NA | NA | NA | 185 | 428 | 122 |
| Muranga | 41 | 38.5 | 14.2 | 613 | 1.58 | 0.75 | NA | NA | NA | 30 | 442 | 142 |
| Nairobi | 1,513 | 34.9 | 12.1 | 20,054 | 3.93 | 5.38 | 263 | 197 | 67 | 2,761 | 451 | 119 |
| Nakuru | 11,474 | 40.8 | 12.2 | 89,291 | 4.40 | 6.44 | 3,885 | 1,096 | 1,752 | 17,606 | 482 | 145 |
| Nandi County | 804 | 50.7 | 12.6 | 14,582 | 6.65 | 3.09 | 566 | 210 | 216 | 104 | 499 | 141 |
| Narok | 289 | 41.0 | 8.4 | 4,416 | 4.11 | 2.05 | 156 | 202 | 66 | 64 | 623 | 146 |
| Nyamira | 9 | 29.2 | 2.5 | 171 | 1.29 | 0.54 | NA | NA | NA | 8 | 474 | 87 |
| Nyandarua | 530 | 40.3 | 13.0 | 5,266 | 2.63 | 3.14 | 42 | 313 | 376 | 327 | 472 | 146 |
| Nyeri | 1,102 | 35.6 | 10.4 | 21,249 | 5.30 | 7.22 | 372 | 222 | 72 | 1,724 | 473 | 133 |
| Siaya | 40 | 46.8 | 15.0 | 781 | 1.69 | 0.65 | NA | NA | NA | 47 | 528 | 112 |
| Taita Taveta | 1 | 29.0 | NA | 6 | 0.77 | 0.12 | NA | NA | NA | NA | NA | NA |
| Tharaka Nithi | 1 | 26.0 | NA | 11 | 1.21 | 0.09 | NA | NA | NA | NA | NA | NA |
| Trans Nzoia | 2,407 | 44.4 | 11.7 | 20,108 | 3.69 | 5.09 | 321 | 191 | 124 | 2,059 | 469 | 137 |
| Uasin Gishu | 1,218 | 42.6 | 12.6 | 22,052 | 1.76 | 2.77 | 37 | 247 | 117 | 1,519 | 480 | 139 |
| Vihiga | 13 | 31.8 | 11.6 | 81 | 1.04 | 0.34 | NA | NA | NA | 2 | 390 | 16 |
bind_rows(
afc_data %>% group_by(main_breed_name) %>% summarise(N=n(), Mean=round(mean(afc_months),1), SD=round(sd(afc_months),1), .groups="drop") %>% mutate(Trait="AFC (months)"),
daily_milk_data %>% group_by(main_breed_name) %>% summarise(N=n(), Mean=round(mean(daily_yield),2), SD=round(sd(daily_yield),2), .groups="drop") %>% mutate(Trait="Daily Milk (kg)"),
tly_data %>% group_by(main_breed_name) %>% summarise(N=n(), Mean=round(mean(total_lactation_yield),0), SD=round(sd(total_lactation_yield),0), .groups="drop") %>% mutate(Trait="Lactation Milk (kg)"),
ci_data %>% group_by(main_breed_name) %>% summarise(N=n(), Mean=round(mean(calving_interval_days),0), SD=round(sd(calving_interval_days),0), .groups="drop") %>% mutate(Trait="CI (days)")
) %>% select(Trait, Breed=main_breed_name, N, Mean, SD) %>%
kable(caption = "Mean (SD) by breed for each trait (unadjusted)", format.args = list(big.mark=",")) %>%
kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE) %>%
row_spec(0, bold = TRUE, background = "#2b6cb0", color = "white")| Trait | Breed | N | Mean | SD |
|---|---|---|---|---|
| AFC (months) | Ayrshire | 5,067 | 43.60 | 11.60 |
| AFC (months) | Ayrshire crosses | 87 | 36.80 | 11.00 |
| AFC (months) | Boran | 101 | 38.70 | 9.10 |
| AFC (months) | Breed_0 | 147 | 37.30 | 8.90 |
| AFC (months) | Breed_12 | 5 | 44.00 | 17.40 |
| AFC (months) | Breed_17 | 1 | 26.00 | NA |
| AFC (months) | Breed_35 | 1 | 58.00 | NA |
| AFC (months) | Breed_38 | 1 | 46.00 | NA |
| AFC (months) | Breed_43 | 1 | 46.00 | NA |
| AFC (months) | Breed_72 | 1 | 52.00 | NA |
| AFC (months) | Breed_80 | 1 | 42.00 | NA |
| AFC (months) | Brown Swiss | 147 | 44.20 | 13.90 |
| AFC (months) | FFFB | 19 | 41.10 | 15.10 |
| AFC (months) | Fleckvieh | 53 | 31.80 | 8.10 |
| AFC (months) | Gir | 197 | 31.10 | 7.50 |
| AFC (months) | Girolando | 22 | 29.50 | 5.30 |
| AFC (months) | Guernsey | 573 | 41.50 | 12.60 |
| AFC (months) | Guernsey crosses | 8 | 37.00 | 15.20 |
| AFC (months) | Hereford | 2 | 27.50 | 0.70 |
| AFC (months) | Holstein-Friesian crosses | 1,253 | 35.80 | 7.00 |
| AFC (months) | Holstein Friesian | 15,541 | 37.10 | 11.00 |
| AFC (months) | Jersey | 1,130 | 31.40 | 10.10 |
| AFC (months) | Jersey crosses | 70 | 30.10 | 6.20 |
| AFC (months) | Local Zebu | 7 | 49.00 | 19.20 |
| AFC (months) | Magic | 12 | 37.80 | 13.50 |
| AFC (months) | Others | 2 | 61.50 | 6.40 |
| AFC (months) | Red Poll | 23 | 47.00 | 12.70 |
| AFC (months) | Sahiwal | 2,874 | 49.30 | 11.40 |
| AFC (months) | Sahiwal crosses | 226 | 38.40 | 8.30 |
| AFC (months) | Scandinavian Red | 50 | 26.80 | 4.50 |
| AFC (months) | Unknown | 43 | 40.10 | 14.00 |
| AFC (months) | Zebu | 4 | 43.20 | 18.60 |
| Daily Milk (kg) | Ayrshire | 56,017 | 3.27 | 4.36 |
| Daily Milk (kg) | Ayrshire crosses | 14,858 | 14.33 | 4.70 |
| Daily Milk (kg) | Boran | 3,176 | 10.49 | 4.33 |
| Daily Milk (kg) | Breed_0 | 2,067 | 2.16 | 3.66 |
| Daily Milk (kg) | Brown Swiss | 18,483 | 13.14 | 6.58 |
| Daily Milk (kg) | FFFB | 245 | 4.36 | 4.12 |
| Daily Milk (kg) | FFFF | 245 | 4.36 | 4.12 |
| Daily Milk (kg) | Fleckvieh | 15,816 | 12.15 | 4.51 |
| Daily Milk (kg) | Gir | 75,910 | 14.22 | 5.24 |
| Daily Milk (kg) | Girolando | 10,343 | 12.68 | 4.36 |
| Daily Milk (kg) | Guernsey | 6,140 | 1.99 | 2.74 |
| Daily Milk (kg) | Guernsey crosses | 215 | 15.47 | 5.70 |
| Daily Milk (kg) | Hereford | 8 | 1.40 | 0.32 |
| Daily Milk (kg) | Holstein-Friesian crosses | 67,360 | 13.24 | 5.42 |
| Daily Milk (kg) | Holstein Friesian | 213,921 | 6.11 | 6.59 |
| Daily Milk (kg) | Jersey | 13,392 | 5.46 | 6.13 |
| Daily Milk (kg) | Jersey crosses | 24,695 | 14.77 | 4.95 |
| Daily Milk (kg) | Local Zebu | 119 | 3.33 | 1.94 |
| Daily Milk (kg) | Magic | 342 | 4.87 | 6.69 |
| Daily Milk (kg) | Others | 5 | 8.40 | 1.41 |
| Daily Milk (kg) | Red Poll | 231 | 0.79 | 0.18 |
| Daily Milk (kg) | Sahiwal | 16,644 | 3.85 | 4.25 |
| Daily Milk (kg) | Sahiwal crosses | 23,655 | 13.31 | 5.21 |
| Daily Milk (kg) | Scandinavian Red | 20,357 | 13.08 | 4.36 |
| Daily Milk (kg) | Unknown | 369 | 3.90 | 3.01 |
| Daily Milk (kg) | Zebu | 4 | 8.53 | 1.40 |
| Lactation Milk (kg) | Ayrshire | 611 | 221.00 | 403.00 |
| Lactation Milk (kg) | Ayrshire crosses | 97 | 3,127.00 | 2,480.00 |
| Lactation Milk (kg) | Boran | 78 | 448.00 | 659.00 |
| Lactation Milk (kg) | Breed_0 | 9 | 179.00 | 75.00 |
| Lactation Milk (kg) | Breed_43 | 1 | 108.00 | NA |
| Lactation Milk (kg) | Breed_80 | 1 | 909.00 | NA |
| Lactation Milk (kg) | Brown Swiss | 61 | 4,438.00 | 3,245.00 |
| Lactation Milk (kg) | FFFB | 3 | 141.00 | 58.00 |
| Lactation Milk (kg) | FFFF | 3 | 141.00 | 58.00 |
| Lactation Milk (kg) | Fleckvieh | 69 | 2,848.00 | 1,909.00 |
| Lactation Milk (kg) | Gir | 372 | 3,069.00 | 2,758.00 |
| Lactation Milk (kg) | Girolando | 39 | 3,932.00 | 3,317.00 |
| Lactation Milk (kg) | Guernsey | 43 | 189.00 | 141.00 |
| Lactation Milk (kg) | Guernsey crosses | 1 | 3,248.00 | NA |
| Lactation Milk (kg) | Holstein-Friesian crosses | 1,264 | 800.00 | 1,324.00 |
| Lactation Milk (kg) | Holstein Friesian | 3,869 | 566.00 | 1,355.00 |
| Lactation Milk (kg) | Jersey | 286 | 215.00 | 162.00 |
| Lactation Milk (kg) | Jersey crosses | 150 | 2,505.00 | 2,298.00 |
| Lactation Milk (kg) | Magic | 6 | 201.00 | 64.00 |
| Lactation Milk (kg) | Sahiwal | 1,848 | 678.00 | 696.00 |
| Lactation Milk (kg) | Sahiwal crosses | 633 | 2,951.00 | 2,109.00 |
| Lactation Milk (kg) | Scandinavian Red | 89 | 3,391.00 | 3,304.00 |
| Lactation Milk (kg) | Unknown | 39 | 1,167.00 | 1,151.00 |
| Lactation Milk (kg) | Zebu | 1 | 205.00 | NA |
| CI (days) | Ayrshire | 5,680 | 491.00 | 148.00 |
| CI (days) | Ayrshire crosses | 71 | 472.00 | 129.00 |
| CI (days) | Boran | 163 | 425.00 | 86.00 |
| CI (days) | Breed_0 | 155 | 464.00 | 124.00 |
| CI (days) | Brown Swiss | 149 | 481.00 | 126.00 |
| CI (days) | FFFB | 23 | 532.00 | 142.00 |
| CI (days) | FFFF | 23 | 532.00 | 142.00 |
| CI (days) | Fleckvieh | 29 | 414.00 | 76.00 |
| CI (days) | Gir | 204 | 459.00 | 108.00 |
| CI (days) | Girolando | 15 | 496.00 | 112.00 |
| CI (days) | Guernsey | 814 | 460.00 | 137.00 |
| CI (days) | Guernsey crosses | 3 | 392.00 | 143.00 |
| CI (days) | Holstein-Friesian crosses | 3,101 | 416.00 | 91.00 |
| CI (days) | Holstein Friesian | 24,672 | 448.00 | 123.00 |
| CI (days) | Jersey | 2,545 | 442.00 | 115.00 |
| CI (days) | Jersey crosses | 122 | 419.00 | 92.00 |
| CI (days) | Local Zebu | 18 | 461.00 | 134.00 |
| CI (days) | Magic | 38 | 442.00 | 114.00 |
| CI (days) | Others | 8 | 728.00 | 10.00 |
| CI (days) | Red Poll | 17 | 540.00 | 189.00 |
| CI (days) | Sahiwal | 5,123 | 518.00 | 155.00 |
| CI (days) | Sahiwal crosses | 442 | 475.00 | 130.00 |
| CI (days) | Scandinavian Red | 38 | 502.00 | 120.00 |
| CI (days) | Unknown | 797 | 426.00 | 93.00 |
Breed groups are confounded with region and management. Fair comparisons require the fixed-effects models below.
dim_summary <- daily_milk_data %>%
mutate(dim_class_num = as.numeric(as.character(dim_class))) %>%
group_by(dim_class_num) %>%
summarise(N = n(), Mean = mean(daily_yield), SE = sd(daily_yield)/sqrt(n()), .groups = "drop") %>%
mutate(DIM_midpoint = (dim_class_num - 0.5)*30, Lower = Mean - 1.96*SE, Upper = Mean + 1.96*SE)
ggplot(dim_summary, aes(x = DIM_midpoint, y = Mean)) +
geom_ribbon(aes(ymin = Lower, ymax = Upper), fill = trait_colors["Daily Milk"], alpha = 0.25) +
geom_line(linewidth = 1.3, color = trait_colors["Daily Milk"]) +
geom_point(size = 3, color = trait_colors["Daily Milk"]) +
geom_vline(xintercept = 45, linetype = "dashed", color = "red", linewidth = 0.8) +
annotate("text", x = 55, y = max(dim_summary$Mean)*0.98, label = "Peak yield (~DIM 45)",
color = "red", fontface = "italic", hjust = 0, size = 4.5) +
labs(title = "Lactation Curve: Average Daily Milk Yield by Days in Milk",
subtitle = "Shaded = 95% CI", x = "Days in Milk (DIM)", y = "Mean Daily Milk Yield (kg)") +
scale_x_continuous(breaks = seq(0, 500, 50)) + theme(panel.grid.minor = element_blank())dim_summary %>%
mutate(`DIM Class` = dim_class_num,
`DIM Range` = paste0((dim_class_num-1)*30+1, "\u2013", dim_class_num*30),
`Mean yield (kg)` = round(Mean, 2),
`95% CI` = paste0("[", round(Lower,2), ", ", round(Upper,2), "]")) %>%
select(`DIM Class`, `DIM Range`, N, `Mean yield (kg)`, `95% CI`) %>%
kable(caption = "Mean daily milk by DIM class", format.args = list(big.mark=",")) %>%
kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)| DIM Class | DIM Range | N | Mean yield (kg) | 95% CI |
|---|---|---|---|---|
| 1 | 1–30 | 56,973 | 9.43 | [9.36, 9.49] |
| 2 | 31–60 | 58,893 | 11.26 | [11.2, 11.33] |
| 3 | 61–90 | 54,743 | 10.78 | [10.72, 10.85] |
| 4 | 91–120 | 52,668 | 10.32 | [10.25, 10.38] |
| 5 | 121–150 | 50,189 | 9.82 | [9.76, 9.88] |
| 6 | 151–180 | 50,183 | 9.50 | [9.44, 9.56] |
| 7 | 181–210 | 45,369 | 9.07 | [9.01, 9.13] |
| 8 | 211–240 | 42,364 | 8.61 | [8.55, 8.67] |
| 9 | 241–270 | 40,582 | 8.28 | [8.23, 8.34] |
| 10 | 271–300 | 33,811 | 7.90 | [7.84, 7.96] |
| 11 | 301–330 | 26,118 | 7.47 | [7.4, 7.54] |
| 12 | 331–360 | 19,977 | 7.31 | [7.23, 7.39] |
| 13 | 361–390 | 16,074 | 7.19 | [7.09, 7.28] |
| 14 | 391–420 | 12,153 | 6.93 | [6.82, 7.03] |
| 15 | 421–450 | 10,061 | 6.63 | [6.51, 6.74] |
| 16 | 451–480 | 9,046 | 6.48 | [6.36, 6.6] |
| 17 | 481–510 | 5,413 | 6.25 | [6.1, 6.41] |
parity_dm <- daily_milk_data %>% group_by(parity_class) %>% summarise(Mean=mean(daily_yield), SE=sd(daily_yield)/sqrt(n()), .groups="drop")
parity_tly <- tly_data %>% group_by(parity_class) %>% summarise(Mean=mean(total_lactation_yield), SE=sd(total_lactation_yield)/sqrt(n()), .groups="drop")
parity_ci <- ci_data %>% group_by(parity_class) %>% summarise(Mean=mean(calving_interval_days), SE=sd(calving_interval_days)/sqrt(n()), .groups="drop")
mkp <- function(df, fc, yl, cy=FALSE) {
ggplot(df, aes(x=parity_class, y=Mean)) + geom_bar(stat="identity", fill=fc, alpha=0.85) +
geom_errorbar(aes(ymin=Mean-1.96*SE, ymax=Mean+1.96*SE), width=0.2) +
geom_text(aes(label=round(Mean, ifelse(yl=="Mean CI (days)",0,1))), vjust=-1.6, fontface="bold", size=4) +
labs(x="Parity", y=yl) + scale_y_continuous(expand=expansion(mult=c(0,0.18)), labels=if(cy) comma else waiver())
}
p_dm <- mkp(parity_dm, trait_colors["Daily Milk"], "Mean daily milk (kg)") + labs(title="Daily Milk by Parity")
p_tly <- mkp(parity_tly, trait_colors["Lactation Milk"], "Mean lactation milk (kg)", TRUE) + labs(title="Lactation Milk by Parity")
p_ci <- mkp(parity_ci, trait_colors["Calving Interval"],"Mean CI (days)") + labs(title="CI by Parity")
(p_dm | p_tly | p_ci) +
plot_annotation(title = "Effect of Parity on Trait Means (Raw, Unadjusted)",
subtitle = "Error bars = 95% CI; Parity 4 = '4 and above'",
theme = theme(plot.title = element_text(face="bold", size=17, hjust=0.5),
plot.subtitle = element_text(hjust=0.5, color="gray40", size=13)))Quantify and remove systematic environmental effects before genetic evaluation. The residual variation is the raw material for BLUPF90 breeding value estimation.
\[y_{ijklm} = \mu + \text{Region}_i + \text{Breed}_j + \text{Parity}_k + \text{DIM}_l + e_{ijklm}\]
Note: Region is used instead of Farm for interpretability. Farm will be included in the BLUPF90 evaluation.
Model: AFC ~ Region + Breed
afc_anova_df <- as.data.frame(anova(afc_model)) %>%
tibble::rownames_to_column("Effect") %>%
mutate(`% Variance` = round(`Sum Sq`/sum(`Sum Sq`)*100, 2),
`Pr(>F)` = ifelse(is.na(`Pr(>F)`), NA_character_,
ifelse(`Pr(>F)` < 0.001, "< 0.001", as.character(round(`Pr(>F)`, 4)))))
afc_anova_df %>%
kable(caption = "ANOVA \u2014 AFC ~ Region + Breed", digits = 2, format.args = list(big.mark=",")) %>%
kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE) %>%
row_spec(0, bold = TRUE, background = "#2b6cb0", color = "white")| Effect | Df | Sum Sq | Mean Sq | F value | Pr(>F) | % Variance |
|---|---|---|---|---|---|---|
| region | 34 | 473,609.0 | 13,929.68 | 126.13 | < 0.001 | 12.12 |
| main_breed_name | 31 | 385,667.1 | 12,440.87 | 112.65 | < 0.001 | 9.87 |
| Residuals | 27,603 | 3,048,357.6 | 110.44 | NA | NA | 78.01 |
afc_lsmeans <- afc_data %>% mutate(predicted = predict(afc_model)) %>%
group_by(main_breed_name) %>%
summarise(N=n(), Raw_Mean=round(mean(afc_months),2), LS_Mean=round(mean(predicted),2), .groups="drop") %>%
arrange(LS_Mean)
ggplot(afc_lsmeans, aes(x = reorder(main_breed_name, LS_Mean), y = LS_Mean)) +
geom_bar(stat="identity", fill=trait_colors["AFC"], alpha=0.8) +
geom_point(aes(y=Raw_Mean), color="darkred", size=3, shape=17) +
geom_text(aes(label=paste0(LS_Mean," mo")), vjust=-0.5, fontface="bold", size=3.5) +
labs(title="Breed Effect on AFC", subtitle="Bars = adjusted; triangles = raw",
x="Breed", y="AFC (months)") +
theme(axis.text.x = element_text(angle=45, hjust=1)) +
scale_y_continuous(expand = expansion(mult = c(0, 0.12)))afc_lsmeans %>% kable(caption="AFC by breed: raw vs. adjusted",
col.names=c("Breed","N","Raw Mean (mo)","Adjusted Mean (mo)"), format.args=list(big.mark=",")) %>%
kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)| Breed | N | Raw Mean (mo) | Adjusted Mean (mo) |
|---|---|---|---|
| Breed_17 | 1 | 26.00 | 26.00 |
| Scandinavian Red | 50 | 26.84 | 26.84 |
| Hereford | 2 | 27.50 | 27.50 |
| Girolando | 22 | 29.45 | 29.45 |
| Jersey crosses | 70 | 30.11 | 30.11 |
| Gir | 197 | 31.11 | 31.11 |
| Jersey | 1,130 | 31.44 | 31.44 |
| Fleckvieh | 53 | 31.75 | 31.75 |
| Holstein-Friesian crosses | 1,253 | 35.79 | 35.79 |
| Ayrshire crosses | 87 | 36.77 | 36.77 |
| Guernsey crosses | 8 | 37.00 | 37.00 |
| Holstein Friesian | 15,541 | 37.08 | 37.08 |
| Breed_0 | 147 | 37.28 | 37.28 |
| Magic | 12 | 37.75 | 37.75 |
| Sahiwal crosses | 226 | 38.44 | 38.44 |
| Boran | 101 | 38.67 | 38.67 |
| Unknown | 43 | 40.09 | 40.09 |
| FFFB | 19 | 41.05 | 41.05 |
| Guernsey | 573 | 41.45 | 41.45 |
| Breed_80 | 1 | 42.00 | 42.00 |
| Zebu | 4 | 43.25 | 43.25 |
| Ayrshire | 5,067 | 43.59 | 43.59 |
| Breed_12 | 5 | 44.00 | 44.00 |
| Brown Swiss | 147 | 44.18 | 44.18 |
| Breed_38 | 1 | 46.00 | 46.00 |
| Breed_43 | 1 | 46.00 | 46.00 |
| Red Poll | 23 | 47.00 | 47.00 |
| Local Zebu | 7 | 49.00 | 49.00 |
| Sahiwal | 2,874 | 49.28 | 49.28 |
| Breed_72 | 1 | 52.00 | 52.00 |
| Breed_35 | 1 | 58.00 | 58.00 |
| Others | 2 | 61.50 | 61.50 |
Model: Daily_Milk ~ Region + Breed + Parity + DIM_class
dm_anova_df <- as.data.frame(anova(dm_model)) %>%
tibble::rownames_to_column("Effect") %>%
mutate(`% Variance` = round(`Sum Sq`/sum(`Sum Sq`)*100, 2),
`Pr(>F)` = ifelse(is.na(`Pr(>F)`), NA_character_,
ifelse(`Pr(>F)` < 0.001, "< 0.001", as.character(round(`Pr(>F)`, 4)))))
dm_anova_df %>%
kable(caption = "ANOVA \u2014 Daily Milk ~ Region + Breed + Parity + DIM", digits=2, format.args=list(big.mark=",")) %>%
kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE) %>%
row_spec(0, bold=TRUE, background="#2b6cb0", color="white") %>%
row_spec(which(dm_anova_df$`% Variance` == max(dm_anova_df$`% Variance`[dm_anova_df$Effect != "Residuals"], na.rm=TRUE)),
bold=TRUE, background="#FFF3CD")| Effect | Df | Sum Sq | Mean Sq | F value | Pr(>F) | % Variance |
|---|---|---|---|---|---|---|
| region | 34 | 13,984,781.8 | 411,317.11 | 17,832.35 | < 0.001 | 47.82 |
| main_breed_name | 25 | 472,462.1 | 18,898.49 | 819.33 | < 0.001 | 1.62 |
| parity_class | 3 | 339,307.2 | 113,102.41 | 4,903.47 | < 0.001 | 1.16 |
| dim_class | 16 | 967,813.4 | 60,488.34 | 2,622.43 | < 0.001 | 3.31 |
| Residuals | 584,538 | 13,482,827.3 | 23.07 | NA | NA | 46.10 |
dm_dim_means <- daily_milk_data %>%
mutate(predicted = predict(dm_model), dim_class_n = as.numeric(as.character(dim_class))) %>%
group_by(dim_class_n) %>%
summarise(N=n(), Raw_Mean=mean(daily_yield), LS_Mean=mean(predicted), .groups="drop")
p_dim <- ggplot(dm_dim_means, aes(x = dim_class_n*30-15)) +
geom_ribbon(aes(ymin=LS_Mean*0.95, ymax=LS_Mean*1.05), fill=trait_colors["Daily Milk"], alpha=0.15) +
geom_line(aes(y=LS_Mean, color="Adjusted"), linewidth=1.3) +
geom_point(aes(y=LS_Mean, color="Adjusted"), size=3) +
geom_line(aes(y=Raw_Mean, color="Raw"), linewidth=1, linetype="dashed") +
geom_point(aes(y=Raw_Mean, color="Raw"), size=2) +
scale_color_manual(values=c("Adjusted"=trait_colors["Daily Milk"], "Raw"="gray50")) +
labs(title="Lactation Curve: Adjusted vs Raw", x="Days in Milk", y="Daily Milk (kg)", color=NULL) +
scale_x_continuous(breaks=seq(0,510,60))
dm_breed_means <- daily_milk_data %>% mutate(predicted = predict(dm_model)) %>%
group_by(main_breed_name) %>%
summarise(N=n(), Raw_Mean=mean(daily_yield), LS_Mean=mean(predicted), .groups="drop")
p_breed <- ggplot(dm_breed_means, aes(x=reorder(main_breed_name, LS_Mean), y=LS_Mean)) +
geom_bar(stat="identity", fill=trait_colors["Daily Milk"], alpha=0.8) +
geom_point(aes(y=Raw_Mean), color="darkblue", size=3, shape=17) +
geom_text(aes(label=round(LS_Mean,1)), vjust=-0.5, fontface="bold", size=3.5) +
labs(title="Breed Effect on Daily Milk", x="Breed", y="Daily Milk (kg)") +
theme(axis.text.x=element_text(angle=45, hjust=1)) +
scale_y_continuous(expand=expansion(mult=c(0,0.12)))
p_dim | p_breedModel: Lactation_Milk ~ Region + Breed + Parity
tly_anova_df <- as.data.frame(anova(tly_model)) %>%
tibble::rownames_to_column("Effect") %>%
mutate(`% Variance` = round(`Sum Sq`/sum(`Sum Sq`)*100, 2),
`Pr(>F)` = ifelse(is.na(`Pr(>F)`), NA_character_,
ifelse(`Pr(>F)` < 0.001, "< 0.001", as.character(round(`Pr(>F)`, 4)))))
tly_anova_df %>%
kable(caption = "ANOVA \u2014 TLY ~ Region + Breed + Parity", digits=2, format.args=list(big.mark=",")) %>%
kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE) %>%
row_spec(0, bold=TRUE, background="#2b6cb0", color="white")| Effect | Df | Sum Sq | Mean Sq | F value | Pr(>F) | % Variance |
|---|---|---|---|---|---|---|
| region | 21 | 7,991,266,686 | 380,536,509 | 204.38 | < 0.001 | 28.71 |
| main_breed_name | 23 | 2,000,132,129 | 86,962,266 | 46.71 | < 0.001 | 7.19 |
| parity_class | 3 | 109,892,326 | 36,630,775 | 19.67 | < 0.001 | 0.39 |
| Residuals | 9,525 | 17,734,451,280 | 1,861,885 | NA | NA | 63.71 |
tly_par_means <- tly_data %>% mutate(predicted = predict(tly_model)) %>%
group_by(parity_class) %>%
summarise(N=n(), Raw_Mean=mean(total_lactation_yield), LS_Mean=mean(predicted), .groups="drop")
ggplot(tly_par_means, aes(x=parity_class)) +
geom_bar(aes(y=LS_Mean, fill="Adjusted"), stat="identity", alpha=0.85) +
geom_point(aes(y=Raw_Mean, color="Raw"), size=4, shape=17) +
geom_text(aes(y=LS_Mean, label=paste0(round(LS_Mean,0)," kg")), vjust=-0.6, fontface="bold", size=4) +
scale_fill_manual(values=c("Adjusted"=trait_colors["Lactation Milk"])) +
scale_color_manual(values=c("Raw"="darkgreen")) +
labs(title="Parity Effect on TLY", x="Parity", y="Lactation Milk (kg)", fill=NULL, color=NULL) +
scale_y_continuous(labels=comma, expand=expansion(mult=c(0,0.15)))tly_breed_means <- tly_data %>% mutate(predicted = predict(tly_model)) %>%
group_by(main_breed_name) %>%
summarise(N=n(), Raw_Mean=mean(total_lactation_yield), LS_Mean=mean(predicted), .groups="drop") %>%
arrange(LS_Mean)
ggplot(tly_breed_means, aes(x=reorder(main_breed_name, LS_Mean), y=LS_Mean)) +
geom_bar(stat="identity", fill=trait_colors["Lactation Milk"], alpha=0.85) +
geom_point(aes(y=Raw_Mean), color="darkgreen", size=3, shape=17) +
geom_text(aes(label=paste0(format(round(LS_Mean,0), big.mark=",")," kg")), vjust=-0.5, fontface="bold", size=3.5) +
labs(title="Breed Effect on TLY", x="Breed", y="Lactation Milk (kg)") +
theme(axis.text.x=element_text(angle=45, hjust=1)) +
scale_y_continuous(labels=comma, expand=expansion(mult=c(0,0.14)))tly_breed_means %>% mutate(across(c(Raw_Mean,LS_Mean), ~round(.,0))) %>%
kable(caption="TLY by breed: raw vs. adjusted", col.names=c("Breed","N","Raw (kg)","Adjusted (kg)"),
format.args=list(big.mark=",")) %>%
kable_styling(bootstrap_options=c("striped","hover"), full_width=FALSE) %>%
row_spec(0, bold=TRUE, background="#2b6cb0", color="white")| Breed | N | Raw (kg) | Adjusted (kg) |
|---|---|---|---|
| Breed_43 | 1 | 108 | 108 |
| FFFF | 3 | 141 | 141 |
| FFFB | 3 | 141 | 141 |
| Breed_0 | 9 | 179 | 179 |
| Guernsey | 43 | 189 | 189 |
| Magic | 6 | 201 | 201 |
| Zebu | 1 | 205 | 205 |
| Jersey | 286 | 215 | 215 |
| Ayrshire | 611 | 221 | 221 |
| Boran | 78 | 448 | 448 |
| Holstein Friesian | 3,869 | 566 | 566 |
| Sahiwal | 1,848 | 678 | 678 |
| Holstein-Friesian crosses | 1,264 | 800 | 800 |
| Breed_80 | 1 | 909 | 909 |
| Unknown | 39 | 1,167 | 1,167 |
| Jersey crosses | 150 | 2,505 | 2,505 |
| Fleckvieh | 69 | 2,848 | 2,848 |
| Sahiwal crosses | 633 | 2,951 | 2,951 |
| Gir | 372 | 3,069 | 3,069 |
| Ayrshire crosses | 97 | 3,127 | 3,127 |
| Guernsey crosses | 1 | 3,248 | 3,248 |
| Scandinavian Red | 89 | 3,391 | 3,391 |
| Girolando | 39 | 3,932 | 3,932 |
| Brown Swiss | 61 | 4,438 | 4,438 |
Model: CI ~ Region + Breed + Parity (parity 2+ only)
ci_anova_df <- as.data.frame(anova(ci_model)) %>%
tibble::rownames_to_column("Effect") %>%
mutate(`% Variance` = round(`Sum Sq`/sum(`Sum Sq`)*100, 2),
`Pr(>F)` = ifelse(is.na(`Pr(>F)`), NA_character_,
ifelse(`Pr(>F)` < 0.001, "< 0.001", as.character(round(`Pr(>F)`, 4)))))
ci_anova_df %>%
kable(caption = "ANOVA \u2014 CI ~ Region + Breed + Parity", digits=2, format.args=list(big.mark=",")) %>%
kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE) %>%
row_spec(0, bold=TRUE, background="#2b6cb0", color="white")| Effect | Df | Sum Sq | Mean Sq | F value | Pr(>F) | % Variance |
|---|---|---|---|---|---|---|
| region | 31 | 32,532,101 | 1,049,422.62 | 65.28 | < 0.001 | 4.29 |
| main_breed_name | 23 | 13,856,888 | 602,473.38 | 37.48 | < 0.001 | 1.83 |
| parity_class | 2 | 1,563,716 | 781,857.98 | 48.63 | < 0.001 | 0.21 |
| Residuals | 44,193 | 710,458,036 | 16,076.26 | NA | NA | 93.68 |
ci_breed_means <- ci_data %>% mutate(predicted = predict(ci_model)) %>%
group_by(main_breed_name) %>%
summarise(N=n(), Raw_Mean=mean(calving_interval_days), LS_Mean=mean(predicted), .groups="drop")
ggplot(ci_breed_means, aes(x=reorder(main_breed_name, LS_Mean), y=LS_Mean)) +
geom_bar(stat="identity", fill=trait_colors["Calving Interval"], alpha=0.85) +
geom_point(aes(y=Raw_Mean), color="purple4", size=3, shape=17) +
geom_text(aes(label=round(LS_Mean,0)), vjust=-0.5, fontface="bold", size=4) +
geom_hline(yintercept=365, linetype="dashed", color="red", linewidth=0.8) +
annotate("text", x=0.6, y=375, label="Economic optimum (365 d)", color="red", hjust=0, fontface="italic", size=4) +
labs(title="Breed Effect on Calving Interval", x="Breed", y="CI (days)") +
theme(axis.text.x=element_text(angle=45, hjust=1)) +
scale_y_continuous(expand=expansion(mult=c(0,0.12)))r2_df <- data.frame(
Trait = c("AFC","Daily Milk","Lactation Milk","Calving Interval"),
Model = c("Region + Breed","Region + Breed + Parity + DIM","Region + Breed + Parity","Region + Breed + Parity"),
R2 = c(afc_r2, dm_r2, tly_r2, ci_r2),
AdjR2 = c(afc_adjr2, dm_adjr2, tly_adjr2, ci_adjr2))
ggplot(r2_df, aes(x=reorder(Trait,R2), y=R2, fill=Trait)) +
geom_bar(stat="identity", alpha=0.85, show.legend=FALSE) +
geom_text(aes(label=paste0(R2,"%")), hjust=-0.2, fontface="bold", size=5) +
scale_fill_manual(values=trait_colors) +
labs(title="Variance Explained by Fixed Effects", x=NULL, y="R\u00B2 (%)") +
scale_y_continuous(expand=expansion(mult=c(0,0.20)), limits=c(0,100)) + coord_flip()r2_df %>% kable(caption="R\u00B2 by trait", col.names=c("Trait","Model","R\u00B2 (%)","Adj R\u00B2 (%)")) %>%
kable_styling(bootstrap_options=c("striped","hover"), full_width=FALSE) %>%
row_spec(0, bold=TRUE, background="#2b6cb0", color="white")| Trait | Model | R² (%) | Adj R² (%) |
|---|---|---|---|
| AFC | Region + Breed | 22.0 | 21.8 |
| Daily Milk | Region + Breed + Parity + DIM | 53.9 | 53.9 |
| Lactation Milk | Region + Breed + Parity | 36.3 | 36.0 |
| Calving Interval | Region + Breed + Parity | 6.3 | 6.2 |
AFC 22%
DMY 53.9%
TLY 36.3%
CI 6.3%
The lower the R², the more unexplained variation remains — much of it genetic. All four traits carry sufficient genetic variance for BLUPF90 evaluation.
final_summary <- data.frame(
Metric = c("Records (after QC)","Animals","Farms","Mean","Standard deviation","CV (%)",
"Minimum (QC limit)","Maximum (QC limit)","R\u00B2 of fixed effects (%)"),
AFC = c(afc_nrec, format(n_distinct(afc_data$animal_id),big.mark=","),
format(n_distinct(afc_data$farm_id),big.mark=","),
round(mean(afc_data$afc_months),1), round(sd(afc_data$afc_months),2),
round(sd(afc_data$afc_months)/mean(afc_data$afc_months)*100,1),
round(min(afc_data$afc_months),1), round(max(afc_data$afc_months),1), afc_r2),
`Daily Milk` = c(dm_nrec, format(n_distinct(daily_milk_data$animal_id),big.mark=","),
format(n_distinct(daily_milk_data$farm_id),big.mark=","),
round(mean(daily_milk_data$daily_yield),2), round(sd(daily_milk_data$daily_yield),2),
round(sd(daily_milk_data$daily_yield)/mean(daily_milk_data$daily_yield)*100,1),
round(min(daily_milk_data$daily_yield),1), round(max(daily_milk_data$daily_yield),1), dm_r2),
`Lactation Milk` = c(tly_nrec, format(n_distinct(tly_data$animal_id),big.mark=","),
format(n_distinct(tly_data$farm_id),big.mark=","),
round(mean(tly_data$total_lactation_yield),0), round(sd(tly_data$total_lactation_yield),0),
round(sd(tly_data$total_lactation_yield)/mean(tly_data$total_lactation_yield)*100,1),
round(min(tly_data$total_lactation_yield),0), round(max(tly_data$total_lactation_yield),0), tly_r2),
`Calving Interval` = c(ci_nrec, format(n_distinct(ci_data$animal_id),big.mark=","),
format(n_distinct(ci_data$farm_id),big.mark=","),
round(mean(ci_data$calving_interval_days),0), round(sd(ci_data$calving_interval_days),0),
round(sd(ci_data$calving_interval_days)/mean(ci_data$calving_interval_days)*100,1),
round(min(ci_data$calving_interval_days),0), round(max(ci_data$calving_interval_days),0), ci_r2),
check.names = FALSE)
final_summary %>%
kable(caption = "Comprehensive summary", col.names = c("Metric","AFC (months)","Daily Milk (kg)","Lactation Milk (kg)","Calving Interval (days)")) %>%
kable_styling(bootstrap_options = c("striped","hover"), full_width = TRUE) %>%
row_spec(0, bold=TRUE, background="#2b6cb0", color="white") %>%
row_spec(c(4,9), bold = TRUE) %>%
pack_rows("Data Volume",1,3) %>% pack_rows("Distribution",4,8) %>% pack_rows("Model Fit",9,9)| Metric | AFC (months) | Daily Milk (kg) | Lactation Milk (kg) | Calving Interval (days) |
|---|---|---|---|---|
| Data Volume | ||||
| Records (after QC) | 27,669 | 584,617 | 9,573 | 44,250 |
| Animals | 27,669 | 15,478 | 4,941 | 17,635 |
| Farms | 1,247 | 651 | 356 | 453 |
| Distribution | ||||
| Mean | 39.3 | 9.21 | 978 | 460 |
| Standard deviation | 11.88 | 7.07 | 1705 | 131 |
| CV (%) | 30.2 | 76.8 | 174.4 | 28.5 |
| Minimum (QC limit) | 18 | 0.5 | 100 | 280 |
| Maximum (QC limit) | 72 | 50 | 19584 | 900 |
| Model Fit | ||||
| R² of fixed effects (%) | 22 | 53.9 | 36.3 | 6.3 |
Three raw CSV files merged, harmonised, and QC-filtered into four analysis-ready trait datasets covering multiple regions, breeds, and parities.
Region, Breed, Parity, and DIM effects quantified for all four traits. All effects are highly significant (P < 0.001). DIM dominates daily milk variation.
Low-to-moderate R² values confirm substantial genetic variance remains after environmental correction — all four traits are suitable for BLUPF90 evaluation.
Variance component estimation using BLUPF90 animal models, followed by EBV computation and bull/cow ranking for selection decisions.
R version 4.5.1 (2025-06-13 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)
Matrix products: default
LAPACK version 3.12.1
locale:
[1] LC_COLLATE=English_Kenya.utf8 LC_CTYPE=English_Kenya.utf8
[3] LC_MONETARY=English_Kenya.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Kenya.utf8
time zone: Africa/Nairobi
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] broom_1.0.9 scales_1.4.0 kableExtra_1.4.0 knitr_1.50
[5] patchwork_1.3.2 ggplot2_3.5.2 tidyr_1.3.1 dplyr_1.1.4
[9] data.table_1.17.8
loaded via a namespace (and not attached):
[1] gtable_0.3.6 jsonlite_2.0.0 compiler_4.5.1 tidyselect_1.2.1
[5] xml2_1.4.0 stringr_1.5.1 jquerylib_0.1.4 textshaping_1.0.2
[9] systemfonts_1.2.3 yaml_2.3.10 fastmap_1.2.0 R6_2.6.1
[13] labeling_0.4.3 generics_0.1.4 backports_1.5.0 tibble_3.3.0
[17] svglite_2.2.1 bslib_0.9.0 pillar_1.11.0 RColorBrewer_1.1-3
[21] rlang_1.1.6 stringi_1.8.7 cachem_1.1.0 xfun_0.52
[25] sass_0.4.10 viridisLite_0.4.2 cli_3.6.5 withr_3.0.2
[29] magrittr_2.0.3 digest_0.6.37 grid_4.5.1 rstudioapi_0.17.1
[33] lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.5 glue_1.8.0
[37] farver_2.1.2 rmarkdown_2.29 purrr_1.1.0 tools_4.5.1
[41] pkgconfig_2.0.3 htmltools_0.5.8.1