Week 7: Data Mapping and Understanding

Data Loading and Initial Cleaning

First, we load the experimental data. Based on our preliminary analysis, we identified a significant outlier in the serum data where a Calf_ID (28980) was erroneously recorded as a protein value. We also filter for biologically plausible birth weights and consumption durations.

# Load Data
excel_file <- "data_animal_science.xlsx"
df_milk    <- read_excel(excel_file, sheet = "Milk Consumption")
df_weight  <- read_excel(excel_file, sheet = "Body Weights")
df_serum   <- read_excel(excel_file, sheet = "Serum Total Protein")

# Filtering Outliers
filtered_milk <- df_milk %>% 
  filter(Days_consuming_milk > 50)

filtered_weight <- df_weight %>% 
  filter(Birth_Weight >= 50, Birth_Weight <= 150)

# Removing ID entry error (28980) and keeping valid age ranges
filtered_serum <- df_serum %>% 
  filter(Days_of_age >= 1, Days_of_age <= 10, Serum_Total_Protein < 70)

Relationship Analysis

By joining the datasets on Calf_ID, we can evaluate how initial birth metrics influence consumption patterns.

# Join datasets
combined_data <- filtered_milk %>% 
  inner_join(filtered_weight, by = "Calf_ID") %>% 
  inner_join(filtered_serum, by = "Calf_ID")

# Birthweight vs Milk Consumption
ggplot(combined_data, aes(x = Milk_Consumption_Liters, y = Birth_Weight)) + 
  geom_point(alpha = 0.6, color = "steelblue") + 
  geom_smooth(method = "lm", color = "darkorange", se = FALSE) +
  labs(title = "Relationship Between Birthweight and Milk Consumption", 
       x = "Milk Consumption (Liters)", 
       y = "Birthweight (lbs)") +
  theme_minimal()


Week 8: Dairy Feed Efficiency

In this section, we analyze feed efficiency (FE), calculated as the ratio of milk produced to dry matter intake (DMI).

# Week 8: Dairy Feed Efficiency

# Load Week 8 Data
daily_milk <- read.csv("Merged_Dairy_Data_given.csv")
daily_feed <- read.csv("Daily Intake.csv")

# This code finds the column that starts with 'Dry' and renames it to 'DMI'
daily_feed <- daily_feed %>%
  rename_with(~ "DMI", contains("Dry"))

# Calculate milk components and Efficiency
merged_data <- daily_milk %>%
  mutate(
    Fat_lbs = Milk * (Fat. / 100),
    Protein_lbs = Milk * (Pro. / 100),
    Combined_Fat_Protein = Fat_lbs + Protein_lbs
  ) %>%
  left_join(daily_feed, by = "Index") %>%
  # Use the new 'DMI' name here to avoid the object not found error
  mutate(Feed_Efficiency = Milk / DMI)

# View Feed Efficiency Distribution
ggplot(merged_data, aes(x = Feed_Efficiency)) + 
  geom_histogram(bins = 30, fill = "orange", color = "white", alpha = 0.8) + 
  labs(title = "Feed Efficiency Distribution", 
       subtitle = "Calculated as Milk (lbs) / Dry Matter Intake (lbs)",
       x = "Feed Efficiency", 
       y = "Count") +
  theme_minimal()

Analysis Summary

  • Outlier Removal: The Serum Total Protein sheet contained a massive outlier (28980.0) which was successfully identified as an accidental Calf_ID entry and removed.
  • Correlations: Birthweight and milk consumption showed a high degree of variance, though a cluster exists around the mean.
  • Efficiency: The herd’s feed efficiency distribution helps identify top-performing cows, with most values centering around expected physiological norms for a lactating dairy herd.