El diseño factorial de dos factores permite evaluar simultáneamente
el efecto de dos factores sobre una variable respuesta, así como la
posible interacción entre ellos. En este documento se utiliza la base de
datos global_cattle_disease_detection_dataset.csv y se
define el siguiente esquema analítico:
Feed_TypeSeasonMilk_Yield_LDesde el punto de vista metodológico, este análisis reproduce la lógica de un diseño factorial de dos factores. Sin embargo, debe aclararse que la base utilizada corresponde a datos observacionales, no a un experimento aleatorizado controlado. Por ello, el procedimiento resulta muy útil para enseñanza, interpretación estadística y práctica con ANOVA factorial, pero la inferencia causal debe formularse con cautela.
Supóngase que:
Entonces, el número total de observaciones es:
\[ N = abr \]
En un diseño factorial de dos factores se estudian tres componentes:
La interacción es especialmente importante porque indica si el efecto de un factor depende del nivel del otro.
El modelo lineal es:
\[ Y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \varepsilon_{ijk} \]
donde:
Se supone que:
\[ \varepsilon_{ijk} \sim N(0,\sigma^2) \]
e independientes.
\[ H_0: \alpha_1 = \alpha_2 = \cdots = \alpha_a = 0 \]
\[ H_1: \text{al menos uno de los efectos } \alpha_i \text{ es diferente de cero} \]
\[ H_0: \beta_1 = \beta_2 = \cdots = \beta_b = 0 \]
\[ H_1: \text{al menos uno de los efectos } \beta_j \text{ es diferente de cero} \]
\[ H_0: (\alpha\beta)_{ij} = 0 \quad \text{para todo } i,j \]
\[ H_1: \text{existe al menos una interacción distinta de cero} \]
Esta sección reproduce el enfoque conceptual de los ejemplos iniciales: primero se identifican medias por celda, medias marginales, media general y luego se muestran las sumas de cuadrados del ANOVA factorial.
La media de la celda \((i,j)\) es:
\[ \bar{Y}_{ij.} = \frac{1}{r} \sum_{k=1}^{r} Y_{ijk} \]
La media marginal del factor A es:
\[ \bar{Y}_{i..} = \frac{1}{br} \sum_{j=1}^{b}\sum_{k=1}^{r} Y_{ijk} \]
La media marginal del factor B es:
\[ \bar{Y}_{.j.} = \frac{1}{ar} \sum_{i=1}^{a}\sum_{k=1}^{r} Y_{ijk} \]
La media general es:
\[ \bar{Y}_{...} = \frac{1}{N}\sum_{i=1}^{a}\sum_{j=1}^{b}\sum_{k=1}^{r} Y_{ijk} \]
La suma de cuadrados total es:
\[ SC_T = \sum_{i=1}^{a}\sum_{j=1}^{b}\sum_{k=1}^{r} (Y_{ijk} - \bar{Y}_{...})^2 \]
La suma de cuadrados del factor A es:
\[ SC_A = br\sum_{i=1}^{a}(\bar{Y}_{i..} - \bar{Y}_{...})^2 \]
La suma de cuadrados del factor B es:
\[ SC_B = ar\sum_{j=1}^{b}(\bar{Y}_{.j.} - \bar{Y}_{...})^2 \]
La suma de cuadrados de la interacción es:
\[ SC_{AB} = r\sum_{i=1}^{a}\sum_{j=1}^{b} (\bar{Y}_{ij.} - \bar{Y}_{i..} - \bar{Y}_{.j.} + \bar{Y}_{...})^2 \]
La suma de cuadrados del error es:
\[ SC_E = SC_T - SC_A - SC_B - SC_{AB} \]
En un diseño balanceado:
\[ CM_A = \frac{SC_A}{a-1} \]
\[ CM_B = \frac{SC_B}{b-1} \]
\[ CM_{AB} = \frac{SC_{AB}}{(a-1)(b-1)} \]
\[ CM_E = \frac{SC_E}{ab(r-1)} \]
Los estadísticos de prueba son:
\[ F_A = \frac{CM_A}{CM_E}, \qquad F_B = \frac{CM_B}{CM_E}, \qquad F_{AB} = \frac{CM_{AB}}{CM_E} \]
datos <- read.csv("global_cattle_disease_detection_dataset.csv")
head(datos)
## Cattle_ID Breed Region Country Climate_Zone
## 1 CATTLE_000001 Tharparkar Africa CA Tropical
## 2 CATTLE_000002 Africander South_America ET Arid
## 3 CATTLE_000003 Holstein-Friesian Oceania KE Temperate
## 4 CATTLE_000004 Fleckvieh Europe_NA BR Tropical
## 5 CATTLE_000005 Danish_Red Oceania US Subtropical
## 6 CATTLE_000006 Jersey Oceania IN Continental
## Management_System Age_Months Weight_kg Parity Lactation_Stage Days_in_Milk
## 1 Intensive 32 259.9 4 Late 352
## 2 Semi_Intensive 63 593.9 6 Early 325
## 3 Intensive 132 675.4 3 Mid 79
## 4 Semi_Intensive 73 260.5 5 Late 249
## 5 Extensive 50 477.8 6 Early 339
## 6 Mixed 81 629.4 5 Late 166
## Feed_Type Feed_Quantity_kg Water_Intake_L Walking_Distance_km
## 1 Hay 16.8 58.5 7.89
## 2 Dry_Fodder 8.9 57.8 4.01
## 3 Crop_Residues 3.0 75.3 2.08
## 4 Crop_Residues 10.6 90.3 3.60
## 5 Concentrates 14.3 57.3 4.09
## 6 Pasture_Grass 9.6 54.4 3.76
## Grazing_Duration_hrs Rumination_Time_hrs Resting_Hours Body_Temperature_C
## 1 1.6 4.3 8.4 39.8
## 2 5.5 11.3 11.1 39.1
## 3 3.8 9.8 12.0 40.2
## 4 6.8 7.1 8.9 37.7
## 5 1.0 5.2 12.0 38.3
## 6 8.5 8.9 5.9 39.6
## Heart_Rate_bpm Respiratory_Rate Ambient_Temperature_C Humidity_percent
## 1 61 30 24.9 66.9
## 2 82 27 34.0 46.2
## 3 60 25 45.0 78.3
## 4 91 30 34.0 34.9
## 5 73 16 33.1 100.0
## 6 76 24 25.3 81.4
## Season Housing_Score Milk_Yield_L FMD_Vaccine Brucellosis_Vaccine HS_Vaccine
## 1 Summer 0.57 3.08 0 1 1
## 2 Summer 0.77 2.00 1 0 0
## 3 Autumn 0.54 14.06 1 0 0
## 4 Spring 0.69 12.74 0 0 0
## 5 Summer 0.83 15.64 0 0 0
## 6 Monsoon 0.36 10.59 0 1 1
## BQ_Vaccine Anthrax_Vaccine IBR_Vaccine BVD_Vaccine Rabies_Vaccine
## 1 1 1 0 0 0
## 2 1 0 0 1 1
## 3 1 1 0 1 0
## 4 1 0 0 0 0
## 5 1 0 1 1 1
## 6 1 1 1 0 0
## Previous_Week_Avg_Yield Body_Condition_Score Milking_Interval_hrs Date
## 1 4.88 3.0 24 2023-02-06
## 2 3.52 2.5 24 2022-10-31
## 3 11.28 2.5 12 2024-11-01
## 4 10.63 4.0 8 2023-07-07
## 5 16.99 2.0 12 2024-09-20
## 6 11.88 3.0 24 2024-11-26
## Farm_ID Disease_Status
## 1 FARM_0825 Foot_and_Mouth
## 2 FARM_0106 Ketosis_Subclinical
## 3 FARM_0201 Healthy
## 4 FARM_0174 Healthy
## 5 FARM_0028 Healthy
## 6 FARM_0750 Bovine_Tuberculosis
str(datos)
## 'data.frame': 250000 obs. of 40 variables:
## $ Cattle_ID : chr "CATTLE_000001" "CATTLE_000002" "CATTLE_000003" "CATTLE_000004" ...
## $ Breed : chr "Tharparkar" "Africander" "Holstein-Friesian" "Fleckvieh" ...
## $ Region : chr "Africa" "South_America" "Oceania" "Europe_NA" ...
## $ Country : chr "CA" "ET" "KE" "BR" ...
## $ Climate_Zone : chr "Tropical" "Arid" "Temperate" "Tropical" ...
## $ Management_System : chr "Intensive" "Semi_Intensive" "Intensive" "Semi_Intensive" ...
## $ Age_Months : int 32 63 132 73 50 81 50 133 98 50 ...
## $ Weight_kg : num 260 594 675 260 478 ...
## $ Parity : int 4 6 3 5 6 5 2 6 4 6 ...
## $ Lactation_Stage : chr "Late" "Early" "Mid" "Late" ...
## $ Days_in_Milk : int 352 325 79 249 339 166 232 265 103 63 ...
## $ Feed_Type : chr "Hay" "Dry_Fodder" "Crop_Residues" "Crop_Residues" ...
## $ Feed_Quantity_kg : num 16.8 8.9 3 10.6 14.3 9.6 5.5 11.9 12.9 5.9 ...
## $ Water_Intake_L : num 58.5 57.8 75.3 90.3 57.3 54.4 77.4 55.3 58.5 67.3 ...
## $ Walking_Distance_km : num 7.89 4.01 2.08 3.6 4.09 3.76 4.94 0.89 2.31 6.04 ...
## $ Grazing_Duration_hrs : num 1.6 5.5 3.8 6.8 1 8.5 3.8 9.2 8.3 5 ...
## $ Rumination_Time_hrs : num 4.3 11.3 9.8 7.1 5.2 8.9 4 8.2 9.1 4.1 ...
## $ Resting_Hours : num 8.4 11.1 12 8.9 12 5.9 14.5 11.6 6.5 15.2 ...
## $ Body_Temperature_C : num 39.8 39.1 40.2 37.7 38.3 39.6 37.9 38.2 40.2 38.4 ...
## $ Heart_Rate_bpm : num 61 82 60 91 73 76 69 52 63 63 ...
## $ Respiratory_Rate : num 30 27 25 30 16 24 26 15 19 21 ...
## $ Ambient_Temperature_C : num 24.9 34 45 34 33.1 25.3 17 -2.1 23.5 23.8 ...
## $ Humidity_percent : num 66.9 46.2 78.3 34.9 100 81.4 66.9 59.3 58.3 91.3 ...
## $ Season : chr "Summer" "Summer" "Autumn" "Spring" ...
## $ Housing_Score : num 0.57 0.77 0.54 0.69 0.83 0.36 0.45 0.81 0.84 0.74 ...
## $ Milk_Yield_L : num 3.08 2 14.06 12.74 15.64 ...
## $ FMD_Vaccine : int 0 1 1 0 0 0 1 0 1 0 ...
## $ Brucellosis_Vaccine : int 1 0 0 0 0 1 1 1 1 0 ...
## $ HS_Vaccine : int 1 0 0 0 0 1 1 1 1 0 ...
## $ BQ_Vaccine : int 1 1 1 1 1 1 0 0 0 0 ...
## $ Anthrax_Vaccine : int 1 0 1 0 0 1 1 0 1 1 ...
## $ IBR_Vaccine : int 0 0 0 0 1 1 1 0 1 1 ...
## $ BVD_Vaccine : int 0 1 1 0 1 0 1 0 0 1 ...
## $ Rabies_Vaccine : int 0 1 0 0 1 0 1 1 1 1 ...
## $ Previous_Week_Avg_Yield: num 4.88 3.52 11.28 10.63 16.99 ...
## $ Body_Condition_Score : num 3 2.5 2.5 4 2 3 3.5 3 3.5 4 ...
## $ Milking_Interval_hrs : int 24 24 12 8 12 24 12 24 24 6 ...
## $ Date : chr "2023-02-06" "2022-10-31" "2024-11-01" "2023-07-07" ...
## $ Farm_ID : chr "FARM_0825" "FARM_0106" "FARM_0201" "FARM_0174" ...
## $ Disease_Status : chr "Foot_and_Mouth" "Ketosis_Subclinical" "Healthy" "Healthy" ...
datos2 <- datos[, c("Feed_Type", "Season", "Milk_Yield_L")]
names(datos2)
## [1] "Feed_Type" "Season" "Milk_Yield_L"
datos2 <- na.omit(datos2)
datos2$Feed_Type <- as.factor(datos2$Feed_Type)
datos2$Season <- as.factor(datos2$Season)
datos2$Milk_Yield_L <- as.numeric(datos2$Milk_Yield_L)
str(datos2)
## 'data.frame': 250000 obs. of 3 variables:
## $ Feed_Type : Factor w/ 8 levels "Concentrates",..: 5 3 2 2 1 7 1 2 6 1 ...
## $ Season : Factor w/ 5 levels "Autumn","Monsoon",..: 4 4 1 3 4 2 5 2 2 4 ...
## $ Milk_Yield_L: num 3.08 2 14.06 12.74 15.64 ...
summary(datos2)
## Feed_Type Season Milk_Yield_L
## Dry_Fodder :31569 Autumn :49691 Min. : 0.000
## Pasture_Grass:31290 Monsoon:50106 1st Qu.: 4.340
## Concentrates :31282 Spring :49816 Median : 7.620
## Crop_Residues:31273 Summer :50293 Mean : 8.723
## Green_Fodder :31222 Winter :50094 3rd Qu.:12.290
## Mixed_Feed :31151 Max. :36.420
## (Other) :62213
levels(datos2$Feed_Type)
## [1] "Concentrates" "Crop_Residues" "Dry_Fodder" "Green_Fodder"
## [5] "Hay" "Mixed_Feed" "Pasture_Grass" "Silage"
levels(datos2$Season)
## [1] "Autumn" "Monsoon" "Spring" "Summer" "Winter"
nlevels(datos2$Feed_Type)
## [1] 8
nlevels(datos2$Season)
## [1] 5
tabla_diseno <- table(datos2$Feed_Type, datos2$Season)
tabla_diseno
##
## Autumn Monsoon Spring Summer Winter
## Concentrates 6134 6296 6303 6297 6252
## Crop_Residues 6146 6300 6266 6212 6349
## Dry_Fodder 6227 6311 6254 6425 6352
## Green_Fodder 6338 6207 6234 6324 6119
## Hay 6121 6200 6226 6294 6293
## Mixed_Feed 6150 6387 6234 6216 6164
## Pasture_Grass 6363 6106 6167 6411 6243
## Silage 6212 6299 6132 6114 6322
freq <- as.vector(tabla_diseno)
c(minimo = min(freq), maximo = max(freq), iguales = length(unique(freq)) == 1)
## minimo maximo iguales
## 6106 6425 0
medias_celda <- aggregate(Milk_Yield_L ~ Feed_Type + Season, data = datos2, mean)
medias_celda
## Feed_Type Season Milk_Yield_L
## 1 Concentrates Autumn 8.816224
## 2 Crop_Residues Autumn 8.674406
## 3 Dry_Fodder Autumn 8.671738
## 4 Green_Fodder Autumn 8.708429
## 5 Hay Autumn 8.675297
## 6 Mixed_Feed Autumn 8.815514
## 7 Pasture_Grass Autumn 8.776902
## 8 Silage Autumn 8.722502
## 9 Concentrates Monsoon 8.713482
## 10 Crop_Residues Monsoon 8.729208
## 11 Dry_Fodder Monsoon 8.712100
## 12 Green_Fodder Monsoon 8.703546
## 13 Hay Monsoon 8.687094
## 14 Mixed_Feed Monsoon 8.722090
## 15 Pasture_Grass Monsoon 8.695906
## 16 Silage Monsoon 8.610513
## 17 Concentrates Spring 8.688759
## 18 Crop_Residues Spring 8.810065
## 19 Dry_Fodder Spring 8.777173
## 20 Green_Fodder Spring 8.786819
## 21 Hay Spring 8.743532
## 22 Mixed_Feed Spring 8.841269
## 23 Pasture_Grass Spring 8.693493
## 24 Silage Spring 8.878819
## 25 Concentrates Summer 8.683313
## 26 Crop_Residues Summer 8.690856
## 27 Dry_Fodder Summer 8.704367
## 28 Green_Fodder Summer 8.760848
## 29 Hay Summer 8.793878
## 30 Mixed_Feed Summer 8.633798
## 31 Pasture_Grass Summer 8.696785
## 32 Silage Summer 8.821318
## 33 Concentrates Winter 8.811631
## 34 Crop_Residues Winter 8.638639
## 35 Dry_Fodder Winter 8.643997
## 36 Green_Fodder Winter 8.701641
## 37 Hay Winter 8.669369
## 38 Mixed_Feed Winter 8.583102
## 39 Pasture_Grass Winter 8.735140
## 40 Silage Winter 8.686884
medias_A <- aggregate(Milk_Yield_L ~ Feed_Type, data = datos2, mean)
medias_A
## Feed_Type Milk_Yield_L
## 1 Concentrates 8.742190
## 2 Crop_Residues 8.708634
## 3 Dry_Fodder 8.701753
## 4 Green_Fodder 8.732397
## 5 Hay 8.714065
## 6 Mixed_Feed 8.719264
## 7 Pasture_Grass 8.719910
## 8 Silage 8.742841
medias_B <- aggregate(Milk_Yield_L ~ Season, data = datos2, mean)
medias_B
## Season Milk_Yield_L
## 1 Autumn 8.732629
## 2 Monsoon 8.696800
## 3 Spring 8.777300
## 4 Summer 8.722895
## 5 Winter 8.683746
media_general <- mean(datos2$Milk_Yield_L)
media_general
## [1] 8.722596
N <- nrow(datos2)
N
## [1] 250000
SC_total <- sum((datos2$Milk_Yield_L - mean(datos2$Milk_Yield_L))^2)
SC_total
## [1] 8304204
modelo <- aov(Milk_Yield_L ~ Feed_Type * Season, data = datos2)
summary(modelo)
## Df Sum Sq Mean Sq F value Pr(>F)
## Feed_Type 7 50 7.20 0.217 0.9817
## Season 4 263 65.72 1.979 0.0948 .
## Feed_Type:Season 28 753 26.89 0.809 0.7496
## Residuals 249960 8303137 33.22
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(modelo)
## Analysis of Variance Table
##
## Response: Milk_Yield_L
## Df Sum Sq Mean Sq F value Pr(>F)
## Feed_Type 7 50 7.199 0.2167 0.98171
## Season 4 263 65.725 1.9786 0.09477 .
## Feed_Type:Season 28 753 26.890 0.8095 0.74957
## Residuals 249960 8303137 33.218
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Feed_Type presenta un p-valor pequeño, el tipo de
alimento influye en la producción de leche.Season presenta un p-valor pequeño, la estación
influye en la producción de leche.Feed_Type:Season presenta un p-valor pequeño, el
efecto del tipo de alimento depende de la estación.library(ggplot2)
ggplot(datos2, aes(x = Milk_Yield_L)) +
geom_histogram(bins = 30, fill = "steelblue", color = "white") +
labs(
title = "Distribución general de la producción de leche",
x = "Milk_Yield_L",
y = "Frecuencia"
) +
theme_minimal()
ggplot(datos2, aes(x = Milk_Yield_L)) +
geom_density(fill = "lightblue", alpha = 0.5) +
labs(
title = "Densidad de la producción de leche",
x = "Milk_Yield_L",
y = "Densidad"
) +
theme_minimal()
ggplot(datos2, aes(x = Feed_Type, y = Milk_Yield_L)) +
geom_boxplot(fill = "lightgreen") +
labs(
title = "Producción de leche por tipo de alimento",
x = "Feed_Type",
y = "Milk_Yield_L"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(datos2, aes(x = Season, y = Milk_Yield_L)) +
geom_boxplot(fill = "lightcoral") +
labs(
title = "Producción de leche por estación",
x = "Season",
y = "Milk_Yield_L"
) +
theme_minimal()
ggplot(datos2, aes(x = interaction(Feed_Type, Season), y = Milk_Yield_L, fill = Season)) +
geom_boxplot() +
labs(
title = "Producción de leche por combinación Feed_Type × Season",
x = "Combinación de tratamientos",
y = "Milk_Yield_L"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 60, hjust = 1))
ggplot(medias_A, aes(x = Feed_Type, y = Milk_Yield_L)) +
geom_col(fill = "skyblue") +
labs(
title = "Media de producción de leche por tipo de alimento",
x = "Feed_Type",
y = "Media de Milk_Yield_L"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(medias_B, aes(x = Season, y = Milk_Yield_L)) +
geom_col(fill = "tan") +
labs(
title = "Media de producción de leche por estación",
x = "Season",
y = "Media de Milk_Yield_L"
) +
theme_minimal()
ggplot(medias_celda,
aes(x = Season, y = Milk_Yield_L, color = Feed_Type, group = Feed_Type)) +
geom_point(size = 3) +
geom_line(linewidth = 1) +
labs(
title = "Gráfico de interacción: Feed_Type × Season",
x = "Season",
y = "Media de Milk_Yield_L",
color = "Feed_Type"
) +
theme_minimal()
ggplot(medias_celda, aes(x = Season, y = Feed_Type, fill = Milk_Yield_L)) +
geom_tile(color = "white") +
geom_text(aes(label = round(Milk_Yield_L, 2)), size = 3) +
labs(
title = "Mapa de calor de medias por combinación",
x = "Season",
y = "Feed_Type",
fill = "Media"
) +
theme_minimal()
ggplot(datos2, aes(x = Feed_Type, y = Milk_Yield_L, fill = Feed_Type)) +
geom_violin(alpha = 0.6) +
geom_boxplot(width = 0.12, fill = "white") +
labs(
title = "Distribución de Milk_Yield_L por tipo de alimento",
x = "Feed_Type",
y = "Milk_Yield_L"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none")
ggplot(datos2, aes(x = Season, y = Milk_Yield_L, fill = Season)) +
geom_violin(alpha = 0.6) +
geom_boxplot(width = 0.12, fill = "white") +
labs(
title = "Distribución de Milk_Yield_L por estación",
x = "Season",
y = "Milk_Yield_L"
) +
theme_minimal() +
theme(legend.position = "none")
Los supuestos principales son:
residuos <- residuals(modelo)
ajustados <- fitted(modelo)
diagnostico <- data.frame(
Ajustados = ajustados,
Residuos = residuos
)
head(diagnostico)
## Ajustados Residuos
## 1 8.793878 -5.713878
## 2 8.704367 -6.704367
## 3 8.674406 5.385594
## 4 8.810065 3.929935
## 5 8.683313 6.956687
## 6 8.695906 1.894094
set.seed(123)
muestra_res <- sample(residuos, size = min(5000, length(residuos)))
shapiro.test(muestra_res)
##
## Shapiro-Wilk normality test
##
## data: muestra_res
## W = 0.95514, p-value < 2.2e-16
ggplot(diagnostico, aes(sample = Residuos)) +
stat_qq() +
stat_qq_line(color = "red") +
labs(
title = "Q-Q plot de los residuos",
x = "Cuantiles teóricos",
y = "Cuantiles muestrales"
) +
theme_minimal()
ggplot(diagnostico, aes(x = Residuos)) +
geom_histogram(bins = 30, fill = "gray70", color = "white") +
labs(
title = "Histograma de residuos",
x = "Residuos",
y = "Frecuencia"
) +
theme_minimal()
ggplot(diagnostico, aes(x = Residuos)) +
geom_density(fill = "gray80", alpha = 0.7) +
labs(
title = "Densidad de residuos",
x = "Residuos",
y = "Densidad"
) +
theme_minimal()
ggplot(diagnostico, aes(x = Ajustados, y = Residuos)) +
geom_point(alpha = 0.4) +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(
title = "Residuos vs valores ajustados",
x = "Valores ajustados",
y = "Residuos"
) +
theme_minimal()
diagnostico$RaizAbsRes <- sqrt(abs(diagnostico$Residuos))
ggplot(diagnostico, aes(x = Ajustados, y = RaizAbsRes)) +
geom_point(alpha = 0.4) +
geom_smooth(se = FALSE, color = "blue") +
labs(
title = "Scale-Location plot",
x = "Valores ajustados",
y = expression(sqrt("|Residuos|"))
) +
theme_minimal()
bartlett.test(Milk_Yield_L ~ interaction(Feed_Type, Season), data = datos2)
##
## Bartlett test of homogeneity of variances
##
## data: Milk_Yield_L by interaction(Feed_Type, Season)
## Bartlett's K-squared = 44.936, df = 39, p-value = 0.2372
fligner.test(Milk_Yield_L ~ interaction(Feed_Type, Season), data = datos2)
##
## Fligner-Killeen test of homogeneity of variances
##
## data: Milk_Yield_L by interaction(Feed_Type, Season)
## Fligner-Killeen:med chi-squared = 30.876, df = 39, p-value = 0.8201
diagnostico$Feed_Type <- datos2$Feed_Type
ggplot(diagnostico, aes(x = Feed_Type, y = Residuos)) +
geom_boxplot(fill = "lavender") +
labs(
title = "Residuos por tipo de alimento",
x = "Feed_Type",
y = "Residuos"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
diagnostico$Season <- datos2$Season
ggplot(diagnostico, aes(x = Season, y = Residuos)) +
geom_boxplot(fill = "mistyrose") +
labs(
title = "Residuos por estación",
x = "Season",
y = "Residuos"
) +
theme_minimal()
Si el efecto de un factor es significativo y la interacción no domina la interpretación, se pueden realizar comparaciones múltiples.
TukeyHSD(modelo, which = "Feed_Type")
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Milk_Yield_L ~ Feed_Type * Season, data = datos2)
##
## $Feed_Type
## diff lwr upr p adj
## Crop_Residues-Concentrates -0.0335564317 -0.17324259 0.1061297 0.9961814
## Dry_Fodder-Concentrates -0.0404370918 -0.17979538 0.0989212 0.9879132
## Green_Fodder-Concentrates -0.0097930496 -0.14953625 0.1299501 0.9999990
## Hay-Concentrates -0.0281247469 -0.16796675 0.1117173 0.9987685
## Mixed_Feed-Concentrates -0.0229258483 -0.16274873 0.1168970 0.9996778
## Pasture_Grass-Concentrates -0.0222805216 -0.16194770 0.1173867 0.9997316
## Silage-Concentrates 0.0006504259 -0.13925358 0.1405544 1.0000000
## Dry_Fodder-Crop_Residues -0.0068806601 -0.14624902 0.1324877 0.9999999
## Green_Fodder-Crop_Residues 0.0237633821 -0.11598986 0.1635166 0.9995898
## Hay-Crop_Residues 0.0054316849 -0.13442035 0.1452837 1.0000000
## Mixed_Feed-Crop_Residues 0.0106305834 -0.12920233 0.1504635 0.9999983
## Pasture_Grass-Crop_Residues 0.0112759101 -0.12840132 0.1509531 0.9999974
## Silage-Crop_Residues 0.0342068576 -0.10570718 0.1741209 0.9957390
## Green_Fodder-Dry_Fodder 0.0306440422 -0.10878149 0.1700696 0.9978220
## Hay-Dry_Fodder 0.0123123450 -0.12721222 0.1518369 0.9999952
## Mixed_Feed-Dry_Fodder 0.0175112436 -0.12199415 0.1570166 0.9999465
## Pasture_Grass-Dry_Fodder 0.0181565702 -0.12119277 0.1575059 0.9999312
## Silage-Dry_Fodder 0.0410875177 -0.09849919 0.1806742 0.9868450
## Hay-Green_Fodder -0.0183316972 -0.15824071 0.1215773 0.9999285
## Mixed_Feed-Green_Fodder -0.0131327987 -0.15302269 0.1267571 0.9999927
## Pasture_Grass-Green_Fodder -0.0124874720 -0.15222174 0.1272468 0.9999948
## Silage-Green_Fodder 0.0104434755 -0.12952751 0.1504145 0.9999985
## Mixed_Feed-Hay 0.0051988986 -0.13478970 0.1451875 1.0000000
## Pasture_Grass-Hay 0.0058442253 -0.13398886 0.1456773 1.0000000
## Silage-Hay 0.0287751727 -0.11129446 0.1688448 0.9985874
## Pasture_Grass-Mixed_Feed 0.0006453267 -0.13916863 0.1404593 1.0000000
## Silage-Mixed_Feed 0.0235762742 -0.11647426 0.1636268 0.9996162
## Silage-Pasture_Grass 0.0229309475 -0.11696414 0.1628260 0.9996784
TukeyHSD(modelo, which = "Season")
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Milk_Yield_L ~ Feed_Type * Season, data = datos2)
##
## $Season
## diff lwr upr p adj
## Monsoon-Autumn -0.035808209 -0.13534172 0.063725298 0.8637980
## Spring-Autumn 0.044715811 -0.05496185 0.144393468 0.7375578
## Summer-Autumn -0.009617638 -0.10905897 0.089823690 0.9989374
## Winter-Autumn -0.048797443 -0.14833689 0.050742000 0.6678568
## Spring-Monsoon 0.080524021 -0.01894677 0.179994810 0.1765358
## Summer-Monsoon 0.026190572 -0.07304340 0.125424539 0.9519925
## Winter-Monsoon -0.012989233 -0.11232152 0.086343054 0.9965428
## Summer-Spring -0.054333449 -0.15371200 0.045045103 0.5681030
## Winter-Spring -0.093513254 -0.19298998 0.005963475 0.0770926
## Winter-Summer -0.039179805 -0.13841973 0.060060116 0.8185053
tabla_aov <- anova(modelo)
SC <- tabla_aov$"Sum Sq"
nombres <- rownames(tabla_aov)
eta2 <- SC / sum(SC)
data.frame(Fuente = nombres, Eta2 = eta2)
## Fuente Eta2
## 1 Feed_Type 6.068625e-06
## 2 Season 3.165863e-05
## 3 Feed_Type:Season 9.066581e-05
## 4 Residuals 9.998716e-01
Desde el punto de vista aplicado:
Por tanto, el análisis factorial permite responder:
La respuesta a estas preguntas debe basarse en la tabla ANOVA, el gráfico de interacción y la revisión de supuestos.
Las conclusiones deben redactarse con base en los resultados obtenidos al ejecutar este documento. Un formato apropiado es el siguiente:
ggplot2 para apoyar la
interpretación de efectos principales e interacción.