Normalized Confusion Matrix

Purpose: Shows the performance of the model in a normalized form, where values represent proportions instead of absolute numbers.

Insights:

• The diagonal values represent the proportion of correctly classified instances for each class.

• Estrus, lying, and standing classes show relatively high accuracy, as their diagonal values are closer to 1.

• Grazing has moderate confusion, particularly with “standing” (e.g., grazing is often mistaken as standing).

• The model struggles more with distinguishing background from the behaviours, as misclassifications for background are noticeable.

Action Point: Improve annotations or use data augmentation for classes with high misclassification.

Normalised Confulsion Matrix
Normalised Confulsion Matrix

Confusion Matrix

Purpose: Displays the absolute number of correct and incorrect predictions for each class.

Insights: • Shows the absolute number of predictions for each class.

• Highlights class imbalance in the dataset. For example, lying and standing have more data points compared to estrus.

• Grazing is frequently misclassified as standing, indicating overlapping visual features or poor annotation consistency.

• Action Point: Add more diverse training examples for underrepresented behaviours like estrus and grazing.

Confulsion Matrix
Confulsion Matrix

F1 Confidence Curve

Purpose: Shows the F1 score (harmonic mean of precision and recall) across different confidence thresholds.

Insights: • The F1 score for “lying” is the highest, indicating the model is performing very well for this behaviour. • Estrus has the lowest F1 score, suggesting the model struggles to balance precision and recall for this class. • The overall F1 score for all classes is reasonable, but there’s room for improvement, especially for estrus and grazing. Action Point: Focus on increasing the quality and quantity of data for estrus. Adjust hyperparameters or try different augmentation strategies to reduce confusion with similar behaviours. F1 Confidence Curve

Labels Correlogram

Purpose: Visual representation of label distributions, including bounding box characteristics (e.g., width, height, x, y coordinates).

Insights: • Bounding boxes are distributed in a logical way (e.g., no extreme outliers).

• However, certain areas might have clustering, indicating potential biases in the dataset or specific patterns in how the data was annotated.

• Width and height distributions are generally uniform, but overlapping boxes may need further scrutiny.

Action Point: Check for annotation biases and ensure consistent labeling practices. Labels Corellelogram

Labels Overview

Purpose: Includes class distribution, annotation heatmaps, and bounding box patterns.

Insights: • Highlights class imbalance, with standing and lying dominating the dataset while estrus has the least examples.

• Shows the spatial distribution of bounding boxes, with clustering in certain areas suggesting biases in the data collection process.

Action Point: Add more data for estrus and grazing and ensure a more even distribution of annotations to improve generalization.

Labels Coverview
Labels Coverview

Precision Confidence Curve

Purpose: Displays how precision changes with varying confidence thresholds.

Insights:

• Shows precision values across different confidence levels for each class.

• The precision for “lying” and “standing” is strong, suggesting the model makes fewer false-positive predictions for these behaviours.

• Estrus and grazing have lower precision, indicating more false positives in these classes.

Action Point: Refine annotations and increase the dataset size for estrus and grazing. Additionally, investigate misclassifications to identify overlaps between behaviours.

Precision Confidence Curve
Precision Confidence Curve

Precision-Recall Curve

Purpose: Illustrates the recall values across different confidence thresholds.

Insights:

• Lying and standing have high AUC values, showing excellent precision-recall trade-off.

• Grazing and estrus have lower AUC values, suggesting difficulty in balancing precision and recall.

• The model has high recall for lying and standing but lower precision for grazing and estrus.

Action Point: Enhance data quality for grazing and estrus by collecting more diverse examples and improving annotation consistency.

Precision Recall Curve
Precision Recall Curve

Recall-Confidence Curve

Purpose: Illustrates the recall values across different confidence thresholds.

Insights:

• Recall for lying and standing is strong across confidence levels.

• Estrus has lower recall, indicating the model fails to identify many instances of this class.

• Grazing recall is moderate but drops significantly at higher confidence thresholds, showing the model struggles with confident predictions for grazing.

Action Point: Focus on increasing recall for underrepresented behaviors through balanced datasets and fine-tuning.

Recall Confidence Curve
Recall Confidence Curve

Results Graph

Purpose: Tracks training and validation metrics over epochs, including losses and mAP.

Insights:

• Training Loss: Decreases steadily over epochs, indicating effective learning. The validation loss follows a similar pattern, which is a good sign.

Metrics:  mAP (mean average precision) values show improvement, with mAP50 reaching close to 0.9 and mAP50-95 around 0.6.  Indicates the model has a good balance of precision and recall but still struggles with some challenging classes.

Action Point: To improve mAP50-95, focus on multi-scale training and incorporate hard-negative mining to handle challenging examples.

Results Graph
Results Graph