Efficient livestock management is an increasingly important part of modern agriculture especially as the global demand for high quality protein products continues to rise alongside populations. Cows play an important role in sustaining agricultural economies worldwide however monitoring cow behaviour and health remains a challenge particularly in large scale operations where individual attention to animals is limited.
The Problem
The range of behaviours cows exhibit from grazing, lying, standing, walking, and estrus can serve as indicators to the health welfare productivity. Changes in these behaviours can be signifiers to early signs of stress illness or environmental discomfort for instance:
• Increased lying could indicate lameness injury or, tiredness
• Reduced grazing could lead to poor nutrition or signify illness
• Changes in movement or standing time may indicate restlessness or environmental stress
• Monitoring of estrus could provide insights into heat and fertility
Traditionally monitoring these behaviours relies heavily on human observation. This is time consuming, labour intensive and is often prone to errors. Technology is being developed to fulfil these tasks.
Motivation
With an interest in cattle and particularly affordable livestock technology solutions for Lower Middle Income Countries, the motivation was to find an affordable way to monitor cattle and their behaviour. Artificial intelligence, computer vision and deep learning are beginning to provide afofordable opportunities to revolutionise livestock farming, alongside other technologies. By utilising computer models such as YOLO it is now possible to detect and classify behaviours in real time from video and image data.
Utilising technology to monitor animals can have numerous benefits
Improved animal welfare
Optimised productivity
Sustainability
Objectives
The objective of this assignment is to develop a deep learning model that can:
• Detect cows in images.
• Classify their behaviours into key categories: grazing, lying, standing or estrus
• Output images identifying the behaviour.
Following completion of the module, the experimentation and investigation stage took up a good deal of time using different data sets and trying to identify a suitable problem to solve. Initial models were programmed simply to identify cows, but it was soon realised that this would have limited application in real-world scenarios.
During the second phase a set of images was identified in which the animals had blue dots spray painted onto relevant parts of their body (hip, shoulder, knees, neck and head) to identify key measurements by which individual animals could be identified and possibly condition scored. However, the model struggled to identify the dots as the data were limited and the images not clear and so this project was also abandoned.
Finally, it was decided that identifying behaviour was a more suitable application for this project and so a large data source of 29,000 appropriate images was identified and downloaded. Annotation was initially undertaken using Roboflow with the YOLO annotation format but the then YOLO pre trained model was quickly able to identify the chosen behaviours (standing, walking, grazing and estrus), and model training began.
The training images and corresponding labels were organised into the requisite directory structure on Google Drive. Initially YOLOv8x was used but this proved slow and expensive progress. Following some further research the smaller faster YOLO models were employed until finally v8n was used maintaining an L4GPU runtime meant that training was quick over 200 epochs and the model could be adjusted and refined and retrained quickly.
Model Variant | Size (Parameters) | Speed (Inference) | Accuracy | Hardware Requirements |
---|---|---|---|---|
YOLOv8n | Smallest | Fastest | Low | Very low |
YOLOv8s | Small | Faster | Medium | Low |
YOLOv8m | Medium | Moderate | Good | Medium |
YOLOv8l | Large | Slow | High | High |
YOLOv8x | Extra Large | Slowest | Highest | Very High |
Training Protocol
YOLOv8n model was adjusted and trained for 200 epochs throughout December. To ensure a smooth training process, data augmentation techniques such as horizontal flipping, scaling, and random cropping were applied to improve model generalization.
The following key steps were followed:
• Data Loading: The training and validation datasets were preprocessed and loaded using the YOLOv8 data loading pipeline.
• Model Initialization: The YOLOv8 architecture was initialized with a predefined configuration, optimized for object detection tasks.
• Training: The model was trained using a stochastic gradient descent (SGD) optimizer with a momentum of 0.937 and weight decay of 0.0005. The learning rate scheduler followed a cosine annealing strategy.
• Validation: Performance was evaluated at the end of each epoch using the validation dataset to monitor overfitting and track metrics such as precision, recall, and mAP.
Model Evaluation After training, the model’s performance was assessed based on its ability to detect and classify cow behaviours in unseen data.
Metrics such as precision, recall, F1 score, and mAP were calculated to evaluate model accuracy. To identify potential overfitting, the divergence between training and validation losses across epochs was examined.(see model analysis)
This was performed using AI tools as well as visual inspections of detection outputs by generating predictions on 10 random images from the validation set.
These predictions were visualised with bounding boxes overlaid on the images, highlighting the detected behaviour categories.
Steps Taken and Techniques Applied
1. Model Training
Trained the YOLOv8 model with various configurations to achieve optimal performance:
• Baseline Setup:
• Model: YOLOv8m (medium capacity).
•Dataset: 29,000 images, split into train, val, and test directories with annotations in YOLO format.
Hyperparameters:
•Epochs: 200
•Batch Size: 24
•Image Size: 640
•Learning Rate: 0.001
•Augmentations: Enabled mosaic, horizontal flip, and HSV (hue, saturation, brightness) adjustments.
2. EarlyStopping Challenges
• Issue:
Training stopped after the first epoch due to EarlyStopping, as no significant improvement was observed in subsequent epochs.
• Solution: Adjusted the patience parameter to 50 to allow more epochs before stopping
3. Data Quality Analysis
A comprehensive check was performed to ensure:
• Annotation Consistency: Verified that every image in the dataset had a corresponding label, and all labels matched their associated images.
• Class Distribution: Ensured balanced representation of cow poses, including back and side views, and variations in lighting conditions.
4. Data Augmentation Techniques
Experimentation with data augmentations to enhance model generalization:
• Mosaic Augmentation: Combined multiple images into a single input image to improve detection in complex scenes. Intensity was adjusted for better results (mosaic=0.2).
• Horizontal Flip: Introduced mirrored images to simulate diverse orientations (fliplr=0.4).
• HSV Adjustments: Simulated varied lighting conditions by tweaking hue (hsv_h=0.01), saturation (hsv_s=0.5), and brightness (hsv_v=0.3).
5. Hyperparameter Tuning
To stabilize training and improve model convergence:
• Reduced the Learning Rate: Adjusted to lr0=0.0005 to prevent overshooting during weight updates.
• Increased Batch Size: Tested with batch=24 for more stable gradient estimates.
6. Model Validation
Post-training, we validated the model to evaluate performance:
• Metrics Observed:
• Precision: 65.88%
• Recall: 51.10%
• mAP@50: 58.53%
• mAP@50-95: 33.63%
Insights:
• Precision showed moderate improvement, indicating fewer false positives.
• Recall remained relatively low, highlighting missed detections.
• mAP@50-95 suggested the model struggled with finer-grained detections at higher IoU thresholds.
7. Techniques for Visualization
To better understand the model’s performance:
• Prediction Visualisation: Used the model.predict() method to visualize detections on the validation set, analyzing false positives and missed detections.
• Validation Metrics: Collected precision, recall, and mAP values to monitor the model’s learning progress.
• Potential Confusion Matrix: Planned for future inclusion of a confusion matrix to identify inter-class confusion.
The results show that the model performs well in detecting cows in lying and standing positions and achieves a high level of precision and recall for these classes. These outcomes demonstrate that the model’s accuracy in identifying behaviours with distinct visual features potentially make it a useable tool for routine livestock monitoring. For example, lying behaviour was detected with high F1 scores, showing that the model can track this particular activity. Likewise, standing behaviour was identified with a high degree of accuracy further demonstrating the model’s potential to monitor activity levels in herds. These results would suggest that the model might provide useful insights into the daily routines and welfare of cattle offering farmers actionable data for decision-making.
However, despite these achievements the analysis does highlight some shortfalls. One of the main issues was an imbalance within the data set where images of lying and standing cows dominated whilst grazing and estrus are underrepresented. This imbalance within the data set has had an impact on the model’s ability to accurately classify these less represented behaviours. Grazing can often involve subtle and less dynamic movement and so is likely to be frequently misclassified as standing due to overlapping visual features. Similarly, estrus critical behaviour for the management of reproduction has the lowest performance metric and this indicates the need for additional data and improved annotations for this class.
Another shortfall in the model is in the quality and consistency of the annotations used to training. The analysis shows clustering patterns and overlapping bounding boxes which could suggest potential bias in the data. These inconsistencies have contributed to the misclassification of behaviours particularly for grazing and estrus. Spending additional time refining the annotation process to ensure consistent labelling would significantly improve the model’s accuracy. Additionally using advanced data augmentation techniques would probably help to balance the data set and improve the model’s performance across all classes. With more time, expertise and patience these shortfalls could quite easily be overcome.
There are few applications for a model identifying still images of behaviour in cattle. So some experimentation followed the completion of the assigment to identify methods of deployment. Initially, the best.pt file was converted into an .onnx file to enable app deployment where images could be uploaded to an app to identify cow behaviour within the image. Moderate success was achieved here but this aspect of the project was sidelined to work towards creating a model that could process video.
For this Deep SORT (Simple Online and Realtime Tracking with a Deep Association Metric) was identified a suitable algorithm. It is a state-of-the-art multi-object tracking (MOT) algorithm designed to track objects in video sequences. It builds on the original SORT algorithm by incorporating a deep learning-based appearance descriptor, which significantly improves tracking performance. Again, moderate success was achieved with this, examples of which can be viewed in the Github Repository, however time constraints and the scope of the assignment meant that this was not satisfactorily completed.
Alif, M.A.R. & Hussain, M. (2024) YOLOv1 to YOLOv10: A Comprehensive Review of YOLO Variants and Their Application in the Agricultural Domain. Available at: https://arxiv.org/abs/2406.10139 (Accessed: 14 January 2025).
Andrew, W., Greatwood, C., & Burghardt, T. (2019) Aerial Animal Biometrics: Individual Friesian Cattle Recovery and Visual Identification via an Autonomous UAV with Onboard Deep Inference. Available at: https://arxiv.org/abs/1907.05310 (Accessed: 14 January 2025).
Badgujar, C.M., Poulose, A., & Gan, H. (2024) Agricultural Object Detection with You Look Only Once (YOLO) Algorithm: A Bibliometric and Systematic Literature Review. Available at: https://arxiv.org/abs/2401.10379 (Accessed: 14 January 2025).
Dulal, R., Zheng, L., Kabir, M.A., McGrath, S., Medway, J., Swain, D., & Swain, W. (2022) Automatic Cattle Identification using YOLOv5 and Mosaic Augmentation: A Comparative Analysis. Available at: https://arxiv.org/abs/2210.11939 (Accessed: 14 January 2025).
Jegham, N., Koh, C.Y., Abdelatti, M., & Hendawi, A. (2024) Evaluating the Evolution of YOLO (You Only Look Once) Models: A Comprehensive Benchmark Study of YOLO11 and Its Predecessors. Available at: https://arxiv.org/abs/2411.00201 (Accessed: 14 January 2025).
[Unknown author] (2023) A Comprehensive Review of YOLO Architectures in Computer Vision. Available at: https://arxiv.org/abs/2304.00501 (Accessed: 14 January 2025).