KFYO: Fusing Vision and Biomechanics for Accurate Pitch Evaluation


Conner Robertson 1
Nick Katz 1

Gohar Asad2 Md Ariful Islam Mozumder2 Proloy Kumar Mondal2 Rashedul Islam Sumon2 Hee-Cheol Kim2

1 Lawrence University | CMSC/STAT 405: Advanced Data Computing
2 Institute of Digital Anti-Aging Healthcare, Inje University, Gimhae, 50834, Republic of Korea

Introduction

In the last 2 decades, Major League Baseball has begun to track everything when it comes to ball flight, movement, spin, and trajectory. This research has been widely integrated into the scouting and evaluation of player metrics.

Central Problem

  • In terms of analyzing performance, it turns out accurately detecting, tracking, and evaluating a baseball pitch in real time is extremely difficult using ordinary video.

  • Baseball’s are small, move at high speeds, can blur due to camera frame-rate limitations, and may disappear or shrink visually as it travels further from the camera.

  • Modern tracking systems provide valuable data on pitch movement and player performance.

  • Low-cost video-based tracking is promising, but inconsistent environments make detection difficult.

  • Deep learning strengthens object detection by handling visual challenges such as blur, lighting changes, and background clutter

What is Yolo?

Figure 1: What is Yolo?

YOLO (You Only Look Once) is an object detection model. It scans an image once and predicts:

  1. where an object is located
  2. what the object is

This makes YOLO especially useful because it is faster and uses less processing power than prior systems of tracking.

In this study, YOLOv12 is used to detect small, fast-moving baseballs in video. The system then tracks the ball across frames to estimate its trajectory, speed, and pitch outcome

Objectives

  1. To analyze video frame-series pitch actions and, with the use of aerodynamic theory, recreate ball speed, trajectory, and rotation.
  2. Detect small, fast-moving objects in real-time video using a YOLOv12-optimized object detection model.
  3. Provide a low-cost, scalable alternative to commercial baseball tracking systems for training, performance analysis, and sports broadcasting.

Methods

  • 8,892 images from MLB games, Korean leagues, and shaky amateur video
  • Manually annotated ball and bat classifications

Workflow

  • Kalman filter + kinematics
    • helps with ball or strike classification using player movement data
    • makes up for all those blurs with aerodynamic calculations
  • YOLOv12
    • drawing final boundary boxes for prediction
  • Performance Analysis

Results

In object tracking models, the IoU, or intersection over union, is important when evaluating precision. When the predicted box around an object like a ball overlaps completely with the real labelled ball, the IoU would be 100%. This paper used mean average precision (mAP) to get a more nuanced look at precision.

Evaluation of YOLO Models

Model Precision Recall Inference Time (ms) Pitch Speed Est
YOLOv8 0.875 0.52 0.84 0.81 6.3 ~117 kph
YOLOv11 0.915 0.57 0.88 0.86 7.5 ~119 kph
YOLOv11n 0.880 0.54 0.85 0.83 9.1 ~116 kph
YOLOv11m 0.920 0.58 0.89 0.87 7.0 ~120 kph
YOLOv12 0.945 0.60 0.91 0.89 8.6 ~121 kph
  • : a forgiving threshold for IoU, with any overlap of 50% or higher being labelled “True”. a score of 0.945 means the rough location is found almost 95% of the time

  • : the “strict” test, this mAP takes the average of intervals from 50 to 95%. Always lower than , the score of 0.6 beat all other YOLO models.

  • Recall: scoring the highest with a recall of 0.89, the v12 model catches almost 90% of true positives.

  • Inference Time: in ms, translates to 116 frames per second

    • most video captures are between 30-60 fps; it was slower than previous models, but still efficient enough

User Interface Example of Tagged Frames

When looking at the basic interface to view the results, the YOLOv12 seems simple: boxes are drawn with either “Bat” or “Ball” labels.

Limitations

  • Data imbalance: The model had more bat examples than ball examples.

  • Missed ball detections: The ball class recall was about 80%.

  • Simplified motion model: The model assumes the ball moves at a mostly constant velocity, but real pitches are affected by gravity, drag, spin, and curve.

  • Camera angle requirements: The model also required a specific camera angle and field dimensions, which may not hold at amateur levels.

Reproducibility & Potential Future Work

  • Train on larger, more balanced datasets with stronger physics-based and time-based tracking models.

The current system does not fully model how a baseball actually moves through the air. Future models could include physics ideas like gravity, drag, spin, and curve. This would make trajectory prediction more realistic.

  • Use multi-camera input for more accurate speed estimation.

  • The approach may be transfferable to other sports with high speed like cricket, tennis or badminton.

Overall: The performance could be further developed in the future if a larger, and more representative training data set is available.

Conclusion

  • In this study, the article presented and evaluated how deep learning models performed in a baseball game analysis.

Final Remarks

The system was able to achieve high scores and successfully managed to perform the targeted tasks. The results of this study indicate the system can be used by the baseball coaches and players to get valuable information while training baseball and help them improve their pitching or catching performance better. While their are still inconsistencies in the model, with further development this could become a more accurate, accessible, and practical tool for real-world baseball training and analysis.

References

https://www.sciencedirect.com/science/article/pii/S2405844026005037?via%3Dihub