Human Action Recognition Using YOLOv11 Ultralytics: A Comprehensive Study for Real-Time Applications
DOI:
https://doi.org/10.56147/aaiet.1.2.14Keywords:
Human action recognition, YOLOv11, Ultralytics, Deep learning, Computer vision, Real-time detection, Surveillance, HealthcareAbstract
Human action recognition (HAR) is a pivotal task
in computer vision, with applications in surveillance, healthcare,
robotics, and human-computer interaction. This study presents a
novel framework for HAR using the YOLOv11 model by Ultralyt
ics, a state-of-the-art object detection architecture optimized for
real-time performance. We trained and evaluated the model on a
custom dataset comprising 18 distinct human actions, captured
in indoor environments using fisheye cameras. The actions range
from everyday activities (e.g., walking, sitting) to specialized
tasks (e.g., patient on stretcher, patient on wheelchair). Our
results show that YOLOv11 achieves a mean Average Precision
(mAP@0.5) of 0.401, with exceptional performance on actions like
”cleaning” (mAP@0.5: 0.760), ”searching” (mAP@0.5: 0.695),
and ”patient on wheelchair” (mAP@0.5: 0.995). We provide
an in-depth analysis of the model’s training metrics, bounding
box distributions, precision-recall curves, F1-confidence curves,
recall-confidence curves, and confusion matrices. Additionally, we
present extensive qualitative results to demonstrate the model’s
robustness in real-world scenarios. A comparison with existing
methods, such as two-stream CNNs and Transformer-based
models, highlights YOLOv11’s superior balance of accuracy
and speed, making it a promising solution for real-time HAR
applications. This study also discusses the model’s limitations
and outlines directions for future research, paving the way for
enhanced action recognition systems.