Visual Analysis of Human Pose Estimation under Frame Degradation Using MediaPipe and ViTPose
Keywords:
Human Pose Estimation, MediaPipe, ViTPose, Rotation, Low Resolution, Model PerformanceAbstract
Human Pose Estimation (HPE) is a significant task in most computer vision applications. However, in the presence of visually degraded inputs, such as low-resolution or rotated video frames, its accuracy tends to reduce. This paper compared two frequently applied pose estimation models including MediaPipe (MP) and ViTPose in terms of their performance on carefully chosen frames extracted from three of our daily videos. In order to emulate non-optimal conditions, we used three kinds of visual filters on the videos, that is, loosy video compression (approximately 70% of the original size), clockwise 90-degree rotation, and 180-degree rotation. Then we used the original frames and compared them with their filtered counterparts using visual overlays of the predicted landmarks. Our results assist in shedding some light on the model reaction to such changes, as they provide a visual representation that could be used to explain anomalies in performance regarding different circumstances. These observations have been pivotal in determining the weakness of HPE systems in unpredictable environments and future opportunities to enhance pose estimation models with a view of their wider and real-life applications.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 International Journal of Computers and Informatics (Zagazig University)

This work is licensed under a Creative Commons Attribution 4.0 International License.