Comparative Analysis of Fine-Tuned Pre-Trained Models for Person Re-Identification on the Market-1501 Dataset
Keywords:
Person Re-Identification, Re-ID, Fine-Tuning, Deep Learning, CNNs, ViT, Pre-Trained ModelsAbstract
Person re-identification (Re-ID) plays a pivotal role in intelligent surveillance, enabling consistent identification of individuals across non-overlapping camera views. Despite the widespread adoption of deep learning, the comparative performance of modern pre-trained architectures under consistent fine-tuning conditions remains underexplored. This study presents a systematic evaluation of five widely used models—ResNet50, DenseNet121, EfficientNetB3, Vision Transformer (ViT), and Swin Transformer—fine-tuned on the Market-1501 dataset using a unified training pipeline. The models were assessed using Rank-1 and Rank-5 accuracy, mean Average Precision (mAP), and computational efficiency metrics such as GFLOPs, FPS, and parameter count. The Swin Transformer achieved the highest Rank-1 accuracy of 96.2% and mAP of 89.1%, outperforming convolutional counterparts while maintaining a competitive inference speed. The results on this benchmark reveal that transformer-based architectures demonstrate superior feature generalization and robustness against viewpoint and illumination variations. The study provides a reproducible benchmark that connects architectural design principles with Re-ID performance, offering practical guidance for future research and deployment in real-time surveillance systems.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 International Journal of Computers and Informatics (Zagazig University)

This work is licensed under a Creative Commons Attribution 4.0 International License.