Camo-M3FD: A New Benchmark Dataset for Cross-Spectral Camouflaged Pedestrian Detection

CVPR-Workshop 2026
Paper Dataset

About the paper

Pedestrian detection is fundamental to autonomous driving, robotics, and surveillance. Despite progress in deep learning, reliable identification remains challenging due to occlusions, cluttered backgrounds, and degraded visibility. While multispectral detection—combining visible and thermal sensors—mitigates poor visibility, the challenge of camouflaged pedestrians remains largely unexplored. Existing Camouflaged Object Detection (COD) benchmarks focus on biological species, leaving a gap in safety-critical human detection where targets blend into their surroundings. To address this, we introduce Camo-M3FD (derived from the M3FD* dataset), a novel benchmark for cross-spectral camouflaged pedestrian detection, consisting of registered visible-thermal image pairs. The dataset is curated using quantitative metrics to ensure high foreground-background similarity. We provide high-quality pixel-level masks and establish a standardized evaluation framework using state-of-the-art COD models. Our results demonstrate that while thermal signals provide indispensable localization cues, multispectral fusion is essential for refining structural details. Camo-M3FD serves as a foundational resource for developing robust, safety-critical detection systems. The dataset is available on GitHub: https://cod-espol.github.io/Camo-M3FD/.

(*) Liu, J., Fan, X., Huang, Z., Wu, G., Liu, R., Zhong, W., & Luo, Z. (2022). Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5802-5811).

RGB 00032 RGB 00359 RGB 00459 RGB 01119 RGB 00785 Thermal 00032 Thermal 00359 Thermal 00459 Thermal 01119 Thermal 00785 GT 00032 GT 00359 GT 00459 GT 01119 GT 00785
Figure 1.Example images of the Camo-M3FD dataset. (1st row) Visible (RGB) images. (2nd row) Thermal images. (3rd row) Segmentation mask images of camouflaged objects.
Dataset Source Year Scope Type of images # images
Chameleon [29] - 2018 Animal RGB 76
CAMO [21, 39] CVIU 2019 Animal & others RGB 1,250
COD10K [11, 12] CVPR 2020 Animal & others RGB 10,000
NC4K [25] CVPR 2021 Animal & others RGB 4,121
Camo-M3FD (Ours) CVPR 2026 Pedestrian RGB + Thermal 614
Table 1. COD datasets comparison.
RGB
Recon 1 Recon 2 Recon 3 Recon 4 Recon 5 Recon 6 Recon 7 Recon 8 Recon 9 Recon 10 Recon 11
RGB Sobel
Sobel 1 Sobel 2 Sobel 3 Sobel 4 Sobel 5 Sobel 6 Sobel 7 Sobel 8 Sobel 9 Sobel 10 Sobel 11
GT Edges
Edges 1 Edges 2 Edges 3 Edges 4 Edges 5 Edges 6 Edges 7 Edges 8 Edges 9 Edges 10 Edges 11
Sα
0.8059
0.6292
0.3946
0.7715
0.7711
0.4472
0.7873
0.6052
0.3358
0.3573
0.3600
Figure 4. Examples of accepted and rejected (marked in red) images alongside their respective edges extracted by RGB using Sobel, edges of the GT mask, and camouflage scores (Sα).
Technique Source Source Year Image Size Backbone #Param.
Type (px) (M)
BASNet [28] CVPR Conference 2019 256 × 256 ResNet-34 [16] 87.06
SINet-v2 [12] TPAMI Journal 2021 352 × 352 Res2Net-50 [14] 24.93
BGNet [4] IJCAI Conference 2022 416 × 416 Res2Net-50 [14] 77.80
C2F-Net [3] TCSVT Conference 2022 352 × 352 Res2Net-50 [14] 26.36
OCENet [23] WACV Conference 2022 352 × 352 ResNet-50 [16] 58.17
EAMNet [30] ICME Conference 2023 384 × 384 Res2Net-50 [14] 30.51
DGNet [19] MIR Journal 2023 352 × 352 EfficientNet [31] 8.30
HitNet [17] AAAI Conference 2023 352 × 352 PVTv2 [37] 25.73
PCNet [40] arXiv - 2024 352 × 352 PVTv2 [37] 27.66
CTF-Net [41] CVIU Journal 2025 384 × 384 PVTv2 [37] 64.48
AVNet [33] VISAPP Conference 2026 416 × 416 PVTv2 [37] 48.04
Table 2. Distinctive characteristics of the evaluated SoTA COD techniques.
Technique Input Sα Fβw M Eφadp Eφmean Eφmax Fβadp Fβmean Fβmax
BASNet [28] Vis 0.6239 0.2902 0.0032 0.6972 0.7183 0.8042 0.2879 0.3057 0.3137
Th 0.7051 0.4161 0.0028 0.7293 0.7822 0.8078 0.3762 0.4358 0.4571
SINet-v2 [12] Vis 0.6275 0.2693 0.0037 0.6039 0.7080 0.7227 0.2244 0.2872 0.3033
Th 0.6927 0.4072 0.0034 0.6450 0.7593 0.7949 0.3428 0.4244 0.4424
BGNet [4] Vis 0.6745 0.3922 0.0500 0.7594 0.7687 0.8142 0.3576 0.4124 0.4255
Th 0.7196 0.4699 0.0106 0.7664 0.8306 0.8539 0.4315 0.4865 0.4963
C2F-Net [3] Vis 0.5137 0.0432 0.0811 0.4804 0.6079 0.7333 0.1433 0.2155 0.2554
Th 0.5244 0.0522 0.0663 0.5122 0.6432 0.7656 0.2064 0.2882 0.3437
OCENet [23] Vis 0.5994 0.2357 0.0037 0.6680 0.7975 0.8201 0.2240 0.2546 0.2632
Th 0.7277 0.4884 0.0037 0.7122 0.8152 0.8666 0.4253 0.4998 0.5403
EAMNet [30] Vis 0.5227 0.0494 0.0160 0.4048 0.6141 0.8109 0.0998 0.1752 0.2352
Th 0.5047 0.0333 0.0506 0.4946 0.6458 0.8091 0.1836 0.2622 0.3799
DGNet [19] Vis 0.6438 0.3109 0.0039 0.6720 0.7598 0.7739 0.2759 0.3235 0.3377
Th 0.6898 0.4073 0.0052 0.6765 0.7928 0.8227 0.3586 0.4244 0.4403
HitNet [17] Vis 0.5659 0.1593 0.0030 0.7333 0.5685 0.7353 0.1815 0.1721 0.1809
Th 0.6682 0.3622 0.0029 0.7694 0.7466 0.7778 0.3910 0.3800 0.3919
PCNet [40] Vis 0.6512 0.3227 0.0034 0.5048 0.7639 0.8069 0.1688 0.3464 0.3552
Th 0.7034 0.4260 0.0030 0.6187 0.8280 0.8428 0.2674 0.4504 0.4572
CTF-Net [41] Vis 0.5077 0.0525 0.0755 0.4201 0.5912 0.7296 0.1322 0.2449 0.3146
Th 0.6532 0.2955 0.0116 0.4178 0.7515 0.8073 0.1409 0.4310 0.4794
AVNet [33] Vis 0.6669 0.4035 0.0029 0.8294 0.8164 0.8287 0.3923 0.3985 0.4068
Th 0.7289 0.5066 0.0026 0.7989 0.8075 0.8242 0.4831 0.4926 0.5113
Vis+Th 0.7318 0.5301 0.0030 0.8167 0.8287 0.8617 0.5051 0.5139 0.5362
Table 3. Metric evaluation results for each COD technique on the Camo-M3FD dataset, reported for the RGB and Thermal baseline. Results are presented using the metric notation defined in Sec. 3.5, "↑ / ↓" indicates that larger or smaller is better. The best three performing results are highlighted using color: First, Second, and Third respectively.
RGB
RGB 1 RGB 2 RGB 3 RGB 4 RGB 5 RGB 6
Thermal
Thermal 1 Thermal 2 Thermal 3 Thermal 4 Thermal 5 Thermal 6
GT
GT 1 GT 2 GT 3 GT 4 GT 5 GT 6
BASNet Th [28]
BASNet 1 BASNet 2 BASNet 3 BASNet 4 BASNet 5 BASNet 6
BGNet Th [4]
BGNet 1 BGNet 2 BGNet 3 BGNet 4 BGNet 5 BGNet 6
OCENet Th [23]
OCENet 1 OCENet 2 OCENet 3 OCENet 4 OCENet 5 OCENet 6
AVNet Vis [33]
AVNet Vis 1 AVNet Vis 2 AVNet Vis 3 AVNet Vis 4 AVNet Vis 5 AVNet Vis 6
AVNet Th [33]
AVNet Th 1 AVNet Th 2 AVNet Th 3 AVNet Th 4 AVNet Th 5 AVNet Th 6
AVNet Vis+Th [33]
AVNet Vis+Th 1 AVNet Vis+Th 2 AVNet Vis+Th 3 AVNet Vis+Th 4 AVNet Vis+Th 5 AVNet Vis+Th 6
Figure 5. Results using SoTA COD techniques that have achieved first or second place in at least one of the metrics. Successful matches between GT and predicted masks (white areas); False positive regions (red areas, over-segmentation); and false negative regions (blue areas, miss-segmentation).

Paper

BibTeX

If you use the Camo-M3FD dataset, please cite the following paper,

        
         @inproceedings{velesaca2026camo-m3fd,
          title={Camo-M3FD: A New Benchmark Dataset for Cross-Spectral Camouflaged Pedestrian Detection},
          author={Velesaca, Heny O and Mero, Andrea and Castillo, Guillermo and Sappa, Angel},
          booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops},
          pages={1--8},
          year={2026}
        }