Abstract
It was proved that the fusion of information from multi-modality images increases the accuracy of pedestrian recognition systems. One of the best approach so far is to concatenate the features from multi-modality images into a large feature vector, but it requires strong camera calibration settings and non-discriminative modalities could lead to missclassification of some particular images. We present a modality fusion approach for pedestrian recognition, which is able to dynamically select and fuse the most discriminative modalities for a given image and furthermore use them in the classification process. Firstly, we extract kernel descriptor features from a given image in three modalities: intensity, depth and flow. Secondly, we dynamically determine the most suitable modalities for that image using a modality pertinence classifier. Thirdly, we join the features from the selected modalities and classify the image using a linear SVM approach. Numerical experiments are performed on the Daimler benchmark dataset consisting of pedestrian and non-pedestrian bounding boxes captured in outdoor urban environments and indicate that our model outperforms all the individual-modality classifiers and is slightly better than the model obtained by concatenating all multi-modality features.
Citare
Rus, A., Rogozan, A., Dioșan, L., Benshrair, A., Pedestrian recognition using a dynamic modality fusion approach, ICCP, 2015, 393-400
https://doi.org/10.1109/ICCP.2015.7312691