Mask R-CNN for Moving Shadow Detection and Segmentation

Document Type : Original Article

Authors

1 Information Technology Dept. Faculty of Computers and Information Menofia University Egypt

2 Information Technology dept., Faculty of Computers and Information, Menoufia University, Egypt

Abstract

One of the primary tasks of completing and developing many computer vision applications is to identify and remove shadow regions. Most existing moving shadow detection methods depend on extracting hand-crafted features of object and shadow regions manually (for example the chromaticity, physical, or geometric properties). Shadow detection using handcrafted features is a challenging task due to different environmental conditions of the shadow such as camouflage and illumination irregularity problems that make these features inefficient to handle such problems. The proposed method uses Convolution Neural Networks (CNN) to automatically learn different distinctive features to model shadow under different environmental conditions. In this paper, the Mask Region Convolution Neural Network (Mask R-CNN) framework is evaluated and tested to automatically perform semantic segmentation in order to detect and classify shadow pixels from the entire video frame. To adapt Mask R-CNN for segmenting and detecting shadow regions, the most significant features are extracted from video frames in a supervised way using deep Residual Network (ResNet-101) architecture. Then, the Region proposal network (RPN) predicts regions of interest (ROI) and their classes that contain foreground objects. Finally, Fully Convolutional Network (FCN) generates a binary segmentation mask for each detected class in ROI. The proposed framework evaluated on common shadow detection datasets that have different environmental issues. Experimental results achieved high performance rates compared to several state-of-the-art methods in terms of average detection rate (96.81%), average discrimination rate (99.42%), and overall accuracy (98.09 %).

Keywords