Improving object recognition using context based top-down signals

Document Type : Research Paper


1 Department of Computer Engineering, Shahid Rajaee Teacher Training University, Tehran, Iran.

2 ​Department of Computer Engineering Shahid Rajaee Teacher Training University

3 PhD Student,School of Cognitive Science, Institute for Research in Fundamental Science (IPM)


Human visual system can recognize object accurately, swiftly, and effortlessly even when objects are under challenging conditions. Many research groups try to model this ability; however, these computational models could not achieve human performance. Convolutional neural networks (CNN’s) are the state-of-the-art successful computational vision models that try to implement feedforward path of human visual system. However, evidence shows that human visual system uses top-down expectation signals to increase accuracy and speed of object recognition under dificult conditons. In this study, we extend a well-known model using top-down expectation signals. In this regard, Alexnet network is considered as feedforward path. We used a pre-trained network on ImageNet dataset for object recognition and a pre-trained network on Places dataset for scene recognition. The pre-trained network on places was used to provide top-down feedback signals based on scene information. The feedback signals contain occurrence frequency information of the objects in the scene. These signals are integrated with information from feedforward path. To evaluate the proposed model several experiments were done on different image sets. The results showed that integrating the feedback information with the feedforward information significantly improve object recognition accuracy in comparison to the base model. This support the idea that content information facilitates object recognition ability, specifically when objects are under challenging conditions.