Document Type : Survey
A semantic video analysis system is a semi- or fully-automated system for investigation of video contents and extraction of concepts and semantics from video. These systems can be classified from different viewpoints. In this article, the semantic gap between low-level features and high-level concepts in the semantic video analysis systems is discussed and the literature is reviewed from viewpoint of semantic hierarchy for video production.To this end, after a brief description of video analysis systems and their general block diagram, two main challenges of these systems are discussed: sensory gap and semantic gap. Then, different approaches for reduction of the semantic gap are studied based on a semantic hierarchy used for video production. According to this hierarchical structure, there are three main steps for reduction of semantic gaps: frame processing; content analysis; and semantic extraction. Finally, open problems in this field of research are presented as the following. Usually, a wide variety of events and concepts may occur in a video. Different concepts may be assigned to an event in different circumstances. Some high-level concepts may occur in a relatively long duration of video; thus, extraction of such concepts requires processing of a relatively long duration of video to construct a semantic network between the concepts extracted from short durations of video. Usually, analysis of multi-modal information may close the semantic gaps that exist in the analysis of single-modal information.