In this paper, a video semantic retrieval framework is proposed based on a novel unsupervised motion region detection algorithm which works reasonably well with dynamic background and camera motion. The proposed framework is inspired by biological mechanisms of human vision which make motion salience (defined as attention due to motion) is more attractive than some other low-level visual features to people while watching videos. Under this biological observation, motion vectors in frame sequences are calculated using the optical flow algorithm to estimate the movement of a block from one frame to another. Next, a center-surround coherency evaluation model is proposed to compute the local motion saliency in a completely unsupervised manner. The integral density algorithm is employed to search the globally optimal solution of the minimum coherency region as the motion region which is then integrated into the video semantic retrieval framework to enhance the performance of video semantic analysis and understanding. Our proposed framework is evaluated using video sequences in non-static background, and the promising experimental results reveal that the semantic retrieval performance can be improved by integrating the global texture and local motion information.