Nowadays, in such a high-tech living lifestyle, profusion of multimedia data are produced and propagated around the world. To identify meaningful semantic concepts from the large amount of data, one of the major challenges is called the data imbalance problem. Data imbalance occurs when the number of positive instances (i.e., instances which contain the target concept) is greatly less than the number of negative instances (i.e., instances which do not contain the target concept). In other words, the ratio between positive and negative instances is extremely low. Rebalancing the dataset is usually proposed to resolve the problem by sampling or data pruning. In this paper, we propose a sampling method which consists of three stages, namely selecting features to identify the negative instances, producing negative ranking scores, and performing sampling. The method is compared with some other existing methods on the TRECVID dataset and is demonstrated to have better performance.