With the rapid increase in the amount of multimedia data, the researches on semantic information retrieval are facing a very challenging problem - the number of positive data instances with the target concept/object/event compared with the number of negative data instances without the target concept/object/event is much smaller, which is also called the data imbalance issue. Therefore, one of the popular topics in multimedia information processing and retrieval is data pruning, a technique that can automatically identify and prune the data instances from the training data set so that the pruned data set is able to enhance the performance of model learning, classification, and concept detection. In this paper, a novel data pruning framework which gives each transaction a weight based on multiple correspondence analysis (MCA) is proposed. These transaction weights are used as the measure for pruning the training data set. Meanwhile, the testing data set could be weighted and pruned as well so that the computational cost is reduced not only when building the model but also when applying the classifiers. Experimenting with 18 high-level concepts and the benchmark (both balanced and imbalanced) data sets from TRECVID, our proposed framework achieves promising results to enhance the concept detection performance of three well-known classifiers commonly used for concept detection.