Most content-based recommender systems focus on analyzing the textual information of items. For items with images, the images can be treated as another information modality. In this paper, an effective method called MSLIM is proposed to integrate multimodal information for content-based item recommendation. It formalizes the probelm into a regularized optimization problem in the least-squares sense and the coordinate gradient descent is applied to solve the problem. The aggregation coefficients of the items are learned in an unsupervised manner during this process, based on which the k-nearest neighbor (k-NN) algorithm is used to generate the top-N recommendations of each item by finding its k nearest neighbors. A framework of using MSLIM for item recommendation is proposed accordingly. The experimental results on a self-collected handbag dataset show that MSLIM outperforms the selected comparison methods and show how the model parameters affect the final recommendation results.