Protein-protein interactions(PPIs) participate in dynamic cellular and biological processes continually in our life. Thus, it is crucial to understand the PPIs thoroughly so that we can elucidate disease occurrence, achieve the desired drug target therapeutic effect and model the protein complex structures. However, compared to the available protein sequences of different organisms, the number of revealed protein-protein interactions is relatively limited. Lots of research methods have investigated in this field to facilitate the discovery of novel PPIs. Among these methods, PPI prediction techniques that solely depend on protein sequences are more prevalent than other methods which require a wealth of domain knowledge. In this paper, a Multi-modal Deep Learning Framework is proposed by combining protein physicochemical features as well as the structural context-local features from the PPI networks. In other words, our method not only takes into account the protein sequence information but also discern the neighboring effect for protein nodes in the PPI networks. We use a stacked auto-encoder architecture together with CBOW based metapath model to examine the PPI predictions. Based on that, we use the supervised deep neural networks to both identify the PPIs and classify the protein families. The results present that our Multi-modal deep learning framework achieves better performance compared to primary methods.