The ever accumulating wealth of knowledge about protein interactions and the domain architecture of involved proteins in different organisms offers ways to understand the intricate interplay between interactome and proteome. Ultimately, the combination of these sources of information will allow the prediction of interactions among proteins where only domain composition is known. Based on the currently available protein-protein interaction and domain data of Saccharomyces cerevisiae and Drosophila melanogaster we introduce a novel method, Maximum Specificity Set Cover (MSSC), to predict potential protein-protein interactions. Utilizing interactions and domain architectures of domains as training sets, this algorithm employs a set cover approach to partition domain pairs, which allows the explanation of the underlying protein interaction to the largest degree of specificity. While MSSC in its basic version only considers domain pairs as the driving force between interactions, we also modified the algorithm to account for combinations of more than two domains that govern a protein-protein interaction. This approach allows us to predict the previously unknown protein-protein interactions in S. cerevisiae and D. melanogaster, with a degree of sensitivity and specificity that clearly outscores other approaches. As a proof of concept we also observe high levels of co-expression and decreasing GO distances between interacting proteins. Although our results are very encouraging, we observe that the quality of predictions significantly depends on the quality of interactions, which were utilized as the training set of the algorithm. The algorithm is part of a Web portal available at http://ppi.cse.nd.edu .
ASJC Scopus subject areas
- Molecular Biology