SOV-refine: A further refined definition of segment overlap score and its significance for protein structure similarity

Tong Liu, Zheng Wang

Research output: Contribution to journalArticlepeer-review

7 Scopus citations


Background: The segment overlap score (SOV) has been used to evaluate the predicted protein secondary structures, a sequence composed of helix (H), strand (E), and coil (C), by comparing it with the native or reference secondary structures, another sequence of H, E, and C. SOV's advantage is that it can consider the size of continuous overlapping segments and assign extra allowance to longer continuous overlapping segments instead of only judging from the percentage of overlapping individual positions as Q3 score does. However, we have found a drawback from its previous definition, that is, it cannot ensure increasing allowance assignment when more residues in a segment are further predicted accurately. Results: A new way of assigning allowance has been designed, which keeps all the advantages of the previous SOV score definitions and ensures that the amount of allowance assigned is incremental when more elements in a segment are predicted accurately. Furthermore, our improved SOV has achieved a higher correlation with the quality of protein models measured by GDT-TS score and TM-score, indicating its better abilities to evaluate tertiary structure quality at the secondary structure level. We analyzed the statistical significance of SOV scores and found the threshold values for distinguishing two protein structures (SOV_refine > 0.19) and indicating whether two proteins are under the same CATH fold (SOV_refine > 0.94 and > 0.90 for three- and eight-state secondary structures respectively). We provided another two example applications, which are when used as a machine learning feature for protein model quality assessment and comparing different definitions of topologically associating domains. We proved that our newly defined SOV score resulted in better performance. Conclusions: The SOV score can be widely used in bioinformatics research and other fields that need to compare two sequences of letters in which continuous segments have important meanings. We also generalized the previous SOV definitions so that it can work for sequences composed of more than three states (e.g., it can work for the eight-state definition of protein secondary structures). A standalone software package has been implemented in Perl with source code released. The software can be downloaded from

Original languageEnglish (US)
Article number1
JournalSource Code for Biology and Medicine
Issue number1
StatePublished - Apr 20 2018


  • Assessment of protein secondary structure predictions
  • Comparing different definitions of topologically associating domains
  • Protein secondary structure prediction
  • Protein structure similarity
  • SOV score
  • Segment overlap score
  • Similarity of segmented biological sequences

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Health Informatics
  • Information Systems and Management


Dive into the research topics of 'SOV-refine: A further refined definition of segment overlap score and its significance for protein structure similarity'. Together they form a unique fingerprint.

Cite this