Design of multispecific protein sequences using probabilistic graphical modeling.

Publication Type:

Journal Article


Proteins, Volume 78, Issue 3, p.530-47 (2010)


2010, Algorithms, Amino Acid Sequence, Center-Authored Paper, Computational Biology, Evolution, Molecular, Models, Biological, Models, Chemical, Models, Molecular, Models, Statistical, Molecular Sequence Data, Peroxisome Proliferator-Activated Receptors, PROTEINS, Public Health Sciences Division, Structure-Activity Relationship, Temperature, Thioredoxins, Transducin


In nature, proteins partake in numerous protein- protein interactions that mediate their functions. Moreover, proteins have been shown to be physically stable in multiple structures, induced by cellular conditions, small ligands, or covalent modifications. Understanding how protein sequences achieve this structural promiscuity at the atomic level is a fundamental step in the drug design pipeline and a critical question in protein physics. One way to investigate this subject is to computationally predict protein sequences that are compatible with multiple states, i.e., multiple target structures or binding to distinct partners. The goal of engineering such proteins has been termed multispecific protein design. We develop a novel computational framework to efficiently and accurately perform multispecific protein design. This framework utilizes recent advances in probabilistic graphical modeling to predict sequences with low energies in multiple target states. Furthermore, it is also geared to specifically yield positional amino acid probability profiles compatible with these target states. Such profiles can be used as input to randomly bias high-throughput experimental sequence screening techniques, such as phage display, thus providing an alternative avenue for elucidating the multispecificity of natural proteins and the synthesis of novel proteins with specific functionalities. We prove the utility of such multispecific design techniques in better recovering amino acid sequence diversities similar to those resulting from millions of years of evolution. We then compare the approaches of prediction of low energy ensembles and of amino acid profiles and demonstrate their complementarity in providing more robust predictions for protein design.