In this issue of Blood, Howe and Stack1 present findings linking the predicted structural changes introduced by amino acid substitutions to immunogenicity of clinically relevant blood group antigens. The authors identify an inverse correlation between predicted protein disorder and immunogenicity.
The substitutions predicted to be most immunogenic are seen to reside in more rigid and buried regions of the proteins. This result may not come as too much of a surprise, as more restrained geometries will lend themselves to higher affinity interactions with B- and T-cell receptors, possibly giving rise to a differential immune response more generally. More surprisingly, the predictions obtained for peptide stretches centered around the substitutions exhibit a reduced relative solvent accessibility in comparison to less immunogenic antigens that are predicted to be presented in loops on the surface or in disordered regions. On the one hand, the immunological responses to transfusion with the antithetic antigens have been well characterized; on the other hand, however, the question remains how reliable structural predictions for single amino acid substitutions may be in regions where protein structure is ill-defined or even intrinsically disordered?
The prediction of protein structures for which experimental data exist in the form of homologs has been heavily investigated and, consequently, a vast array of approaches—ranging from free-energy minimizations to molecular dynamics and clustering of in silico predictions as well as neuronal networks trained to predict polypeptide folds ab initio—are available to assist researchers in exploring the chemical space of protein structure prediction.2 Most will agree that polypeptide conformational predictions, however they may be derived, can be considered highly reliable for protein domains for which experimental data exist. What about regions omitted from the experimental data due to the lack of definition? In experimental structure determination, this is typically observed as an absence of defined electron density or inability to interpret spectra, or simply—and commonly—no single structural arrangement that satisfies the measured data equally well. With the availability of deep-learning–trained algorithms, a hoard of predictive data are becoming available to researchers, and these are expected to have a dramatic impact, reaching as far as whole-proteome modeling and evolutionary comparison of protein families.3
The study by Howe and Stack represents a special case within the wider topic of polypeptide folding and accuracy of overall structure prediction. They address the consequences of single point mutations, which can be correlated with known immunogenicity following unmatched transfusion. The ability of algorithms such as AlphaFold2 (AF2)4 to accurately predict the structural consequences of single point mutations has been a matter of much debate, given evidence of bias toward existing structures (closely representing the seed or training set for any algorithm).5,6 Comparative analyses investigating the type of prediction returned from AF2 in relation to experimentally available data have revealed that the output tends to predict a single fold with high consistency in those cases where a rigid domain structure is more likely to be maintained within proteins of conserved function. The matter becomes much harder to predict when considering intrinsically disordered regions, which may exert their physiological role in terms of clusters or ensembles of varying conformations.
Predicted local distance difference testing has emerged as a useful way to predict such disorder.7 Regions of polypeptide that exert their physiological role as clusters of conformations (eg, intrinsically disordered protein domains) should be treated and analyzed as such, as structural ensembles. Predictive approaches reflecting this have been presented by combining different random seeds (in the form of multiple sequence alignments) for iterations of the predictive algorithm (AF2) or subclustering of the multiple sequence alignments (eg, across different species). Thus, they provide a pipeline for generating ensembles that could be used to study the effect of destabilizing mutations on the local domain structure, such as fold-switching properties.8,9 Further combination of such ensemble approaches with molecular dynamics and (folding) free-energy minimization calculations may provide valuable insight into the biochemical relevance of artificial intelligence–derived predictions, especially when a physiological metric (such as immunogenicity) can be used to correlate amino acid substitutions with bona fide structural changes.
What is still beyond reach of predictive algorithms is consideration of the impact posttranslational modifications may have on the local domain structure; for example, the Kell substitution investigated by Howe and Stack—substitution of threonine in position 193 by methionine—also abrogates the glycosylation of asparagine at position 191.10 It is reasonable to assume that the presence of a large and flexible glycan structure in the vicinity of the amino acid position in question will further affect local polypeptide folding and potentially—if folding trajectories were to converge and ultimately yield similar structures—the conformational space available to flexible regions nearby. Many antigenic proteins, including those studied by Howe and Stack, are not only heavily glycosylated but also exist as physiological multimers with either themselves or more promiscuously with other proteins (eg, glycophorin B in ankyrin complexes). Similar to the constraints represented by posttranslational modifications, protein-protein interfaces will restrict the conformational space certain regions may adopt at the interface and possibly beyond, resulting in a somewhat different predicted conformation than the monomeric protein and lacking any large-scale posttranslational modifications or multimer interfaces.
Future developments that include constraints reflecting such modifications and interfaces could increase our ability to critically assess the quality of the structure in terms of physiological relevance. For now, these remain known unknowns.
Conflict-of-interest disclosure: P.W. declares no competing financial interests.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal