One of the hypotheses as to the pathogenesis of T-cell large granular lymphocytic leukemia (LGL) posits its origin from the initially polyclonal stimulation by unknown antigens, e.g., viral, autoimmune or tumor antigens. LGL evolves then via oligoclonal and ultimately monoclonal cytotoxic T-cell expansion. This process may be accompanied by acquisition of STAT3 (or rarely STAT5) mutations in 40% of cases. Identification of driver antigens is a key research question in LGL as it could both clarify its triggers as well as the mechanisms of cytopenias. We investigated a cohort of patients with this disease (n=195) by combining HLA allotypes and NGS-based TCR analysis to identify marker TCR Vβ clonotypes with the goal to reverse engineer or deduce peptides and therefore antigenic targets. For this purpose we used in silico and machine-learning modelling to capture structural features predictive of CDR3 Vβ epitope interactions in specific HLA allotype context.
To focus on specific HLA/TCR interactions we have analyzed HLA for the presence of common haplotypes. First, we generated HLA haplotype calls using whole-exome sequencing data. We have found that HLA-A*02 was the most common class I allele, but not overrepresented in LGL vs controls. We have also identified HLA-B*44 as a risk allele present in 36% LGL with STAT3 mutation which was further enriched when considering HLA-A*02/HLA-B*44 together (Fig.1A).
After TCR NGS we selected a threshold of 5.0% (based on study of 900 controls) to define an expanded (co)dominant clonotypes: 145 expanded clones were identified, of which 143 were unique/private. To identify possible consensus sequence across the CDR3 profile, we clustered the CDR3 AA sequences based on global alignment scores (Needleman-Wunsch) using BLOSUM62 substitution matrix and identified 9 clusters. However, multiple-sequence alignment within each CDR3 cluster did not show any distinct consensus sequence patterns overall and within cases sharing HLA-A*02. Cross-referencing with CDR3 databases of known conditions yielded 87 CDR3β (co)dominant LGL clonotypes. About 50% of those LGL clonotypes were shared in 2 common disease: type I diabetes, Celiac disease We also mined the VDJdb[2] for known CDR3-pHLA associations. Filtering to include high quality annotations, matches and HLA-A*02 restriction; LGL-derived TCR were found to recognize multiple epitopes. Prominent examples include CMV and EBV derived peptidic antigens from pp65, BRLF1, and BMLF1 (NLVPMVATV, RPPIFIRRL) present in 44% LGL expanded clonotypes (Fig.1B).
In order to reverse engineer the identity of antigenic peptides based on the CDR3β sequence and HLA type (high affinity peptides restricted to HLA A2, a machine-learning method was developed as predictive modelling tool that quantifies CDR3β/peptide binding. The previously developed LLM trained on AA sequences was used for that purpose[3] whereby LLM embeddings serve as input to a deep CNN-based model, which is trained on IEDB pos/neg datasets[4]. This method was augmented with control CDR3 sequences as negative data points. As a result, our model showed an AUC of 0.94 in cross-validation. As a proof of concept of the proposed model in epitope discovery, CMV pp65-derived peptides were fitted into specific LGL CDR3 and CMV specific CDR3 clonotypes were found to match e.g., pp65 epitope NLVPMVATV among several others overlapping with the above-described findings.
In conclusion, our analysis shows that a significant number of expanded clonotypes may be derived from known epitopes e.g., in autoimmune disease and viral infection pointing towards molecular mimicry. However, the specificity of a large fraction of the LGL spectrum remains unknown. For the discovery of corresponding peptides we generated ML-based antigen simulation method allowing to identify the best fitting peptides into the corresponding HLA allele and CDRVβ clonotype.
Fig. Global map of co-occurrence of HLA allotypes showing the delta frequency compared to healthy controls and Sankey plot of identified CDR3β - Epitope associations (a) Using the whole-exome sequencing based HLA genotype calls, we calculated the frequency of co-occurrence of different HLA loci and investigated the differences in regards to controls (b) Using the VDJdb, we conducted sequence similarity based search and extracted the CDR3-pHLA annotations.
Disclosures
Maciejewski:Omeros: Consultancy; Alexion: Membership on an entity's Board of Directors or advisory committees; Regeneron: Consultancy, Honoraria; Novartis: Honoraria, Speakers Bureau.