Abstract
A systematic approach to identifying novel therapeutic strategies is essential for expanding curative options in hematopoietic malignancies such as acute myeloid leukemia (AML). We have recently established a high-resolution genetic platform that enables de novo therapeutic development through the CRISPR-Tiling Instructed Computer-Aided (CRISPR-TICA) workflow (Nat Struct Mol Biol 2024, PMID: 38316881). Using this approach, we further identified a new therapeutic pocket and its lead compound for AML treatment (Sci Adv 2024, PMID: 38394203). Here, we present an integrative pipeline, CRISPR-TICA.ai, which combines CRISPR-based functional genomics with generative artificial intelligence (GenAI)-driven ligand modeling to perform high-throughput evaluation of the top 100 leukemia-essential genes and their therapeutic potential in AML.
We first used the CRISPR screen dataset published by Novartis (Cancer Discov 2016, PMID: 27260157), which tile-scanned 68 genes in three cancer cell lines, as the initial training dataset to establish the CRISPR-TICA.ai pipeline. Structurally, ligandable protein surface pockets were identified by employing the P2Rank algorithm (J Cheminform 2018, PMID: 30109435). For each pocket, we used the structure-guided diffusion model DiffSBDD (Nat Comput Sci 2024, PMID: 39653846) to generate previously unseen chemical structures that differ substantially from existing drugs, thereby enabling the exploration of novel chemical space. The model was further fine-tuned using direct preference optimization (NeurIPS 2023) against a curated dataset of known inhibitors to enrich for favorable binding profiles (low binding free energy, ΔG) and calibrated against drug-likeness (QED), synthetic accessibility (SA), lipophilicity (logP), and Lipinski's rule compliance. Additionally, our pipeline integrates ligand–protein contact maps with CRISPR-hypersensitive positions to assess spatial interactions, prioritizing ligands that effectively mask biologically essential residues with high affinity.
The utility of CRISPR-TICA.ai in AML drug discovery was evaluated- we selected the top 100 leukemia-essential genes from the DepMap database (https://depmap.org/portal/) and generated custom CRISPR libraries for high-resolution gene tiling screens. These libraries comprised 20,233 sgRNAs to tile-scan all 100 genes (47,901 amino acid residues) in Molm13 AML cells at an average resolution of 2.4 amino acids per sgRNA. This dataset provided in-depth structural and functional input for CRISPR-TICA.ai to examine AI-enabled ligand design for over 300 high-confidence druggable pockets in AML. We further developed a user-friendly, web-based 3D visualization platform that interactively maps functional CRISPR hotspots alongside predicted pockets and ligands to enhance accessibility and reproducibility. This tool allows visualization of CRISPR tiling scores, predicted binding pockets, and ligand prioritization, enabling interactive exploration and application. To annotate therapeutically relevant sites, we incorporated data from multiple genomic and proteomic resources, including AlphaFold protein models, Pfam domain annotations, post-translational modification (PTM) sites, and COSMIC leukemia mutational data. Exportable files and GitHub integration are also provided for advanced users to deploy CRISPR-TICA.ai within their facilities.In summary, our study introduces a scalable, leukemia-focused framework for de novo therapeutic discovery. By integrating high-density CRISPR mutagenesis with diffusion-based GenAI modeling, CRISPR-TICA.ai provides a rational roadmap for compound design targeting residues essential for leukemia maintenance. Importantly, the CRISPR-TICA.ai pipeline is adaptable to CRISPR tiling datasets from solid tumors (e.g., the Novartis dataset), highlighting its versatility across diverse cancer types. This approach is especially promising for addressing historically undruggable and resistance-prone targets in AML and beyond.