Identification of LV CISs in human HSPCs from hematochimeric mice and ALD clinical trial and comparative analysis of integration distribution within the CISs and in the surrounding chromosomal regions in datasets with documented insertional mutagenesis events. (A) Experimental strategy for LV integration site profiling in human CD34+ HSPCs derived from BM, MPB, and CB. On ex vivo transduction with LVs expressing a therapeutic (arylsulfatase A; LV.ARSA) or marker (LV.GFP) gene, cells were transplanted into immunodeficient mice (Rag2−/−Il2rg−/−), and a portion was cultured in vitro for 14 days. BM, thymus (Thy), and spleen (Spl) from mice that received a transplant were harvested 12 weeks after transplantation. Vector copy number, engraftment, and integration site analysis were then performed on the available samples. (B) Frequency distributions of LV integrations at 5 chromosomal regions targeted at high frequency. The bin size used for the chromosomal distributions is 1 Mb. The y-axis is the percentage of the total integrations of each dataset; the x-axis is chromosomal coordinates in megabase ×10. Genes at CIS locations are indicated for the ALD dataset and in red when common between the ALD and our datasets from panel A. (C) Frequency distributions of γRV integrations surrounding validated genotoxic CISs found in X-SCID and CGD clinical trials. (D) Frequency distributions of γ-retroviruses or SB transposon integrations surrounding validated genotoxic CISs found in tumors generated in different insertional mutagenesis studies. (E) Frequency distribution of SB transposon integrations at chromosome 1 near the transposon concatemer locus in transgenic mice. (F) Frequency distribution of genotoxic LV integrations targeting (left) Braf in hematopoietic tumors from Cdkn2a−/− mice and (right) the Ghr gene in IL-3–independent cell clones from Bokhoven et al.26 (G-H) Distribution of vector integrations around CIS centers. (G) Tukey box-and-whisker graph representing the distance of vector integrations from the center of CISs found in each dataset in a ± 2.5-Mb region (x-axis, units in base pair). (H) Tukey box-and-whisker graph representing the distance of vector integrations from the center of each CIS within the CIS interval. The center of each CIS was calculated as the position closest to the highest number integrations within the CIS interval. The tighter clustering of genotoxic integrations within CIS boundaries, although suggestive of positional constrains for cancer gene–activating integrations, it does not test if the integration frequency at the CIS is significantly different with respect to other regions and therefore cannot be used to discriminate between different CIS types.