Genetic and epigenetic environment around the proviral insertion site. UISs identified in vivo were organized according to the disease status of the subject (AC, HAM/TSP, and ATLL) and UIS abundance (number of UISs per 10 000 PBMCs). In the ATLL category, the last column named “Major UIS” referred to the most abundant UIS in each person (ie, the proviral insertion site present in the putative malignant clone). “In vitro” refers to insertion sites isolated after coculture of uninfected T cells with an HTLV-1–infected cell line. The y-axis represents the departure from the random distribution. The in vitro results were compared with sites that were randomly generated in silico (horizontal asterisks below “vs random”). The UISs of lowest and highest abundance in each disease status group were compared with the in vitro sites (vertical asterisks to the right of “vs in vitro”). The trends associated with the UIS abundance were also tested for significance (asterisks below the black arrows). (A) “Pr” is the proportion of insertion sites lying within 10 kb of a CpG island or a RefSeq gene. Enrichment toward a given mark is calculated as the log ratio of “Pr” over “Pr random” (proportion expected in case of perfect random integration). Insertion sites isolated in vitro were enriched in the vicinity of CpG islands and genes compared with random (χ2 test). Increasing UIS abundance was correlated with proximity to CpG islands and genes (χ2 test for trend). The UISs of lowest abundance in each disease group were significantly less frequently integrated near CpG islands and genes than were the in vitro sites (χ2 test). (B-D) “N” is the number of a given epigenetic mark in a 10-kb window (± 5 kb) around the insertion site. “N random” is the number of that mark in the case of perfectly randomly distributed insertion sites. Enrichment of a given epigenetic mark was calculated as log (N/N random). In vitro insertion sites were found to lie in an environment enriched for both active and repressive epigenetic marks compared with random (B-D; unpaired t test with Welch correction). UIS abundance was negatively correlated with the density of gene-silencing marks (B, Pearson correlation test). UIS abundance was positively correlated with the density of marks associated with active transcription start sites (TSS), promoters, and transcribed units (C-D, Pearson correlation test). The UISs of highest abundance in each disease group were less frequently associated with gene-silencing marks than were the UISs in vitro (panel B, unpaired t test with Welch correction). The UISs of lowest abundance were less frequently associated with activating marks than the in vitro sites (C-D, unpaired t test with Welch correction). Sample size: In vitro, n = 2135; AC < 0.1, n = 4544; AC 0.1 to 1, n = 8649; AC 1 to 10, n = 727; HAM-TSP < 0.1, n = 26 200; HAM-TSP 0.1 to 1, n = 36 377; HAM-TSP 1 to 10, n = 2931; HAM-TSP > 10, n = 39; ATLL 0.1 to 1, n = 9827; ATLL 1 to 10, n = 1659; ATLL > 10, n = 69; ATLL major UIS, n = 19. ***P < .001. **P < .01. *P < .05. NS indicates not significant (P > .05).