Figure 2.
Processing of variants. An overview of the processing of variants after they have been imported into the database. Firstly, variants found in multiallelic sites (n = 312) were split up and treated as separate individual variants. A multiallelic site is a specific locus in the genome with ≥2 alternate sequences observed in addition to the reference sequence. Secondly, all variants were remapped from the GRCh37 assembly to their corresponding LRG reference. Discrepancies between these references were found at 101 variant locations, so to handle the transition to the LRG reference correctly, additional variants were generated to reflect these differences. Thirdly, accurate descriptions according to HGVS nomenclature were generated at the DNA level.24 All variants were then classified according to terms defined by Sequence Ontology v2.525 and alleles generated. 1000 Genomes, 1000 Genomes Project4 ; GRCh37, Genome Reference Consortium human genome build 3745 ; HGVS, Human Genome Variation Society24 ; LRG, Locus Reference Genomic.46 The Sequence Ontology set of terms and relationships are used to describe features and attributes of biological sequences.25