Information about numbering systems.
The position of a substitution, deletion, etc, has to be specified using a
designated numbering system. At least 5 are used in different communities,
and this database keeps track of equivalencies among them. You need to
choose the numbering system you prefer, and use it to fill in position
numbers in the appropriate fields. This is illustrated using Hb Villejuif
as an example, which is a Thr to Ile substitution at codon 123 of the
beta-globin gene, resulting from an ACC to ATC transition.
- Common DNA-based description:
For hemoglobin genes, it is most common to use +1 for the cap site
(initiation site for transcription). Thus for Hb Villejuif, the mutation
can be described as g.1401C>T in this numbering system. The requested
number for the position of the substitution would be 1401. (The "g" means
genomic DNA.)
- Official HGVS genomic DNA-based description:
In this numbering system, +1 is the A in the initiator ATG. Thus the
substitution in Hb Villejuif can be described as g.1351C>T (since HBB has
a 50 bp 5' untranslated region), and the requested number for position of
the substitution would be 1351. For more information on the Hugo nomenclature
click here
or here.
- Description using the GenBank reference sequence:
In this convention, +1 is the first nucleotide
of the reference sequence entry in GenBank
(NG_000007.3
for the beta-like globin genes, or
NG_000006.1
for the alpha-like globin genes). Thus in the
reference sequence for beta, the Hb Villejuif
substitution occurs at position 71945.
- Common protein-based description:
Hb Villejuif can be described as p.T123I, where the amino acid after
the initiator methionine is 1. The requested number for this example would
be 123. (The "p" means protein.)
- Official HGVS protein-based description:
In this convention, the initiator methionine is 1. Thus Hb Villejuif
would be described as p.T124I, and the requested number for this example
would be 124. For more information on the Hugo nomenclature click
here
or here.
- Large deletions
For the Thalassemias that extend beyond the GenBank sequences the
sequence HUMHBB_DOM_385
was used for beta-like globin genes. GenBank NG_000007.3
sequence is nts 215117 to 296612 of the HUMHBB_DOM_385 sequence. For the
alpha-like globin genes the sequence Hum_16p13.3_376 D_Higgs_99June18 was used
to locate endpoints. GenBank NG_000006.1 sequence is nts 129137 to 172194.
These are still stored with the GenBank numbering system. For example a
deletion starting one nt before the GenBank number 1 is stored as 0 and is
129136 in the Hum_16p13.3_376 D_Higgs_99June18 sequence. This is how
numbers larger or smaller than the numbers in the GenBank sequences are
computed.
- Build hg#, chromosome # coordinates:
This is the chromosome coordinates as used in Genome Browsers.
hg18 is the Human March 2006 assembly (NCBI Build 36.1).
Return to the menu