Human Genome Headlines
-
The genomic landscape shows marked variation in the distribution of a number
of features, including genes, transposable elements, GC content, CpG islands
and recombination rate. This gives us important clues about function.
For example, the developmentally important HOX gene clusters are the most
repeat-poor regions of the human genome, probably reflecting the very complex
coordinate regulation of the genes in the clusters.
-
There appear to be about 30,000-40,000 protein-coding genes in the human
genome-only about twice as many as in worm or fly. However, the genes are more
complex, with more alternative splicing generating a larger number of protein
products.
-
The full set of proteins (the 'proteome') encoded by the human genome is more
complex than those of invertebrates. This is due in part to the presence of
vertebrate-specific protein domains and motifs (an estimated 7% of the total),
but more to the fact that vertebrates appear to have arranged pre-existing
components into a richer collection of domain architectures.
-
Hundreds of human genes appear likely to have resulted from horizontal transfer
from bacteria at some point in the vertebrate lineage. Dozens of genes appear
to have been derived from transposable elements.
-
Although about half of the human genome derives from transposable elements,
there has been a marked decline in the overall activity of such elements in
the hominid lineage. DNA transposons appear to have become completely inactive
and long-terminal repeat (LTR) retroposons may also have done so.
-
The pericentromeric and subtelomeric regions of chromosomes are filled with
large recent segmental duplications of sequence from elsewhere in the genome.
Segmental duplication is much more frequent in humans than in yeast, fly
or worm.
-
Analysis of the organization of Alu elements explains the longstanding mystery
of their surprising genomic distribution, and suggests that there may be
strong selection in favour of preferential retention of Alu elements in
GC-rich regions and that these 'selfish' elements may benefit their human
hosts.
-
The mutation rate is about twice as high in male as in female meiosis,
showing that most mutation occurs in males.
-
Cytogenetic analysis of the sequenced clones confirms suggestions that large
GC-poor regions are strongly correlated with 'dark G-bands' in karyotypes.
-
Recombination rates tend to be much higher in distal regions (around 20
megabases (Mb)) of chromosomes and on shorter chromosome arms in general,
in a pattern that promotes the occurrence of at least one crossover per
chromosome arm in each meiosis.
-
More than 1.4 million single nucleotide polymorphisms (SNPs) in the human
genome have been identified. This collection should allow the initiation of
genome-wide linkage disequilibrium mapping of the genes in the human population.
Mouse Genome Headlines
-
The mouse genome is about 14% smaller than the human genome (2.5 Gb
compared with 2.9 Gb). The difference probably reflects a higher rate of
deletion in the mouse lineage.
-
Over 90% of the mouse and human genomes can be partitioned into corresponding
regions of conserved synteny, reflecting segments in which the gene order in
the most recent common ancestor has been conserved in both species.
-
At the nucleotide level, approximately 40% of the human genome can be aligned to
the mouse genome. These sequences seem to represent most of the orthologous
sequences that remain in both lineages from the common ancestor, with the rest
likely to have been deleted in one or both genomes.
-
The neutral substitution rate has been roughly half a nucleotide substitution
per site since the divergence of the species, with about twice as many of these
substitutions having occurred in the mouse compared with the human lineage.
-
By comparing the extent of genome-wide sequence conservation to the neutral
rate, the proportion of small (50-100 bp) segments in the mammalian genome that
is under (purifying) selection can be estimated to be about 5%.
This proportion is much higher than can be explained by protein-coding
sequences alone, implying that the genome contains many additional features
(such as untranslated regions, regulatory
elements, non-protein-coding genes, and chromosomal structural elements) under
selection for biological function.
-
The mammalian genome is evolving in a non-uniform manner, with various
measures of divergence showing substantial variation across the genome.
-
The mouse and human genomes each seem to contain about 30,000 protein-coding
genes. These refined estimates have been derived from both new evidence-based
analyses that produce larger and more complete sets of gene predictions, and
new de novo gene predictions that do not rely on previous evidence of
transcription or homology. The proportion of mouse genes with a single
identifiable orthologue in the human genome seems to be approximately 80%.
The proportion of mouse genes without any homologue currently detectable in
the human genome (and vice versa) seems to be less than 1%.
-
Dozens of local gene family expansions have occurred in the mouse lineage.
Most of these seem to involve genes related to reproduction, immunity and
olfaction, suggesting that these physiological systems have been the focus of
extensive lineage-specific innovation in rodents.
-
Mouse-human sequence comparisons allow an estimate of the rate of protein
evolution in mammals. Certain classes of secreted proteins implicated in
reproduction, host defence and immune response seem to be under positive
selection, which drives rapid evolution.
-
Despite marked differences in the activity of transposable elements between
mouse and human, similar types of repeat sequences have accumulated in the
corresponding genomic regions in both species. The correlation is stronger
than can be explained simply by local (G+C) content and points to additional
factors influencing how the genome is moulded by transposons.
-
By additional sequencing in other mouse strains, we have identified about 80,000
single nucleotide polymorphisms (SNPs). The distribution of SNPs reveals that
genetic variation among mouse strains occurs in large blocks, mostly reflecting
contributions of the two subspecies Mus musculus domesticus and
Mus musculus musculus to current laboratory strains.