Readings for BMMB/CSE 597D


Overview

  • The DNA sequence of human chromosome 22. Dunham I, et al. The complete article can be found by following links from the Sanger Center's website for Chr. 22 and excerpts concerning genefinding are here.

    Interspersed repeats

  • The origin of interspersed repeats in the human genome. Smit AF.
  • Ancestral, mammalian-wide subfamilies of LINE-1 repetitive sequences. Smit AF, Toth G, Riggs AD, Jurka J.
  • Interspersed repeats and other mementos of transposable elements in mammalian genomes. Smit AF.
  • MIRs are classic, tRNA-derived SINEs that amplified before the mammalian radiation, Smit AF, Riggs AD.

    Homology based gene-finding methods

  • Sequence analysis and database searching. Gregory D. Schuler. Chapter 7 in the book by Baxevanis & Ouellette.
  • Basic local alignment search tool. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ.
  • Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. The complete paper is available on-line.
  • On-line Blast tutorial..
  • Stephen Altschul's three lectures on Blast statistics: 1, 2, 3.
  • Analysis of EST-driven gene annotation in human genomic sequence. Bailey LC Jr, Searls DB, Overton GC
  • A comparison of expressed sequence tags (ESTs) to human genomic sequences. Wolfsberg TG, Landsman D
  • A computer program for aligning a cDNA sequence with a genomic DNA sequence. Florea L, Hartzell G, Zhang Z, Rubin GM, Miller W.

    Comparison of human and mouse genomic sequences

  • Long human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. Hardison RC, Oeltjen J, Miller W.
  • PipMaker -- A Web server for aligning two genomic DNA sequences. Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck J, Gibbs R, Hardison R, Miller W.
  • Comparative analysis of 1196 orthologous mouse and human full-length mRNA and protein sequences. Makalowski W, Zhang J, Boguski MS.
  • Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. Makalowski W, Boguski MS.
  • Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs. Jareborg N, Birney E, Durbin R.

    Substitution matrices for protein comparisons

  • Scores for sequence searches and alignments. Henikoff S.
  • Amino acid substitution matrices from protein blocks. Henikoff S, Henikoff JG.

    Ab initio gene-finding methods

  • Computational methods for the identification of genes in vertebrate genomic sequences. Claverie JM.
  • Finding the genes in genomic DNA. Burge CB, Karlin S.
  • Predictive methods using nucleotide sequences. Fickett JW. Chapter 10 in the book by Baxevanis & Ouellette.
  • Prediction of complete gene structures in human genomic DNA. Burge C, Karlin S.
  • Eukaryotic promoter recognition. Fickett JW, Hatzigeorgiou AG.

    Hidden Markov models and protein families

  • Hidden Markov models. Eddy SR.
  • Profile hidden Markov models. Eddy SR.
  • Pfam: a comprehensive database of protein domain families based on seed alignments. Sonnhammer EL, Eddy SR, Durbin R. See also the updates for 1998, 1999 and 2000.

    Multiple sequence alignment

  • Practical aspects of multiple sequence alignment. Andreas D. Baxevanis. Chapter 8 in the book by Baxevanis & Ouellette.
  • CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Thompson JD, Higgins DG, Gibson TJ.
  • Using CLUSTAL for multiple sequence alignments. Higgins DG, Thompson JD, Gibson TJ.
  • The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG.

    Related topics

  • Noncoding RNA genes. Eddy SR.
  • Annotating sequence data using Genotator. Harris NL.
  • Frequent alternative splicing of human genes. Mironov AA, Fickett JW, Gelfand MS.
  • Interpreting cDNA sequences: some insights from studies on translation. Kozak M.
  • Computational methods for the identification of differential and coordinated gene expression. Claverie JM.