Greg, Mark and David, We have a suite of programs that attempts to produce better consensus sequences for EST clusters by discarding suspicious Blast hits. Specifically, it improves on our earlier consensus-generating programs in three respects: 1. Smith-Waterman analysis enforces high-quality, edge-to-edge matching. 2. Gene's graph-reduction technique detects multi-sequence anomalies. 3. A more sophisticated algorithm is used to produce a "crude" alignment (as input to the alignment improver). The following files can be found in ~ftp/dist/NCBI on galapagos.cse.psu.edu. Makefile README - this file align.c band1.c genedata.c genelaps.c getstats.c go littlelib.c reAligner.c swband1.c versa.c Start with a directory containing "*.seq"-"*.hsp" file pairs. Create a sibling directory called "code.d" and ftp the above files to it. Then run "make", go to the directory with file pairs, and run "../code.d/go &". For each pair, this will create a corresponding "*.cns" file with the alignment(s) and consensus sequence(s) for each contig, and summarize the output in a file called "Results". The command "../code.d/getstats Results" produces a one-line summary of the Results file. We fixed a bug in reAligner.c, so please replace the earlier version. We ran the new programs on a collection of 366 clusters (3'reads only), with a total of 3527 sequences. The earlier programs produced alignments with an average of 4.1% mismatches, a worst alignment with 17.0% mismatches, and 8 alignments exceeding 10%. The new programs discarded 160 sequences for which no hit remained after Smith-Waterman filtering. Another 352 correspond to a single-sequence contig after filtering by graph-reduction. The remaining 3015 sequences were placed into 577 contigs, with 2.5% mismatches on average, and a maximum of 6.1%. Thus, on average, each cluster was modified by removal of about (160+352)/366 = 1.4 sequences and split into 577/366 = 1.6 contigs. Please try the programs on one or several contigs and get back to us with your impressions. In particular, if desirable, it might be possible to discard fewer Blast hits (and thereby discard fewer sequences and create fewer contigs) without much degrading of the percent of mismatches in the alignments. Thanks for all the help you're giving us on this very worthwhile and interesting project. --Gene, Eric, Zheng and Webb