Command-line options for tffind release 1.1 Nov 3, 2003 Matt Weirauch (weirauch@soe.ucsc.edu) Running tffind in batch mode has the exact same capabilities and functionality as running tffind in interactive mode. To run in batch mode, type tffind ALIGNMENT_FILE MATRIX_FILE along with any of the following options: -n TFBS_NAME The name of the binding site matrix exactly as it occurs in the current matrix file. To change matrix files, use the "-db" option BEFORE entering the TFBS that is found in the matrix file. TFBS names are case-insensitive, and may or may not need to be enclosed in single quotes, depending on if the name contains a character such as '(' or '<'. In IMD files, the name is the first word on each header line. In TransFac files, the name occurs near the top of each entry, by itself on a line that begins with 'NA'. -p TFBS_CONSENSUS_PATTERN The consensus pattern of the binding site, as it is listed in the matrix file. To change matrix files, use the "-db" option BEFORE entering the TFBS that is found in the matrix file. TFBS consensus patterns are case insensitive, and often contain ambiguous nucletide characters such as W (A or T) or N (any nucleotide.) In IMD files, the consensus pattern is the fourth entry on the header line. In TransFac files, the consensus pattern occurs along the right hand side of the matrix, running vertically from top to bottom. -a TFBS_ACCESS_NUMBER The access number of the binding site, as it is listed in the matrix file. To change matrix files, use the "-db" option BEFORE entering the TFBS that is found in the matrix file. TFBS access numbers are case insensitive, and are of the format "M00034". In IMD files, the access number is the final entry on the header line. In TransFac files, the access number occurs near the top of each entry, by itself on a line that begins with 'AC'. -i 'TFBS_TRANSFAC_ID' The TransFac ID of the binding site, as it is listed in the TransFac file. To change matrix files to one in TransFac format, use the "-db" option BEFORE entering the TFBS that is found in the TransFac matrix file. TransFac IDs are case insensitive, and are of the format "V$MYOD_01". When entering the ID, be sure to enclose it in single quotes. In TransFac files, the ID is near the top of each entry (under the access number), by itself on a line that begins with 'ID'. -s 'SPACER' Add a spacer to the current pattern that is being constructed. Spacers are used to specify how far TFBSs are from each other (see Tffind documentation for more information.) They are of the format "<20>", meaning exactly 20 bps, or "<10,20>", meaning between 10 and 20 bps. When entering the spacer, be sure to enclose it in single quotes. *** NEW FOR VERSION 1.1 *** Tffind now also accepts spacers of the format "10,20" and "10". -list List the available TFBSs in the current matrix database. The TFBSs are printed to a file with the same name as the database, but with "_list" appended to the end of its name (i.e. the list of available TFBSs in the file 'imd.txt' would be printed to a file called 'imd.txt_list'.) -db DATABASE_FILENAME Change the current matrix database. Two matrix formats are supported: the IMD format and the TransFac format (see Tffind documentation for information on these formats.) Be sure to change databases before entering the TFBS you want to use from that database. For example, entering "tffind hmr1 imd.txt -n GATA-1 -db TransFac.txt" would use the GATA-1 matrix from the IMD database, because the "-db" option was specified after the GATA-1 TFBS was specified. In this way, it is possible to utilize TFBSs from different databases within the same pattern. -min MIN_SEQS Specify the minimum number of sequences in the alignment that must match the pattern for a hit to be recorded. This number must be between 1 and the number of sequences in the alignment. -cut CUTOFF Specify the cutoff value used to calculate the threshold score that a motif must be greater than or equal to if it is to be deemed a hit. The formula used is: Threshold = (cutoff) * MaxPossibleScore + (1 - cutoff) * MinPossibleScore This equation was taken from the MOTIF program(http://motif.genome.ad.jp). The default value for the cutoff is 0.85, the same value used by MOTIF. The higher the cutoff value, the more stringent the search requirement (and consequently, the fewer matches that will be reported.) Its value must be between 0 and 1. Note that when this option is used is important. For example, the queries "tffind hmr1 imd.txt -cut .5 -n gata-1 -s '<0,50>' -n gata-1" and "tffind hmr1 imd.txt -n gata-1 -s '<0,50>' -cut .5 -n gata-1" are different from each other. The former will have a value of .5 for both GATA TFBSs, whereas the latter will have a (default) value of .85 for the first one, and a value of .5 for the second one. -range FROM TO Specify the range in the alignment to search for the pattern. The beginning of the range is followed by a space and the end of the range. This range is relative to the top sequence in the alignment. The values of the range must be between 1 and the length of the topmost sequence in the alignment. -lookin SEQS_TO_LOOK_IN Designate which sequences in the alignment to look for the pattern. The topmost sequence is sequence 0, the second topmost 1, etc. Each sequence is entered one at a time, space seperated (i.e. "-lookin 0 2 3"). -exons g|c|e WHICH_SEQ EXONS_FILENAME Filter out hits that occur in regions specified in a file in exons format. The first argument is the type of region that is filtered out: -g filters out entire genes (in an exons file, the line ">gene1 100 500"), -c filters out the coding region of the gene (the line "+ 200 400"), and -e filters out exon regions (the lines "120 230", etc.). See Tffind documentation for more information on exons files. The second argument specifies which sequence in the alignment the positions are relative to. The third argument gives the name of the exons file, and may need to be enclosed in single quotes in certain cases. The option "-exons c 0 ex1" would filter out all hits that occur in the coding regions of the exons found in file 'ex1', relative to sequence 0. -reltop Report the positions of all hits relative to the topmost sequence in the alignment, as opposed to being relative to the sequence they occur in. -lc *** NEW FOR VERSION 1.1 *** Treat lowercase nucleotides as if they were uppercase. The default setting masks out all lowercase nucleotides, treating them as if they were gaps. -gff *** NEW FOR VERSION 1.1 *** Change output format to gff format. Documentation for this format is available at the Sangre Institute Webpage: http://www.sanger.ac.uk/Software/formats/GFF/ The default format is Tffind format, as described in the Tffind manual. A sample query: To search in the alignment file "hmr1" for the a GATA-1 binding site followed by the NF-kB binding site 0 to 200 bps away, with a minimum of 2 sequences in the alignment matching, enter the following at the command-line: tffind hmr1 imd.txt -n gata-1 -s '<0,200>' -n NF-kB -min 2 More information on all of these options and a more in-depth discussion can be found in the Tffind documentation "tffind_manual.txt".