Command-line options for tffind release 1.1
Nov 3, 2003
Matt Weirauch (weirauch@soe.ucsc.edu)


Running tffind in batch mode has the exact same capabilities and functionality as
running tffind in interactive mode.  To run in batch mode, type 

	tffind ALIGNMENT_FILE MATRIX_FILE

along with any of the following options:


	-n TFBS_NAME
		The name of the binding site matrix exactly as it occurs in the current
		matrix file.  To change matrix files, use the "-db" option BEFORE entering
		the TFBS that is found in the matrix file.  TFBS names are 
		case-insensitive, and may or may not need to be enclosed in single quotes,
		depending on if the name contains a character such as '(' or '<'.  In
		IMD files, the name is the first word on each header line.  In TransFac
		files, the name occurs near the top of each entry, by itself on a line
		that begins with 'NA'.
	
	-p TFBS_CONSENSUS_PATTERN
		The consensus pattern of the binding site, as it is listed in the matrix
		file.  To change matrix files, use the "-db" option BEFORE entering
                the TFBS that is found in the matrix file.  TFBS consensus patterns are
		case insensitive, and often contain ambiguous nucletide characters such
		as W (A or T) or N (any nucleotide.)  In IMD files, the consensus pattern
		is the fourth entry on the header line.  In TransFac files, the consensus
		pattern occurs along the right hand side of the matrix, running vertically
		from top to bottom.

	-a TFBS_ACCESS_NUMBER
		The access number of the binding site, as it is listed in the matrix file.
		To change matrix files, use the "-db" option BEFORE entering the TFBS that
                is found in the matrix file.  TFBS access numbers are case insensitive, 
		and are of the format "M00034".  In IMD files, the access number is the 
		final entry on the header line.  In TransFac files, the access number
		occurs near the top of each entry, by itself on a line that begins with
		'AC'.

	-i 'TFBS_TRANSFAC_ID'
		The TransFac ID of the binding site, as it is listed in the TransFac file.
		To change matrix files to one in TransFac format, use the "-db" option 
                BEFORE entering the TFBS that is found in the TransFac matrix file.  
		TransFac IDs are case insensitive, and are of the format "V$MYOD_01".
		When entering the ID, be sure to enclose it in single quotes.  In 
		TransFac files, the ID is near the top of each entry (under the access
		number), by itself on a line that begins with 'ID'.

	-s 'SPACER'
		Add a spacer to the current pattern that is being constructed.
		Spacers are used to specify how far TFBSs are from each other (see Tffind
		documentation for more information.)  They are of the format "<20>", 
		meaning exactly 20 bps, or "<10,20>", meaning between 10 and 20 bps.
		When entering the spacer, be sure to enclose it in single quotes.

		*** NEW FOR VERSION 1.1 ***
		Tffind now also accepts spacers of the format "10,20" and "10".

	-list
		List the available TFBSs in the current matrix database.  The TFBSs
		are printed to a file with the same name as the database, but with 
		"_list" appended to the end of its name (i.e. the list of available 
		TFBSs in the file 'imd.txt' would be printed to a file called 
		'imd.txt_list'.)

	-db DATABASE_FILENAME
		Change the current matrix database.  Two matrix formats are supported:
		the IMD format and the TransFac format (see Tffind documentation for
		information on these formats.)  Be sure to change databases before 
		entering the TFBS you want to use from that database.  For example,
		entering "tffind hmr1 imd.txt -n GATA-1 -db TransFac.txt" would use the 
		GATA-1 matrix from the IMD database, because the "-db" option
		was specified after the GATA-1 TFBS was specified.  In this way, it
		is possible to utilize TFBSs from different databases within the same
		pattern.

	-min MIN_SEQS
		Specify the minimum number of sequences in the alignment that must
		match the pattern for a hit to be recorded.  This number must be between
		1 and the number of sequences in the alignment.

	-cut CUTOFF
		Specify the cutoff value used to calculate the threshold score that a
		motif must be greater than or equal to if it is to be deemed a hit.  The
		formula used is:

		Threshold = (cutoff) * MaxPossibleScore + (1 - cutoff) * MinPossibleScore

		This equation was taken from the MOTIF program(http://motif.genome.ad.jp).
		The default value for the cutoff is 0.85, the same value used by MOTIF.
		The higher the cutoff value, the more stringent the search requirement
		(and consequently, the fewer matches that will be reported.) Its value
		must be between 0 and 1.  Note that when this option is used is important.
		For example, the queries 

		"tffind hmr1 imd.txt -cut .5 -n gata-1 -s '<0,50>' -n gata-1" and
		"tffind hmr1 imd.txt -n gata-1 -s '<0,50>' -cut .5 -n gata-1" 

		are different from each other.  The former will have a value of .5 for
		both GATA TFBSs, whereas the latter will have a (default) value of .85 for
		the first one, and a value of .5 for the second one.

	-range FROM TO
		Specify the range in the alignment to search for the pattern.  The 
		beginning of the range is followed by a space and the end of the range.
		This range is relative to the top sequence in the alignment.  The values
		of the range must be between 1 and the length of the topmost sequence in
		the alignment.

	-lookin SEQS_TO_LOOK_IN
		Designate which sequences in the alignment to look for the pattern.
		The topmost sequence is sequence 0, the second topmost 1, etc.  Each 
		sequence is entered one at a time, space seperated (i.e. "-lookin 0 2 3").

	-exons g|c|e WHICH_SEQ EXONS_FILENAME
		Filter out hits that occur in regions specified in a file in exons format.
		The first argument is the type of region that is filtered out: -g filters
		out entire genes (in an exons file, the line ">gene1 100 500"), -c
		filters out the coding region of the gene (the line "+ 200 400"), and
		-e filters out exon regions (the lines "120 230", etc.).  See Tffind 
		documentation for more information on exons files.  The second argument
		specifies which sequence in the alignment the positions are relative to.
		The third argument gives the name of the exons file, and may need to be
		enclosed in single quotes in certain cases.  The option "-exons c 0 ex1"
		would filter out all hits that occur in the coding regions of the exons
		found in file 'ex1', relative to sequence 0.

	-reltop
		Report the positions of all hits relative to the topmost sequence in the
		alignment, as opposed to being relative to the sequence they occur in.

	-lc		*** NEW FOR VERSION 1.1 ***
		Treat lowercase nucleotides as if they were uppercase.  The default setting masks
		out all lowercase nucleotides, treating them as if they were gaps. 

	-gff		*** NEW FOR VERSION 1.1 ***
		Change output format to gff format.  Documentation for this format is available
		at the Sangre Institute Webpage: http://www.sanger.ac.uk/Software/formats/GFF/
		The default format is Tffind format, as described in the Tffind manual.


A sample query:

To search in the alignment file "hmr1" for the a GATA-1 binding site
followed by the NF-kB binding site 0 to 200 bps away, with a minimum of 2 sequences
in the alignment matching, enter the following at the command-line:

tffind hmr1 imd.txt -n gata-1 -s '<0,200>' -n NF-kB -min 2


More information on all of these options and a more in-depth discussion can be found
in the Tffind documentation "tffind_manual.txt".