INFO: ===== Tests were initially performed for two versions of phylogen, depending on whether a flexible anchor was used or not. Data and the associated command/results files were grouped accordingly in two directories, N.data (no flexible anchor) and A.data (# active seqs in the column was used as flexible anchor). Later, as the number of active seqs proved to be a constant for our particular alignments, these data became redundant. Hence, only N-type tests were used, for simplification. TYPES OF FILES in this directory: ================================ *_reference files Reference set(s) containing the locations of the landmarks. .c files C-source code. README Summary of files, software/procedures. TYPES OF FILES in the N.data and A.data directories: ==================================================== DEVIATION files (or "run"-files) contain the raw deviation data (fp, fn, fp+fn) with respect to the reference (functional) sites obtained for runs of phylogen when varying the anchor. The parameter l stays the same for a given file. interval files Are derived from DEVIATION files, and contain the intervals of invariance for the anchor value. *bat* file Command files for the tests. make.best Command file for producing the point data for the blocks in Figs 4A,B,C of the "Comparison of five methods..." paper and the. Initiates phylogen runs for the best sets of the (a,l) parameters. PROGRAMS: ========= button.c * Initiates phylogen test runs and for each computes the counts of false positives and false negatives with respect to a specified set of landmarks (). The minimum length of the region is fixed (specified as [S=]) while a is varied over [-10,2] during the procedure. By default, phylogen with no flexible anchor is is used. The # of active sequences becomes the value for the flexible anchor when compiled with the -DANCHOR flag. interval.c * Determines intervals of invariance for the anchor value, given a phylogen DEVIATION file ('interval files'). min.c * Determines the best fp+fn count for a DEVIATION file (l=ct). Used in analysis for individual regions. *** source code in ../tools *** min_interval.c Determines the best fp+fn count for an 'interval file' (ie, for a file as the ones produced by the 'interval' command). Used in analysis for combined regions. *** source code in ../tools *** NOTE: min and min_interval perform the same operation on files with different format. min operates on DEVIATION ("run") files, while min_interval operates on interval files. merge.c * Merges two interval files of anchor values, to produce an interval file with a finer interleaving. Used in combined region assessments. crop.c * 'Crops' a DEVIATION file to the runs which conform to the specified true positives percentage cut. The last argument in the command line is the total length of the functional regions. *** source code in ../tools *** crops_total.c * Finds the a-intervals with the best "total count" among the set of runs conforming to the specified true positives percentage cut. Starts from a DEVIATION file, from which it produces an intermediate interval-file. Used in assessment of individual regions. crops_fp.c * Finds the a-intervals with the best "fp count" among the set of runs conforming to the specified true positives percentage cut. Starts from a DEVIATION file, from which it produces an intermediate interval-file. ctotal.c * The counterpart for crops_total.c in combined region evaluations. Finds the a-intervals with the best "total count" among the set of runs conforming to the specified true positives percentage cut. Starts directly from an interval file. cfp.c * The counterpart for crops_fp.c in combined region evaluations. Finds the a-intervals with the best "fp count" among the set of runs conforming to the specified true positives percentage cut. Starts directly from an interval file. COMMAND FILES: ============== /* A : ---------- PER REGION ASSESSMENTS ----------- */ ./hs2bat * Command file to produce data files (hs2_DEVIATION.N.l* or hs2_DEVIATION.A.l*) for the hs2 region. ./hs3bat * Same for hs3. ./betabat * Same for the HBB promoter. ./araCe_bat * Same for the araBAD-araC intergenic region. ./N.data/bat_min ./A.data/bat_min * Command files to extract the best fn+fp counts in each DEVIATION file, when l varies. ./N.data/bat_crops ./A.data/bat_crops * Command file to produce best a-intervals (for each l) for a given region, provided that only the runs conforming to a given true positives percentage cut are included. Output saved as hs2.best_total.cut60, hs2.best_total.cut80eg, (hs2.best_fp.cut60, hs2.best_fp.cut80 for "best fp count" evaluations), eg, ehn hs2 is analyzed. /* B : --------- COMBINED REGION ASSESSMENTS ----------- */ ./N.data/bat_merge ./A.data/bat_merge * Command file to produce the interval files for the overall hs2, hs3 and HBB promoter regions. Output is saved in the hs23b_DEVIATION.l* files. ./N.data/bat_merge.cut ./A.data/bat_merge.cut * Command file to produce the interval files for the overall hs2, hs3 and HBB promoter regions, for the 60% cut (that is, for each region only consider runs with < 40% fn count). Can be modified for use with any value of the percentage cut (eg, 80). ./N.data/bat_crops_combined ./A.data/bat_crops_combined * Counterpart of bat_crops, for combined region analyses. Produces the best a-intervals (for each l) for the combined hs2-hs3-HBB promoter (hs23b) region, conforming to a specified true positives 'cut'. ./N.data/bat_min_combined ./A.data/bat_min_combined * Counterpart of bat_min for combined region assessments. Operates on interval files (eg, hs23b_DEVIATION.N.l5).