INFO: ===== Tests are performed for two versions of agree, strict (gap free, X) and general (gaps allowed, G), with the data and the associated command/results files grouped accordingly in two directories, X.data and G.data. TYPES OF FILES in this directory: ================================ *_reference files Reference set(s) containing the locations of the landmarks. .c files C-source code. README Summary of files, software/procedures. TYPES OF FILES in the X.data and G.data directories: ==================================================== DEVIATION files (or "run"-files) contain the raw deviation data (fp, fn, fp+fn) with respect to the reference (functional) sites obtained for runs of agree when varying the percent identity. The parameter l stays the same for a given file. interval files Are derived from DEVIATION files, and contain the intervals of invariance for the percent value (p-intervals). *bat* file Command files for the tests. make.best Command file(s) for producing the point data for the blocks in Figs 4A,B,C of the "Comparison of five methods..." paper and the. Initiate agree runs for the best sets of the (p,l) parameters. PROGRAMS: ========= button.c * Initiates agree test runs and for each computes the counts of false positives and false negatives with respect to a specified set of landmarks (). The minimum length of the region is fixed (specified as [S=]) while p is varied over [10,100] in increments of 1%, during the procedure. If compiled without a flag, it triggers 'strict' (gap free) agree runs; this form is saved as buttonx. If compiled with the flag DGAP, will generate 'general' agree runs (gaps allowed in regions); this form is saved as buttong. interval.c * Determines intervals of invariance for the percent identity (p), and produces a p-interval file from a given a agree DEVIATION file. min.c * Determines the best fp+fn count for a DEVIATION file (l=ct). Used in analysis for individual regions. *** source code in ../tools *** min_intervalp.c Determines the best fp+fn count for a p-interval file (ie, for a file as the ones produced by the 'interval' command). Used in analysis for combined regions. Executable is min_interval. *** source code in ../tools *** NOTE: min and min_interval perform the same operation on files with different format. min operates on DEVIATION ("run") files, while min_interval operates on interval files. merge.c * Merges two interval files of percent identity values, to produce a p-interval file with a finer interleaving. Used in combined region assessments. crop.c * 'Crops' a DEVIATION file to the runs which conform to the specified true positives percentage cut. The last argument in the command line is the total length of the functional regions. *** source code in ../tools *** crops_total.c * Finds the p-intervals with the best "total count" among the set of runs conforming to the specified true positives percentage cut. Starts from a DEVIATION file, from which it produces an intermediate p-interval file. Used in assessment of individual regions. crops_fp.c * Finds the p-intervals with the best "fp count" among the set of runs conforming to the specified true positives percentage cut. Starts from a DEVIATION file, from which it produces an intermediate p-interval file. ctotal.c * The counterpart for crops_total.c in combined region evaluations. Finds the p-intervals with the best "total count" among the set of runs conforming to the specified true positives percentage cut. Starts directly from an interval file. cfp.c * The counterpart for crops_fp.c in combined region evaluations. Finds the p-intervals with the best "fp count" among the set of runs conforming to the specified true positives percentage cut. Starts directly from an interval file. COMMAND FILES: ============== /* A : ---------- PER REGION ASSESSMENTS ----------- */ ./hs2bat * Command file to produce data files (hs2_DEVIATION.X.l* or hs2_DEVIATION.G.l*) for the hs2 region. ./hs3bat * Same for hs3. ./betabat * Same for the HBB promoter. ./araCe_bat * Same for the araBAD-araC intergenic region. ./X.data/bat_min ./G.data/bat_min * Command files to extract the best fn+fp counts in each DEVIATION file, when l varies. ./X.data/bat_crops ./G.data/bat_crops * Command file to produce best p-intervals (for each l) for a given region, provided that only the runs conforming to a given true positives percentage cut are included. Output saved as hs2.best_total.cut60, hs2.best_total.cut80 (hs2.best_fp.cut60, hs2.best_fp.cut80 for "best fp count" evaluations), eg, when hs2 is analyzed. /* B : --------- COMBINED REGION ASSESSMENTS ----------- */ ./X.data/bat_merge ./G.data/bat_merge * Command file to produce the interval files for the overall hs2, hs3 and HBB promoter regions. Output is saved in the hs23b_DEVIATION.X.agree.l* and hs23b_DEVIATION.G.agree.l* files. ./N.data/bat_merge.cut ./A.data/bat_merge.cut * Command file to produce the interval files for the overall hs2, hs3 and HBB promoter regions, for the 60% cut (that is, for each region only consider runs with < 40% fn count). Can be modified for use with any value of the percentage cut (eg, 80). ./N.data/bat_crops_combined ./A.data/bat_crops_combined * Counterpart of bat_crops, for combined region analyses. Produces the best p-intervals (for each l) for the combined hs2-hs3-HBB promoter (hs23b) region, conforming to a specified true positives 'cut'. ./N.data/bat_min_combined ./A.data/bat_min_combined * Counterpart of bat_min for combined region assessments. Operates on interval files (eg, hs23b_DEVIATION.X.agree.l5).