Glop is a powerful query engine based on the Prolog programming language. It is intended primarily for users who already know Prolog. If you do not, you will probably find it easier to use Genie instead.
Glop uses the same data as Genie, but in a different format. If you have not familiarized yourself with the records, sets, and lists used by this database, please refer to Understanding the Schema. To look up the exact names of tables and fields currently in the database, see the current schema.
Each table is stored as a single predicate, with the set of tuples as its only argument. For example, the
binding assay
table could be abbreviated
binding_assay({[Tuple1|RemainingTuples]}).The table name is written as a functor, and the set of tuples is stored as a Prolog list inside curly braces ("{ }") commonly used to denote sets. Each tuple in the table has the form
(Reference, Item_No, Description, Corrections, Contributor, Assay)
Query 1: To examine each tuple in the
binding assay
table, we could use the predicates
member
and
findall
like this:
/* 1 */ :- /* 2 */ binding_assay({BindingAssaySet}), /* 3 */ findall( /* 4 */ BindingAssayTuple, /* 5 */ member(BindingAssayTuple, BindingAssaySet), /* 6 */ Results /* 7 */ ), /* 8 */ report(binding_assay({Results})).The line numbers are listed for reference only, and would not need to be included in the query. This unifies the variable
BindingAssayTuple
with one of the
tuples in BindingAssaySet
. After examining a tuple,
findall
uses backtracking to look at
the next tuple in the table.
Glop identifies fields by both their name and their location within the tuple, as specified in the current schema. For example, Query 2 finds all of the articles published in 1996:
/* 1 */ :- reference_info({RefSet}), /* 2 */ findall( /* 3 */ Reference, /* 4 */ ( /* 5 */ member(Reference, RefSet), /* 6 */ Reference = (_, _, _, _, _, year(1996), _, _, _) /* 7 */ ), /* 8 */ Results /* 9 */ ), /* 10 */ report(reference_info({Results})).In line 6, the field name,
year
, is used as the functor of a structure
containing the desired data value. Because year
is the sixth
of nine fields shown in the schema for the reference_info
table, we make the year
structure the sixth of nine arguments
of an anonymous structure representing a reference_info
tuple.
Like Genie, Glop requires any string that contains spaces, carriage returns, or punctuation to be enclosed in single quotes, 'like this'. Any string that begins in a capital letter must also be enclosed in single quotes, since Prolog normally thinks that each word starting with capital letter is a variable. Glop is case sensitive, so 'A BIG Word' is not the same as 'a big word'. Numbers should not be in quotes.
A record is simply a collection of subfields which are grouped
together because they have some logical association with each other. Many
records are defined as types in the Type Definition Macros section
of the the current schema. For
example, each region
is defined as a
REGION_DESCRIPTOR
, which is defined in the schema as follows:
TYPEDEF REGION_DESCRIPTOR: RECORD ( origin: STRING #FOREIGN_KEY origin_info start: INTEGER stop: INTEGER ) #RESTRICT stop >= startRecall that anything following a '#' is a comment and is ignored by the computer. If we want to use the variables Or, St, and Sp to retrieve information about a
region
in a conserved
tuple,
our query would include the line:
Tuple = (_, _, _, _, _, region(origin('humhbb'), start(St), stop(Sp)), _),
Notice that region
, the name of the field in the
conserved
table, becomes the functor of a structure
representing the record. The type of the record,
REGION_DESCRIPTOR
, is not mentioned in the query.
In Glop, a set is represented as a Prolog list enclosed in curly braces, and a list is represented as a Prolog list enclosed in parentheses:
An example of a set: {[ASetElement | RemainingSetElements]}
An example of a list: ([ListItem1 | RemainingListItems])
To access the elements in the set (or list), instantiate a variable to the contents of the curly braces (or parentheses), and then use the variable anywhere that we could use a Prolog list.
Query 3: For example, suppose we want to find all DNA transfer experiments with a reversed orientation on any construct segment. We could write the query like this:
/* 1 */ :- /* 2 */ dna_transfer_experiment({DNASet}), /* 3 */ findall( /* 4 */ DTExperiment, /* 5 */ ( /* 6 */ member(DTExperiment, DNASet), /* 7 */ DTExperiment = (_, _, _, _, _, construct({ConstructSet}), _), /* 8 */ member((construct_segment((CSegmentList)), _), ConstructSet), /* 9 */ member(Segment, CSegmentList), /* 10 */ Segment = (segment(_, orientation('reversed'), _)) /* 11 */ ), /* 12 */ Results /* 13 */ ), /* 14 */ report(dna_transfer_experiment({Results})).
The line numbers are provided for reference purposes only, and would not
need to be included in the actual query. As usual, the predicates
findall
and
member
do most of the work in this
query. Lines 2 and 6 work together to select a tuple from the set
of tuples in the dna_transfer_experiment
table. Line 7
retrieves the set of constructs. Each construct in the
ConstructSet
is a record with two
fields, construct_segment
and parent
. We use the
structure representing this record as the first argument of the
member
predicate in line 8 to instantiate
CSegmentList
to the list of construct segments. Line
9 selects one segment from this list, and line 10 discards the segment if
its orientation is not 'reversed'. Finally, we use the
report
predicate to display the results.
When referring to a variant in Glop, we use a plus symbol and the variant tag as a functor to a structure containing the fields of the record.
Query 4 finds all binding assay tuples with gelshift assays where the probe overlaps the region corresponding to 8700- 8800 in the human sequence:
/* 1 */ :- /* 2 */ binding_assay({BindingAssaySet}), /* 3 */ findall( /* 4 */ BindingAssay, /* 5 */ ( /* 6 */ member(BindingAssay,BindingAssaySet), /* 7 */ BindingAssay=(_,_,_,_,_,assay(+Assay)), /* 8 */ Assay = gelshift(probe(+beta_g_region(Region, _)), _, _, _, _, _, _), /* 9 */ overlaps(Region, region(origin(human), start(8700), stop(8800))) /* 10 */ ), /* 11 */ BindingAssayList /* 12 */ ), /* 13 */ report(binding_assay({BindingAssayList})).
This query involves two variant records, assay
in lines 7
and 8, and probe
in line 8. Notice that, in line 7, the
variable Assay
is instantiated to the tag name and variant
record (gelshift(probe(+beta_g_region(Region, _)), _, _, _, _, _, _)
), without the plus sign, so the plus sign is not needed in line
8. The predicates member,
overlaps, and report are
described below.
When writing Glop queries, you can use any Prolog predicates, Glop predicates, or predicates you have written. This section describes the predicates that are used frequently in our sample queries.
The member(Element, List)
predicate succeeds if the
Element
is in the List
. The member
predicate is defined in the Sicstus Prolog "lists" library. If you are
using a different version of Prolog, defining your own member
predicate should be straightforward.
The predicate findall(Object, Goal, List)
creates a
List
of all Object
s that satisfy the specified
Goal
. If Object
is a variable, it must be used
in the Goal
. If Object
is a structure that
contains variables, then at least one of those variables must be used in
the Goal
. The findall
predicate is a standard
part of Prolog.
The findall
predicate is often used to examine each tuple
in a table, ignoring tuples that do not meet certain criteria and adding
tuples that do satisfy the criteria to the result set. When
findall
is used this way, Object
is a variable
representing a single tuple, Goal
expresses the criteria, and
List
is a variable representing the set of results.
To
select one tuple from the set of possibilities, Goal
starts
with the member
predicate to instantiate
Object
to a specific tuple. Criteria to test the tuple
follows member
, as shown.
binding_assay({BindingAssaySet}), findall ( Object, ( member(Object, BindingAssaySet), % include criteria here ), List ),
The following predicates allow us to examine the data that is specific to this database. These predicates are implemented as part of Glop.
The predicate overlaps(TestRegion, SpecifiedRegion)
succeeds when the two regions have the same origin and the offsets overlap.
The overlaps
predicate works the same way as the
OVERLAPS
function in Genie.
overlaps(Region, region(origin(human),start(8658),stop(8677)))
The predicate
contains_region(ContainerRegion, ContainedRegion)
succeeds if the ContainerRegion
completely contains the
ContainedRegion
. It is similar to the COVERS
function in Genie.
contains_region(region(origin(human),start(7750),stop(9230)),Region)
report(Table)
displays the results of your query
in the same format that Genie uses. If you would like to use any other
programs on the Globin Gene Server to process your results, they need to be
in this format. If you just want to see the results yourself, you can use
another predicate to display the results.
When we create a query using findall
, we get a set of
tuples as an intermediate result. To convert this set to a table format,
we create a structure whose functor is the table name and whose only
argument is the variable containing the set of tuples
(BindingAssayList
) enclosed in curly braces representing a
Glop set, as shown in the last line of Query 4:
/* 1 */ :- /* 2 */ binding_assay({BindingAssaySet}), /* 3 */ findall( /* 4 */ BindingAssay, /* 5 */ ( /* 6 */ member(BindingAssay,BindingAssaySet), /* 7 */ BindingAssay=(_,_,_,_,_,assay(+Assay)), /* 8 */ Assay = gelshift(probe(+beta_g_region(Region, _)), _, _, _, _, _, _), /* 9 */ overlaps(Region, region(origin(human), start(8700), stop(8800))) /* 10 */ ), /* 11 */ BindingAssayList /* 12 */ ), /* 13 */ report(binding_assay({BindingAssayList})).
To write a Glop query involving one table, you can follow the outline below. To illustrate, each step is followed by the numbers of the lines the accomplish that step in Query 5.
When writing complex queries, it is often helpful to write predicates expressing the criteria. The following query adds on to Query 4.
Query 5: Retrieve all binding assay tuples where the probe (for gelshifts) or one of the regional effects (for non-gelshifts) overlaps the region corresponding to 8700-8800 in the human sequence.
With this query, we check the probe or regional effect, depending on the
assay. Assay is a variant record, so different
attributes can be stored for each type of assay. Notice that
regional_effect
is the third attribute of
in_vivo_footprint
assays, but the seventh attribute of
in_vitro_footprint
and methylation_interference
assays.
/* 1 */ q(gelshift(probe(+beta_g_region(Region,_)),_,_,_,_,_,_)) :- /* 2 */ overlaps(Region,region(origin(human),start(8700),stop(8800))). /* 3 */ q(in_vitro_footprint(_,_,_,_,_,_,regional_effect({RegionalEffectSet}),_)) :- /* 4 */ member((Region,_,_,_,_),RegionalEffectSet), /* 5 */ overlaps(Region,region(origin(human),start(8700),stop(8800))), /* 6 */ !. /* 7 */ q(methylation_interference(_,_,_,_,_,_,regional_effect({RegionalEffectSet}),_)) :- /* 8 */ member((Region,_,_,_,_),RegionalEffectSet), /* 9 */ overlaps(Region,region(origin(human),start(8700),stop(8800))), /* 10 */ !. /* 11 */ q(in_vivo_footprint(_,_,regional_effect({RegionalEffectSet}))) :- /* 12 */ member((Region,_,_,_,_),RegionalEffectSet), /* 13 */ overlaps(Region,region(origin(human),start(8700),stop(8800))), /* 14 */ !. /* 15 */ /* 16 */ :- /* 17 */ binding_assay({BindingAssaySet}), /* 18 */ findall( /* 19 */ BindingAssay, /* 20 */ ( /* 21 */ member(BindingAssay,BindingAssaySet), /* 22 */ BindingAssay=(_,_,_,_,_,assay(+Assay)), /* 23 */ q(Assay) /* 24 */ ), /* 25 */ BindingAssayList /* 26 */ ), /* 27 */ report(binding_assay({BindingAssayList})).More Hints
Retrieve all DNA transfer experiment tuples having in one of their constructs a segment with a beta-globin region that overlaps the AP1 region (corresponding to 8658-8677 in the human sequence) and is completely contained in the HS2 region (corresponding to 7750-9230 in the human sequence), but no other segments overlap the LCR (corresponding to anything < 15,000 in the human sequence). In addition, one of the segments in this construct must feature a particular gene (e.g. beta-globin) as a reporter.
This query follows the same outline as Query 5. It uses four predicates to express the criteria:
q
ConstructSet
contains a ConstructSegmentList
for which q1 succeeds, q2 fails, and q3 succeeds.
q1
ConstructSegmentList
contains a segment whose beta-globin region overlaps the AP1 region and is completely contained in the HS2 region
q2
ConstructSegmentList
contains a segment whose beta-globin region overlaps the LCR
q3
ConstructSegmentList
features a beta-globin reporter.
/* 1 */ q(ConstructSet) :- /* 2 */ member((construct_segment(ConstructSegmentList),_),ConstructSet), /* 3 */ q1(ConstructSegmentList), /* 4 */ findall(_,q2(ConstructSegmentList),AnswerList), length(AnswerList,1), /* 5 */ q3(ConstructSegmentList), !. /* 6 */ /* 7 */ q1(ConstructSegmentList) :- /* 8 */ member(segment(dna_fragment(+beta_g_region(Region,_)),_,_),ConstructSegmentList), /* 9 */ overlaps(Region,region(origin(human),start(8658),stop(8677))), /* 10 */ contains_region(region(origin(human),start(7750),stop(9230)),Region), !. /* 11 */ /* 12 */ q2(ConstructSegmentList) :- /* 13 */ member(segment(dna_fragment(+beta_g_region(Region,_)),_,_),ConstructSegmentList), /* 14 */ overlaps(Region,region(origin(human),start(-999999),stop(15000))). /* 15 */ /* 16 */ q3(ConstructSegmentList) :- /* 17 */ member(segment(_,_,feature({FeatureSet})),ConstructSegmentList), /* 18 */ member(feature_element(+reporter(gene(beta_globin))),FeatureSet), !. /* 19 */ /* 20 */ :- /* 21 */ dna_transfer_experiment({DTESet}), /* 22 */ findall( /* 23 */ DTE, /* 24 */ ( /* 25 */ member(DTE,DTESet), /* 26 */ DTE=(_,_,_,_,_,construct({ConstructSet}),_), /* 27 */ q(ConstructSet) /* 29 */ ), /* 30 */ DTEList /* 31 */ ), /* 32 */ report(dna_transfer_experiment({DTEList})).
Instead of using the report
predicate
to display results as Genie would, you can define your own predicates to
provide the information you want in an easy-to-read format.
Query 7 defines five predicates to display a summary of
the data.
Counting
Counting is not directly supported by Glop, but you can use Prolog to count tuples or data values. Query 7 uses predicates from Sicstus prolog's list library to accomplish its counting tasks.
For each reference, count the number of tuples and
(a) show number of DNA transfer experiments and constructs; and
(b) show number of binding assays and assay types.
This query involves the following steps:
reference_info
, binding_assay
, and dna_transfer_experiment
. (Lines 62 - 64)
We use one goal to retrieve each of the tables involved:
reference_info
, binding_assay
, and
dna_transfer_experiment
.
To count the tuples, experiments, constructs, and assays for each
reference, the lines 65 - 86 of Query 7 create a list
AuthorStats
, in which each list element contains information
for one reference.
The elements in AuthorStats
have the form:
Author-(Ref-DTEs-BAs)where
Author
Ref
DTEs
BAs
The elements in the lists AuthorStats
, DTEs
,
and BAs
are in the form Key-OtherInfo
. This
allows the query to use the Sicstus Prolog list library's predicate
keysort
to sort the list items by their
Key
s. In AuthorStats
, OtherInfo
is
the structure (Ref-DTEs-BAs)
, and in DTEs
and
BAs
, OtherInfo
is simply a "0" acting as a
placeholder.
In this query, we would like to find all DNA transfer experiments
and binding assay experiments for each reference. To accomplish
this, we nest the findall predicate. The "outer"
findall
examines each tuple in the reference_info
table. We use the findall
predicate two more times within
the second argument of the "outer" findall
. The first "inner"
findall
finds all DNA transfer experiments from the current
reference, and the second "inner" findall
finds all binding
assay experiments from the current reference.
The predicate keysort(Oldlist, SortedList)
is
defined in the Sicstus Prolog lists library. The elements in both lists
have the form Key-OtherInfo
. The predicate
keysort
sorts the elements of Oldlist
by their
Key
s to create the new list SortedList
.
Several predicates are defined with this query to
process and neatly display the results. Most of these predicates call
format(FormatString, ResultList)
, which is defined in
Sicstus Prolog.
The first of these predicates, write_report
, uses
the collect
predicate to collect all of
the articles by the same author, then calls
write_author
to display results for
this author, and finally calls write_report
recursively to
repeat the process for the authors remaining in the list.
The predicate
collect(+DesiredKey, +PairList, -ValList, -PairTail)
takes a specific string, DesiredKey
, as its first argument,
and a sorted list of Key-Val
pairs as its second argument,
PairList
. The collect
predicate creates a
ValList
from the values Val
paired with a
Key
equal to the DesiredKey
, and it returns the
portion of the PairList
whose elements' Key
s are
not equal to DesiredKey
as PairTail
.
To calculate the number of items with a certain key, we simply need to
call the length
predicate on the ValList
created by collect
.
For each author, the predicate
write_author(Author, RefStats)
displays the number of
articles for that author, and then calls
write_refs
to display information
on each article and the experiments described.
The write_author
predicate uses
keysort
and
length
to process the list of
references, and format
to neatly display
the results.
For each article, the predicate
write_refs(RefStatList)
writes the reference ID and
then calls write_DTEs
and
write_BAs
to list the
experiments described in the article.
Each element of the RefStatList
has the form:
Ref-DTEs-BAswhere
Ref
DTEs
BAs
The write_refs
predicate uses
keysort
and
length
to process the lists of
experiments, and format
to neatly
display the results.
The predicate write_DTEs
displays the numbers of
stable and transient transfections and the number of liver, blood, and
brain tissue samples used in DNA transfer experiments.
The write_DTEs
predicate uses
collect
and
length
to process the list of
experiments, and format
to neatly
display the results.
The predicate write_BAs
displays the numbers of each
kind of assay in binding assay experiments.
The write_BAs
predicate uses
collect
and
length
to process the list of
experiments, and format
to neatly
display the results.
/* 1 */ write_report([]). /*Help */ /* 2 */ write_report([Author-(Ref-DTEs-BAs)|AuthorStatRest]) :- /* 3 */ collect(Author,AuthorStatRest,RefStatRest,NewAuthorStatRest), /* 4 */ write_author(Author,[Ref-DTEs-BAs|RefStatRest]), /* 5 */ write_report(NewAuthorStatRest). /* 6 */
/* 7 */ write_author(Author,RefStats) :- /*Help */ /* 8 */ length(RefStats,NumRefs), /* 9 */ format('~2nAuthor: ~w --- Articles: ~d',[Author,NumRefs]), /* 10 */ format('~n-------------------------------------------------------------',[]), /* 11 */ keysort(RefStats,SortedRefStats), write_refs(SortedRefStats). /* 12 */
/* 13 */ write_refs([]). /*Help */ /* 14 */ write_refs([Ref-DTEs-BAs|RefStatRest]) :- /* 15 */ length(DTEs,NumDTEs), length(BAs,NumBAs), NumExps is NumDTEs+NumBAs, /* 16 */ format('~n Article: ~w --- Experiments: ~d',[Ref,NumExps]), /* 17 */ format('~n -----------------------------------------------------------',[]), /* 18 */ format('~n DNA Transfer Experiments: ~d',[NumDTEs]), /* 19 */ format('~n ---------------------------------------------------------',[]), /* 20 */ keysort(DTEs,SortedDTEs), write_DTEs(SortedDTEs), /* 21 */ format('~n ---------------------------------------------------------',[]), /* 22 */ format('~n Binding Assays: ~d',[NumBAs]), /* 23 */ format('~n ---------------------------------------------------------',[]), /* 24 */ keysort(BAs,SortedBAs), write_BAs(SortedBAs), /* 25 */ format('~n -----------------------------------------------------------',[]), /* 26 */ write_refs(RefStatRest). /* 27 */
/* 28 */ write_DTEs(DTEs) :- /* Help */ /* 29 */ collect(assay(stable),DTEs,Stable,DTEs1), length(Stable,NumStable), /* 30 */ collect(assay(transient),DTEs1,Trans,DTEs2), length(Trans,NumTrans), /* 31 */ NumTransfections is NumStable+NumTrans, /* 32 */ format('~n Transfections: ~d',[NumTransfections]), /* 33 */ format('~n -------------------------------------------------------',[]), /* 34 */ format('~n Stable: ~d',[NumStable]), /* 35 */ format('~n Transient: ~d',[NumTrans]), /* 36 */ format('~n -------------------------------------------------------',[]), /* 37 */ collect(tissue(liver),DTEs2,Liver,DTEs3), length(Liver,NumLiver), /* 38 */ collect(tissue(blood),DTEs3,Blood,DTEs4), length(Blood,NumBlood), /* 39 */ collect(tissue(brain),DTEs4,Brain,[]), length(Brain,NumBrain), /* 40 */ NumTransGenMice is NumLiver+NumBlood+NumBrain, /* 41 */ format('~n Transgenic Mice: ~d',[NumTransGenMice]), /* 42 */ format('~n -------------------------------------------------------',[]), /* 43 */ format('~n Liver: ~d',[NumLiver]), /* 44 */ format('~n Blood: ~d',[NumBlood]), /* 45 */ format('~n Brain: ~d',[NumBrain]). /* 46 */
/* 47 */ write_BAs(BAs) :- /* Help */ /* 48 */ collect(gelshift,BAs,GelShift,BAs1), length(GelShift,NumGelShift), /* 49 */ collect(in_vitro_footprint,BAs1,Vitro,BAs2), length(Vitro,NumVitro), /* 50 */ collect(in_vivo_footprint,BAs2,Vivo,BAs3), length(Vivo,NumVivo), /* 51 */ collect(methylation_interference,BAs3,Meth,[]), length(Meth,NumMeth), /* 52 */ format('~n Gelshifts: ~d', [NumGelShift]), /* 53 */ format('~n In Vitro Footprint: ~d',[NumVitro]), /* 54 */ format('~n Methylation Interference: ~d',[NumMeth]), /* 55 */ format('~n In Vivo Footprint: ~d',[NumVivo]).
/* 56 */ collect(Key,[Key-Val|R1],[Val|R2],R3) :- !, /** Help */ /* 57 */ collect(Key,R1,R2,R3). /* 58 */ collect(_,List,[],List). /* 59 */
/* 61 */ :- /* 62 */ reference_info({RISet}), /*Help */ /* 63 */ dna_transfer_experiment({DTESet}), /* 64 */ binding_assay({BASet}), /* 65 */ findall( /*Help */ /* 66 */ Author-(Ref-DTEs-BAs), /* 67 */ ( /* 68 */ member((id(Ref),author([name(Author)|_]),_,_,_,_,_,_,_),RISet), /* 68 */ % member((id(Ref),author({[name(Author)|_]}),_,_,_,_,_,_,_),RISet), /* 69 */ findall( /* 70 */ DTE-0, /* 71 */ ( /* 72 */ member((reference(Ref),_,_,_,_,_,experiment({ESet})),DTESet), /* 73 */ member(type_of_assay(+TOA),ESet), /* 74 */ (TOA=transfection(DTE,_,_,_); TOA=transgenic_mouse(_,_,DTE,_)) /* 75 */ ), /* 76 */ DTEs /* 77 */ ), /* 78 */ findall( /* 79 */ BA-0, /* 80 */ ( /* 81 */ member((reference(Ref),_,_,_,_,assay(+Assay)),BASet), /* 82 */ functor(Assay,BA,_) /* 83 */ ), /* 84 */ BAs /* 85 */ ) /* 86 */ ), /* 87 */ AuthorStats /* 88 */ ), /* 89 */ keysort(AuthorStats,SortedAuthorStats), write_report(SortedAuthorStats). /*Help */