Customizing
EASE requires knowledge of the directory structure of EASE and the format of
the files therein. The majority of
files are text files with lines in the "standard" format:
GeneIdentifier[tab]Information
This
schema allows for many-to-many relationships between genes and pieces of
information.
In the
LocusLink-centric schema created by the automated update process, a line in
these files might look like this:
10 apoptosis
If this
occurred in a file called "GO Biological Process.txt" the
\Data\Class\ directory, it would map LocusLink number 10 to the gene category
"apoptosis" in the "GO Biological Process" system of
classifying genes. If it occurred in a
file called "Gene name.txt" in the \Data\ directory, it would map
LocusLink number 10 to "apoptosis" for the "Gene name"
annotation field. Hence, by knowing the
purpose of the various subdirectories of the EASE directory, a user can create
files with database or spreadsheet software to fit the format expected by EASE
and have EASE use the files accordingly.
Below are descriptions of the subfolders in the EASE directory:
\_Inline\
contains system files. Nothing to
customize here.
\Data\
contains "annotation field" files in the standard format that map
genes to annotation corresponding to the field.
\Data\Class\
contains gene classification "system" files in the standard format
that map genes to their classification in the corresponding classification
system. These files can optionally
contain a third field which lists PMIDs for MedLine articles that support the
classification of this gene into this category. The PMIDs are separated by semicolons within the field. These files can also be used as fields in
annotation tables.
\Data\Class\Implies\
contains files that supplement files of the same name in the \Data\Class\
directory. These map the gene
classifications explicit in the \Data\Class\ version to other classifications
implied, in the format:
ExplicitClassification[tab]ImplicitClassification
This is
useful for such systems as those of the Gene Ontology wherein a gene being
classified as a certain term implies that it is also classified as every
superordinate (parent) category of the ontology. Although you could directly map a given LocusLink number to every
possible parental Gene ontology term in the appropriate file in \Data\Class\,
this is very inefficient and leads to excessive loading times. To speed data loading for such systems, only
direct "child implies parent" relationships need to be specified in
the \Data\Class\Implies\ supplemental file.
All grandparent, etc relationships are loaded recursively. If a file from this directory is selected
during annotation field selection, that field will list all explicit and
implicit categories for a given gene.
\Data\Class\URL
data\ contains configuration files for specifying URLs to hyperlink gene
categories to definitions. The files
are matched to system files in the \Data\Class\ directory, and contain two
lines of URLs. The first line defines
the URL to the definition of the system itself. The second line defines a template of the URL for a specific
category within that classification system.
EASE makes the URL specific to a classification term by replacing the
string [*TERM*] in the template with the actual classification term. For URLs that require some conversion from
classification term to some tag before URL generation, the template contains
the string [*TAG*]. In this case, another
file with the same name will occur in the \Data\Class\URL data\Tags\ directory
as mentioned below.
\Data\Class\URL
data\Tags\ contains the tag conversion files for the URL creation mentioned
above. These files are in the format:
Classification[tab]Tag
\Data\Convert\
contains files in the standard format that map gene identifiers to some
accession system for referring to that gene.
Files in this directory are named for the type of accession number being
linked. For example, the "Genbank.txt"
file installed by the automated update links LocusLink numbers to Genbank accessions
with lines like:
3558 S77834
These
files can also be selected during the annotation field selection process to
include a column listing all accessions within this system that refer to the
gene of a given row in the output table.
\Enhance\
contains files to define genes that should "share" annotation when
using the "Enhance" function of EASE. These files map all genes in a pair-wise fashion in the standard
format:
GeneIdentifier[tab]GeneIdentifier
\Help\
contains help files (like this one!) in plain text.
\Links\
contains link definition files containing templates for URLs to online tools
for analyzing gene lists. EASE will
detect two sets of one of the following strings in any given URL template:
[*EaseID*]
[*DATA*]
[*CONVERT*]
[*CLASS*]
... and
construct the URL accordingly, replacing the [*EaseID*] string with the
standard gene identifiers of the list to complete the URL. For example, if the URL template looks like:
http://david.niaid.nih.gov/david/ease.asp?locus=[*EaseID*];[*EaseID*]
EASE
initializes the URL with:
http://david.niaid.nih.gov/david/ease.asp?locus=
and
then concatenates all LocusLink numbers in the list with a semicolon and adds
them to the end of the completed URL.
In the cases of [*DATA*], [*CONVERT*], and [*CLASS*], EASE will first
convert the list of genes to tags using a file specified on the second line of
the link definition file located in the \Data\, \Data\Convert\, or \Data\Class\
directory respectively. For example,
the
\Links\View
Abstracts of PMIDs from LocusLink.txt
file
contains the following lines:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&doptcmdl=Abstract&term=[*DATA*],[*DATA*]
Known
PMIDs.txt
To link
a given list of genes to this link named "View Abstracts of PMIDs from
LocusLink", EASE first converts all LocusLink numbers of the gene list to
PMID numbers using the file "Known PMIDs.txt" in the \Data\
directory. Then EASE initializes the
URL with:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&doptcmdl=Abstract&term=
and
then concatenates all of the PMID numbers with a comma and adds them to the end
of the URL.