The
easy-to-update and flexible data file schema of EASE comes at a substantial
price in terms of speed. EASE must read
the entire text file for every process from figuring out what genes your
identifiers are referring to, to looking up gene categories for the entire
population, to looking up how to make custom URLs and how to make the
term-to-tag conversions sometimes required therein.
So the
main trick to optimizing EASE is to trim the lines in all files to only those
needed for the genes on your microarray.
Say you typically refer to genes using Genbank accessions, but EASE is
using LocusLink numbers as its standard gene identifiers. You could take all Genbank accessions for
your microarray and annotate them with LocusLink numbers:
Paste
all Genbank accessions into the main gene list.
[Select
annotation fields]
[Add
fields]
Browse
to \Convert\LocusLink numbers.txt
Select
it and click [open], then
[done]
[annotate
genes]
Now you
have a list of LocusLinks for your microarray, and you can use it with a
database package to run queries of all the files in the \Data\, \Data\Class\,
and \Data\Convert\ directories to trim the files to include only lines that
begin with one of these LocusLinks.
Of
course, this will practically destroy any efforts to "Enhance" your
annotation. To also keep any
"synonymous" LocusLink numbers, be sure to use the Enhance function
when annotating your original list.
Another
tip to speed up EASE is to eliminate any PMIDs from the optional third column
of the categorical files. If you don't
really care what articles "prove" a given categorical assignment,
then it's best to not make EASE read though all of these PMIDs every time it
loads that category.
One
final note: EASE was designed to work most quickly with many lists run against
the same population and same analysis options.
When you want to change the population or your analysis options, you
might see a substantial degradation in performance speed. To remedy this, close EASE and restart; then
run the lists with the new options selected.