UCSC Gene Sorter User's Guide
|
|
|
|
|
Genes function and evolve together. To understand a gene,
you often need to understand an entire gene family.
Many such families are already known and described, such as the
HOX family that mediates many aspects of limb and brain development
and the Cytochrome P450 family that is central to the metabolism of many
medications.
One easy way to identify well-known relatives of a gene
is by looking for genes with name similarity, because biologists
tend to use similar names for similar genes. However, scientists
only partly understand the function of perhaps one-third of
the genes of the genome; therefore, other techniques for grouping genes
into families are necessary.
The UCSC Gene Sorter is an excellent resource for
exploring gene families and the relationships among genes. This
tool displays a table of genes within
a selected genome that are related to one another. Several
different relationships may be explored: protein-level homology,
similarity of gene expression profiles, or genomic proximity. The
Gene Sorter supports searches on a variety of terms and phrases,
including the gene name, the UniProtKB protein name, a GenBank
accession, or a word or phrase present in a gene's description. The
gene family display is highly configurable, allowing the user to
control the order and number of columns, the number of rows, and
the genes displayed. The tool provides several output formats,
including a simple tab-delimited format that may be imported into a
spreadsheet or a relational database.
An important use of the Gene Sorter is to gather
together a collection of genes that share similar properties for
statistical analysis. For instance, one might want to examine
promoter regions of genes that share a similar expression
pattern or look for protein sequence motifs in genes that
share similar GO annotations.
One of the most powerful features of the Gene Sorter is its
filtering capabilities. The filter enables the user to
quickly select an interesting subset of the 25,000 genes in
the genome based on a variety of detailed and flexible selection criteria.
For example, the filter
may be used to select all human genes over-expressed in the cerebellum
that have GO-annotated G-protein coupled receptor activity.
The Gene Sorter was designed and implemented by Jim Kent,
Fan Hsu, David Haussler, and the UCSC Genome Bioinformatics Group.
This work is supported by a grant from the National Human Genome
Research Institute and by the Howard Hughes Medical Institute.
|
|
|
|
|
|
To begin using the Gene Sorter, you will first
have to select a genomic region and the type of gene relationship
you wish to display.
You may also want to change some of the Gene Sorter's configuration
settings to tailor the display to your research needs. These
configuration options are described in Configuring
the Gene Sorter display.
Starting the Gene Sorter
- Open the Gene Sorter
home
page.
- Specify the genome and assembly you wish to view by selecting
the appropriate options from the genome and
assembly pull-down menus.
- Type a term or phrase into the search text box to
determine which genes will be displayed in the browser. Valid search
terms (with human genome examples) include:
- a gene name (HOXA9)
- a UniProtKB protein name (HXA9)
- a word or phrase that occurs in the description of a gene (MAP kinase)
- a GenBank mRNA accession (U14680)
- Choose the gene relationship that you would like to examine by
selecting an option from the sort by pull-down menu. Genes will
be sorted in order of proximity to the chosen gene, based on one of the
following criteria:
- Expression (GNF Atlas1) -- similarity in gene expression, based on GNF
Atlas 1 data
- Protein Homology - BLASTP -- similarity in protein homology, based on the
BLASTP E-value
- Protein Homology - Rankprop -- similarity in protein homology, based
on the Rankprop algorithm
- Protein Homology - PSI-BLAST -- similarity in protein homology, based on
the PSI-BLAST E-value
- Pfam Similarity -- similarity based on number of shared domains
- Gene Distance -- absolute distance (left or right) on the chromosome
from the selected gene
- Chromosome -- list sorted by chromosomal location
- Name Similarity -- similarity to the name of selected gene, based on
the first several characters of the name
- Alphabetical -- list sorted by gene name
- GO Similarity -- number of Gene Ontology (GO) terms shared with
selected gene
- Choose the number of items to display from the display
pull-down menu (the default is 50).
- Press the Go! button to display your search results.
|
|
|
|
Understanding
the Gene Sorter display
|
|
|
The main page of the Gene Sorter displays a table
containing rows of genes and associated attributes. In most
cases, the currently-selected gene is shown at the top of
the list, highlighted in light green. The remaining genes
are ordered relative to the selected gene based on the sort
criteria specified in the sort by menu. For example,
in a table sorted by gene distance, the genes are listed in
order of greater to lesser chromosomal proximity to the
selected gene.
The initial Gene Sorter display shows only a default subset of
the columns available. The set of columns may be expanded,
reduced, and rearranged by using the Gene Sorter's
configuration utility. To view
information about the data shown in the column, click on
the column's label.
To select a different gene in the table, click on the name of
the gene. The Gene Sorter will move the gene entry to the top of the
list and highlight it. The remaining genes will be reordered
relative to the new selection.
Column descriptions (listed in alphabetical order)
-
# column: Displays the position of each gene in the table,
which is useful when examining data in tables with
many rows. Clicking on the gene number selects it and moves
it to the top of the list.
-
3' UTR Fold column: Shows the estimated
energy in kcal/mol of folding the 3' UTR of a
gene into the best predicted secondary structure.
The energy calculations and secondary structure
predictions were obtained from the RNAfold
program, which is part of the Vienna RNA
Package.
-
5' UTR Fold column: Shows the estimated
energy in kcal/mol of folding the 5' UTR of a
gene into the best predicted secondary structure.
The energy calculations and secondary structure
predictions were obtained from the RNAfold
program, which is part of the Vienna RNA
Package.
-
Abundance column (Yeast): Displays protein
abundance information, as reported in Ghaemmaghami
et al. Global analysis of protein
expression in yeast, Nature
425(6959), 737-741 (2003). For more
information, see the
Yeast GFP Fusion Localization
Database.
-
Arbeitman et al. 2002 Life-Cycle Expression Data column (Fruitfly):
Shows the median ratio of gene expression in various phases of the fly life cycle
relative to the expression of the gene in mixed egg-to-adult cultures.
The level of detail may be
increased or decreased with the configuration
utility. See the Expression columns
description for more information about the display and configuration
of this column. For more details about the experiments and methods used to
create this data, click on the column's label.
-
BLASTP Bits column: Shows the bit score measure of protein similarity
between a gene and the selected gene. The greater the
similarity between two proteins, the higher the bit score. For
more information about how the bit score was calculated, click
on the Bits column label.
-
BLASTP E-Value column: Shows the Blastp E-value (expectation value)
between each gene and the selected gene. The greater the
similarity of two proteins, the lower the E-value is.
Identical long proteins have an E-value of zero.
Formally, an E-value is the number of other known genes that
are expected to have at least this level of homology by chance.
An E-value of less than 0.1 can be safely interpreted as the
probability that a match this good would occur merely by
chance. For more information about how the E-value was
calculated, click on the E-Value column label.
Clicking on a gene's E-value displays an alignment
between the gene and the selected (highlighted)
gene.
-
C. elegans column: Shows the best Blastp match to the WormBase
protein set. Clicking on the ID number displays the
corresponding WormBase database record.
-
Coding SNPs column: Shows the Simple Nucleotide Polymorphisms (SNPs)
located in the coding region of each gene. Clicking on a SNP
ID displays the SNP record associated with the gene in
NCBI's dbSNP.
-
Description column: Shows a brief description of each gene taken from
its mRNA record. Clicking on a description displays a page
showing additional information and links for the gene.
-
Drosophila column: Shows the FlyBase ID of the best Blastp match to
the FlyBase protein set. Clicking on the FlyBase ID displays
the corresponding FlyBase report.
-
Ensembl column: Shows the Ensembl transcript ID associated with
the gene. Ensembl is an automatic gene prediction pipeline and
a major genome database and web site run by the Sanger
Institute and the European Bioinformatics Institute (EMBL-EBI).
It is especially effective at mapping genes that are known
in one organism to another organism. Compared to other gene
predictions, those of Ensembl tend to have high specificity
but low sensitivity to genes not already associated with
characterized mRNA or protein sequence. Clicking on an
Ensembl ID displays the Ensembl GeneView page for the gene.
-
Entrez Gene (Human): Formerly LocusLink. Shows
the NCBI Entrez Gene
ID associated with the gene. Clicking on the entry
displays the Entrez Gene record. If the record shows a
link to the Online Mendelian Inheritance in Man
(OMIM) -- indicated by an orange square with an
"O" inside it -- an OMIM record is available for
the gene.
-
Exon Count column: Displays the number of
exons in the gene (coding and non-coding).
-
Exp Delta column: Shows the similarity of the expression of
each gene to the selected gene. Genes with identical expression
profiles have a value of 0. This column shows data only for the
1000 genes (including splicing variants) that have the most
similar expression profiles. For more information about how
expression distance was calculated, click on the Exp Delta column
label.
-
Expression columns: Show the ratio of
expression of the gene in selected tissues or life cycle stages
to the expression of the gene overall.
A gene that is more highly expressed is colored red, and a less
expressed gene is shown in green. The values are colored on a
logarithmic scale. This coloring is standard, but is the opposite
of what an inexperienced user might expect: in this case, red means go
and green means stop! Black indicates that a gene is neither
over nor under expressed in the tissue. Uncolored boxes
(white on most browsers) represent missing data.
Various attributes of the expression column displays may be configured
using the configuration utility. Depending on the
organism, the user
may adjust the color scheme and brightness, toggle between expression
ratio and absolute expression values, and increase or decrease the
level of detail shown. In particular, color-blind users
may wish to switch the coloring from red/green to yellow/blue.
For more information about the selection criteria used for expression
columns, click on the column's label.
-
GenBank column: Shows the GenBank RefSeq or mRNA accession number
associated with the gene. Clicking on the accession number
displays the GenBank record associated with it.
-
Gene Ontology column: Shows the Gene Ontology (GO) terms associated with
the gene. GO terms are words from a controlled vocabulary
assigned to a gene by human curators. Clicking on a GO term
displays the associated Gene Ontology Consortium database
record.
-
Genome Position column: Shows the chromosomal location of each gene in
the genome. Clicking on a chromosomal position displays the
gene at that location in the UCSC Genome Browser.
-
GNF1M ID column (Mouse): Shows the
Affymetrix ID from the GNF1M chip that best
corresponds to each gene. For more details about
the experiments and methods used to
create this data, click on the column's label.
-
GNF Atlas2 column (Human): Shows the ID of
the probe in the GNF Atlas 2 that overlaps most
with the selected gene. The GNF Altas 2 is based
on two Affymetrix chips: the U133A and a
custom-designed GNF1H chip.
-
GNF Delta column: Shows the similarity of the
expression of each gene to the selected gene.
Genes with identical expression profiles have a
value of 0. This column shows data only for the
1000 genes (including splicing variants) that
have the most similar expression profiles.
-
GNF U74a, GNF U74b, GNF U74c columns (Mouse): Shows data
from the Mouse Gene Expression Atlas from the Genomics Institute of
the Novartis Research Foundation (GNF) on the Affymetrix U74a, U74b,
and U74c chips. By default, the columns display the median ratio of
expression in a specific set of tissues relative to the expression
of the gene overall. The level of detail may be
increased or decreased with the configuration
utility. Currently, the full spectrum of tissues is available only
on the U74a chip.
See the Expression columns
description for more information about the display and configuration
of this column. For more details about the experiments and methods used to
create this data, click on the column's label.
-
GNF U95 column (Human): Shows data from from the GNF Expression
Atlas on the Affymetrix U95 chip. By default, the column
displays the median ratio of expression in a specific set of tissues
relative to the expression of the gene overall. The level of detail may be
increased or decreased with the configuration
utility. See the Expression columns
description for more information about the display and configuration
of this column. For more details about the experiments and methods used to
create this data, click on the GNF U95 column's label.
-
Human column (Mouse): Shows that best Blastp match to the known genes
protein set from the UCSC Human Genome Browser database.
Clicking the accession number displays the gene in the UCSC
Human Genome Browser.
-
Kim-Lab Life-Cycle Expression Data (All) column (C. elegans):
Shows the ratio of gene in all phases of the worm life cycle relative to
the expression of the gene in mixed wild-type adult cultures.
See the Expression columns
description for more information about the display and configuration
of this column. For more details about the experiments and methods used to
create this data, click on the column's label.
-
Kim-Lab Life-Cycle Expression Data (Median) column (C. elegans):
Shows the median ratio of gene expression in selected phases of the worm life cycle.
See the Expression columns
description for more information about the display and configuration
of this column. For more details about the experiments and methods used to
create this data, click on the column's label.
-
Max GNF Atlas 2 column: Shows the
maximum absolute expression level for any tissue
in the GNF Gene Expression Atlas 2. Most of the
values fall in the zero to 50,000 range, but a
few outliers may range as high as 200,000. A
value of less than 20 indicates the expression
could not be detected in any tissue at levels
significantly above the cross-hybridization
controls.
-
Max GNF U95 column (Human): Shows the
maximum absolute expression level for any tissue.
Most values range between
zero and 30,000, but a few outliers may range as
high as 52,000. A value of less than 20 indicates
the expression could no be detected in any
tissue at levels significantly above the
cross-hybridization controls.
-
Max Rinn Sex column (Mouse): Shows the
maximum expression value in adult male and female
mouse tissue as described in (Rinn et al.,
Developmental Cell, 2004). For more information
about the methods used to generate these data,
click on the Max Rinn Sex column header.
-
Module column (Yeast): Shows the predicted
regulatory module (a combination of transcription
factor binding sites) that regulates the gene.
The approach underlying this annotation is
described in Segal, E. et al.,
Genome-wide discovery of
transcriptional modules from DNA sequence and gene
expression, Bioinformatics
19(Suppl 1), i273-i282 (2003). For more
information about the methods used, click the
Module column header. To view genes that share
a regulatory module, select the Regulatory
Module option from the sort by
menu.
-
MOE430 ID column (Mouse): Shows the
Affymetrix ID from the MOE430 series of chips
(A & B) that best corresponds to each gene.
For more details about the experiments and methods
used to create this data, click on the column's
label.
-
Mouse column (Human): Shows the accession
number of the best Blastp match to the known genes
protein set in the UCSC Mouse Genome Browser database.
Clicking on the accession number displays the gene in
the UCSC Mouse Genome Browser.
-
Name column: Displays the name of the gene. When possible,
the HUGO Gene Nomenclature Committee (HGNC) name is shown. If the gene does not yet have an HGNC
name, the GenBank accession number of the associated RefSeq
or mRNA record is shown instead. Clicking on the gene name
selects it and moves it to the top of the list.
-
% ID column: Shows the percentage identity at the protein
level between the gene and the selected gene. For more
information about how the % ID was
calculated, click on the % ID column label.
-
PDB column: Displays all Protein Data
Bank (PDB) IDs associated with the gene. PDB is a
database of proteins with known 3-D structures.
In some cases these records will correspond
to only a fragment of the gene. In other cases
the PDB record may include other molecules with
which the protein interacts. Clicking on a PDB
entry displays the associated PDB Structure
Explorer page.
-
Pfam Domains column: Shows a list of
Protein Family (Pfam) domains contained in the gene
product. Clicking on a domain displays the associated
Pfam record.
-
PSI-BLAST E-Value column: Shows the
PSI-BLAST E-value (expectation value) between the
UniProtKB or TrEMBL protein associated with the
gene and the protein associated with the selected
(highlighted) gene. The greater the similarity of
two proteins, the lower the E-value is. Identical
long proteins have an E-value of zero.
For more information about how the E-value was
calculated, click on the E-Value column label.
Clicking on a gene's E-value displays an alignment
between the gene and the selected (highlighted)
gene.
-
Rankprop column: Displays protein
similarity scores assigned by the Rankprop
algorithm. The scores reported in this column
range from zero to one, with one being the most
significant. Currently, Rankprop does not report
an E-value statistic. For more information on this
algorithm, click on the Rankprop column
label.
-
RefSeq column: Displays the NCBI RefSeq
accession associated with the gene, if available.
RefSeq genes are a non-redundant set of
high-quality mRNA sequences. Clicking on the accession number
displays the NCBI Entrez Gene record for a RefSeq accession. If
the record shows a link to the Online Mendelian Inheritance in
Man (OMIM) -- represented by an orange box with an "O"
on it -- an OMIM record is available for the gene. OMIM is
often an excellent source of human-curated information about
human genes.
-
Regulatory Motif column (Yeast): Shows the
predicted transcription factor binding sites that
correlate with expression of the gene.
regulatory module (a combination of transcription
factor binding sites) that regulates the gene.
The approach underlying this annotation is
described in Segal, E. et al.,
Genome-wide discovery of
transcriptional modules from DNA sequence and gene
expression, Bioinformatics
19(Suppl 1), i273-i282 (2003). For more
information about the methods used, click the
Module column header.
-
SGD ORF: Shows the Saccharomyces Genome
Database (SGD) ID associated with the gene.
-
SP Acc column: Shows the UniProtKB
protein accession of a gene. Clicking on an entry
displays the corresponding UniProtKB NiceProt
view of the protein.
-
Superfamily column: Displays a list of
Structural Classification of Protein (SCOP)
superfamilies associated with a protein. The gene
set was mapped to SCOP superfamilies using the
Superfamily HMM library. Clicking on an entry
displays the associated Superfamily record.
-
UniProtKB column: Displays the UniProtKB
protein name of each gene,
if it is available. Otherwise, it shows the primary
accession number. Clicking on a protein name or accession
displays the corresponding UniProtKB NiceProt view of the
protein.
-
U133 ID column (Human): Shows the Affymetrix ID
from the HG-U133 chip that best
corresponds to each gene. For more information about
the selection criteria used for this column, click on
the U133 ID column's label.
-
U133Plus2 ID column (Human): Shows the
Affymetrix ID from the HG-U133 Plus 2.0 chip that
best corresponds to each gene. For more
information about the selection
criteria used for this column, click on the
U133Plus2 ID column's label.
-
U74 ID column (Mouse): Shows the Affymetrix ID
from the U74 series of chips (a, b, c) that best
corresponds to each gene. For more information about
the selection criteria used for this
column, click on the U74 ID column's label.
-
U95 ID column (Human): Shows the Affymetrix ID
from the HG-U95 chip that best corresponds to each gene.
For more information about the selection criteria used
for this column, click on the U95 ID column's label.
-
UCLA Long Expression column (Human): Shows UCLA
expression data from normal tissues on the U133 chip.
This column shows the ratio of expression of a gene in the entire
tissue set relative to the expression of the gene overall.
See the Expression columns
description for more information about the display and configuration
of this column. For more information about the methods used to
generate this data, click on the UCLA Long Expression column's label.
-
UCLA Short Expression column (Human): Shows UCLA
expression data from normal tissues on the U133 chip.
This column shows the ratio of expression of a gene in a particular
subset of tissues relative to the expression of the gene overall.
See the Expression columns
description for more information about the display and configuration
of this column. For more information about the methods used to
generate this data, click on the UCLA Short Expression column's label.
-
WormBase column (C. elegans): Shows the ORF name
associated with each gene. Clicking on an
ORF name displays the associated WormBase record.
-
Yeast column: Shows the best Blastp match to the Saccharomyces
Genome Database (SGD) protein set. Clicking on the yeast ID
displays the corresponding SGD record.
-
Zebrafish column: Shows the Ensembl peptide ID of the best Blastp
match to the Ensembl gene predictions on the zebrafish genome.
Clicking on the Ensembl peptide ID displays it in Ensembl
Zebrafish Protein View.
|
|
|
|
Configuring the
Gene Sorter display
|
|
|
The Gene Sorter is highly configurable, allowing you
to fine-tune the display to show just the genes and data
columns in which you're interested in an order that best suits your research
needs. Most of the configuration is controlled through settings on
the Configuration page, accessed via the configure button at
the top of the Gene Sorter page.
Changing the number of rows displayed
To increase or decrease the number of rows shown in the table, pick a
new value from the display pull-down menu, then click the Go!
button.
Changing the number of columns displayed
By default, the Gene Sorter shows only a small subset of the
table columns available for the genome. You can view the full list of columns,
or add or remove columns from your display, on the Configuration page.
The configuration table shows all the
columns available for the currently-selected genome, listed in left-to-right
display order. To add or remove a column from the Gene Sorter display, click the
On checkbox to toggle the setting (a check indicates that the column
is displayed). To quickly change the On settings of all columns,
click the Hide All or Show All button at the top of the page. Click the
Submit button to display the changes in the Gene Sorter.
Changing the column positions
In addition to adding or removing columns, it is
also possible to move the columns to the left or right within the Gene Sorter
table. The order of the column names in the configuration table indicates the
current relative position of the columns in the Gene Sorter display from left to
right. To shift a column one position to the left, click the up arrow in the
item's Position column. Similarly, click the down arrow to shift a
column to the right. When you have finished making changes, click the Submit
button.
Changing the expression colors
By default, the gene expression ratios are shown using a red/green color
scheme,
where red indicates a gene that is more highly expressed and green corresponds
to less expression. Color-blind users may find it helpful to switch the
coloring from red/green to yellow/blue. To do so, select the "yellow high/blue
low" option from the Expression ratio colors pull-down menu on the
Configuration page, then click the Submit button.
Changing the brightness of expression colors
To increase or decrease the brightness of the colors in an expression
column, edit the brightness value for the corresponding entry in the
configuration table. Values greater than 1.0 increase the brightness, while
those less than 1.0 dim the color. Click the Submit button to display the
new values.
Changing the type of tissue data shown in expression columns
By default, the expression columns show the median ratio of expression
of a gene in a small selected set of tissues. Use the the tissues
pull-down menu to configure the tissue display for the column.
The "all replicas" option will show the value
of each individual experimental replica of each tissue. The "median
of replicas" option displays a single value for each tissue that
represents the median of all replicas for that tissue.
Toggling between ratio and absolute expression values
By default, expression columns show the ratio of expression of a gene relative
to expression of the gene overall. To view absolute expression values instead,
select the "absolute" option from the values pull-down menu.
Displaying splicing variants
By default, the Gene Sorter shows only one splicing variant: the one that produces
the largest protein. To show all splicing variants, click the Show all
splicing variants checkbox. Note that in most cases, the column values
(and sometimes the names) will be identical across variants.
Restoring the default settings
At any time during your Gene Sorter session, you can restore the Gene Sorter table
to its default layout by clicking the Default button on the Configuration page,
then clicking Submit.
Saving a configuration for future use
The Gene Sorter configuration utility allows you to store multiple configurations
for use in future sessions. This feature is particularly useful if you require
different layouts for different research uses.
To save the current configuration of the Gene Sorter layout, click the Save button
on the Configuration page. Type in a name for the configuration in the text
box at the top of the page, then click Save.
Loading a previously-saved configuration
Once you have saved a configuration, you can load it back into your Gene Sorter
in a future session. To load a configuration, click the Load button on the
Configuration page. The Gene Sorter will display a list of the names of your saved
configurations. Click on a name to highlight it, then click Load to
reconfigure your Gene Sorter based on the saved settings.
Viewing a list of saved configurations
To display a list of configurations that you have saved, click the Save button
on the Configuration page. If you have any saved configurations, the Gene Sorter
will display an Existing Setups list that shows the configuration names. To
permanently remove a configuration from the list, click on the name to
highlight it, then click the Delete Existing Setup button.
|
|
|
|
Filtering the gene display
|
|
|
The Gene Sorter's gene filtering capabilities provide a versatile
way to fine-tune the display to show just the genes in which you are
interested. Filters are applied to individual gene fields, and may be
combined to increase the specificity of the search. To access the Filter page,
click the filter button at the top of the Gene Sorter page.
At any time during the filter setup process, you can click the List Names
button on the Filter page to view a list of genes
that will be returned when the current filter settings are applied to the
genome. You may find this list helpful in fine-tuning the filter.
Filtering based on matching one or more terms
Filters based on names, IDs, or other words restrict the display
to only those genes that match one or more terms typed into the
search text box.
Examples of values that can be filtered on this basis include the gene
name, RefSeq accession number, gene description, coding SNPs, and GO terms.
This search supports wildcard matching on "*" and "?".
Multiple terms must be separated by a space or tab. For example,
the search criteria "HOXA9 FOX*" on the gene name field returns the gene
named HOXA9 and any gene whose name begins with the letters "FOX".
When searching on fields that consist of values containing more than one
word (GO terms, coding SNPs, Pfam domains, and gene descriptions), the
multi-word
elements must be enclosed in single quotes. For instance, a search on the
description phrase "forkhead box protein" should be entered as
'forkhead box protein'.
Use the
"any" and "all" options to determine whether the search should
return any gene that matches any term ("any") or only those genes that
match all terms ("all").
To facilitate searching on multiple terms, the Gene Sorter provides
the option to paste in or upload a list of search terms. To paste in a list
of terms, click the filter's Paste List button, then paste or type the
terms into the text box. Terms must be separated by a space, a tab, or be
entered on separate lines, and may not include wildcards. When you have
completed the list, click the Submit button to return to the main Filter page.
The file upload utility - accessed via the Upload List button - has a similar
functionality.
Filtering based on numerical ranges
Several of the gene fields can be filtered by specifying a numerical range
within which the value must fall. Examples of
fields in this category include expression ratios, Blastp data, and genome
position. To use this type of filter, enter the minimum and maximum
values delimiting the range in which you are interested. In some cases, the
range of valid values is indicated in the filter box.
The genome position filter requires the name of a chromosome (in the
format chrN) in addition
to the chromosomal start and end positions. To list all genes on a chromosome,
enter only the chromosome name.
Expression filters include "any" and "all" options to
determine whether the search should
return a gene if any of the tissue expression values meet the minimum and
maximum criteria ("any") or only if all tissue expression values meet
the search criteria ("all").
Saving filter settings
The Gene Sorter provides a mechanism for saving filter settings for use in future
sessions. To preserve the current filter configuration, click the Save Filter
button on the Filter page. Type in a name for the filter, then click Save
to save the filter and return to the Filter page.
Loading a saved filter
Once a filter configuration has been saved, you can retrieve it in later
sessions by loading it back into your Gene Sorter. To load the saved filter
settings, click the Load Filter button on the Filter page. Click on the
name of the filter you wish to load, then click the Load button. Click
the Submit button on the Filter page to apply the filter settings to the
Gene Sorter.
Viewing a list of saved filters
To display a list of filter settings that you have saved, click the Save button
on the Filter page. If you have any saved filters, the Gene Sorter
will display an Existing Setups list that shows the filter names. To
permanently remove a filter from the list, click on the name to
highlight it, then click the Delete Existing Setup button.
|
|
|
|
Displaying sequence and
text-based output
|
|
|
The Gene Sorter's graphical presentation of data facilitates the visual
observation of relationships and patterns among the genes in the
display. However, it is often useful to convert the data to a text-based
format that can be easily saved to a file or loaded into another program,
database, or spreadsheet for further analysis. The Gene Sorter provides
a mechanism for saving the current display in a tab-delimited text file or
showing a text-based view of the sequence underlying the current display.
Creating text-based output
To output the current Gene Sorter table as text, click the text button at the
top of the page. The Gene Sorter will display each row of table data
on a separate tab-delimited line.
Viewing the underlying sequence
To display the protein, mRNA, or genomic sequence underlying the current
Gene Sorter table, click the sequence button at the top of the
page. On the Get Sequence page, select the desired sequence configuration
settings that you'd like, then click the Get Sequence button. The Gene Sorter will
display a text-based list of FASTA format records for each gene displayed
in the table. The FASTA records may be cut and pasted into
Blat for further study.
|
|
|
|
|