|
The bigChain format describes a pairwise alignment that allow gaps in both sequences
simultaneously, just as Chain files do, but
bigChain files are compressed and indexed as bigBeds. bigChain files are created using
the program bedToBigBed with a special AutoSQL file that defines the fields
of the bigChain. The resulting bigChain files are in an indexed binary format. The main
advantage of the bigChain files is that only portions of the files needed to display a
particular region are transferred to UCSC. So for large data sets, bigChain is
considerably faster than regular Chain files. The bigChain file remains on
your web accessible server (http, https, or ftp), not on the UCSC server.
Only the portion that is needed
for the chromosomal position you are currently viewing is locally cached as a
"sparse file".
Big Chain
The following AutoSQL definition is used for bigChain pairwise alignment files.
This is the bigChain.as
file defined by the -as option when using bedToBigBed.
table bigChain
"bigChain pairwise alignment"
(
string chrom; "Reference sequence chromosome or scaffold"
uint chromStart; "Start position in chromosome"
uint chromEnd; "End position in chromosome"
string name; "Name or ID of item, ideally both human readable and unique"
uint score; "Score (0-1000)"
char[1] strand; "+ or - for strand"
uint tSize; "size of target sequence"
string qName; "name of query sequence"
uint qSize; "size of query sequence"
uint qStart; "start of alignment on query sequence"
uint qEnd; "end of alignment on query sequence"
uint chainScore; "score from chain"
)
Note that the bedToBigBed utility uses a substantial amount of
memory; somewhere on the order of 1.25 times more RAM than the
uncompressed BED input file.
To create a bigChain track, follow these steps:
- If you already have a Chain file you would like to convert to a bigChain, skip to Step 3,
otherwise download the example
Chain file for the Human
GRCh38(hg38) assembly.
- Download the AutoSQL files needed by bedToBigBed:
- Download the bedToBigBed and hgLoadChain programs from the
directory
of binary utilities.
- Use the fetchChromSizes script from the same
directory
to create a chrom.sizes file for the UCSC database you are working with
(e.g. hg38). Alternatively, you can download the chrom.sizes file for
any assembly hosted at UCSC from our
downloads page (click on "Full data set" for any assembly). For example, for the hg38
database, the hg38.chrom.sizes are located at
http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes.
- Generate the chain.tab and link.tab files needed to create our bigChain
file with the hgLoadChain utility:
hgLoadChain -noBin -test hg38 bigChain chr22_KI2707731v1_random.hg38.mm10.rbest.chain
- Create the bigChain file from your input file using a combination of sed,
awk and the bedToBigBed utility like so:
sed 's/.000000//' chain.tab | awk 'BEGIN {OFS="\t"} {print $2, $4, $5, $11, 1000, $8, $3, $6, $7, $9, $10, $1}' > chr22_KI270731v1_random.hg38.mm10.rbest.bigChain
bedToBigBed -type=bed6+6 -as=bigChain.as -tab chr22_KI270731v1_random.hg38.mm10.rbest.bigChain hg38.chrom.sizes bigChain.bb
- You must also create a binary indexed link file along with your bigChain file in order to display
your data in the browser. Use the following commands to generate a link file:
awk 'BEGIN {OFS="\t"} {print $1, $2, $3, $5, $4}' link.tab | sort -k1,1 -k2,2n > bigChain.bigLink
bedToBigBed -type=bed4+1 -as=bigLink.as -tab bigChain.bigLink hg38.chrom.sizes bigChain.link.bb
- Move the newly created bigChain and bigLink files (bigChain.bb and
bigChain.link.bb) to an http, https, or ftp location.
- Construct a custom track using a single
track line.
Note that any of the track attributes listed
here are applicable
to tracks of type bigBed.
The most basic version of the "track" line will look something
like this:
track type=bigChain name="My Big Chain" bigDataUrl=http://myorg.edu/mylab/bigChain.bb linkDataUrl=http://myorg.edu/mylab/bigChain.link..bb
- Paste this custom track line into the text box on the
custom track management page.
The bedToBigBed program can also be run with several additional options.
Run bedToBigBed with no arguments to view a ful list of available options.
Example One
In this example, you will use an existing bigChain file to create a bigChain
custom track. A bigChain file that contains data on the hg38
assembly has been placed on our http server.
You can create a custom track using this bigChain file by constructing a
"track" line that references this file like so:
track type=bigChain name="bigChain Example One"
description="A bigChain file"
bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.bb
linkDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.link.bb
Paste the above "track" line into the
custom track management page for the
human assembly hg38 (Dec. 2013), then press the submit button.
Custom tracks can also be loaded via one URL line. The below link loads the same
bigChain track, but includes parameters on the URL line:
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr22_KI270731v1_random&hgct_customText=track%20type=bigChain%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.bb%20linkDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.link.bb%20visibility=pack
With this example bigChain loaded, click into a chain from the track. Note
that the details page displays information about the individual chains,
similar to a standard chain track.
Example Two
In this example, you will create your own bigChain file from an existing
bigChain input file.
- Save this
Chain file to your machine
(satisfies above step 1).
- Save these bigChain.as and
bigLink.as files to your machine
(Step 2).
- Download the bedToBigBed and hgLoadChain
utilities
(step 3).
- Save this hg38.chrom.sizes text file to your machine.
It contains the chrom.sizes for the human (hg38) assembly
(step 4).
- As in steps 5-7 above, run the utilities to create the bigChain and bigLink output files
(step 5-7).
- Place the bigChain and bigLink files you just created (bigChain.bb and
bigChain.link.bb) on a web-accessible server (step 8).
- Construct a "track" line that points to your bigChain file
(see step 9).
- Create the custom track on the human assembly hg38 (Dec. 2013), and
view it in the Genome Browser (see step 10).
Sharing Your Data with Others
If you would like to share your bigChain data track with a colleague, learn
how to create a URL by looking at Example 11 on
this page.
Extracting Data from the bigChain Format
Since the bigChain files are an extension of bigBed files, which are indexed binary files,
they can be difficult to
extract data from. We have developed the following
programs, all of which are available from the
directory of binary
utilities.
- bigBedToBed — this program converts a bigBed file
to ASCII BED format.
- bigBedSummary — this program extracts summary information
from a bigBed file.
- bigBedInfo — this program prints out information about a
bigBed file.
As with all UCSC Genome Browser programs, simply type the program name
at the command line with no parameters to see the usage statement.
Troubleshooting
If you encounter an error when you run the bedToBigBed program,
it may be because your input bigChain file has data off the end of a chromosome.
In this case, use the bedClip program
here before the
bedToBigBed program. It will remove the row(s) in your input BED
file that are off the end of a chromosome.
| |