Quick Start Guide to Assembly Hubs

Assembly Hubs allow researchers to create Track Data Hubs on assemblies that are not in the UCSC Browser. By including the underlying reference sequence in UCSC twoBit format, as well as data tracks, researchers can browse and annotate any genome. For more information please refer to the Assembly Hub Wiki. Below is also a section about starting GBiB Assembly Hubs.

STEP 1:

Arabidopsis thaliana

wget -r --no-parent --reject "index.html*" -nH --cut-dirs=3 http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/

if you do not have wget installed,

curl -O http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/hub.txt

If you use curl, be sure to recreate the structure with matching araTha1 and araTha1/bbi directories. Double check you have all the files by looking here: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/

STEP 2: Paste your hub.txt link (http://yourURL/hub.txt) into the My Hubs tab of the Track Data Hubs page, click the "Add Hub" button, and then click the "Genome Browser" link from the top bar.

http://genome.ucsc.edu/cgi-bin/hgHubConnect?hgHub_do_redirect=on&hgHubConnect.remakeTrackHub=on&hgHub_do_firstDb=1&hubUrl=http://yourURL/hub.txt

This URL should work the same as using the original data just copied:
http://genome.ucsc.edu/cgi-bin/hgHubConnect?hgHub_do_redirect=on&hgHubConnect.remakeTrackHub=on&hgHub_do_firstDb=1&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/hub.txt

STEP 3: Congratulations! Your assembly hub should display!

If you are having problems, be sure all your files and the directories are publicly-accessible. You may also wish to reset the browser occastionally to clear all existing data. For hubs to work, your server must also accept byte-ranges. You can check using the following command to verify "Accept-Ranges: bytes" displays: curl -I http://yourURL/hub.txt

Now that you have the assembly hub copied from above, you can copy the directory and start to edit some of the documents like genomes.txt, groups.txt, and trackDb.txt to understand how they work. Refer to the Assembly Hub Wiki to understand how to build a twoBit file for your own original fasta files. Read more about trackDb settings in the definition document.

here

Please note that the Browser waits 300 seconds before checking for any changes to these files. When editing hub.txt, genomes.txt, trackDb.txt, and related hub files shorten this delay by adding udcTimeout=1 to your URL. For more information, please see the Debugging and Updating Track Hubs section of the Track Hub User Guide. Also, for more detailed instructions on setting up a regular hub, please see the Setting Up Your Own Track Hub section of the Track Hub User Guide.

Setting up Blat for an Assembly Hub

Starting Blat for an Assembly Hub
By running gfServers from your institution, you can enable blat on your assembly hubs.

Setting up an Assembly Hub on GBiB with Blat included

Starting a Blat enabled Assembly Hub on GBiB
With an operational Genome Browers in a Box (GBiB), you can quickly and easily acquire an example assembly hub and run gfServers locally on the GBiB to enable blat.

Resources:

Starting Blat for an Assembly Hub

From the location of yourAssembly.2bit file, http://yourURL/yourAssembly/yourAssembly.2bit, you can start two gfServers, specifing a port for the assembly hub to access amino acid sequence, 17777 -trans, or DNA sequence, 17779, in this example:

gfServer start localhost 17777 -trans -mask yourAssembly.2bit & gfServer start localhost 17779 -stepSize=5 yourAssembly.2bit &

Then you can edit the genomes.txt file of your assembly hub to include two lines in the stanza referring to yourAssembly, that would have matching port numbers:

transBlat yourLab.yourInstitution.edu 17777 blat yourLab.yourInstitution.edu 17779

See an example genomes.txt with commented out lines here, and please note capital B in transBlat. For more information, see the "Adding BLAT servers" section of the Assembly Hub Wiki. The Source Downloads page offers access to utilities with pre-compiled binaries such as gfServer found in a blat/ directory for your machine type here and further blat documentation here. Please note that because the -mask option in the above 17777 -trans gfServer option will mask all lower-case sequence from being matched, you may not wish to include it. See the above blat links and gfServer usage statement for more information.

If you are having trouble with your blat servers connecting to the browser it may help to know that some institutions have firewalls that will prevent the browser from sending multiple inquiries, in which case you may need to request your admins add this IP range as exceptions that are not limited: 128.114.119.* That will cover the U.S. genome.ucsc.edu site. In case you may wish the requests to work from our European Mirror genome-euro.ucsc.edu site, you would want to include 129.70.40.120 also to the exception list.

Starting a Blat enabled Assembly Hub on GBiB

Acquiring GBiB and Assembly Hub

Install GBiB

http://genome.ucsc.edu/goldenPath/help/gbib.html

You may wish to read this blog post and the related OpenHelix blog post.

SSH into your GBiB

2. With your GBiB operational, use your computer's terminal program to ssh into your GBiB: ssh browser@localhost -p 1235, using browser for the password.

Wget example hub

3. Navigate to the GBiB's folders directory and then use sudo to wget this assembly hub:

cd /folders

sudo wget -r --no-parent --reject "index.html*" -nH --cut-dirs=3 http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/

Load hub on your GBiB

http://127.0.0.1:1234/cgi-bin/hgGateway?genome=araTha1&hubUrl=http://127.0.0.1:1234/folders/hubExamples/hubAssembly/plantAraTha1/hub.txt

Enabling Blat

Acquire gfServer utility to create local blat servers

5. To enable blat you must acquire the gfServer utility. The UCSC Genome Browser and Blat software are free for academic, nonprofit, and personal use. Commercial download and installation of the Blat and In-Silico PCR software may be licensed through Kent Informatics (http://www.kentinformatics.com).

You can obtain just the gfServer utility on your GBiB with the following command that will create a bin directory and install the tool.

mkdir ~/bin -p; rsync -avP hgdownload.cse.ucsc.edu::genome/admin/exe/linux.x86_64/blat/gfServer ~/bin/

The GBiB also includes a tool you can run on the commandline to download an entire suite of tools including gfServer: gbibAddTools

Enable blat lines in genomes.txt

6. Navigate to the genomes.txt file of this assembly hub: cd /folders/hubExamples/hubAssembly/plantAraTha1/

Edit the currently commented-out blat lines with sudo vi genomes.txt and use "x" when the cursor is over the # at the start of the line to remove it and :w! to save the changes and :q to quit.

blat localhost 17779 transBlat localhost 17777

Please note that if you loaded your hub earlier, it will take five minutes (300 seconds) for the browser to check for any changes to genomes.txt, and that this delay can be shortened temporarily by adding &udcTimeout=10 to the URL. See more information in the Debugging and Updating section of the Track Hub User Guide.

Start gfServers on your assembly hub

7. Change directories to the 2bit file: cd /folders/hubExamples/hubAssembly/plantAraTha1/araTha1

Run the two gfServer commands to start the blat servers:

gfServer start localhost 17777 -trans -mask araTha1.2bit & gfServer start localhost 17779 -stepSize=5 araTha1.2bit &

Load hub and use blat

8. Load this plant assembly hub by using this URL and selecting it under the "group" category where "Plant araTha1" displays:

http://127.0.0.1:1234/cgi-bin/hgGateway?genome=araTha1&hubUrl=http://127.0.0.1:1234/folders/hubExamples/hubAssembly/plantAraTha1/hub.txt

On the blat page, http://127.0.0.1:1234/cgi-bin/hgBlat, you can now select the Arabidopsis thaliana assembly and blat plant amino acid sequences, like IYQTRENKYIIGEIQITESERDRRRSSLPGNH or DNA sequences, like TAAGTAAAAAATAATATGATTAAGACTAATAAATCTTAATAGTTAATACT.