axt alignment files are produced from Blastz,
an alignment tool available from
Webb Miller's lab at Penn State University.
The axtNet and axtChain alignments are produced by
processing the alignment files with additional utilities
written by Jim Kent at UCSC.
Example:
The following segment from an axt file shows the first 2
sets of alignments of the human assembly (the aligning
assembly) to mouse chromsome 19 (the primary assembly).
0 chr19 3001012 3001075 chr11 70568380 70568443 - 3500
TCAGCTCATAAATCACCTCCTGCCACAAGCCTGGCCTGGTCCCAGGAGAGTGTCCAGGCTCAGA
TCTGTTCATAAACCACCTGCCATGACAAGCCTGGCCTGTTCCCAAGACAATGTCCAGGCTCAGA
1 chr19 3008279 3008357 chr11 70573976 70574054 - 3900
CACAATCTTCACATTGAGATCCTGAGTTGCTGATCAGAATGGAAGGCTGAGCTAAGATGAGCGACGAGGCAATGTCACA
CACAGTCTTCACATTGAGGTACCAAGTTGTGGATCAGAATGGAAAGCTAGGCTATGATGAGGGACAGTGCGCTGTCACA
Structure
Each alignment block in an axt file contains three lines:
a summary line and 2 sequence lines.
Blocks are separated from one another by blank lines.
1. Summary line
0 chr19 3001012 3001075 chr11 70568380 70568443 - 3500
The summary line contains chromosomal position and size
information about the alignment. It consists of 9 required
fields:
-
Alignment number -- The alignment numbering
starts with
0 and increments by 1, i.e. the first alignment in a file
is numbered 0, the next 1, etc.
-
Chromosome (primary organism)
-
Alignment start (primary organism) -- The
first base is numbered 1.
-
Alignment end (primary organism) -- The end
base is included.
-
Chromosome (aligning organism)
-
Alignment start (aligning organism)
-
Alignment end (aligning organism)
-
Strand (aligning organism) -- If the strand
value is "-", the values of the aligning organism's
start and end fields are relative to the reverse-complemented
coordinates of its chromosome.
-
Blastz score -- Different blastz scoring
matrices are used for different organisms. See the README.txt
file in the alignments directory for scoring information
specific to a pair of alignments.
2. & 3. Sequence lines
TCAGCTCATAAATCACCTCCTGCCACAAGCCTGGCCTGGTCCCAGGAGAGTGTCCAGGCTCAGA
TCTGTTCATAAACCACCTGCCATGACAAGCCTGGCCTGTTCCCAAGACAATGTCCAGGCTCAGA
The sequence lines contain the sequence of the primary
assembly (line 2) and aligning assembly (line 3) with
inserts. Repeats are indicated by lower-case letters.
|