Schema for Assembly - Assembly from Fragments
  Database: hg38    Primary Table: gold    Row Count: 45,830   Data last updated: 2019-06-25
fieldexampleSQL type info
bin 585smallint(6) range
chrom chr1varchar(255) values
chromStart 10000int(10) unsigned range
chromEnd 10615int(10) unsigned range
ix 2int(11) range
type Fchar(1) values
frag AP006221.1varchar(255) values
fragStart 36116int(10) unsigned range
fragEnd 36731int(10) unsigned range
strand -char(1) values

Sample Rows
 
binchromchromStartchromEndixtypefragfragStartfragEndstrand
585chr110000106152FAP006221.13611636731-
73chr1106151774173FAL627309.15102166904+
586chr11774172076664FFO538757.3200032249+
73chr12576662979686FAP006222.1040302+
73chr13479685016178FAL732372.150153649-
73chr15016175359889PFO681485.2034371-
73chr158598869753711FAC114498.20111549+
73chr169753783505012FAL669831.130137513+
591chr183505083533313OKF495845.10283-
591chr183533387197714FAL669831.13137796174440+

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

Assembly (gold) Track Description
 

Description

This track shows the contigs used to construct the GRCh38 (hg38) genome assembly, as defined in the AGP file delivered with the sequence. For information on the AGP file format, see the NCBI AGP Specification. The NCBI website also provides an overview of genome assembly procedures, as well as specific information about the hg38 assembly.

In dense mode, this track depicts the contigs that make up the currently viewed scaffold. Contig boundaries are distinguished by the use of alternating gold and brown coloration. Where gaps exist between contigs, spaces are shown between the gold and brown blocks. The relative order and orientation of the contigs within a scaffold is always known; therefore, a line is drawn in the graphical display to bridge the blocks.

Component types found in this track (with counts of that type in parenthesis):

  • F - finished sequence (35,798)
  • O - other sequence (8,536)
  • W - whole genome shotgun (764)
  • P - pre draft (16)
  • D - draft sequence (8)
  • A - active finishing (8)

In addition to the standard nucleotide codes, the raw sequence files from NCBI also include IUPAC ambiguity codes for bases that could not be positively identified as A, C, G or T (see Wikipedia's IUPAC notation article for more information). As part of the UCSC assembly creation process, all IUPAC ambiguity characters are converted to Ns. The FASTA files available for download from UCSC reflect this. The raw data files containing the original IUPAC characters can be downloaded from the NCBI FTP site.

The following table lists the counts by chromosome of the various IUPAC ambiguity characters in the original NCBI data files:

chromosome
1 2 3 6 7 9 10 12 13 16 17 21 22 X Y Total
code
B 1 1 2
K 1 4 1 2 8
M 1 1 3 1 2 8
R 1 1 1 1 1 13 1 3 1 2 1 1 27
S 1 1 1 1 1 5
W 2 2 6 1 1 1 1 14
Y 4 3 1 2 2 8 2 2 5 2 2 2 35
Total 2 9 7 1 4 3 36 3 3 1 12 3 5 5 5 99