Schema for Assembly - Assembly from Fragments

Home
Genomes
Genome Browser
Tools
Mirrors
- Third Party Mirrors
- Mirroring Instructions
Downloads
My Data
Help
About Us
- News
- Publications
- Blog
- Cite Us
- Credits
- Release Log
- Staff
- Contact Us
- Conditions of Use
- Jobs
- Licenses

field

example

SQL type

info

bin

585

smallint(6)

range

chrom

chr1

varchar(255)

values

chromStart

10000

int(10) unsigned

range

chromEnd

10615

int(10) unsigned

range

ix

int(11)

range

type

char(1)

values

frag

AP006221.1

varchar(255)

values

fragStart

36116

int(10) unsigned

range

fragEnd

36731

int(10) unsigned

range

strand

char(1)

values

bin

chrom

chromStart

chromEnd

type

frag

fragStart

fragEnd

strand

585

chr1

10000

10615

AP006221.1

36116

36731

chr1

10615

177417

AL627309.15

102

166904

586

chr1

177417

207666

FO538757.3

2000

32249

chr1

257666

297968

AP006222.1

40302

chr1

347968

501617

AL732372.15

153649

chr1

501617

535988

FO681485.2

34371

chr1

585988

697537

AC114498.2

111549

chr1

697537

835050

AL669831.13

137513

591

chr1

835050

835333

KF495845.1

283

591

chr1

835333

871977

AL669831.13

137796

174440

Description

This track shows the contigs used to construct the GRCh38 (hg38) genome assembly, as defined in the AGP file delivered with the sequence. For information on the AGP file format, see the NCBI AGP Specification. The NCBI website also provides an overview of genome assembly procedures, as well as specific information about the hg38 assembly.

In dense mode, this track depicts the contigs that make up the currently viewed scaffold. Contig boundaries are distinguished by the use of alternating gold and brown coloration. Where gaps exist between contigs, spaces are shown between the gold and brown blocks. The relative order and orientation of the contigs within a scaffold is always known; therefore, a line is drawn in the graphical display to bridge the blocks.

Component types found in this track (with counts of that type in parenthesis):

F - finished sequence (35,798)

O - other sequence (8,536)

W - whole genome shotgun (764)

P - pre draft (16)

D - draft sequence (8)

A - active finishing (8)

In addition to the standard nucleotide codes, the raw sequence files from NCBI also include IUPAC ambiguity codes for bases that could not be positively identified as A, C, G or T (see Wikipedia's IUPAC notation article for more information). As part of the UCSC assembly creation process, all IUPAC ambiguity characters are converted to Ns. The FASTA files available for download from UCSC reflect this. The raw data files containing the original IUPAC characters can be downloaded from the NCBI FTP site.

The following table lists the counts by chromosome of the various IUPAC ambiguity characters in the original NCBI data files:

chromosome

Total

code

Total