IMGT/BlastSearch Documentation

Version:
1.1.1 (2017/11/07) using NCBI WWWBLAST 2.2.26

BLAST search programs

The NCBI BLAST family of programs includes:

blastp
compares an amino acid query sequence against a protein sequence database
blastn
compares a nucleotide query sequence against a nucleotide sequence database
blastx
compares a nucleotide query sequence translated in all reading frames against a protein sequence database
tblastn
compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames
tblastx
compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database. Please note that tblastx program cannot be used with the nr database on the BLAST Web page.

IMGT® databases

Summary

Size Description Updated on
178,936 sequences; 197,057,871 total bases IMGT/LIGM-DB nucleotide sequences 2017-11-19
7,175 sequences; 1,583,245 total bases IMGT/GENE-DB nucleotide sequences (F+ORF+inframeP) 2017-11-19
8,076 sequences; 1,833,990 total bases IMGT/GENE-DB nucleotide sequences (F+ORF+allP) 2017-11-19
14,417 sequences; 3,013,982 total residues IMGT/2Dstructure-DB and IMGT/3Dstructure-DB amino acid chain sequences 2017-11-19
1,138 sequences; 289,552 total residues IMGT/2Dstructure-DB amino acid chain sequences 2017-11-19
13,279 sequences; 2,724,430 total residues IMGT/3Dstructure-DB amino acid chain sequences 2017-11-19
27,877 sequences; 2,924,652 total residues IMGT/2Dstructure-DB and IMGT/3Dstructure-DB amino acid domain sequences 2017-11-19
15,434 sequences; 1,430,965 total residues IMGT/DomainDisplay amino acid reference sequences 2017-11-19

Nucleotide sequences

IMGT/LIGM-DB
Nucleotide sequences of IG and TR from human and other vertebrate species.
IMGT/GENE-DB
Nucleotide reference sequences of IG and TR genes in IMGT/GENE-DB.

Amino acid sequences

IMGT/3Dstructure-DB
Amino acid sequences of IG, TR, MH and RPI from 3D structures in IMGT/3Dstructure-DB.
IMGT/DomainDisplay
Amino acid sequences of IG, TR, MH and RPI domains from the IMGT reference directory.

Options

HISTOGRAM
Display a histogram of scores for each search; default is yes (see parameter H in the BLAST Manual).
DESCRIPTIONS
Restricts the number of short descriptions of matching sequences reported to the number specified; default limit is 100 descriptions (see parameter V in the manual page). See also EXPECT and CUTOFF.
ALIGNMENTS
Restricts database sequences to the number specified for which high-scoring segment pairs (HSPs) are reported; the default limit is 50. If more database sequences than this happen to satisfy the statistical significance threshold for reporting (see EXPECT and CUTOFF below), only the matches ascribed the greatest statistical significance are reported (see parameter B in the BLAST Manual).
EXPECT
The statistical significance threshold for reporting matches against database sequences; the default value is 10, such that 10 matches are expected to be found merely by chance, according to the stochastic model of Karlin and Altschul (1990). If the statistical significance ascribed to a match is greater than the EXPECT threshold, the match will not be reported. Lower EXPECT thresholds are more stringent, leading to fewer chance matches being reported. Fractional values are acceptable (see parameter E in the BLAST Manual).
CUTOFF
Cutoff score for reporting high-scoring segment pairs. The default value is calculated from the EXPECT value (see above). HSPs are reported for a database sequence only if the statistical significance ascribed to them is at least as high as would be ascribed to a lone HSP having a score equal to the CUTOFF value. Higher CUTOFF values are more stringent, leading to fewer chance matches being reported. (See parameter S in the BLAST Manual). Typically, significance thresholds can be more intuitively managed using EXPECT.
MATRIX
Specify an alternate scoring matrix for BLASTP, BLASTX, TBLASTN and TBLASTX. The default matrix is BLOSUM62 (Henikoff & Henikoff, 1992). The valid alternative choices include: PAM40, PAM120, PAM250 and IDENTITY. No alternate scoring matrices are available for BLASTN; specifying the MATRIX directive in BLASTN requests returns an error response.
STRAND
Restrict a TBLASTN search to just the top or bottom strand of the database sequences; or restrict a BLASTN, BLASTX or TBLASTX search to just reading frames on the top or bottom strand of the query sequence.
FILTER
Low-complexity
Mask off segments of the query sequence that have low compositional complexity, as determined by the SEG program of Wootton & Federhen (Computers and Chemistry, 1993), or segments consisting of short-periodicity internal repeats, as determined by the XNU program of Claverie & States (Computers and Chemistry, 1993), or, for BLASTN, by the DUST program of Tatusov and Lipman (in preparation). Filtering can eliminate statistically significant but biologically uninteresting reports from the blast output (e.g., hits against common acidic-, basic- or proline-rich regions), leaving the more biologically interesting regions of the query sequence available for specific matching against database sequences.
Filtering is only applied to the query sequence (or its translation products), not to database sequences. Default filtering is DUST for BLASTN, SEG for other programs.
It is not unusual for nothing at all to be masked by SEG, XNU, or both, when applied to sequences in SWISS-PROT, so filtering should not be expected to always yield an effect. Furthermore, in some cases, sequences are masked in their entirety, indicating that the statistical significance of any matches reported against the unfiltered query sequence should be suspect.
Mask for lookup table only
This option masks only for purposes of constructing the lookup table used by BLAST. The BLAST extensions are performed without masking. This option is still experimental and may change in the near future.
NCBI-gi
Causes NCBI gi identifiers to be shown in the output, in addition to the accession and/or locus name.
Genetic Code
Genetic code to be used in translation of the query or the database.
Graphical Overview
An overview of the database sequences aligned to the query sequence is shown. The score of each alignment is indicated by one of five different colors, which divides the range of scores into five groups. Multiple alignments on the same database sequence are connected by a striped line. Mousing over a hit sequence causes the definition and score to be shown in the window at the top, clicking on a hit sequence takes the user to the associated alignments.
BLAST Color schema description
  • Color schema 1:
    • masked regions in lower case
    • everything else in upper case
  • Color schema 2:
    • masked regions in lower case, gray letters
    • unaligned regions in italic
    • everything else in upper case
  • Color schema 3:
    • no middle line
    • masked regions in lower case, gray letters unless identity
    • everything else in upper case
    • unaligned regions in italic
    • identity shown in red color
    • similarity shown in blue color
    • mismatches shown in black color
  • Color schema 4:
    • no middle line
    • masked regions in lower case, gray letters
    • everything else in upper case
    • unaligned regions in italic
    • identity shown in blue color
    • similarity shown in brown color
    • mismatches shown in red color
  • Color schema 5:
    • no middle line
    • masked regions in lower case, gray letters
    • everything else in upper case
    • unaligned regions in italic
    • identity shown in red color
    • similarity shown in blue color
    • mismatches shown in black color
  • Color schema 6:
    • no middle line
    • masked regions in lower case, gray letters unless identity
    • everything else in upper case
    • unaligned regions in italic
    • identity shown in red bold color
    • similarity shown in blue color
    • mismatches shown in gray color
Out-Of-Frame BLAST notation

When protein aligned to the nucleotide there are 6 possibilities of match at any point. In OOF alignment - upper sequence is DNAP - 3-frame translated DNA. Lower sequence is protein. At any position next protein base may be aligned to 6 possible bases in DNAP:

(TBO - traditional blast output)

0:         3 nucleotides missing - gap (TBO notation "-")

OOF alignment with DNAP:

      DTRGGDTPQKSVFSRAQNTLWGERGDTQKRGGAQRGDIFSLWGG-GVLCV
      |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   |  |
      D  G  T  K  F  A  T  G  G  Q  G  Q  D  S  G K V  V

TBO:

      DGTKFATGGQGQDSG-VV
      DGTKFATGGQGQDSG VV
      DGTKFATGGQGQDSGKVV
1:         2 nucleotides missing - "frameshift -2" (TBO notation "\\")
OOF alignment with DNAP:

      DTRGGDTPQKSVFSRAQNTLWGERGDTQKRGGAQRGDIFSLWGGGGVLCV
      |  |  |  |  |  |  |  |  |  |  |  |  |  |  |/  |  |
      D  G  T  K  F  A  T  G  G  Q  G  Q  D  S  GK  V  V

TBO:

      DGTKFATGGQGQDSG\\GVV
      DGTKFATGGQGQDSG   VV
      DGTKFATGGQGQDSG  KVV
2:         1 nucletide missing - "frameshift -1" (TBO notation "\")
OOF alignment with DNAP:

      DTRGGDTPQKSVFSRAQNTLWGERGDTQKRGGAQRGDIFSLWGGERGV
      |  |  |  |  |  |  |  |  |  |  |  |  |  | /  |  |  
      D  G  T  K  F  A  T  G  G  Q  G  Q  D  S G  K  V  

TBO:

      DGTKFATGGQGQDS\GEV
      DGTKFATGGQGQDS G V
      DGTKFATGGQGQDS GKV  
3:         Complete match
OOF alignment with DNAP:

      DTRGGDTPQKSVFSRAQNTLWGERGDTQKRGGAQRGDIFSLWGGEKRGV
      |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  | 
      D  G  T  K  F  A  T  G  G  Q  G  Q  D  S  G  K  V 

TBO:

      DGTKFATGGQGQDSGKV
      DGTKFATGGQGQDSGKV 
      DGTKFATGGQGQDSGKV 

4:         1 nucleotide insertion - "frameshift +1" (TBO notation "/")
OOF alignment with DNAP:

      DTRGGDTPQKSVFSRAQNTLWGERGDTQKRGGAQRGDIFSLWGGVEKRGV
      |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   \
      D  G  T  K  F  A  T  G  G  Q  G  Q  D  S  G   K  V

TBO:

      DGTKFATGGQGQDSG/KV
      DGTKFATGGQGQDSG KV
      DGTKFATGGQGQDSG KV
5:         2 nucleotides insertion - "frameshift +2" (TBP notation "//")
OOF alignment with DNAP:

      DTRGGDTPQKSVFSRAQNTLWGERGDTQKRGGAQRGDIFSLFLWGGEKRGV
      |  |  |  |  |  |  |  |  |  |  |  |  |  |    \  |  |
      D  G  T  K  F  A  T  G  G  Q  G  Q  D  S    G  K  V

TBO:

      DGTKFATGGQGQDS//GKV
      DGTKFATGGQGQDS  GKV
      DGTKFATGGQGQDS  GKV

Advanced Options

BLASTN Program

  -G  Cost to open a gap [Integer]
    default = 5
  -E  Cost to extend a gap [Integer]
    default = 2
  -q  Penalty for a mismatch in the blast portion of run [Integer]
    default = -3
  -r  Reward for a match in the blast portion of run [Integer]
    default = 1
  -e  Expectation value (E) [Real]
    default = 10.0
  -W  Word size, default is 11 for blastn, 3 for other programs.
  -v  Number of one-line descriptions (V) [Integer]
    default = 100
  -b  Number of alignments to show (B) [Integer]
    default = 100

BLASTP/BLASTX/TBLASTN Program

  -G  Cost to open a gap [Integer]
    default = 11
  -E  Cost to extend a gap [Integer]
    default = 1
  -e  Expectation value (E) [Real]
    default = 10.0
  -W  Word size, default is 11 for blastn, 3 for other programs.
  -v  Number of one-line descriptions (V) [Integer]
    default = 100
  -b  Number of alignments to show (B) [Integer]
    default = 100

Limited values for gap existence and extension are supported for these three programs.  
Some supported and suggested values are:

  Existence	Extension

     10             1
     10             2
     11             1
      8             2
      9             2

Versions

1.1.1
  • Add more databases
1.1.0
  • Remove bug introduce by the use of BLAST+ makeblastdb in place of formatdb
1.0.3
  • New data bank from IMGT/GENE-DB reference sequences
  • Some Javascript to restrict blast programs on a bank select
1.0.2
  • File uploading regression (bug)
  • Internal change
  • Upgrade NCBI BLAST
1.0.1
  • Interface refresh
  • Upgrade NCBI BLAST