seeMotif

Exploring and visualizing sequence motifs in 3D structures

About seeMotif

Using seeMotif

Related sites

Troubleshooting



What is seeMotif?

In the past few years, an urgent demand of efficiently deriving sequence motifs arises owing to a large number of sequencing projects have accumulated numerous sets of homologues sequences. Several state of the art pattern mining algorithms have revealed their potentials in discovering protein functional signatures using sequences only. As the number of protein structures increases at an astounding rate, it is greatly desirable to have an exploring tool in which the researchers can quickly see where a sequence motif is present in protein structures and how it looks like in space. In this paper, we present the service seeMotif, which aims at providing an easy-to-use web interface for visualizing and exploiting sequence motifs in 3D protein structures. Here a motif is defined as a pattern in the form of regular expression (e.g. H-G-T-x(3)-G-x(77,101)-A-x-G-N-x(57,78)-G-T-S-x(3)-P), PROSITE ID (e.g. PS00136), or ELM ID (e.g. CLV_PCSK_SKI1_1).

What is a motif?

In seeMotif, the term 'motif' refers to a sequence motif, although you will explore these sequence motifs in structures (that is one of the most useful features of seeMotif). In genetics, a sequence motif is a nucleotide or amino-acid sequence pattern that is biologically significant. There are many ways to present such a concept. For example, here is a motif described by linguistic form for the N-glycosylation site:

Asn, followed by anything but Pro, followed by either Ser or Thr, followed by anything but Pro.

In seeMotif, we accept motifs in regular expression form. The following styles are allowed:

Regular expression
A regular expression is an expression that describes a set of strings. They are usually used to represent a common characteristic of a set. In Bioinformatics, regular expression is widely adopted to describe a functional region shared by a group of protein sequences. The syntax of regular expression used in seeMotif is described as follows:

  • An uppercase letter matches an amino acid with the corresponding one-letter code defined by the standard IUPAC.
  • The symbol 'x' is used for a position where any amino acid is accepted.
  • Symbols within a paired square brackets '[]' match a position where each of the corresponding amino acid is accepted. For example: [ALT] stands for Ala or Leu or Thr.
  • Symbols within a paired curly brackets '{}' match a position where any amino acid is accepted, except those corresponding amino acids. For example: {AM} stands for any amino acid except Ala and Met.
  • The symbol '-' is used to separate any two elements described above.
  • Symbols followed by a brackets '()' provide a succinct notation of repeated elements. For example, x(3) corresponds to x-x-x; x(2,4) corresponds to x-x or x-x-x or x-x-x-x; A(3) corresponds to A-A-A.

PROSITE ID
seeMotif will recognize a string of seven characters starting with 'PS' as a PROSITE ID. By specifying a PROSITE ID, seeMotif will automatically find the corresponding pattern in PROSITE. A PROSITE pattern is indeed a regular expression described above. For example, PS00397 (like to PROSITE) indicates Y-[LIVAC]-R-[VA]-S-[ST]-x(2)-Q.

Pfam ID
seeMotif will recognize a string of seven characters starting with 'PF' as a PROSITE ID. As ROSITE ID, seeMotif will automatically find the corresponding hidden Markov model (model) in Pfam by specifying a Pfam ID. A Pfam HMM is a profile. For example, the following figure is a (cropped) graphic view of PF00239 (like to Pfam):

How does seeMotif visualize a motif?

The visualizing procedures should be carefully handled in order to correctly present sequence motifs in 3D structures. It is usually the case that none of the sequences, from which the motifs are derived, have 3D structures experimentally determined. In this regard, seeMotif requests a reference sequence, which must contain at least a subsequence that matches the query motif, from the users. Local alignment (using BLAST) is performed on the reference sequence against the protein chains in Protein Data Bank (PDB) in order to find potential homologous structures. Afterward, global alignment (using ClustalW) is invoked to construct position mapping between the reference sequence and the polypeptides present in 3D structures.

How to start?

This is the home page of seeMotif:

To use seeMotif, all you have to do is to specify a motif and a reference protein sequence that matches the motif.

Visualizing page

After submitting a motif and a reference sequence, you will see the visualizing page of seeMotif like this:

This page consists of three parts. The first part provides a sequence view of the input motif on the reference sequence. The second and third parts provide a three-dimensional view of the input motif. The second part shows the input motif mapped on a selected protein (not the reference protein) with PDB structure available. You may see the input motif mapped on existing protein structures by the user interface in the third part.

Sequence panel

This panel (the first part in the visualizing page) highlighted the input motif on the reference sequence.

If the input motif matched more than two positions of the reference sequence, this panel provides a control to switch among these positions.

The letters in the center part is the reference sequence in FASTA format while the numbers beside the sequence are shown for you to get the position of a specific amino acid conveniently. In addition, when your cursor stays over an amino acid for a while, you will see a little tip which displays the index of that amino acid.

The underscored amino acids are the residues matched by the input motif. Different colors represent different blocks of the input motif. These colors also help the user to map the residues to that shown visually in the structure panel (explained below).

Structure panel

This panel is consisted of two regions (the second and the third part in the visualizing page). The left region embeds a Java viewer for chemical structures in 3D while the right region provides an interface to control which PDB structure to be shown in the left region.

In the left region, the blocks of the query motif are illustrated as sticks with distinct colors corresponding to their sequence expression form as shown in the sequence panel. Ligands are displayed in spacefill and colored in CPK mode.

seeMotif will collect PDB structures that contain similar polypeptide chains to your reference sequence and list their information in the right region of this panel. In this case, there are 20 structures with a protein chain similar to the reference sequence. The checked symbol in front of a PDB ID (here is 1ZR4:E) indicates that the selected PDB chain is shown in the left region now.

About the columns

PDB ID - PDB ID of the structure

e-value - Expectation value of BLAST; the number of different alignments with scores equivalent to or better than S that are expected to occur in a database search by chance

Dis - the shortest distance from the matched positions to any binding partners; * a distance of 100 indicates that no binding partner is found in the PDB file

#MR - the number of matched residues found in structures

Link - link to PDB

Visualization mode

In multi-motif mode, it might happen that two motifs are competing for the same residues on the reference sequence. In this regard, overlapping between motifs should be detected before coloring the residues. In order to avoid abundant conflicts, overlapping detection is performed in block level. In multi-motif mode, only one matched position is considered for each motif block at a time, resulting in numerous combinations of motif matching, which can be selected manually by the users. Once the positions of motif blocks are selected, motif blocks that overlap on the reference sequence are assigned with the same color to show their relationships. On the other hand, a block without overlapping with the others can be colored in two ways: (1) the simpler one, blocks belonging to the same motif have a single color; (2) the more complex one, all the blocks has its own color.

In single-motif mode, the adopted coloring scheme is simple. Each motif takes a distinct color, and for the selected motif all the matched positions are shown simultaneously. Multi-block motifs are excluded from the list, since it would be too complicated to show all the occurrences (if it does have) at a time.

Swiss-Prot

Swiss-Prot is a curated protein sequence database which strives to provide a high level of annotation (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases (More details|References|Linking to Swiss-Prot|User manual).

PDB

The Protein Data Bank (PDB) is the single worldwide depository of information about the three-dimensional structures of large biological molecules, including proteins and nucleic acids. These are the molecules of life that are found in all organisms including bacteria, yeast, plants, flies, and mice, and in healthy as well as diseased humans. Understanding the shape of a molecule helps to understand how it works (Linking to PDB).

PROSITE

PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them (More details|References|Linking to PROSITE).

My structure panel seems broken

The structure panel in the visualizing page provides an interactive interface of enzyme structure in 3D. However, for better experience, this panel utilized many modern web technologies such as Java Applet and AJAX. We provide some checkpoints to help people who have no idea about these web technologies.

  1. Browser
    The default settings of most modern browsers (Opera, Firefox, and Internet Explorer) should be okay for seeMotif.
    • enable Java applet
    • enable Javascript (note that Java applet and Javascript are different)
  2. Java Runtime
    We know that some user can use the structure panel after updating Java runtime, however, we don't know what version he used before updating. In addition, all Java applets run on the same Java virtual machine and share a fixed stack space. Sometimes you would be better to turn off other Java applications for larger stack space.
    • install the Java runtime
    • update to the latest Java runtime
    • enlarge the stack space
    • turn off other Java applications
    • turn off other browser pages (keep only one browser instance and only one tab)
  3. Tested Environments
    The environments that have been tested for compatibility are as follows:
    1. Windows XP - Internet Explorer 6.0 - Java 1.6.0
    2. Windows XP - Firefox 2.0.0.11 - Java 1.6.0
    3. Windows XP - Opera 9.25 - Java 1.6.0
    It would be very nice if you can tell us your environment in which seeMotif runs well. On the other hand, please feel free to contact us when you cannot use seeMotif on a specific environment. We will try to test seeMotif on your environment.