TarO
is hosted by the Barton Group,
First released:
Introduction
|
|
||
Home Page
|
|
||
|
The home page provides a table summarising queries that have been run through the pipeline. Only queries submitted by yourself and by users in your group will be visible, and queries are presented in groups according to user ID. Of course all guest queries are visible to everyone. There is also an acknowledgements section, detailing the references for software incorporated into TarO. Please cite these as appropriate. Click on the image to navigate to an example home page. |
|||
Input Sequences Page
|
|
||
|
This page presents results for the input sequences and the progress of pipeline queries. There is also a button to start the Jalview applet and so visualise annotations of the input sequences in a Multiple Sequence Alignment. Links from this page lead to the results for putative orthologues and homologues. The results pages generally provide numerous links, for example to allow DAS lookups via Dasty2, and to the UniProt and COG websites. Click on the image to navigate to an example input sequences page. |
|||
Query Status
Table
|
|
||
|
|
On the Input Sequences
page there is a table detailing the query progress. The
various TarO pipeline stages are summarised in the
left column, and the status of each stage is summarised in the right-hand
column. The colour of each row reflects the status of each step, according to a ‘traffic
lights’ scheme.
|
||
Annotated Multiple Sequence Alignment |
|||
|
|
Jalview is used
to visualise annotations that can be mapped to residues in the sequence (e.g.
phosphorylation sites), other annotations (eg extinction coefficient) are available in the results
tables. The Jalview
applet provides the facility to start the full Jalview application
(on menu click File > View in Full Application). The full Jalview
application allows lookup of DAS features and the ability to save alignment
files. The multiple sequence alignment is constructed using the MUSCLE
algorithm.
|
||
Orthologues Page
|
|
||
|
This page presents tabulated results for putative orthologues of the input sequence(s), annotated from BLAST searches of the COG database. Results on this page are ordered by predicted crystallisation propensity (ParCrys) and then by BLASTP Expectation value (for the match to the user input sequence). Methodology is described in more detail below. There are links for each sequence for homologues obtained by a search of UniRef100. Click on the image to navigate to an example orthologues page. |
|||
Homologues Page
|
|
||
|
This page presents results for putative homologues of the sequence that was clicked on (which could be an input sequence or a putative orthologue). Results on this page are ordered by estimated crystallisation propensity (ParCrys) and then by PSIBLAST Expectation Value. Homologues are gathered using a PSIBLAST search of UniRef100. Methodology is described in more detail below. Click on the image to navigate to an example homologues page. |
|||
Submit New
Query Page
|
|
||
|
This page is used to start a new TarO query. The query description box allows users to specify a name for the query that is displayed in the home page query summary table. Something that is meaningful to help identify the query to you is therefore recommended! The input is required to be in fasta format and protein sequence. There is the facility to upload an input file, or to paste a fasta-format sequence into the large box. This page also has a field to specify the maximum number of sequences to include in the Multiple Sequence Alignment – the default is 100. If too many sequences are included, the alignment may become rather “gappy”. Click on the image to navigate to an example new query page. |
|||
Summary of Methodology |
|
||
|
1 User input sequence(s) searched against the COG database using BLASTP
(thresholds coupling sequence identity with alignment length as defined in Rost (1999) Protein Eng. 12:85-94). The topscoring matched COG sequence is used to assign a COG cluster to the input sequence
and COG sequences from that COG cluster (of putative orthologues) are thus associated with the user input
sequence. Sequences within an assigned COG cluster are displayed if the
BLASTP evalue is 1e-3 or better. |
|
||

TarO is still evolving and user feedback is most
welcome. Please direct any comments to taro@compbio.dundee.ac.uk
Site layout
|
|
|
Description of column headings |
|
The following section elaborates upon headings displayed in the tables on the TarO website. This section is primarily intended to be accessed via links from the main pages for additional explanation of particular table headings. |
|
Reference Table |
|
• Introduction |
QUERY_ID |
|
|
The TarO identifier for the search triggered from the user input page |
|
Query Description |
|
|
The user-specified description for the given query |
|
#Sequences |
|
|
The total number of unique sequence identifiers associated with the Query_ID. Currently these may be from user input, COG or UniRef100. |
|
Sequence_ID |
|
|
The sequence identifier (may be supplied by the user or from external databases). This sequence is referred to as the query sequence in the context of database searching. |
|
Organism |
|
|
The organism associated with
the sequence. UniRef100 sequences are associated with a name according to the
information in the header of UniRef100.fasta file, or where this is not
informative organisms are assigned to UniRef100 identifiers using the IPI
database; however, there are still (a small proprtion
of) Uniref100 sequences where meaningful organism information is not
currently assigned. Also, for UniRef sequences the
presence of "..." following the organism name indicates that there
is additional organism information available, which appears on mouseover. The COG/KOG sequences are associated with an
abbreviated organism name, as given in the COG database. The list of abbreviations
with their corresponding full organism names is given below: |
|
Links |
|
|
Links to display further
details of results (One-letter codes as follows): |
|
Sequence statistics |
|
|
Empirically calculated and predicted properties for the given sequence. |
|
Seqlen |
|
|
Sequence length |
|
Mr |
|
|
Molecular weight |
|
GpIclus |
|
|
Cluster assigned from the GRAVY/pI index (see PNAS 99:11664) |
|
pI |
|
|
Isoelectric point |
|
GRAVY |
|
|
GRand AVerage of hydrophobicitY (kyte-doolittle socres) |
|
SigP |
|
|
SignalP (JMB 340:783) predicted signal peptide. This column details the last residue of any predicted signal peptide. More information on SignalP is available here |
|
SPconf |
|
|
SignalP (JMB 340:783) HMM confidence score. More information on SignalP is available here |
|
#TMH |
|
|
The number of transmembrane helices predicted by the program TMHMM2 (JMB 305:567). More information on TMHMM2 is available here |
|
TMH_span |
|
|
The portion of the sequence (start-end) that includes transmembrane helices, taken from the predictions of TMHMM2 (JMB 305:567). All predicted transmembrane helices are included in these sequence co-ordinates. More information on TMHMM2 is available here |
|
#His |
|
|
Number of Histidines |
|
#Met |
|
|
Number of Methionines |
|
#Cys |
|
|
Number of Cysteines |
|
COG top hit details |
|
|
These statistics are compiled
from a BLASTP
search of the COG database |
|
COGclus |
|
|
Assigned COG cluster based on the BLASTP search of the COG database. Note that a good BLAST match to a COG sequence does not automatically allow the assignment of a COG cluster because sequences in COG are not neccessarily associated with a COG cluster. |
|
Subject |
|
|
Database sequence identifier |
|
eval |
|
|
BLAST expectation value |
|
%id |
|
|
Percentage identity |
|
Alen |
|
|
Alignment length |
|
Qst |
|
|
Alignment start position on query sequence |
|
Qen |
|
|
Alignment end position on query sequence |
|
Sst |
|
|
Alignment start position on subject (database) sequence |
|
Sen |
|
|
Alignment end position on subject (database) sequence |
|
Seqlen |
|
|
Sequence length |
|
ORIGIN |
|
|
The source from which the sequence data has been retrieved |
|
Uniref Tophit |
|
|
The topscoring UniRef100 sequence found with a BLASTP seach. |
|
PDB Tophit |
|
|
The topscoring
PDB sequence found with a PSIBLAST
search (3 iterations, 1E-03) or BLASTP search
(for the orthologue/homologue sequences), and thresholds from Rost
(1999) Protein |
|
TargetDB Tophit |
|
|
The topscoring TargetDB sequence found with a BLASTP search (thresholds 1E-03, as well as matching above Rost thresholds (coupling alignment length and percentage identity)). If there is no data in these columns this is because no hit was found above the thresholds. |
|
TargetDB_groupID |
|
|
The group associated with the TargetDB identifier |
|
TargetDB_status |
|
|
The status of efforts towards obtaining the molecular structure of the given protein. |
|
More |
|
|
Display more information.
One-letter codes are specified as follows: |
|
99%qcov |
|
|
Whether at least 99% of the query sequence is covered by the alignment with the subject (database) sequence. This information is specified by 1 (True) or a 0 (False) |
|
99%qcov+99%id |
|
|
Whether at least 99% of the query sequence is covered by the alignment with the subject (database) sequence with at least 99% identity. This information is specified by 1 (True) or 0 (False). |
|
RPSBLAST |
|
|
RPS-BLAST
(Reverse PSI-BLAST) searches a query sequence against a database of profiles,
producing BLAST-like output. More details.. |
|
PSIBLAST Statistics |
|
|
These statistics are compiled from a PSIBLAST search of the UniRef100 database |
|
BLASTP Statistics |
|
|
These statistics refer to the BLASTP alignment of the user input sequence to sequences from the COG database |
|
ParCrys prediction |
|
|
ParCrys is a Parzen Window approach to crystallisation propensity prediction (Overton et al. 2007). The prediction can be "Highly amenable", "Amenable" or "Recalcitrant" to crystallisation. These three predictions are based on an analysis of TargetDB data. The ParCrys score thresholds for defining these boundaries are 6637270 (Highly amenable/Amenable) and 3564600 (Amenable/Recalcitrant). More information on ParCrys is available from here |
|
ParCrys-Sc |
|
|
ParCrys is a Parzen Window approach to crystallisation propensity prediction, the ParCrys-Sc refers to the raw ParCrys Score. The higher the score, the more similar the input sequence to sequences associated with diffraction-quality crystals. More information on ParCrys is available from here |
|
|
|
|
The OB-Score is a z-score scale based on calculated hydrophobicity and isoelectric point values from PDB sequences against a background distribution generated from UniRef50. The OB-Score can be used to estimate crystallisation propensity. For more details, see Overton & Barton (2006). FEBS Lett. 580, 4005-4009. More information on the OB-Score is also available from here |
|
RONN |
|
|
The RONN algorithm is used to predict disordered regions. The column RONN gives the percentage of residues that are predicted to be disordered by RONN. More information on RONN is available from here |
|
Jpred_H |
|
|
The Jpred algorithm is used to predict secondary structure. The column Jpred_H gives the percentage of residues that are predicted in helical conformation. More information on Jpred is available from here |
|
Jpred_E |
|
|
The Jpred algorithm is used to predict secondary structure. The column Jpred_E gives the percentage of residues that are predicted in extended conformation. More information on Jpred is available from here |
|
NetNglyc |
|
|
NetNglyc
was developed to predict N-linked glycosylation in
human proteins. The NetNglyc predictions with score
of at least 0.7 are displayed in the format: "ResidueNumber:Score_ResidueNumber:Score_etc.". More
information on NetNglyc is available from here |
|
NetOglyc |
|
|
NetOglyc
is used to predict mucin type GalNAc
O-glycosylation sites in mammalian proteins. The NetOglyc predictions with score of at least 0.7 are
displayed in the format: "ResidueNumber:Score_ResidueNumber:Score_etc.". More
information on NetOglyc is available from here |
|
NetPhos |
|
|
NetPhos is used to predict serine, threonine and tyrosine phosphorylation sites in eukaryotic proteins. The NetPhos predictions with score of at least 0.7 are displayed in the format: "ResidueNumber:Score_ResidueNumber:Score_etc.". More information on NetPhos is available from here |
|
A280 |
|
|
The predicted Molar extinction coefficient at 280nm. |
|
A280_1mg |
|
|
The predicted extinction coefficient for a 1mg per ml solution of the protein at 280nm. |
|
Multiple Sequence Alignment (MSA) Information |
|
|
Clicking on the button to "View Multiple Sequence
Alignment Annotated with...." starts the Jalview applet,
displaying a window with the MSA, and a window entitled "Feature
Settings". The full Jalview
application can be started from within the applet for additional
functionality (click 'File'->'View in Full Application'). The MSA was
constructed with the MUSCLE algorithm, including sequences that have a BLAST match to the query sequence of 1E-20 or better. Sequences are excluded if their sequence length is more than 125% of the query sequence length. |
|
|
There are currently 5 groups
of annotation on the MSA, however simultaneous display of all groups can be
confusing! Therefore we strongly suggest that you customise the display of
groups using the "Feature Settings" window (described above). The
groups are: |
|